How I started studying large language models

Jul 7

It all comes back to what we mean by what we say.

8 Comments

Nice write-up, Sean. Interesting to see how different people from different backgrounds learn about LLM mechanisms. If I may be so audacious as to humbly suggest my 90+ hour course on LLM architecture, training, and mechanistic interpretability, using ML methods to investigate internal activations during inference: https://github.com/mikexcohen/LLM_course

Expand full comment

Reply (1)

Sean Trott

Aug 19

This looks like an incredible resource! Thank you for sharing, I'll check it out.

Expand full comment

Pete

Sep 9

I never thought about LLMs as an object of observation. My assumption before reading your text was that anything engineered can be considered deterministic - thanks for opening a new avenue of contemplation for me!

Expand full comment

Reply (1)

Sean Trott

Sep 10

Glad to hear you enjoyed it!

Expand full comment

Christopher Riesbeck

Jul 7

+1 for abolishing entries in a mental lexicon, Sean. When we were developing understanding systems for episodic knowledge-based reasoning systems at Yale in the late 1980s, it became clear that language comprehension needed all knowledge, not just some small bits crammed into a lexicon. For you, Elman was the inspiration. For me, it was Quillian's Teachable Language Comprehender (https://dl.acm.org/doi/10.1145/363196.363214). TLC understood phrases like "the lawyer's client" or "the doctor's patient" by finding the connecting paths in a semantic network. TLC was a model with no lexicon! Our application of that idea to our episodic knowledge networks was Direct Memory Access Parsing, a model of language understanding as lexically-cued memory recognition. Will Fitzgerald and I wrote a non-technical introduction to the idea in a response to Gernsbacher's Language Comprehension as Structure Building (https://www.cogsci.ecs.soton.ac.uk/cgi/psyc/newpsy?5.38). More technical points are in https://users.cs.northwestern.edu/~livingston/papers/others/From_CA_to_DMAP.pdf.

Quillian 1969, myself 1986, Elman 2009 -- we're due for another attempt to dump the mental lexicon. It all depends -- as it should -- on the quality of the knowledge base.

Expand full comment

Reply (1)

Sean Trott

Jul 13

Very cool to see the thread of this line of thinking through research on Natural Language Understanding—another nice example of the same kinds of debates and issues resurfacing time and again. I'll check out those papers, thank you!

Expand full comment

Jul 7

Perhaps it is interesting to note that "cap" in market cap could be considered a close mapping if you understood it to mean directly something like "the head / cap on top of the market value". This would be an incorrect understanding of cap in this context (since the term is short for capitalization and thus is connected to pen cap only via the Latin root) but the end result is entirely coherent and the 'vectors' involved should work fine for most sentences using either term. Our mental models can connect the two words sensibly even if they get the historical etymology (and thus the common dictionary definition) incorrect.

Expand full comment

Reply (1)

Sean Trott

Jul 7

Great point, yeah polysemous meanings are definitely *related*, often metaphorically or metonymically so. You could also have a baseball cap and a bottle cap, and both of those clearly relate semantically (probably metonymically) to pen cap, i.e., there's some kind of spatial/conceptual mapping about something going on top of something else, as you say. I think those are actually better examples than "market cap" since as you say, the "cap" in market cap is actually not really related to the pen cap meaning.

Crucially though, those relations (pen cap vs. baseball cap) are usually thought to be less systematic than relations that generalize across words, such as "Animal for Meat" (marinated lamb, marinated chicken, etc.) or "Contents for Container" (pass me the water/wine/etc.), "Container for Contents" (I drank the whole bottle/glass/etc.), "Place for Institution" (White House, Wall Street, etc.), and so on. With regular polysemy, the expectation is that you should see the same relation across multiple words in the same language and also that it'll have a very good chance of recurring across languages; irregular polysemy might recur across languages too (more so than homonymy), but the idea is that it's less likely to than regular polysemy. This paper does some great work cataloguing examples of these systematic relations across multiple languages: https://www.sciencedirect.com/science/article/pii/S0024384114002885

In terms of theories of the mental lexicon, regular polysemy is easier to handle under a "core representation" than irregular polysemy is because you can just list a bunch of generative/productive rules that apply to multiple words ("Animal for Meat"). With irregular polysemy the relations are more idiosyncratic and may apply to only specific words, even though there clearly *is* a relation. That puts you in a tough spot where you still just have to enumerate the different meanings.

The state-space model sidesteps much of these issues by not worrying about how these things are "stored". There are explanatory limitations to such an approach (I'll discuss this in an upcoming approach) but it is nicely elegant.

Expand full comment

The Counterfactual

How I started studying large language models