How I started studying large language models

It all comes back to what we mean by what we say.

Jul 07, 2025

These days, it feels like almost everyone I talk to has heard of Large Language Models (LLMs). Even if they’re not familiar with the term “LLM”, they probably know about ChatGPT.

As a researcher studying LLMs in multiple capacities, it’s been strange to observe this societal transition. Before ChatGPT, explaining what I worked on and why it might matter often required giving some background details about what a “language model” was and how it worked. After ChatGPT, I’ve found that people have a much easier entry point into my research. Perhaps not coincidentally, interest in this newsletter has also grown considerably.

But I also don’t want to give the impression that I’ve always been interested in language models. When I first entered graduate school in 2016, I was interested in the complexities of human communication (and I still am!): how do we figure out what other people mean, especially when what they say is ambiguous or otherwise under-specified?1 This interest in ambiguity eventually pushed me towards ambiguous words, i.e., homonyms (“river bank” vs. “financial bank”) and polysemes (“marinated lamb” vs. “friendly lamb”). It wasn’t until my third year of graduate school (2018-2019) that I came across a series of papers by former UCSD professor Jeffrey Elman2 that completely reshaped my research interests and how I thought about “meaning” more generally.

The original view: words as entries

What do words mean?

The traditional approach in Psychology and Linguistics conceives of words as mapping onto meanings, similar to a dictionary. In some cases, the dictionary metaphor has been taken quite literally, leading to the concept of a “mental lexicon”. In this mental lexicon, each word is an “entry”, complete with the core grammatical properties necessary to understand that word. For example, the entry for house might look something like:

House (n.): building for human habitation.

There was considerable debate about how these properties were represented. Some theorists advocated for a feature-based approach:

(+ artifact) (+shelter) (+for humans)

Others advocated for more imagistic representations, e.g., some kind of rough “mental picture” of what a house looks like; other theorists emphasized the taxonomic relationships between concepts (a “cat” is a kind of “mammal”, which is a kind of “animal”, etc.); and still others advocated for an “all of the above” approach. Despite their differences, all of these approaches assumed that the mind “stored” meanings and that these meanings were arranged like those in a dictionary. A natural question to ask at this point is: how did this system contend with ambiguous words?

The solution to homonyms—words with multiple, unrelated meanings—was basically to (again) mirror the structure of a dictionary. Words like “bank” were simply given two distinct entries:

Bank (1) (n.): A financial establishment that saves and invests money from consumers.
Bank (2) (n.): the terrain alongside a river or stream.

A number of theoretical and empirical research papers were dedicated to figuring out exactly how and when these different meanings were accessed in the context of language comprehension. Again, a common thread was that distinct meanings were, in fact, “stored” in distinct entries in the mental lexicon.

The real challenge began with polysemous words: those with distinct but related meanings (e.g., “marinated lamb” vs. “friendly lamb”). Polysemy is much more common than homonymy3, and has also inspired much more debate among cognitive scientists as to the nature of its representation. One school of thought is that polysemous meanings are represented more or less like homonymous ones, i.e., in distinct entries; this is called the “Sense Enumeration” view. Another view is that distinct polysemous meanings are derived from a single “core” representation, using a set of semantic rules or relations (like “animal for meat”); this is called the “Core Representation” view. You might think of this latter view as also being like a dictionary entry with “sub-senses”:

Chicken —> a. The common domestic fowl.
—> b. The flesh of the animal used as food.

The long and short of it is that, as you might expect, these different accounts each have their own theoretical issues. The Sense Enumeration accounts have a hard time dealing with the fact that people often respond differently to polysemous meanings than homonymous ones (unsurprisingly, because they are more related!). The Core Representation accounts, in contrast, don’t always give satisfying explanations of how these meanings are generated “on the fly”: some polysemous meanings are systematic (like “animal for meat”), but some are harder to represent as formal rules (e.g., “pen cap” vs. “baseball cap”4).

Further, neither account really grapples with the more general problem that all words—even monosemous ones—mean slightly different things in different contexts. Does “paint” mean different things when we say “She painted portraits” vs. “She painted houses”? What about “She painted her fingernails”? Distinguishing ambiguity from vagueness is notoriously difficult: some lexicographers have gone so far as to assert that they don’t believe in word senses.5

What goes into these meanings anyway?

I mentioned earlier that Jeff Elman wrote a series of papers motivating an alternative view of the mental lexicon. From the standpoint of historical impact and comprehensiveness, I think the most important of these papers is probably his 2009 paper, “On the Meaning of Words and Dinosaur Bones: Lexical Knowledge without a Lexicon”, though an earlier 2004 paper also provides a helpful overview of his account.

One of the things I really like about Elman’s 2009 paper is that he carefully describes both theoretical and empirical problems with the traditional “mental lexicon” view. In addition to the issues I described above, one central challenge is that people seem to encode a remarkable amount of world knowledge in their understanding of a word—and crucially, how those words fit together in a broader sentence context.

For example, people rapidly detect incongruities between an instrument and its patient, as in “Susan used the scissors to cut the expensive wood”. They also have strong expectations about verbs and their typical patients, driven by the agent of a sentence: “The lifeguard saved ____” is more likely to end with lives than money. Crucially, people can also override these expectations in the right context (such as if the lifeguard is specified to be looking for a discount at the store). As Elman writes (bolding mine):

Cumulatively, all of these data are usually interpreted as indicating a far more central role for the lexicon in sentence processing than was initially envisioned, and they also suggest that lexical representations contain a significant amount of detailed word-specific information that is available and used during on-line sentence processing.

The question is where all this information gets stored:

So why would one even think of putting this sort of information into the lexicon? This raises the more fundamental question: What criteria should we use to decide what goes into the lexicon and what does not?

This echoes an older debate about whether the mental lexicon is more like a dictionary (sparse and relatively parsimonious) or an encyclopedia (rich and intimately connected to world knowledge). Elman’s point is that people clearly extract a bunch of information from words very rapidly during language processing. That makes the sparseness of the dictionary view somewhat untenable—yet it also raises the question of where we draw the boundaries (if at all) around these lexical entries.

Interlude: what makes a good explanation?

At this point, some readers might be wondering why any of this matters. Of course, the mind is not literally a dictionary—it’s a (possibly) helpful metaphor. Moreover, unless you’re a dualist, you probably think that this is “all just neurons firing”. Why does it matter how many entries we have for “chicken”?

There are a couple of ways to think about this.

First, while I agree that many mental phenomena might be ontologically reducible to neural phenomena6, those neural phenomena may not serve adequately as explanations for the mental phenomena. That is, it’s unclear that we’ll be able to derive a satisfying explanation for something like “what do words mean?” by describing the actions of neurons in the brain. Further, even if such a reductionist explanation is possible, it might be helpful to start out with higher-level constructs such as “lexical entries”.

It’s also important to note that “this is all neurons firing” is not an explanation. It’s a hand-wave that just shifts the burden of explanation elsewhere. You could also say “this is all atoms bumping around” or “this is all electricity oscillating through tissue”. Each of these assertions may point to a useful avenue for future explanations but is not in itself an explanation and should not be accepted as such.

Ultimately, though, I’m actually sympathetic to the concern implicit in those questions at the top of this section: namely, that certain theoretical debates really only arise because of limitations built into the underlying metaphor7 we’re using to construe the thing we’re interested in. If you insist on viewing word meanings as dictionary entries, then you run into all these questions about how many entries there are and what information goes into those entries.

Elman’s move was to propose, essentially, a different metaphor for thinking about word meanings.

An alternative view: words as cues

Instead of viewing words as mappings onto stored “entries”, Elman’s argument was that we should view them as cues—just like other forms of sensory stimuli. And instead of viewing the mind as a passive dictionary, we can conceptualize it as a kind of dynamical system, such as simple recurrent network (SRN).8 I’ve included a rough sketch of an SRN below:

Rough schematic of a simple recurrent network. The activations in the hidden state at time t are the result of both the inputs at time t and the previous hidden state (i.e., at time *t - 1*).

Language unfolds over time, and this temporal unfolding is built into the very structure of an SRN. Like a normal feed-forward neural network, an SRN has an input and an output (i.e., what it’s trying to predict), and a hidden layer mediating between them. For example, the job of an SRN might be to predict the next word based on the previous word (“and ___”). What makes SRNs particularly interesting is their recurrent connection. In addition to a normal hidden state, they also “remember” their previous hidden state, i.e., the state produced at time t - 1. This previous hidden state feeds back into the hidden state at time t.

Concretely: the hidden state at time t is the result of both the current input and the previous hidden state.

This may seem superficially similar to a feedforward neural network with a large context window, but it’s importantly different. It’s not that the SRN attends to multiple words at once: rather, it processes input incrementally, but the effect of each subsequent word is the interaction between that word and the effect of the previous words.

In his 2004 article, Elman uses the example of the word “runs”. If this lexical stimulus is presented to a person (or an SRN), it’ll presumably elicit some kind of representation of the word and what it means. But the exact nature of that representation will depend on the context, such as the previous words that were uttered. If we think of an SRN’s representations as a “state-space” (i.e., encapsulating all the dimensions of its hidden state), the location in that state-space will be a function of both the word itself (“runs”) and the prior state of the network, which will itself be a function of the previous word (e.g., “cheetah” vs. “toddler” vs. “clock”):

A hypothetical 2D “state-space” in an SRN. The same lexical stimulus (“runs”) produces a different hidden state (and a different *trajectory* through that state-space) depending on the previous word.

Notably, such a system could in principle produce the kinds of behavior to which we might be tempted to attribute constructs such as “lexical entries”. That is, a network’s state (and predictions) for the same word (“runs”) will be different depending on which words the network has already encountered. It might also behave more similarly for polysemous meanings (e.g., “marinated chicken” vs. “friendly chicken”) than for homonymous ones (e.g., “river bank” vs. “financial bank”). (Networks trained in this way also develop a number of other interesting representations, such as those distinguishing animate from inanimate nouns.)

Crucially, however, this all happens without explicit supervision. If all goes well in training, the network doesn’t have to be directly told that “river bank” and “financial bank” are homonyms. Rather, it can learn distinctions like these from the training data.9

When I first read Elman’s 2009 paper in 2017, I was really taken with this idea. I’ve always been suspicious of accounts that rely on too many explicit constructs we can’t directly observe—these things have a way of exploding combinatorially—and this alternative view of word meaning seemed incredibly elegant and parsimonious. I wondered (with the naïveté of an early graduate student): why isn’t anyone working on this?10

Now, in 2025, these ideas may not seem particularly noteworthy.11 But in 2017, the dominant vector-based approach to word meanings was word2vec, which was powerful but also had clear limitations, like the fact that you had only a single vector for each word (i.e., the same vector for “river bank” and “financial bank”). It was a useful means to an end (e.g., as a quick proxy for word similarities), but it didn’t seem like a viable model of meaning to me: I thought (and still do) that meaning has to be contextualized.

At the time, the most popular architecture for language modeling was probably a long short-term memory network (or LSTM), which is based on an SRN. Along with my labmates, I began experimenting with the “state-space” of an LSTM, i.e., asking how well its hidden states distinguished different meanings of the same word in different contexts. Later that year, models like ELMo (another LSTM) and BERT (a transformer) came out; both (but especially BERT) were more powerful and also easier to use.

All this led to a project investigating whether the contextualized representations in models like BERT could account for human behavioral responses to ambiguous words. Some of that work was published at ACL in 2021, and a more fully-formed version was published in Psychological Review in 2023. I’ll talk about the details of that work in a later post, but the key point here is that these early language models first appealed to me as a way to test this alternative conception of word meaning.

From SRNs to LLMs

The theoretical backbone of much of my current research program—the use of LLMs as a kind of “model organism”—emerged between 2018-2022. Initially drawn in by Elman’s “words as cues to meaning” framework, I began to think about what other kinds of questions neural networks trained on the statistics of language could help answer.

Part of this development in my thinking was driven by improvements in the underlying technology. Sometimes this is seen as a bad thing, i.e., researchers chasing the latest hype. But I think that in many cases it makes sense: new tools allow us to do different things and address questions in potentially new and informative ways.

The other, more influential factor affecting my research trajectory was my immediate research environment. I was fortunate enough to be working with a great PhD advisor, Benjamin Bergen, who has a very discerning eye for what makes a research question interesting and tractable. Some new students also joined Ben’s lab during this period (James Michaelov, Cameron Jones, and Tyler Chang); I’ve had a ton of conversations over the years with each of them about LLMs and human cognition. It really is true that spatial proximity drives a ton of intellectual innovation.12

All of which is to say: by ~2020-2021, we’d coalesced around this idea of using LLMs as distributional baselines. Here’s how I defined distributional baselines in a previous post:

This is what I call the distributional baseline question: if human behavior on a psycholinguistic task can be approximated by an LLM, it suggests that the mechanisms responsible for generating human behavior could in principle be the same as those responsible for generating LLM behavior. That is, linguistic input is sufficient to account for human behavior. (Note that sufficiency ≠ necessity.)

This culminated in a collaborative project investigating the performance of GPT-3 on the false belief task, an instrument commonly used to assess Theory of Mind. At the time, OpenAI’s Python API made it possible to access the probabilities assigned to specific word tokens by the model, which meant we could carefully probe the probability assigned to correct vs. incorrect answers on the task. We found that GPT-3 did show sensitivity to a character’s implied belief states, but it under-performed humans (on average). This work was eventually published in Cognitive Science in 2023.

Where I stand now

It’s now July, 2025: about seven or eight years after I first turned to those early articles by Jeff Elman. I’m interested in many of the same research ideas, but (I like to think) I’ve developed a better perspective on them. I’ve found that this sort of thing is common in research: you find yourself circling around the same cluster of ideas, but you’re not repeating yourself, exactly, or at least not usually. Rather, you’re gradually honing those ideas: figuring out how to articulate them more precisely; identifying weak points in an argument and modifying accordingly; and putting them into practice.

A major development in the last couple of years is that I’ve started moving away from using state-of-the-art, closed-source models (like GPT-4) in my research. Not coincidentally, over that same time period, the companies building those models have focused their efforts more on building useful products than on supporting reproducible research tools (a move I find understandable, given that they are companies and obviously need to make money). There are a few research questions where I think it still makes sense to use such models (i.e., when you’re asking questions about “LLM-equipped software tools” or how humans interact with them), but when it comes to LLM-ology, I’m generally in favor of using open models, i.e., those for which both the weights and training data are made available. I’ve been moving in the direction of mechanistic interpretability research, and you can’t really do that kind of research if you don’t have access to the internal states of the model. But even if you’re only interested in model behaviors, it’s important that other researchers can reproduce your methods and analyses as closely as possible.

Relating to this, the biggest change is probably that I find myself more and more interested in the epistemological foundations of the emerging field of LLM-ology. It’s not that I wasn’t interested in these topics before—like I wrote above, I’ve circled around the same cluster of ideas for a while. But they’ve really been thrown into sharp relief as more and more research is conducted on LLMs without a clear sense of what populations we’re drawing inferences about or how best to assess the capabilities we’re interested in. These days, I am drawn most to questions about how we know what we know.

The field of research focused on meaning in context is called Pragmatics.

I met Elman only a few times during my time in graduate school; sadly, he passed away in 2018. Elman’s pioneering work on recurrent neural networks (RNNs) in the early 1990s was hugely influential on the development of neural network models of language. In particularly, I recommend his 1990 article, “Finding Structure in Time”.

Some estimates put the rate of polysemous words in English as well above 50%.

I initially had “market cap” here, but a commenter correctly pointed out that the “cap” here actually means something different; this was an interesting example of me projecting a shared etymology/semantics because of the shortening of “capitalization” over time. But more generally the argument is that even with clearly related concepts like “pen cap” and “baseball cap”, the meanings are harder to represent with a formal rule as you could with something like “Animal for Meat”. Of course, all this stuff occupies a spectrum and there’s no fine line dividing regular from irregular polysemy.

Notably, there’s an important distinction between accepting that word senses are a convenient line to draw in the sand when making a dictionary, vs. arguing that they are psychologically real.

Though not all—many aspects of cognition are grounded not only in the brain but in the body more generally or even in distributed systems.

Or in some cases, the entire paradigm propping up the field.

Sometimes called an “Elman net” (given that Elman pioneered the idea), SRNs were the theoretical backbone of many neural network architectures to come.

The question of what kind of training data you need (and how much) to learn distinctions like these (or others) is itself an interesting theoretical and practical question!

(They were.)

Modern transformer language models are, of course, various prevalent and powerful, and they’re also natural operationalizations of this idea of words as “locations in state-space”. I do think it’s worth noting that transformers work a little differently from recurrent models; I have a personal aesthetic preference for the structure of a recurrent model, even if they’ve fallen slightly out of fashion (the prospective success of state-space models like Mamba remains to be seen).

Like me, those students have all graduated, but Ben’s lab is still going strong, and I’ve recently published a paper with Sam Taylor, one of his current graduate students.

Mike X Cohen, PhD

Aug 19

Nice write-up, Sean. Interesting to see how different people from different backgrounds learn about LLM mechanisms. If I may be so audacious as to humbly suggest my 90+ hour course on LLM architecture, training, and mechanistic interpretability, using ML methods to investigate internal activations during inference: https://github.com/mikexcohen/LLM_course

Expand full comment

1 reply by Sean Trott

Pete

Sep 9

I never thought about LLMs as an object of observation. My assumption before reading your text was that anything engineered can be considered deterministic - thanks for opening a new avenue of contemplation for me!

6 more comments...

The Counterfactual

Discussion about this post