What we talk about when we talk about LLMs

Sean Trott

Aug 9, 2024

Stochastic parrots, blurry JPEGs, aliens, and more.

Read →

16 Comments

Christopher Riesbeck

Aug 9, 2024

Very nice discussion. The LLMs as crowds metaphor ties to the joke:

An LLM walks into a bar. "What'll you have?" asks the bartender. The LLM looks around and replies "what is everyone else having?"

My current stance is the alien metaphor. Specifically I imagine an electrically complex gas cloud on Jupiter that after decades of listening to radio and TV begins generating and transmitting new episodes of "I Love Lucy", "All in the Family", etc. I accept LLM behavior as robust and flexible enough to count as intelligence, though a kind far different than our own.

Expand full comment

Reply (1)

Sean Trott

Aug 9, 2024

I hadn't heard that before, I like that joke!

And yes, I think that's a fair perspective. The challenge moving forward (in my view) will be testing the limits of that robustness and flexibility and coming up with a theory of LLM "cognition" that's satisfying and accurate.

Expand full comment

Reply (1)

Christopher Riesbeck

Aug 9, 2024

Absolutely agree.

Expand full comment

galen

Aug 9, 2024

Maybe this falls somewhere between copies and crowds in your categorization, but Alison Gopnik's framing of LLMs as a "cultural technology", i.e. like the Internet, libraries, printed text that all serve to enhance cognitive capacity and aide knowledge transmission, is an interesting metaphor as well!

Her recent talk: https://www.youtube.com/watch?v=qoCl_OuyaDw

Expand full comment

Reply (1)

Sean Trott

Aug 10, 2024

This looks great, thanks! I didn't realize she'd given a talk on this topic—I'm a big fan of her work, so I'm looking forward to watching it.

I actually initially had another section I was going to include called "LLMs as tools" but I had a harder time finding good examples. I'm personally very partial to the idea that LLMs can be well-understood as a kind of cultural technology (much like language itself). Interesting that she mentions libraries specifically—I'm working on another post exploring a thought experiment in which LLMs are construed as big libraries.

Expand full comment

Reply (2)

Benjamin Riley

Aug 10, 2024

I can't remember if we've talked about "cognitive gadgets theory" as postulated by Celia Heyes? Tools of the mind!

https://www.educationnext.org/cognitive-gadgets-theory-might-change-your-mind-literally/

Expand full comment

galen

Aug 10, 2024

I'm pretty partial to the "LLMs as tools" framing, but maybe I'm just biased because the only purpose I've personally found them useful for is for coding, like GitHub Copilot. So far it feels like the only substance underneath all the hype are really good (carefully-prompted) autocompletion tools.

Cool, looking forward to that post! :)

Expand full comment

Benjamin Riley

Aug 9, 2024

Interestingly enough, I've talked to Murray Shanahan about writing something around what Richard Rorty would say about LLMs today. One idea in my mind: Rorty described metaphor as the key tool we use to shift away from a current vocabulary to a new one. Your essay here nicely explores what metaphors we use for our mental model of LLMs, but I wonder, will we ever see an LLM invoke new metaphors to shift its own vocabulary? (I have my doubts!)

Expand full comment

Reply (1)

Sean Trott

Aug 10, 2024

I'm also somewhat doubtful, though it'd be fascinating to see!

Expand full comment

Will O'Neil

Aug 9, 2024

Sigh...

I confess that I do not have exhaustive or even very extensive knowledge of current LLMs, but I am led to believe that all are linear mathematical interpolation and extrapolation systems. As always in mathematics one does well to study the properties of the general class of systems, after which it becomes very easy to fully understand they properties of restricted sub-classes, such as LLMs. Analogies can serve only to cloud or distort real understanding.

It is well known that linear mathematical interpolation and extrapolation processes are approximated only to quite a limited extent in the cognition of mammals, including humans. (Too little is known, as yet, of non-mammalian cognition to make confident generalizations.) Thus analogies between LLMs and human cognition are of very limited validity at best, and likely to significantly mislead.

Expand full comment

Reply (1)

Sean Trott

Aug 10, 2024

Yes, that's fair, and I think it'd be great if there were more theoretical/math work characterizing LLMs. I'm certainly not the first person to note this, but the ML field sometimes feels quite empirical and a little disconnected from the more theoretical perspective.

Expand full comment

Reply (1)

Will O'Neil

Aug 10, 2024

It is somewhat odd. I've lost touch with most of the AIers I once knew, but I believe it is still true that many do have pretty good backgrounds in mathematics.

About 60 years ago, as I recall, I was eating with Richard Bellman and Robert Kalaba before Dick was to present a paper. We fell to talking about a mutual acquaintance who had earlier presented a paper featuring wildly exaggerated claims for AI. Dick, who had a famously sharp tongue, quipped that too many of the enthusiasts for artificial intelligence seemed to suffer a deficit of natural intelligence.

There was a little too much truth to it for comfort, but I think we would all have agreed that the real problem was (and remains) less one of deficient natural intelligence but of a failure to think like mathematicians regarding what is fundamentally a mathematic issue.

Expand full comment

Anna Mills

Jan 2

Thanks for this--much to ponder! Definitely finding areas of common interest and maybe overlap with Anuj Gupta, Maha Bali, Yasser Tamer and my paper Assistant, Parrot, or Colonizing Loudspeaker? ChatGPT Metaphors for Developing Critical AI Literacies at https://openpraxis.org/articles/10.55982/openpraxis.16.1.631

Expand full comment

Reply (1)

Sean Trott

Jan 2Edited

This does look really relevant, thank you for sending it! I’ll add a link to it at the bottom of the post where I also linked Brigitte’s very interesting blog post.

Expand full comment

Tim Smithers

Oct 17, 2024

Hi Sean,

You quote from your LLM-ology post [March 3, 2023] ...

"What’s the right model for understanding how LLMs work?

Is it the human mind––bounded, as in traditional cognitive

psychology, by the skull/brain barrier––or is it something

entirely more distributed?

But why do we need a model here? LLMs are designed and built

by people. Assuming this engineering is done well, from the

designing we will have a specification of what to build, and,

from the verification done after the building we will know and

understand how well this specification has been implemented,

and, from the validation and testing done after the building,

we will know and understand how well the LLM works with

respect to how it was designed to perform in the conditions

and use it was designed for.

A full understanding of a well engineered LLM is thus to be

had from an understanding of the design specification and how

well this was fulfilled. No model is needed for this.

Given that LLMs are well specified using the mathematics

linear tensor arithmetic and frequentist statistics, these

will be the formal languages used in the design specification.

No stories about being "neural networks," or "brain like" etc

are needed, and, I think, only serve to cast shadows and

distortions over a proper understanding of LLMs, what they do

and how they do it.

The idea that we need a model to understand LLMs, I would say,

does the same. It adds more shadows and distortions to a

proper understanding.

-- Tim

Expand full comment

Reply (1)

Sean Trott

Oct 24, 2024

I think I’m broadly sympathetic to the spirit of this critique, namely that we should focus on describing LLMs in mathematical terms and that other conceptual approaches are misleading.

It comes down to what the nature of a good explanation is, and also what will be feasible. As you note, we already have a complete description of the math involved in these systems (we know exactly how they’re trained, and in many cases what they’re trained on and even the final value of their parameters). But for many people (including myself) there’s still a gap in understanding how they work—we’d like generalizable theories that make predictions about how they respond to classes of stimuli and perform in various contexts. For that we’ll need some kind of abstraction—and the question for me is what the right abstraction is.

Expand full comment

The Counterfactual

What we talk about when we talk about LLMs