Is ChatGPT "grown, not made"?
What we do and don't know about large language models.
A feature of modern large language models (LLMs)—the architecture underlying technologies like ChatGPT—that is often hard to convey to non-experts is their opacity.
Systems like ChatGPT, after all, are human artifacts. They didn’t spontaneously materialize one day from the sky or from the depths of the ocean; they are the result of human research, engineering, and product design. We tend to assume such artifacts are, ultimately, comprehensible: many of us live, arguably, in a “disenchanted world”, in which we assume that much (if not all) of that world—at minimum, the human-designed portion of it—can be explained in terms of legible, mechanical principles.
Put another way: I don’t know exactly how a toaster oven works, but I assume that someone, somewhere does.
It seems strange, then, to assert that even the people building LLMs don’t entirely understand how or why they work. How could we build them if we didn’t understand them? I don’t want to overstate the case here—as I point out below, we obviously know some things about how LLMs work. But there are, nonetheless, epistemic gaps. The reason for these gaps is sometimes conveyed in the following metaphor:1
LLMs are grown, not made.
As I’ve written before, all metaphors have their strengths and weaknesses: any given metaphor highlights certain aspects of the frame its describing and obscures other aspects. This metaphor is no different. I like some aspects of it, and I’ve relied on it (or a variant of it) in the past to communicate the basic idea of LLM opacity, but I also have some reservations about applying it uncritically. Below, I attempt to convey more precisely what I mean when I say “we don’t understand how or why LLMs work”; at the end of the post, I return to the “grown, not made” metaphor and discuss my mixed feelings in more detail.
What we do know
There is much we do know. We know, first, what LLMs are trained to do: predict upcoming words on the basis of their prior context.2 For so-called “open”3 models, we know what data the LLM was trained on, ideally in which order.
We also know the mechanisms underlying this training process: as Tim Lee and I wrote in our explainer on LLMs, you can think of training as updating a bunch of “knobs”—like changing the temperature in the shower—to help the LLM get better and better at predicting upcoming words. The direction and magnitude by which we turn each knob is based on the error signal the LLM gets during training. After lots of examples, it gets quite good at predicting words. Moreover, for open-weight models, we know the final weights—roughly, the values of these “knobs”—resulting from this training process, and we know the mathematical operations by which particular inputs are transformed (via a “forward pass”) into predictions about the next word.
Of course, systems like ChatGPT are also subject to extensive “post-training”, which includes teaching them to follow instructions, giving them human feedback, and incentivizing them to “think step by step” when solving hard problems. We also know, more or less, how this works.
It’s also worth noting, here, that even though I don’t know all the training details or weights for closed-source models like ChatGPT, OpenAI employees presumably do (even if that cumulative knowledge is distributed across multiple individuals). The information is knowable and known.
Why is this not enough?
What we don’t know
For some people, it is. As I wrote in my mechanistic interpretability explainer, some have argued that interpreting LLMs is effectively a solved problem. In fact, some researchers in the field don’t even seem to think it’s a relevant problem: recently, a manuscript I submitted was criticized by one reviewer on the grounds that we already know, more or less, how transformer language models work—there was no need to conduct further research on the mechanisms they use to contextualize the meaning of words. I don’t think this is a widespread opinion in the field (the other reviewers liked the paper, and found the topic interesting and important), but it is held by some.
That review was a good opportunity to clarify exactly what it means to say “we don’t understand how or why LLMs work”. When people say this, they usually mean at least one of two things (or both).
First, LLMs appear capable of doing things that they weren’t explicitly trained to do. In the course of learning to predict upcoming words, LLMs acquire internal representations and mechanisms that enable them to do this more effectively; these mechanisms, in turn, allow LLMs to produce predictions that are sensitive to myriad factors, like whether an utterance is grammatically well-formed, whether a sentence conveys a plausible event, or even what a character in a story knows or doesn’t know.
Now, as I’ve argued before, the question of whether LLMs actually possess certain capacities (like “grammar” or “Theory of Mind”) is a deep question about the construct validity of the measures we use to assess those capacities, which will require extensive philosophical and empirical work to resolve. The crucial thing, however, is that LLMs weren’t explicitly trained to produce these interesting behaviors: they are a byproduct of the thing the LLM has actually been trained to do.4 Moreover, we lack a thorough understanding of which capacities we should expect to develop through this training process, or why these capacities would develop at all. We have hypotheses (e.g., the idea that learning to predict text encourages a model to “reverse-engineer” the causal process giving rise to that text), but this is very different from the mechanical understanding we have about (say) a toaster oven.
Second, although we know the values of the weights (“knobs”) in an LLM, we don’t know how exactly those weights implement the computations and behaviors we observe. Epistemologically, you might think of this as a gap between levels of analysis. We have a high-level description of what the components of a transformer language model do (e.g., attention heads are a “matchmaking service for words”); we also know, at a very low-level, the precise mathematical operations underlying a specific instantiation of those components (e.g., the result of the query/key/value operation for a particular attention head for a particular input). But in the absence of empirical investigation, we don’t know which behaviors those individual components subserve.
Figuring this out is, in large part, the goal of mechanistic interpretability. And I’d argue we’re making some progress! For instance, interpretability researchers have discovered interesting components like “induction circuits” in a number of different LLMs, which determine whether a given token (e.g., “the”) has previously occurred in the context (e.g., at position t), then predict that the subsequent token will be the one following the previous one (e.g., at position t + 1). There’s even some evidence that these circuits might play a role in important abilities like in-context learning (though recent evidence suggests the situation is more complex than it initially appeared).
At the same time, interpretability has a long way to go: many fundamental questions remain about the nature of circuits and the best ways to go about finding them.
More broadly, it’s notable that this picture is, again, very different from the epistemic picture of a toaster oven. With LLMs, we attempt to identify circuits in a post-hoc manner, i.e., after the system has been trained; and even in the best-case scenario, we generally can’t be certain that we’ve accurately identified the correct function of a given circuit.
An analogy might be in order here. Suppose we construct a mechanical system of cogs and levers, which can be modified according to the various inputs to that system. We then “train” that system to produce certain patterns of exhaust fumes in response to various inputs. The system excels at this task, and in the process, we learn—to our surprise—that training it to produce these exhaust fumes has, inadvertently, taught the system to move forward and backward; moreover, these movements appear to be appropriately calibrated in response to different patterns of input. That is, the system can “drive”. We don’t know how or why this happened, though we have some plausible hypotheses. We also don’t know exactly which cogs and levers are responsible for the system’s “decisions” to move forward or backward in different contexts, though, again, we’re making some progress in the endeavor to map these components onto observable behaviors.
Grown, not made?
Let’s return, then, to the metaphor at hand: is a system like ChatGPT better described as “grown” than “made”?
In the language of conceptual metaphor theory, metaphors typically work by construing some target frame (e.g., ChatGPT) in terms of a source frame (e.g., biological growth processes). As I noted at the start, metaphors don’t necessarily construe every aspect of the target frame, nor do they use every dimension of the source frame. My sense is that the “grown, not made” metaphor emphasizes the contingency and unpredictability of living organisms. While fields like genetic engineering have clearly made incredible strides in recent years, I think most people would still endorse the claim that we understand less about the processes underlying the growth and development of biological organisms than those underlying the construction of a toaster oven or a combustion engine.
But there are lots of things we don’t understand as well as toaster ovens: dark matter, anesthesia, consciousness. Why select the grown vs. made contrast in particular? I think there are a couple aspects of the biological growth and development frame that make it a convenient vehicle for the goals of this metaphor.
First, there’s a human element to both growing and making that makes them similar enough to be meaningful counterparts. Biological organisms grow on their own, too, of course, but human societies have long reshaped their environments to better suit their needs, which includes practices like agriculture and animal husbandry. Even the verb “grow” acknowledges this ambiguity: the subject of the verb can be the thing growing (“plants grow when provided sunlight”) or it can be the agent controlling the growth process (“humans grow plants for food”). Framing ChatGPT as something “grown” accommodates the fact that it was, after all, created by humans—unlike, presumably, dark matter.
Second, there’s much we still don’t understand (and can’t control) about growing life. For instance, a gardener can establish the conditions that facilitate growth (e.g., the approximate quality of the soil, the number and depth of seeds planted, the volume of water, the exposure to sunlight), but they can’t directly control the growth processes themselves, and there’s always some element of contingency. A proponent of the “grown, not made” metaphor might argue that there’s something similar about training LLMs: engineers set the training conditions (e.g., the training data, the initial parameters5 , the training objective, and the architecture), but typically don’t control the specific representations or mechanisms developed by the LLM.
As noted earlier, I have mixed feelings about this metaphor. Its strength is that it concisely and effectively conveys both that humans do create LLMs, but also that the engineers creating them lack the kind of direct control and understanding we typically associate with technology. An argument in favor of the metaphor would thus point out that it’s a quick way to illustrate to someone why it’s possible to create something without fully understanding it.
That said, the metaphor doesn’t, on its own, convey what it is that we don’t understand or why, which is what I’ve tried to convey in this post. The engineers training LLMs know what they’re trained to do, and how they’re trained to do it, but they don’t always know why LLMs end up doing other things besides what they’re trained to do—or how they do those other things. I don’t think that’s a problem with the metaphor, per se; a metaphor can’t do everything. But the one-sentence summary I just wrote also does (I think) a reasonably effective job of conveying, in the absence of metaphor, one of the core points that the metaphor is also trying to convey. I think there’s virtue in saying things as precisely as one can while still getting the basic point across, especially when discussing scientific topics, so I will strive to describe it this way in the future.
You see this metaphor, for instance, in discussion of the new book If Anyone Builds It, Everyone Dies, as well as this article breaking down the strange “seahorse emoji glitch” observed in ChatGPT and other models.
For auto-regressive models; bidirectional LLMs are trained to predict masked tokens using both “left” and “right” context.
I’m glossing over, here, the distinction sometimes made between “open-data” and “open-weight” models, as well as “fully open-source” (in which all training code is also made available). Openness, safe to say, is a gradient.
I’ve tried to avoid, here, relying on the terminology of “emergence”, though that is typically how these “byproducts” are described. This paper by David Krakauer, John Krakauer, and Mitchell discusses the technical definition of “emergence” in-depth and when and why it may not apply to LLMs.
Literally referred to as the “random seed” in some cases.

Love this!
I don't see how you can import growth from nature without complexity and emergence coming along for the ride. Nature grows complex things and complex things are inherently nonlinear and indeterministic. Nature's capacities are not the result of design but emergence .Sometimes humans create complex systems that grow in this way, like cities, markets, and now LLMs.