Bad habits
LLMs and the grooves of thought
In July of 2022—now almost four years ago—I wrote about the possibility that large language models (LLMs) could change the way we use language. This was pre-ChatGPT, but after the release of GPT-3. It was a strange liminal period for researchers studying LLMs: it seemed like something significant was happening, but it was difficult, at the time, to predict the shape of how these new systems would impact society.1 Much has changed since then, to put it mildly.
In general, I am not one for forecasting, mostly because I am not very good at it, though I have great respect for others who are. Of more personal interest to me is whether someone produces a conceptual framework that makes future events more legible. From this perspective, that initial post succeeded in identifying (as did others at the time, and since) that LLMs and other machine learning technologies could reshape cultural practices in significant ways. I also continue to think the two competing hypotheses for how language, specifically, might change—what I called the nova hypothesis and the homogenization hypothesis—are interesting and helpful (and reality seems to be leaning towards homogenization, at least for now), though here, too, I owe a conceptual debt to the writings of Jenny Odell and others on topics like “algorithmic entombment”.
At the same time, my initial essays on this topic missed the mark in two pretty significant ways. First, I drastically underestimated the extent to which people would use LLMs to generate large swaths of text for them wholesale. I was imagining effects “around the margins”, so to speak, driven by a kind of sophisticated autocomplete; I did not imagine that people would use an LLM to write entire Substack posts or even academic papers. In retrospect, I think this is partly because of the state of the technology (LLMs have improved since 2022), and partly because of the interface in which the technology was embedded: a chat interface using instruction-tuned models makes it much easier to use these tools for freeform text generation.
Second, and more significantly, I underestimated how polarizing LLM-generated text would be. I assumed, of course, that people might not react positively to the idea that someone has sent them an email written entirely by an LLM. But I did not anticipate the level of antipathy or even revulsion that would develop towards synthetic text—indeed, I failed to predict even my own sense of despair upon encountering an online landscape increasingly filled with synthetic writing.2
Much has been written on the topic of LLM-generated text at this point, including by writers much more eloquent than me. I don’t presume to add much to this discourse at this point. To be honest, much of it makes me sad, even though I’ve written about it before and even investigated statistical signatures of LLM-generated text. I feel sad when I encounter text that seems to be LLM-generated, especially in the context of academic writing or peer review—which, unfortunately, has gotten much more frequent in the last year or so. But I also feel sad when I consider how frustrating it must be for someone to be incorrectly accused of using LLM-generated text in their prose. I feel sad that the well of online discourse has been further poisoned in this particular way.
But I do want to discuss two topics that are, I think, related. Why does LLM-generated text inspire such strong negative reactions in many people? And what attitude should we have towards using LLMs, especially in our own lives? In both cases, I can, of course, speak only for myself, though perhaps some generalizable lesson can be drawn.
Disgust, horror, mechanism
There’s been considerable discussion in recent weeks about a prize-winning short story that bears many signs of having been at least partially LLM-generated. I don’t much care for the story, and I suspect I’d feel that way regardless of whether I was primed to think it might’ve been LLM-generated (e.g., if I’d read it in 2016 rather than 2026).
One answer to why people don’t like LLM-generated text, then, is that it is bad, and distinctively so. “Bad”, here, might mean many different things to different people, but in my own experience, the aesthetic weaknesses of LLM-generated text often feel like exaggerations of stylistic motifs one might encounter in human writing—including human writing we might even consider good. The infamous em-dash is, of course, an example, and one I refuse to relinquish; but so is the “it’s not just X, it’s Y” construction, along with many of the other constructions we’ve come to associate with LLM-generated text.
In moderation, some of these constructions might be effective rhetorical devices, or might at least appear in the prose of effective rhetoricians. But LLMs make use of them, and in some cases misuse them, to a degree that lays bare their status as devices, resulting in the feeling that one is reading an essay or story by someone trying very hard to sound profound, but without the patience to construct an argument or narrative to earn that profundity.
Those devices really are recognizably human, though, even if now we take them to be undeniable signatures of LLM-generated text. Perhaps the grating irritation we feel upon encountering these ersatz constructions is driven in part by this recognition and the accompanying inference (regardless of whether it is true) that much of language is, in fact, a mechanical thing; akin to the horror some people might feel upon observing the ceaseless working of organs in the human body and feeling that one is, in the end, a kind of meat machine.
Simulacra
The problem with this aesthetic account is that it relies on the assumption that LLM-generated text is necessarily distinctive. But this is by no means guaranteed. LLMs can already pass as human in the adversarial context of an online Turing Test; it stands to reason that an unsuspecting reader or interlocutor could easily mistake synthetic text for human writing “in the wild”.3 Moreover, synthetic text is even now not a singular thing: people with particular interests or expertise can elicit strange and novel outputs from these systems that don’t exhibit the traditional hallmarks of LLM writing, some of which might even be quite interesting to read (provided it’s clearly marked, as I suggest below).
Thus, as an explanation for why people dislike LLM-generated text, I think the claim that the dislike is borne specifically from a distinctive aesthetic quality falls short. It’s also insufficient for mounting a principled opposition to synthetic text making its way into certain domains of human writing. Put another way: I think many people don’t like the idea of reading LLM-generated text in certain contexts regardless of how that text is written—and basing an argument on certain contingent empirical facts about LLM-generated text makes that stance difficult to maintain if (say) models are changed in ways that eliminate current aesthetic signatures.
In many situations, we’re interested in reading something because we think a human wrote it. If we later find out that it was produced by an LLM—even if it is aesthetically indistinguishable from what some human might have written—we might feel irritated or even betrayed. The explanation for this might involve embarrassment (we were fooled), offense (we assume communication involves honesty and effort from both parties), or even loneliness (we thought we were communing with another person with thoughts and experiences of their own4). This is by no means an original thought, but it’s worth disentangling from the argument that people dislike LLM-generated text purely on the basis of its aesthetic properties. Indeed, in some cases we might generate a distaste for the aesthetics because of the source, not the other way around.5
For my part: if I discover that text was LLM-generated in a context when I was expecting and hoping to read something written by a human, I am upset for the reasons I enumerated above. But I am not intrinsically opposed to reading text produced by an LLM; I use LLMs frequently, in fact! In some cases, the fact that something was generated by an LLM is actually the source of my interest in it. I can even imagine creative uses of an LLM, for instance, that play with our understanding of how language works, or that reveal the intriguing or surprising consequences of representing language as a statistical process. In other cases—the cases when I want to read something by a human—the knowledge that something is LLM-generated might well make me lose interest in reading it, which I think is (again) a strong argument that LLM-generated text should come with a tag.
One interesting question is how the discovery that something was LLM-generated compares to the discovery that something was written by someone other than the person you thought wrote it. Two examples come to mind. First, in Her, Joaquin Phoenix’s character (Theodore) makes a living writing cards (condolences, anniversaries, etc.) for other people; it’s not clear to me whether the cards are passed off as written by the sender, but let’s suppose, for the moment, that they are. Second, the central conceit of Edmond Rostand’s Cyrano de Bergerac is, of course, that a handsome but inarticulate man (Christian) collaborates with a less handsome but very eloquent man (Cyrano) to woo the woman they both love.
Opinions differ, I’m sure, on the ethical dimensions of both Theodore’s and Cyrano’s actions here. But I suspect that even those critical of either protagonist would acknowledge that these situations seem distinct, somehow, from passing off LLM-generated text as your own. An obvious difference is that the “generative process” for producing the substitute language in both Her and Cyrano de Bergerac relies on another human with the capacity for phenomenological experience.
And in Cyrano de Bergerac specifically, that other human is in love with the recipient, just like the putative sender (Christian); the feelings the putative sender wishes to convey, then, are shared by the actual writer. All of which is to say: even if one is uncomfortable with Christian’s and Cyrano’s deceit, here, it feels simply like a categorically different kind of act than using an LLM to write (say) your love letters—though I acknowledge that one’s intuition on the distinction here will depend in part on one’s beliefs about the capacity for LLMs to have phenomenological experience.
My broader point, however, is that distaste for LLM-generated text is not solely an empirical property of the words or constructions contained in that text. It is also, and perhaps more fundamentally, a property of the process we believe was responsible for the text.
Modes of use
I’ve focused, so far, on the question of why many people have such strong distaste for LLM-generated text. But I think this issue is to some extent inextricable with the debates about using an LLM in one’s own creative or cognitive process. These debates often center around writing, for reasons I’ll discuss momentarily, but in principle the contours of the debates also fit with other areas of life.
I’ll be direct: I don’t like the idea of using an LLM in my writing process, and thus far have not found LLMs to be particularly useful either for ideation or for crafting prose. For me, this is most clearly true of creative writing (for which the idea of using an LLM feels like a category error6), but it also applies to writing these Substack posts and to academic writing. I have used LLMs for quickly finding typos in a large body of text, and I think they are fairly useful in this role.7
I mention this to point out that it is difficult, in fact, for me to imagine why I would want to use an LLM to write the things I write. It’s become something of a cliche to point out, but it really is true for me that the writing process is deeply intertwined with the thinking process. Writing clarifies thought, and it also identifies where thought is unclear; how often have I started an essay, confident that my point was clear, and realized in the act of writing that there was some deep confusion—or, more interestingly, that my point was actually something else entirely? Even when I do have a good idea of what I want to say ahead of time, writing adds meat to an outline’s bones. This is to say nothing of more meandering essays (like the ones I write at the Leaky Margin) or creative fiction.8
I should acknowledge, however, that I enjoy writing. Many people don’t enjoy writing, or they have to write prose in contexts where they don’t believe (rightly or wrongly) the act of writing is central to their thinking process. Here, an instructive contrast might be to programming: lest you think me some kind of purist, this is one area where I do regularly rely on LLMs. LLMs really have gotten quite good at writing code, at least for applications that are prevalent in their training data. This makes them useful for the kinds of targeted modeling or analysis scripts I’d usually write by hand. They’re also useful for walking through code I don’t understand, line by line.
Why do I feel comfortable using LLMs to write code but not prose? The answer is not as simple as suggesting that I enjoy the latter but not the former—in part because I do, in fact, enjoy writing code in some cases. I think the more accurate, and more interesting explanation, is that there are some tasks that seem purely instrumental (a means to an outcome), and other tasks where the doing of the task seems somehow constitutive of the outcome. For me, the point of writing is not (only) to produce a chunk of text that efficiently conveys a thesis to a reader; the point is also to craft that text myself. I’m reminded, here, of something I read in an essay by Derek Thompson a few months back:
As AI gets better at automating more tasks, I suspect that students and workers will have to cultivate and sustain a new kind of wisdom. They’ll have to answer the question: What are the parts of life where I could use AI, but I shouldn’t, because I want to protect this skill or habit from atrophy?
I loved this bit of wisdom from the author Agustin Lebron. A simple way to figure out whether to use AI at work, or in life, is to think about the difference between a gym and a job. At a gym, the point isn’t for the weight to be lifted, but for you to lift the weight. At a mere job, however, “the point is for the weight to be lifted.”
Use AI for the jobs in your life. Don’t use AI for the gyms in your life.
The personal challenge that I think many people (including myself) will face is delineating “jobs” and “gyms”. I worry that there are currently too many pressures pushing people to automate their cognitive processes with an LLM: a fear of “falling behind”, for instance, which is exacerbated by the rhetoric one encounters from official and unofficial marketing for LLMs. Even in the absence of this fear, though, I think there would still be a deep temptation to reach for an LLM when one encounters a moment of struggle. It takes a great deal of intentionality to determine, first, that one wants to do something oneself; and a great deal of inhibitory control to resist, in the moment of struggle, that temptation to reach for a potentially easy solution.
It is my belief that this will largely be a question of habit.
The grooves of thought
One of my favorite essays by William James is his chapter on Habit. In it, he emphasizes the importance of routine action in inculcating the principles and virtues one wishes to uphold. Each time we take an action, we might think of it as deepening a particular set of grooves in our mind that make that action easier to take in the future—and, perhaps, an alternative action less easy to take or even contemplate. He writes (bolding mine):
Seize the Very first possible opportunity to act on every resolution you make, and on every emotional prompting you may experience in the direction of the habits you aspire to gain. It is not in the moment of their forming, but in the moment of their producing motor effects, that resolves and aspirations communicate the new 'set' to the brain…A tendency to act only becomes effectively ingrained in us in proportion to the uninterrupted frequency with which the actions actually occur, and the brain 'grows' to their use.
Moreover, a failure to act on our convictions will over time weaken those convictions and resolve (bolding again mine):
These latter cases make us aware that it is not simply particular lines of discharge, but also general forms of discharge, that seem to be grooved out by habit in the brain. Just as, if we let our emotions evaporate, they get into a way of evaporating; so there is reason to suppose that if we often flinch from making an effort, before we know it the effort-making capacity will be gone; and that, if we suffer the wandering of our attention, presently it will wander all the time.
I think this is all quite relevant to our use of LLMs and external tools or devices more generally. Anyone who’s tried to reduce their screen time likely knows that it is difficult to do so without the use of some additional commitment device, like leaving your phone at home or disabling access to (say) social media. Speaking as someone who’s taken, at various points, both actions: these commitment devices actually alert you to the strange sensations that occasionally bubble up, typically in moments of anxiety or boredom, which manifest as an impulse to “check your phone”. When your phone is in your pocket, the impulse is essentially continuous with the “motor effect” (to quote James). But when your phone is back at home, you cannot, of course, do so; the sensation thus becomes more perceptible and discontinuous from the motor effect.
What I am describing here is what, I think, others might refer to as mindfulness: a kind of attention to our interior thoughts and impulses and an ability to recognize them as, in principle, separable from our motor actions. My suggestion is that on an individual level, we would likely benefit from this kind of mindfulness regarding our use of LLMs. As I’ve tried to argue throughout this essay, I am not opposed in principle to the use of an LLM in any context (though I understand that some are); I also recognize that people will likely come to different conclusions about which processes they view as instrumental and which they view as constitutive. But I do think people should be clear-eyed about the costs involved in each case, and I also think that, having identified the things they wish to do and understand themselves, it is useful to view the use of LLMs as a kind of “habit”, which might well be a bad habit in many circumstances.
For me, that means using LLMs with a very clear goal in mind, for tasks that I don’t particularly need or want to do myself. Perhaps it is their interactive structure, or perhaps I am particularly susceptible, but I find that if I consult an LLM with a more open-ended question, it is easy to be “tugged” along in the course of an interaction in ways that feel disconcertingly out of my own control. That doesn’t mean these goals are restricted to identifying typos or writing boilerplate code: I recently used Claude to help walk through the math in this 2021 paper. But I like to know what I want from an interaction before I start it.
It is strange to have to make these decisions. Something I have not mentioned here is, of course, the more fundamental question of whether an LLM can competently perform a task in the first place; but I take that to be separable from the question of whether, if it can perform the task, it should. I have also neglected the topic of education, which is full of tasks that appear instrumental but are at least intended to be constitutive. My point here is more personal: I do not presume to prescribe or proscribe anything for anyone else, but I hope that in enumerating my own thoughts, others might be given some kind of conceptual framework for navigating these issues themselves.
Though, of course, many people tried (including me, in that original article).
Actually, upon reading this introduction, my wife kindly reminded me that I did, in fact, anticipate some of this despair, bemoaning at some point in mid-2022 the question of my own value in a universe of simulacra.
Here, a reader might object that people with more experience reading LLM-generated text are better at identifying it. This is true (for now), and as a practical matter it is, of course, relevant to the question of detection. But I think it’s separable from the question of why people don’t like LLM-generated text.
But what if, a reader might object, LLMs are conscious entities themselves? I’m not going to address this debate here, but I think the sense of betrayal still holds: even if you think Claude is conscious, you might still be frustrated by someone passing off Claude’s text as their own.
It’s worth noting that the social norms here are still developing, and it might be some time before different people align on their communicative preferences. Just as politeness norms vary substantially across cultures, norms about whether and when it is appropriate to use LLM-generated text will likely vary across people and contexts. Some people might object to the use of an LLM for any purpose. Others might distinguish between “cosmetic” uses (e.g., copyediting an essay) and more “generative” uses (e.g., prompting an LLM to write the entire essay). Still others might be entirely fine with reading any LLM-generated text as long as it is marked as such.
Again, this is not to say the outputs of an LLM can’t make for interesting art, but I suspect that this art would look quite different from (say) prompting ChatGPT to write a short story, and might (for example) center instead around pushing the model into different regions of state-space, so to speak, revealing properties of language and how the model represents linguistic structure. (I’m thinking, here, of a rough analogy to the visual art Eryk Salvaggio has made.)
Though they are non-deterministic in a way that makes them different from, say, a standard grammar-checker. Pasting the same essay into ChatGPT or Claude multiple times might reveal different grammatical or spelling errors.
This is also true, I think, of reading. I, like Daniel Muñoz, do not much care for the idea of replacing reading books with “vibe-reading”. I do not think reading a summary of a text gives you the “same information” as engaging with the text itself. Ezra Klein has described the belief that these things are equivalent as something like the “Matrix view of the mind”, in which one can simply “download” the raw, distilled “information” from a text into your brain. This is not to say reading summaries is bad! It just seems obviously different to me than what happens in your mind when you read the source.
"In many situations, we’re interested in reading something because we think a human wrote it." This is definitely the issue for me when reading Substack or LinkedIn posts, because of the corollary that follows: if a human wrote it, they might care about what I say about it. Whatever intelligence I might grant LLMs right now, caring about what I think about what they said is not among them. For now, responses to comments often make clear what was originally human-authored and personal and what was not.
It's really interesting to read some normative thoughts about AI use from this research-informed perspective! I think I largely agree with the "job" vs "gym" sort of model; however, what worries me is that (1) I think, in practice, very few things would or should be considered "jobs" in Thompson's sense, and (2) the general direction many people seem to be going in involves classifying more and more things as "jobs".
I think research is a great example, where plenty of people have already been incorporating LLMs at nearly every stage. I've seen AI to generate research protocol proposals, AI to run stats on your data, AI to generate a manuscript. I don't think any of this is inherently bad! In fact, I think it's likely that AI models are quickly outpacing humans in important things like, say, interpreting radiological images. This will probably be great for patient outcomes and so on. On the other hand, I like to think that at least part of the point of research is for the researchers (and people in general) to better understand things. So if I let an LLM do all these "job" tasks for me (and I don't also ask it to explain what it's doing), it seems like my actual job is incomplete in some way.
A separate, but related thought: in my experience, many people -- especially young people -- are also fearful of AI precisely *because* it threatens to do their jobs (I mean it in both senses here). In the first sense, I'm referring of course to the more pragmatic worry that current grads might be unable to find work as roles get filled by AI. In the second sense, though, I'm referring to the idea that people might desire both "job" and "gym." That is, one might want to spend effort on something *because* the outcome is important. Having put the effort in (rather than having automated it) makes one feel, maybe, like they meaningfully contributed to something. I think this notion is more widely discussed and advocated for in creative disciplines like art, as you mention in this article, but I suspect it could be relevant to a lot of other disciplines as well (whether people acknowledge it or not).