How I use (and don't use) ChatGPT
An update on when and how I use LLM-equipped software tools, and when (and why) I don't.
As a computational cognitive scientist, much of my research these days involves large language models (LLMs) in some capacity: either as “model organisms” to better understand humans, or as objects of study in their own right—a discipline I call “LLM-ology”. I also developed and taught a course at UC San Diego on the intersection of LLMs and Cognitive Science. And, of course, I regularly write about LLMs and LLM-ology research here on the Counterfactual. All of which is to say: LLMs play a fairly large role in my working life.
That said, most people are not conducting academic research on LLMs. Rather, most people likely use LLMs embedded in some kind of larger software system:
Notably, however, many of the commercial LLMs people interact with are not pure “vanilla” LLMs: they’ve undergone various forms of fine-tuning, and they often have access to external applications, like a Python shell or a search engine. Sometimes they’re even programmed to run somewhat autonomously, executing many “actions” in the absence of direct user input.
I include well-known systems like ChatGPT here, as well as programming tools like Cursor.
In addition to studying LLMs, I use LLM-equipped software tools pretty regularly in my work. About two years ago, I wrote an article about some of these use cases. People seemed generally interested in that article, and both LLMs and the software systems in which they are embedded have changed a fair bit in the last couple of years—so I thought it was time for an updated perspective.
I also hear a lot from students or readers of the newsletter that they just don’t know what to use these systems for. To be clear, it’s entirely plausible that many people don’t have a ton of use cases. Contrary to the views of some pundits, I don’t really think anyone should feel pressured to start using LLMs all the time. The point of this article is simply to describe how one person (me) uses LLM-equipped software tools (primarily ChatGPT)—and a few examples of things I’m uninterested in using them for.
How I use ChatGPT
I have a number of ongoing “conversations” with ChatGPT. To write this article, I did a quick survey through the last 50-60 conversations and developed a rough taxonomy of use cases, along with a few relevant examples. Some of these are repeats from my article two years ago, and some are new. I’ve also provided rough estimates of how much each use case accounts for my ChatGPT use overall.
Use case #1: Coding and analysis (~70%)
I’ve been programming in Python and R for over ten years, and I also teach classes on both to undergraduate and graduate students at UC San Diego. I consider myself a relatively competent programmer, though more in the realm of modeling and data analysis than building elaborate software architectures. Much of what I do falls into one of these categories:
Writing Python code to run open-weight LLMs in inference mode using the transformers package. This includes measuring their responses (e.g., surprisal) to various inputs, accessing their internal representations (e.g., activations, attention scores), and more recently, intervening on those internal representations (e.g., using interpretability techniques).
Writing Python code to scrape, merge, and otherwise wrangle various datasets (e.g., text corpora, tabular data, human behavioral responses). Here, I’d also include tools from the “standard” machine learning pipeline, e.g., anything you might find in the scikit-learn library.
Writing Python code to fine-tune or train language models from scratch, again using the transformers package. Most recently, Pam and I trained a suite of Spanish language models, with the goal of conducting interpretability analyses over the course of pre-training.
Writing R code to read in pre-processed datasets (often the outputs of what I’ve done in Python), merge them, analyze them, and produce publication-ready figures. I really love data analysis and visualization in R and I’m a big fan of the tidyverse suite of libraries. R, in my view, is also superior to Python when it comes to statistical analyses such as linear mixed effects models (e.g., using the lme4 library).
I can do all this stuff without the aid of ChatGPT. In some cases, ChatGPT doesn’t help at all. For example, if I want to make a quick ggplot in R, filter or summarise a dataframe, or even conduct a model comparison, it’s faster to do it myself.
I find ChatGPT useful for programming in at least two different situations.
The first (and more mundane) use case is basically as a replacement for looking up something simple but hard to remember via Google or StackOverflow. That is, I know what I need to do, but I’ve just forgotten the exact syntax for a particular function or API call, or I’m wondering whether there’s a faster implementation of something I’m working on. The advantage of using ChatGPT (or a coding assistant) is, of course, that the answer is even more tailored to my specific situation. This saves me some time, but just on the margins.
The second kind of use case is, in my view, much more interesting, and has also had a bigger impact on my productivity and research progress. Research often involves asking new questions and learning new things (at least, new to me). Recent examples of this for me include conducting more advanced interpretability research, training language models from scratch, and learning how to deploy these tasks on a GPU cluster I just got access to. As I mentioned above, in principle I could learn how to do these things without ChatGPT: for instance, Neel Nanda has some excellent interpretability tutorials, as well as Python packages specifically built to help with interpretability. Similarly, there’s no end of tutorials on training a language model from scratch.
I’ve read many of these tutorials, and they’re incredibly helpful—but it’s also extremely helpful to supplement my learning with ChatGPT, which (again) produces content and suggestions that are much more tailored to what I’m actually working on. With ChatGPT, I can also ask questions about why a piece of code is written in a certain way, and connect specific implementations (i.e., in Python) to more fundamental theories and concepts (e.g., different kinds of “ablation” for interpretability). It essentially functions as a customized, interactive tutorial.
In neither case would I feel comfortable relying solely on ChatGPT. I still derive a lot of value (obviously) from actual tutorials written by human experts. But ChatGPT’s really good for supplementing that content.
Of course, if your job involves a lot of fairly routine programming that you could implement faster than the time it takes to type something into ChatGPT, then it might not be that helpful (though it does seem plausible that an LLM plugin could solve some of those tasks on its own pretty soon). But if you find yourself doing lots of new things—especially things for which there might be some existing documentation and tutorials online—then ChatGPT can be very useful for jumping into a new domain.
Use case #2: Research support (~15%)
Programming is only a small percentage of what I actually do. Much of what constitutes “research” is what I’ll broadly call planning. Here, I’m including things like: operationalizing vague research questions; reviewing relevant literature; and interpreting or contextualizing the actual results of an empirical study (whether mine or an existing study). ChatGPT is less useful here than for programming, but it’s more useful than it used to be.
For instance, I wouldn’t recommend relying solely on ChatGPT for a literature review, but as I’ve written before, it can be really helpful for getting started—either in terms of finding relevant papers or in terms of coming up with a helpful strategy for searching the available literature. Once you find relevant papers, you need to make sure you actually read them and form opinions on them. (There’s no substitute for actually reading the classics!) But here, again, ChatGPT can be helpful for “delving deeper”: in my case, I find it useful for working through complicated mathematical equations and tying the terms in an equation to the concepts in the paper.
Similarly, when I’m deciding which analyses to run on a dataset to answer specific theoretical questions, sometimes I bounce those ideas off ChatGPT to see if there are any obvious issues I’ve overlooked. That said, I still haven’t found it very useful for coming up with genuinely interesting, novel research questions or operationalizations of those questions (more on that later).
Use case #3: Personal life stuff (~15%)
This last category isn’t actually work-related at all, but I thought I’d mention it because it’s still a nontrivial part of how I use ChatGPT: basically, as an information-seeking and planning tool to help me make both short-term and long-term decisions.
This includes “fun” stuff, like searching for author recommendations. For example, I love to read plays, and I’m always looking for new playwrights whose work I can dive into. The best, as always, is to receive recommendations from a friend—but ChatGPT also does a decent job provided with a list of which plays and playwrights I like. Recently, I read two plays by Martyna Majok at ChatGPT’s suggestion and really enjoyed both of them.
It also includes complicated life decisions, like financial planning or a cross-country move. My family and I are planning a move to New Jersey at the end of 2025—a somewhat daunting prospect with a 1-year-old and two cats—and I used ChatGPT to help provide a list of moving companies. This helped my wife and I avoid using a particular moving company that was trying to quickly “close” a deal over the phone, and also just gave me a better sense for the landscape of what moving companies are actually out there. Further, because my wife and I are both changing jobs, there are a number of stressful issues to figure out (E.g., relating to gaps in salary or health insurance), and ChatGPT has been helpful for pointing me to specific resources that can give more information.
Speaking of health advice, I’ve dealt with a number of different medical issues over the past year or so, both my own and those of people close to me. I certainly would not recommend relying solely on ChatGPT for health or medical advice. But just as I sometimes find it helpful to watch YouTube videos or read subreddit posts about dealing with an issue (e.g., lower back pain), ChatGPT can be a useful additional source of information.1 The unfortunate reality is that for many health issues, medical professionals are simply too busy or overwhelmed to answer the many questions or concerns that arise. Again, as with research, I use ChatGPT as a jumping off point—not as the final arbiter of my decisions. That way, I can bring specific (rather than open-ended) questions to my actual medical providers.
The big caveat here—especially with important decisions relating to health and finances—is that it’s very important to independently confirm the suggestions or answers ChatGPT gives. ChatGPT can be confidently and spectacularly wrong, and in some cases, those errors could be actually harmful. Recently, for example, ChatGPT gave me some suggested physical therapy exercises for engaging my lower back; without going into details, I know enough about spinal anatomy to know that the advice didn’t make any sense from the standpoint of physical mechanics. When I inquired further, ChatGPT first doubled down, insisting that my questions constituted a “common misunderstanding”. It was only when I directly pointed out its errors that it admitted fault, though it implied that the error was simply a miscommunication. All in all, it was an annoying experience, and if I’d been less informed about how my back works, I could’ve hurt myself badly. (To be fair, the same could happen with bad advice from a YouTube video!)
This is why it’s so difficult to develop the right kind of “epistemic intuitions” for how to weigh outputs from ChatGPT: their epistemic stance is authoritative, but in my experience, the system underlying ChatGPT lacks the right kind of epistemic capacities to merit that stance.
How I don’t use ChatGPT
Above, I divided my ChatGPT usage into three broad categories: programming and analysis (~70%), research support (~15%), and personal life stuff (~15%).
Focusing solely on work, programming help accounts for ~82% and research support accounts for the other ~18%. Notably, I’m not saying ~82% of my programming is done with ChatGPT help (the actual number is probably more like ~50-60%): rather, 82% of my ChatGPT usage is for programming.
Further, this ~82%/18% breakdown doesn’t reflect the amount of time I spend on programming or research planning, respectively. Even ignoring teaching (a huge component of my job), as well as the various administrative tasks that come my way2, hands-on programming constitutes a minority of my research time—probably less than 10%. The rest of the time is spent:
Deciding what high-level research questions are important and interesting.
Translating abstract research questions into something concrete and actionable.
Planning the stages of a research project.
For experimental work, designing and assessing stimuli.
Reading academic papers.
Writing up the results of a study.
To be honest, I still don’t find ChatGPT (or any other LLM-equipped software tools) particularly helpful for these things.
In my experience, ChatGPT doesn’t have great “research taste”. It’s also not very good at operationalizing research questions. Perhaps it’s simply too sycophantic, but it tends to be overly eager to suggest specific analyses that, even if implemented successfully, wouldn’t actually shed light on the underlying research question. Of course, in the spirit of fairness, many people are not very good at operationalizing their research ideas either. Probably the single most useful thing I got from graduate school—and specifically from working with my advisor, Ben Bergen—was learning to distinguish ““this idea sounds cool and vaguely related to the broader research question” from “this idea will actually concretely address the research question”. I wouldn’t even say I’m good at it. I’m just better than I was ten years ago.
It may surprise some people to learn that ChatGPT is not useful for generating stimuli, especially given the rise of synthetic datasets in the LLM research world. Perhaps it’s my background in Psycholinguistics, but I think developing stimuli is one of the most important and edifying stages of the research process. Many people want to skip through it because it seems boring or tedious, but it’s where the rubber actually meets the road. If you’re running an experiment (on humans or LLMs), the stimuli (or survey questions, etc.) are what participants actually see. Thus, it’s integral that you’ve controlled for all relevant confounds and that the differences between stimuli actually reflect the theoretical constructs of interest. This is really hard to do well, and ChatGPT has failed badly each time I’ve tried to use it to help. That’s also why I have a hard time trusting research that relies solely on entirely synthetic datasets that haven’t also been carefully validated by human experts.3
It may also surprise people to learn that I don’t find ChatGPT useful for writing. Producing text is, after all, one of the main things that ChatGPT can do! But for me, writing is very much intertwined with thinking. This works in two ways: first, if I can’t write my thoughts out in a clearly structured argument, I know my thoughts still aren’t very clear; and second, the process of actually writing out my thoughts forces me to think more rigorously and carefully, which helps hone my underlying argument. Substituting ChatGPT for the writing process would be like substituting out much of the thinking process too (at least for me). Further, I also just really enjoy writing, and I have no interest in using a software tool to replace that process—either for academic articles or newsletter posts.4 That said, I know that many people find it useful to use AI software (e.g., Grammarly) to check their grammar or writing style, and I’ve used ChatGPT sometimes to check for typos. These use cases seem more cosmetic than fundamental to the writing process, though I recognize some people might disagree.
For work, then, there are still many activities I just don’t find ChatGPT very useful for yet. That could change, and certainly there are some people who think that many types of “cognitive” labor could soon be automated. I don’t think that’s impossible in principle—and it’s also not the point of this post—but I think it’s easy to underrate how hard it is to get some things right. I worry sometimes that certain processes will be automated prematurely and that while it won’t necessarily be immediately catastrophic, the overall quality of a system or service will simply decline.
Broader lessons?
I’m just one person, and my use of ChatGPT is surely somewhat idiosyncratic. Are there any broader lessons here?
With regards to work, I do think it’s striking that despite the many real advances in LLM technology, my current use cases are pretty similar to how I used ChatGPT two years ago. One explanation for this is that I’ve simply been too slow to adapt to technological improvements. But another possibility is that the improvements have mostly made ChatGPT better at things it was already decent at, while not necessarily improving substantively at other tasks. The latter explanation tracks with my own experience: ChatGPT has gotten a lot better at writing reliable code, and accordingly, I use it much more for programming (~50-60%) than I did two years ago (more like ~10%-15%). In contrast, ChatGPT still isn’t very useful for designing carefully controlled stimuli, and I still don’t use it for this (though I try intermittently).
In principle, I think ChatGPT could be more useful for high-level research planning or theorizing and that I’m under-utilizing it for these purposes. There’s certainly lots of information stored (for lack of a better word) within the weights, and many potentially novel recombinations of those bits of knowledge. The problem, in my experience, is that ChatGPT in practice is simply too agreeable: it doesn’t productively poke holes in my ideas the way my advisor did, and instead rushes to come up with analysis ideas. It’s also, as I noted earlier, confidently wrong much of the time: it uses many of the right words (e.g., referencing jargon from the research topic in question), but I get the feeling it’s gesturing excitedly towards concepts in an effort to impress, rather than to improve both of our understanding of the issue. Then again, perhaps this is a “skill issue” on my part.
I’m not sure what lessons one could draw in terms of extrapolating towards the future. By temperament, I’m quite hesitant to make forecasts. But what I can say is that a theme throughout this self-analysis is this: I find ChatGPT to be a really useful tool when I already have some idea of what I want to do and when I’m actually engaged with the issue. I find it much less reliable or useful for completely automating parts of the process. There’s no substitute for reading the paper.
Related posts:
That said, for people with anxious or obsessive tendencies (like myself), it’s important to not fall into the trap of using ChatGPT as a form of “reassurance seeking” (a problem that can also arise with Reddit or other online communities).
Emails, research paperwork, and so on.
Note that this concern is primarily about experimental stimuli designed by LLMs, not necessarily synthetic text corpora or the training data automatically developed for reinforcement learning with verifiable rewards (RLVR).
You can rest assured that any text you read on the Counterfactual was written by me, unless clearly marked otherwise (e.g., to “quote” outputs from an LLM or LLM-equipped software system).
Great read. I didn’t know you read plays regularly…I still do to. Will Arbery is my favorite writing right now. Thanks for the Martina Majok rec!
I have found an unusual way to use these tools and I've been surprised at how much improvement I see over straightforward usage of an LLM.
I use Claude code for my programming. And I am shocked at how much of a speed up I get. Of course it's important that I'm really watching what it's doing, and often things work out. Much better if we talk about how one might approach something before it begins actually doing it.
But being at the cursor prompt all day has caused me to broaden the ways I use it. If I have a specific question, and I know the answer is likely to be on the web somewhere my best bet is not to ask Claude no ChatGPT. Instead, I ask directly at the cloud code prompt. Claude code is not trained to make guesses on things, it seems to favor, looking things up and then synthesizing an answer based on what it finds. So if I wanted to know how to change some behavior in an application, for example, if I ask, chat, GPT or Claude, it will give me a generalized answer from within inside of its LLM. But if I ask the same question to cloud code. It will do some web searches, and then synthesize an answer based on what it read. I find the system is much much less likely to hallucinate when it's done this. And, I have a better understanding of whether or not I should trust the system since I can see specifically what searches it did in constructing its answer. Anyway, I thought I would share.