Reflections: my class about LLMs and Cognitive Science
Class overview, what I felt went well, and what I'd like to change in the future.
In my day job, I’m an Assistant Teaching Professor at UC San Diego. That means that I conduct research and teach classes in the Cognitive Science department here (and in the new Computational Social Sciences program). Last quarter, I designed and taught our department’s first-ever class on Large Language Models (LLMs).
Now that we’re a few weeks into spring quarter, I wanted to write about my new course: why I created it, what I felt went well, and what I might want to change the next time around.
Why a new LLMs course, and why in Cognitive Science?
I’d been mulling over the need for a course focused on advancements in Artificial Intelligence, and specifically LLMs, for a couple of years. Of course, UCSD already has courses on AI and deep learning in the Computer Science and Engineering department, but those courses tend to adopt a more technical perspective (as well they should). I thought there would be a lot of value in having a course aimed at Cognitive Science students in particular, which would approach the topic using concepts and methods from the study of human cognition—in short, the kind of stuff I often write about here on the Counterfactual.
After ChatGPT was released in late 2022, the notion that LLMs were both conceptually and societally important became much more tangible. Here was a new technology, built on insights from the study of minds and brains, that at least appeared to demonstrate a variety of impressive capacities. Much of the discourse around LLMs centered around questions like: Are these things really intelligent? How would we know if they had a “mind”? How do these things work “under the hood” in the first place? And what will be their impact on human society?
In my view, these are all questions Cognitive Science is well-positioned to tackle, so a course built around the intersection of LLMs and Cognitive Science was a sensible proposition. I was happy to have the course officially approved in mid-2023, with the first offering winter quarter, 2024.
What the course was about
Naturally, I thought a lot about what such a course should cover.
This process is in part a question of time budgeting. Academic quarters are only ten weeks, which actually isn’t a ton of time: lectures meet for three hours a week, plus an hour of TA-led discussion, so that’s about 40 hours of in-class time throughout the quarter. Of course, there are also readings and assignments, which are invaluable for providing additional hands-on experience with the material. Nevertheless, time is limited, so you have to make some hard decisions about what to cover and what to leave out.
When I design new courses, I try to follow the principle of starting with some learning outcomes—the skills and concepts I want students to walk away with—and then figuring out how best to meet those goals. I wanted students to get a sense of the history behind the development of LLMs, as well as some of the methods that Cognitive Science researchers use to probe and understand LLMs (what I call “LLM-ology”); I also wanted students to think about some of the big-picture impacts, from ethical concerns to cultural evolution. Finally, I realized that making sense of these discussions would require some conceptual and technical understanding of how LLMs are actually built.
Here’s the course overview I came up with, straight from the syllabus:
Large Language Models (LLMs) like ChatGPT have made incredible strides in recent years. Yet despite intense interest in LLMs across both academic and commercial sectors, much remains unknown about exactly they work. In this course, we’ll tackle the topic of LLMs from a Cognitive Science perspective, asking whether Cognitive Science offers useful conceptual and methodological tools to help unpack these “black box” systems.
The course will begin by discussing the intertwined history of Artificial Intelligence (AI) and Cognitive Science. We’ll then dive deeper into the foundations of language modeling, including hands-on experience building and interpreting simple n-gram models, as well as discussion of recent advances like the Transformer architecture. In the latter half of the course, we’ll review various approaches to probing the mechanics of LLMs (“LLM-ology”), using tools from Cognitive Science. Finally, we’ll discuss debates around the application of these tools, including purported benefits and harms to society.
Looking back, I’m happy with what I came up with, and with how it was carried out. In terms of time, we spent:
Roughly ~1 week on the history. This included a discussion of things like perceptrons and the X-OR problem, as well as back-propagation and classic debates about connectionism vs. symbolic approaches to AI.
Roughly ~3-4 weeks on the technical foundations. We started with n-gram models, which I think are helpful for building intuition about what a language model is meant to do; then we talked about “neural” language models, from recurrent neural networks to the transformer architecture. Students read my explainer with Timothy Lee on LLMs, as well as some chapters from the great textbook on Speech and Language Processing by Jurafsky & Martin.1 We ended this section with a lecture on how LLMs have been “tamed”, e.g., using techniques like fine-tuning or reinforcement learning with human feedback (RLHF).
Roughly ~1 week on the philosophy of LLMs. We covered thought experiments like the Chinese room, “Blockheads”, and more; this section was particularly fun because there were lots of opportunities for in-class discussion and debate.
Roughly ~3 weeks on LLM-ology. What do LLMs “know”, and how do we know what they know? This included topics like grammar, meaning, Theory of Mind, and mechanistic interpretability. We also discussed some of the conceptual and empirical challenges with this work, such as external validity (i.e., GPT-4 is WEIRD) and construct validity (i.e., how do we measure these capacities in the first place). This section was probably the closest to what I typically write about on the Counterfactual.
Roughly ~2 weeks on “looking ahead” at impacts, frontier models, and machine consciousness. This included a discussion of potential harms from LLMs, such as those outlined in Bender et al. (2021)’s paper. We also discussed this more recent paper on “machine culture”, which considers how AI systems will modify human cultural practices (see also my article on whether LLMs could change language). Our final lecture covered the hard problem of consciousness, taking inspiration from this article by David Chalmers in the Boston Review, which resulted in lots of great discussion.
In terms of assessments, I tried to focus on things that would encourage the development of those skills I mentioned earlier. This included quizzes checking in on students’ understanding of core concepts; weekly reading reflections; two technical labs, conducted in Python Colab notebooks; a “position piece”, in which students had to argue for or against a proposition (e.g., “LLMs understand language”); and a final project, in which students had to devise and implement an original empirical study assessing an LLM’s sensitivity to some experimental manipulation (and/or the impact of a prompt engineering approach).
How it went
It’s very hard to predict how a new class will go. Once the class is done, I (along with most professors I know) immediately start thinking about what I’d like to change next time around—tinkering with this assignment here, that lecture there. But doing this systematically requires, I think, a careful review of what went well and what didn’t. This process can in turn be informed by introspection and feedback from students—either explicitly, in the form of end-of-quarter reviews, or implicitly, in the form of the quality of specific assignments.
What went well
Looking back, one thing I think went well was our discussion of various LLM architectures and how they worked. Designing these lectures—particularly the one on transformers and the attention mechanism—was quite difficult and time-consuming, but it was well worth the investment.
In doing so, I converged on a strategy for balancing depth and accessibility that I have since translated to other topics. To illustrate what I mean, I’ll walk through an example from my lecture on the transformer architecture; I don’t expect the details to make a ton of sense without the requisite background, but if you’d like to follow along, I’ve also included the option to download a PDF of my slides.
I start with a high-level motivation for the concept. This is all, of course, in the context of previous course material, so I can draw on that. For example, when I introduced attention, I started with some of the key limitations of recurrent neural networks (RNNs), then pointed out that it’d be useful to have some mechanism for tracking how different elements of a sequence relate to each other. Concretely, a successful prediction about what word comes next sometimes requires looking pretty far back in a sentence or paragraph; here, I used an example from this e2eml article (“Check whether the battery ran down, please” vs. “Check whether the program ran, please”). It also requires keeping track of fairly complex syntactic and semantic relationships in that text, like that of a pronoun to its antecedent (“The animal didn’t cross the street because it was tired”). Thus, “attention” is useful for figuring out the relevance of other items in a sequence to a given item.
Then, I discuss a simple implementation of the concept in question. In the case of attention, that’s dot-product attention. Here, the relevance of any given word in a sentence to a target word is computed by taking the dot product of their embeddings. Words with more similar embeddings will have more similar dot-products, so those will be counted as more “relevant”. These “attention weights” can be used to create a weighted average of the target word, which combines the representation of that target word with the (weighted) representations of the other words. This is a simple, but also pretty naive, way to produce attention weights: relevant clearly isn’t just about having similar meanings—it’s often about playing a complementary role in an event.
Using that naive version as a jumping-off point, I discuss a high-level metaphor for the target concept, such as self-attention. In this case, I drew heavily from the illustrated GPT-2 tutorial, as well as my tutorial with Timothy Lee. I described self-attention as a kind of match-making service for words. The target word is represented as a “query” looking for other words that are relevant; other, potentially relevant words are represented as “keys”, which can be compared to that query to determine how relevant they actually are; combined, these queries and keys give us more nuanced attention weights than the dot-product attention approach. In turn, these weights tell us how much to pay attention to each of the “values” representing those words in the context. Ultimately, then, you can view this approach as a more sophisticated way to create a weighted average of the words in the context.
Then, I lead a more detailed walkthrough of the process in question, focusing on either (or both) visualizations or implementation in code. In the case of self-attention, I walked through, step-by-step, how the attention weights would be calculated for a given sequence—with reference to a simplified diagram of the query, key, and value matrices. This part takes the longest to design, but often has the biggest pay-off if you can get it right. There’s a big difference between getting the metaphor and getting the actual implementation, and this step is crucial for that.
Finally, I try to contextualize that concept by zooming out and asking why it matters. As Timothy Lee and I described, self-attention plays an important role in each layer of a transformer model. I took some time to discuss how self-attention is situated within these layers, how different attention heads might learn to do different things, and so on.
I was particularly happy with my lecture on attention, but I think (hope) that the other technical lectures were also pretty solid.
I also thought the class integrated hands-on technical content pretty well. My students varied quite a lot in how much programming experience they had, so I tried to make my technical labs accessible to beginners, while also having content for people with more expertise. In case you’re interested, I’ve made the Colab notebook for my lab on using the huggingface transformers package in Python available for anyone to view (link here). If you want to run or edit it, you can save your own copy in Google Drive; please just credit myself and Cameron Jones if you end up using it for research or commercial purposes.
Finally, I was really pleased by the quality of the final projects. Students were so creative in terms of how they decided to probe or evaluate LLMs; many used the code from that Colab notebook as a point of departure, and then really ran with an idea. Topics included the presence of gender bias in LLMs, taxonomic knowledge about plants and animals, and understanding idiomatic or non-literal language.
What to change
My first instinct, when it comes to what to change, would be to suggest that the course should actually be a two-course sequence. One course would spend more time on the technical foundations and how to use LLMs in Python, and another course would focus more on probing the models, developing hypotheses about how they work, and discussing the impacts on culture and society. (Alternatively, a three-course sequence: 1) technical foundations; 2) LLM-ology; and 3) philosophy and society. But now I’m just getting carried away.)
At any rate, it’s always easy to think of what I’d like to add to a new course. My class didn’t really discuss tokenization much—I think that was an oversight, and I’d like to incorporate it in future iterations. I also think it’d benefit from some more technical exercises; I was happy with the two labs that students completed, but I think more time spent here could be beneficial. Finally, our discussion of LLM consciousness and culture more generally was pretty abridged, and it would’ve been nice to have at least one additional lecture on this.
It’s much harder to decide what to take out—which, of course, is necessary if I want to add anything to the class. It feels a little ruthless, kind of like editing a paper. But one thing I think I could afford to cut would be my lecture on frontier models. Not because these aren’t important, but because I thought my implementation was a little weak: it wasn’t detailed enough to give students a real understanding of what’s out there (beyond a cursory discussion), but it still took up roughly an hour of class time anyway. We could’ve spent that time discussing another topic in more depth, or introducing a new topic (like tokenization). If I do discuss frontier models next time—which includes multimodal models, language “agents”, and selective state-space models like Mamba—I probably should’ve focused on a specific topic (e.g., language agents) to do it justice.
Final thoughts
I’ve re-designed a couple of courses throughout my time at UCSD, but this was my first time building a course entirely from scratch. It’s a challenging, but also invigorating, process. So much is still unknown about how exactly how LLMs work, what they can and can’t do, and what their impact on society will be. That means there’s some inherent uncertainty about what to do and how to teach it—which is why I tried to emphasize the questions and methods that cognitive scientists bring to bear on these kinds of questions, rather than the provisional answers we have at any given slice in time. And at least from my perspective, the fact that so much remains to be answered seemed like a source of inspiration and excitement for students, because it meant that they too could contribute to this growing field.
For anyone who wants a more technical discussion of LLMs than our explainer, I highly recommend checking out some of the chapters from that textbook. In particular: chapter 3 (n-gram models), chapter 6 (embeddings), chapter 7 (basic neural models), chapter 9 (RNNs), and chapter 10 (transformers). I also recommend the “illustrated GPT-2”.
This course sounds amazing! Is it available online?
Thank you for this post. I appreciate your work to clarify what we are dealing with when we enter a command and get a response. What happens in between is critical. I’m wondering how you might bring in concepts from distributed cognition re: Perkins into your course. Humans themselves are specialized beings.