Discussion about this post

User's avatar
Mira's avatar

the "useful fiction" framing is doing a lot of heavy lifting here... useful for *what* exactly? because I feel like predicting behavior and actually understanding something might need completely different answers to that question

Simon Goldstein's avatar

Great article. I really like the example of the AI agent's mistakes about emails. I agree with you that belief/desire explanations don't seem like the way to explain the problem (its not like the agent wants to mess up large numbers of emails, but wants to do a good job with small numbers). On the other hand, I don't think tools from mechanistic interpretability will be especially helpful in explaining the error either. Instead, the relevant explanation seems to be related to how LLMs do worse with long context windows.

My question about this is how this compares to human psychology. Maybe this is analogous to competence/performance explanations in human psychology? Like psychologists have all sorts of explanations of human behavior that don't just appeal to beliefs and desires. So it may be that AIs and humans are analogous in this respect. But I don't have a good taxonomy of the kinds of explanatory tools that psychology uses.

2 more comments...

No posts

Ready for more?