6 Comments
User's avatar
Neural Foundry's avatar

Brilliant framing of how GOFAI's habitability issue never got solved, just got eclipsed by GUIs. I've been debugging LLM-based tools lately and its wild how much the overestimation problem shows up when non-technical users hit edge cases. What makes it trickier than older NLIs is the illusion of competence becuase the failures aren't consistent, which makes calibration way harder.

Sean Trott's avatar

Thank you!

Agree that the inconsistency and non determinism of LLMs (and also just their mechanistic opacity) makes calibration particularly thorny.

Benjamin Riley's avatar

What a phenomenal essay. Gonna need to re-read this a few times to absorb the various puzzles you've presented!

Sean Trott's avatar

Thank you, Ben!

Christopher Riesbeck's avatar

No arguments with any of this, except that I don't believe that education is the answer to aligning expectations. Instead, we have to stop rewarding responses like "Hmm, let me think about that" and "You're right, I'm so sorry I missed that" in the reinforcement phase of training. As long as LLMs claim intelligence and emotion in their interactions, overestimation will result.

Sean Trott's avatar

Yeah I think that's also a good and relevant point. Also relating to the RLHF stage, I think there's some evidence that RLHF discourages answers like "I don't know". Not sure what the best way to phrase "I don't know" is, but LLMs presumably need some way to calibrate their own uncertainty about a question (of course, lots has been written on this already!).