When Models Drive a Hard Bargain

Nov 20, 2023

Detecting Misaligned Intelligences Within Game Environments

3 Comments

Nov 20, 2023

Thought provoking post. Somewhat relatedly, and given the recent drama at OpenAI, I've been trying to call people's attention to some of the AI safety papers authored by Helen Toner, who sits on the OpenAI board (at least as of this writing) and also leads CSET at Georgetown. In the paper I'll link to below, she hits on the challenges of specification with AI models, the potential for them to diverge from intended behavior increases with the complexity of the environment they are deployed within. There are no easy solutions but there are some techniques outlined in the paper you may find interesting.

https://cset.georgetown.edu/wp-content/uploads/Key-Concepts-in-AI-Safety-Specification-in-Machine-Learning.pdf

Expand full comment

Reply (1)

Sean Trott

Nov 20, 2023

Great pointer, thanks! I think this is especially relevant as so many applications built on GPT revolve around some form of prompt specification. Which is, as others have pointed out, not too unlike an incantation.

Expand full comment

Reply (1)

Benjamin Riley

Nov 21, 2023

Better than incanting "feel the AGI!" as apparently was being incanted by some at a certain AI org that is currently imploding.

Expand full comment

The Counterfactual

When Models Drive a Hard Bargain