Detecting Misaligned Intelligences Within Game Environments
Thought provoking post. Somewhat relatedly, and given the recent drama at OpenAI, I've been trying to call people's attention to some of the AI safety papers authored by Helen Toner, who sits on the OpenAI board (at least as of this writing) and also leads CSET at Georgetown. In the paper I'll link to below, she hits on the challenges of specification with AI models, the potential for them to diverge from intended behavior increases with the complexity of the environment they are deployed within. There are no easy solutions but there are some techniques outlined in the paper you may find interesting.