Postmortems

Specification gaming: the flip side of AI ingenuity

Training of machine learning systems can give rise to unexpected results, as the system can find cheats - unexpected approaches which score well but miss the point. There’s a spreadsheet listing 60 examples linked in the article:

Two examples which could have implications for AI safety:

  • Agent kills itself at the end of level 1 to avoid losing in level 2
  • Self-driving car rewarded for speed learns to spin in circles