Bluesky Thread OpenAI trained a GPT-5 variant to admit when it took shortcuts December 04, 2025 View original thread OpenAI trained a GPT-5 variant to admit when it took shortcutsopenai.com/index/how-co... openai.com How confessions can keep language models honest We’re sharing an early, proof-of-concept method that trains models to report when they break instructions or take unintended shortcuts. 27 1 seems like Anthropic & OpenAI are doing similar things, but OpenAI is investing in reward functions while Anthropic is doing philosophy papers 10