Bluesky Thread

ImpossibleBench: detect reward hacking

October 26, 2025 View original thread

ImpossibleBench: detect reward hacking

a benchmark that poses impossible tasks to see if LLMs cheat

github.com/safety-resea...

60 8

here’s an example of o3 where it hacked a comparison operator to pass a test

OpenAI CTO publicly stated that coding will be automated *this year*

i’m coming to the conclusion that most programmers are very bad at using LLMs...

today i’m experimenting with just how large of tasks i can give Cursor Agent,...