oh!!! o3-mini now shows its thought trace
chatgpt.com/share/67a556...
oh!!! o3-mini now shows its thought trace
View original threadmy theory after reading the s1 paper:
high difficulty (long) traces are most valuable to train on, thus you won’t see the top models like o1-pro and o3 offering up their traces
bsky.app/profile/timk...
high difficulty (long) traces are most valuable to train on, thus you won’t see the top models like o1-pro and o3 offering up their traces
bsky.app/profile/timk...
s1: The $6 R1 Competitor?
This isn't a R1 replication, it's a brilliant breakthrough in data reduction, and just plain dumb engineering ingenuity. I considered not writing this up, but I don't think it's obvious why it's so important. Enjoy!
timkellogg.me/blog/2025/02...
This isn't a R1 replication, it's a brilliant breakthrough in data reduction, and just plain dumb engineering ingenuity. I considered not writing this up, but I don't think it's obvious why it's so important. Enjoy!
timkellogg.me/blog/2025/02...
3