Bluesky Thread

a researcher on X explains why RL alone didn’t work before

View original thread
a researcher on X explains why RL alone didn’t work before

it mostly comes down to that todays base models are smarter and have better exploration

GSK8 simply isn’t a hard enough test to grow interesting emergent behavior

x.com/its_dibya/st...
21 1
this implies that today’s base models aren’t smart enough for tomorrow’s emergent behavior

we may need that distillation loop between reasoning models -> new pretraining data in order to elicit higher levels of behavior from RL
4
21 likes 1 reposts

More like this

×