the owners of TikTok scandalously release:
REER: a learning method that exposes the logic that led to a good result
Booty: when shaken properly elicits creative writing
ASS: (they couldn’t get an acronym to work, but i’m sure they tried)
huggingface.co/papers/2509....
the owners of TikTok scandalously release:
View original threadseriously this is very cool! all jokes aside
with REER, they discover a more scalable alternative to RL. instead of teaching *why* behavior leads to a good result, it *deconstructs* what led to a correct result
it feels like the unsupervised RL that we need
with REER, they discover a more scalable alternative to RL. instead of teaching *why* behavior leads to a good result, it *deconstructs* what led to a correct result
it feels like the unsupervised RL that we need
33
2
and it works!
they made a tiny 8B model that holds up well against many large MoE models
they made a tiny 8B model that holds up well against many large MoE models
25
2