Bluesky Thread

Skills are learned through RL!

View original thread
Skills are learned through RL!

pre-training — individual skills learned

post-training (RL) — composed skills learned

this is very clarifying to me, how RL works

husky-morocco-f72.notion.site/From-f-x-and...
The RL Compositionality Hypothesis
Once a model has acquired the necessary atomic, nondecomposable skills for a task through NTP training, RL enables the composition of these skills into more complex capabilities when properly incentivized.
31 4
in high school wrestling, we spent weeks practicing single moves, over and over, 50 times in a row

that’s what pre-training does. Muscle memory

then we spent a bit more time sparring and learning to chain the moves together to win matches

that’s RL
12
how do you learn skills? put them in pre-training

that’s where synthetic data comes in, frequently

i’ve noticed a lot of agentic models being trained on traces from other agents. it’s all about baking in the individual skills

RL is ineffective without learning the core skills
7
31 likes 4 reposts

More like this

×