Bluesky Thread

Automated Curriculum Learning

View original thread
Automated Curriculum Learning

in RL, the key is to progressively tackle harder and harder problems. More can be learned if the problem is “within reach”

but it’s hard to predict what is “just right”

this paper uses entropy to dynamically score and curate the dataset

arxiv.org/html/2509.01...
arxiv.org
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
21 2
this is an entropix paper

they use the logit entropy across all tokens in the rollout to calculate uncertainty, and use that as a proxy for “explorability”

if a problem/rollout is deemed “explorable”, then it’s prioritized to be reused in future epochs

dynamic dataset curation
2
intuitively i’d want a training dataset pre-calculated before training starts

but correct difficulty is critical, and you can’t truly measure the correctness of the difficulty without looking at the current state of the model

so it has to be dynamically calculated, to optimize
1
21 likes 2 reposts

More like this

×