Automated Curriculum Learning
in RL, the key is to progressively tackle harder and harder problems. More can be learned if the problem is “within reach”
but it’s hard to predict what is “just right”
this paper uses entropy to dynamically score and curate the dataset
arxiv.org/html/2509.01...
Automated Curriculum Learning
View original threadthis is an entropix paper
they use the logit entropy across all tokens in the rollout to calculate uncertainty, and use that as a proxy for “explorability”
if a problem/rollout is deemed “explorable”, then it’s prioritized to be reused in future epochs
dynamic dataset curation
they use the logit entropy across all tokens in the rollout to calculate uncertainty, and use that as a proxy for “explorability”
if a problem/rollout is deemed “explorable”, then it’s prioritized to be reused in future epochs
dynamic dataset curation
2
intuitively i’d want a training dataset pre-calculated before training starts
but correct difficulty is critical, and you can’t truly measure the correctness of the difficulty without looking at the current state of the model
so it has to be dynamically calculated, to optimize
but correct difficulty is critical, and you can’t truly measure the correctness of the difficulty without looking at the current state of the model
so it has to be dynamically calculated, to optimize
1