Bluesky Thread

Automated Curriculum Learning

September 04, 2025 View original thread

Automated Curriculum Learning

in RL, the key is to progressively tackle harder and harder problems. More can be learned if the problem is “within reach”

but it’s hard to predict what is “just right”

this paper uses entropy to dynamically score and curate the dataset

arxiv.org/html/2509.01...

arxiv.org

Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward

21 2

this is an entropix paper

they use the logit entropy across all tokens in the rollout to calculate uncertainty, and use that as a proxy for “explorability”

if a problem/rollout is deemed “explorable”, then it’s prioritized to be reused in future epochs

dynamic dataset curation

intuitively i’d want a training dataset pre-calculated before training starts

but correct difficulty is critical, and you can’t truly measure the correctness of the difficulty without looking at the current state of the model

so it has to be dynamically calculated, to optimize

More like this