Self-Improving Transformers
They found that you can train LLMs on their own outputs by
1. generating *slightly harder* problems each time
2. filtering low quality via majority voting
I have thoughts. I’m going to write a blog. tl;dr it’s interesting, not the singularity
arxiv.org/abs/2502.01612
More like this
×