🚨Great Paper Alert🚨 GSPO (Group Sequence Policy Optimization)
tbh it's a bit tough on the math, but it's EXCELLENT at explaining the situation
it's an RL algorithm that fixes stability problems with GRPO (R1's algo) to enable training huge models easily
arxiv.org/abs/2507.180...
🚨Great Paper Alert🚨 GSPO (Group Sequence Policy Optimization)
View original thread
32
3
their innovation, simplified, is a lot like what Moonshot did with MuonClip (Kimi K2 pretraining & posttraining optimizer)
effectively — slow down learning, look at less data at a time, and training gets hella lot more stable
i imagine american labs must have discovered similar things
effectively — slow down learning, look at less data at a time, and training gets hella lot more stable
i imagine american labs must have discovered similar things
6
one thing they do very well with this GSPO paper is explaining (and proving!) the problems with GRPO as a means to show why GSPO works so well
it's such a phenomenal writing style. More papers should read like this. Juxtapose your thing against some other popular thing
it's such a phenomenal writing style. More papers should read like this. Juxtapose your thing against some other popular thing
10