i haven’t fully wrapped my head around R1. i need to read the paper. they seem to have found an extremely effective distillation process — R1 1.5B beats gpt4o and sonnet-3.5 on math!
something about the iterative training process, but also that they favored simple over complex
i haven’t fully wrapped my head around R1. i need to read the paper. they see...
View original thread
22
1
i recall @tedunderwood.me declaring a few days ago that he thought post-training would be trickier than we think (and for good reason! all the methods being floated were kinda insane)
but R1 basically says, “nope, it’s not tricky, stick to the basics and be patient”
but R1 basically says, “nope, it’s not tricky, stick to the basics and be patient”
6
1