S1: The $6 R1 Competitor?
A new paper released on Friday is making waves in the AI community, not because of the model it describes, but because it shows how close we are to some very large breakthroughs in AI. The model is just below state of the art, but it can run on my laptop. More important, it sheds light on how all this stuff works, and it’s not complicated.
Inference Scaling: “Wait” For Me!
OpenAI were the first to claim the inference-time scaling laws. Basically, an LLM can get higher performance if it can “think” longer before answering. But, like, how do you do it? How do you make it think longer?
OpenAI and R1 had cool graphs showing performance scaling with average thinking time (this from the s1 paper):
But how do they control the length of an LLM response? Everyone skipped over that part, but...