this fits my mental model — LLMs *do* learn procedures. But it’s the same mechanics as what’s learning facts. So of course it would also hallucinate procedures
but also: what does procedure hallucination look like? i don’t think i have a grasp on that
machinelearning.apple.com/research/ill...
this fits my mental model — LLMs *do* learn procedures. But it’s the same mec...
View original thread
22
3
This paper is being advertised as evidence against AGI, but I’m looking at these charts and…that’s how people operate too. That’s humans’ failure modes. Computers’ normally look entirely different. This is AGI straight into your veins
6
this is a fine conclusion, but it’s phrased poorly
it leads the reader to believe that LRMs are inherently limited, whereas it’s actually just saying that LRMs aren’t computers, they’re something else
again, that’s what we’ve been saying
it leads the reader to believe that LRMs are inherently limited, whereas it’s actually just saying that LRMs aren’t computers, they’re something else
again, that’s what we’ve been saying
2
all of this is very intuitive
some models are inherently more capable than others. Simply “thinking longer” doesn’t magically solve harder problems. The LLM still must be capable of tackling said problem
everyone i’ve ever talked to has this intuition, technical or not
some models are inherently more capable than others. Simply “thinking longer” doesn’t magically solve harder problems. The LLM still must be capable of tackling said problem
everyone i’ve ever talked to has this intuition, technical or not
4
the labs have also been saying this
OpenAI and Anthropic have been talking about how making this new crop of models is a combo of pre & post training scaling
if they still need pre-training, that means they still need to improve the fundamental nature of the model. Thinking isn’t a panacea
OpenAI and Anthropic have been talking about how making this new crop of models is a combo of pre & post training scaling
if they still need pre-training, that means they still need to improve the fundamental nature of the model. Thinking isn’t a panacea
3
there was a time where i did think that reasoning would be a panacea. simply thinking longer during inference would tackle all problems
but that phase didn’t last long. even by the time my s1 post landed it didn’t feel right
timkellogg.me/blog/2025/02...
but that phase didn’t last long. even by the time my s1 post landed it didn’t feel right
timkellogg.me/blog/2025/02...
5
for example, if you misconfigure ollama so that it forgets to stop on the stop token, the response doesn’t get better, it gets FAR worse
so then maybe you can RL it into just thinking longer
s1 showed that was VERY easy (force it to say “wait,”) but we didn’t see anything like takeoff
so then maybe you can RL it into just thinking longer
s1 showed that was VERY easy (force it to say “wait,”) but we didn’t see anything like takeoff
3