🚨 Alert: Very Readable Paper 🚨
The “do LLMs think?” question always bugged me because I have no idea what that means. This paper focuses narrowly on, “do LLMs learn **how to do** things”.
Unlike most ML/AI paper, it’s easy to read and understand WHY
arxiv.org/abs/2411.12580
🚨 Alert: Very Readable Paper 🚨
View original threadThe paper is about procedural knowledge, or the knowledge of a process or procedure.
People have described LLMs as “just doing next token prediction”, which presumably means it’s all just retrieval. This paper shows (conclusively imo) that is not the case, the LLM actually learns the math process
People have described LLMs as “just doing next token prediction”, which presumably means it’s all just retrieval. This paper shows (conclusively imo) that is not the case, the LLM actually learns the math process
4
Their methods are cRaZy. They found a way to directly attribute which documents (e.g. web page or PDF from the training data set) that the LLM used to produce an answer
lemme just pause here an marvel at how cool that is
lemme just pause here an marvel at how cool that is
3
Since they took this approach, it means they can rather (imo) irrefutably prove the result
For reasoning problems, they showed that the answer actually was in the training dataset, but the LLM chose not to use it, instead opting for similar math problems that showed the process
For reasoning problems, they showed that the answer actually was in the training dataset, but the LLM chose not to use it, instead opting for similar math problems that showed the process
3
The paper is truly one of the greats. Every time I had a “hold on 🤔” reaction, the very next paragraph answered my question. Over and over.
Give it a read! For real, not just notebooklm, this is one you can make it through (but maybe skip the math section)
Give it a read! For real, not just notebooklm, this is one you can make it through (but maybe skip the math section)
4