Bluesky Thread

🚨 Alert: Very Readable Paper 🚨

December 02, 2024 View original thread

🚨 Alert: Very Readable Paper 🚨

The “do LLMs think?” question always bugged me because I have no idea what that means. This paper focuses narrowly on, “do LLMs learn **how to do** things”.

Unlike most ML/AI paper, it’s easy to read and understand WHY
arxiv.org/abs/2411.12580

arxiv.org

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

The capabilities and limitations of Large Language Models have been sketched out in great detail in recent years, providing an intriguing yet conflicting picture. On the one hand, LLMs demonstrate a g...

46 7

The paper is about procedural knowledge, or the knowledge of a process or procedure.

People have described LLMs as “just doing next token prediction”, which presumably means it’s all just retrieval. This paper shows (conclusively imo) that is not the case, the LLM actually learns the math process

Their methods are cRaZy. They found a way to directly attribute which documents (e.g. web page or PDF from the training data set) that the LLM used to produce an answer

lemme just pause here an marvel at how cool that is

Since they took this approach, it means they can rather (imo) irrefutably prove the result

For reasoning problems, they showed that the answer actually was in the training dataset, but the LLM chose not to use it, instead opting for similar math problems that showed the process

The paper is truly one of the greats. Every time I had a “hold on 🤔” reaction, the very next paragraph answered my question. Over and over.

Give it a read! For real, not just notebooklm, this is one you can make it through (but maybe skip the math section)

More like this