Bluesky Thread

the biggest reason for fully open models is science, and the downstream effects

View original thread
the biggest reason for fully open models is science, and the downstream effects

1. rebuild it, but with your own domain-specific mid-training data
2. try methods out on several snapshots
3. attribute answers to specific documents in the training dataset
4. …
Simon Willison @simonwillison.net
Olmo 3 is notable as a "fully open" LLM - all of the training data is published, plus complete details on how the training process was run. I tried out the 32B thinking model and the 7B instruct models, + thoughts on why transparent training data is so important simonwillison.net/2025/Nov/22/...
32
on #3, this paper uses a method where they can directly attribute specific documents from the pretraining dataset

they used it to show that LLMs do in fact learn procedures, not just autocomplete. But you could take this so much further with Olmo3

arxiv.org/abs/2411.12580
A comic-style infographic titled “THE AI CHEF’S ‘PROCEDURAL’ SECRET: AN ATTRIBUTION ANALOGY.” It uses a robot chef baking a soufflé to explain how attribution and gradient-based tracing in AI works. The diagram proceeds left to right in five labeled steps.

⸻

1. THE TASK (REASONING)

A friendly robot chef stands in a kitchen, holding up a perfectly baked soufflé. A math bubble shows x + 2y = 10 as an analogy for solving a problem.
Caption: AI Chef (LLM) solves a problem (bakes a soufflé).

⸻

2. THE “FINGERPRINT” (GRADIENT)

Close-up of the robot whisking batter. A glowing network of abstract swirls appears over the bowl.
Caption: We record the exact, unique actions & “effort” (Gradient) used for this specific soufflé.

⸻

3. THE “BRAIN MAP” (EK/FAC)

The robot stands before floating diagram bubbles labeled Whisking Techniques, Aeration Physics, Heat Transfer, Simplified Linkages.
Caption: We use a simplified map of how the chef connects concepts (Hessian/EK-FAC approximation).

⸻

4. THE LIBRARY MATCH (ATTRIBUTION)

The robot enters a vast library with floor-to-ceiling bookshelves. A giant glowing fingerprint projection shines onto one shelf as the robot scans for the best match.
Caption: We scan the entire “cookbook library” (pre-training data) to find which book’s instructions best match the fingerprint via the brain map.

⸻

5. THE RESULT: PROCEDURAL KNOWLEDGE

The robot chef proudly holds a glowing lightbulb while a book opens nearby with a concept diagram. A large reference book beside him is titled “THE PHYSICS OF FOAMS & AERATION (NOT a Soufflé Recipe Book!)”
Caption: We find the source was NOT a recipe, but a foundational PRINCIPLE (procedural knowledge) applied to a new task.

⸻

Overall, the image uses the story of baking a soufflé to explain how AI models trace reasoning: capturing gradients, mapping conceptual relations, searching training data, and revealing underlying procedural knowledge rather than direct memorization.
15 2
32 likes 0 reposts

More like this

×