Bluesky Thread

Olmo 3 7B & 32B base & thinking models

View original thread
Olmo 3 7B & 32B base & thinking models

@ai2.bsky.social has done it again, fully open models, fully open process

seems competitive with Qwen 3, excel you can fully reproduce any part of the training process

allenai.org/blog/olmo3
Two-panel line chart on a dark teal background showing model performance during “Base Model Training” (left panel) and “Post-Training” (right panel).

Axes and panels:
	•	Left panel title: “Base Model Training.” Left y-axis label: “Base Eval Average (%),” running roughly 50–100. Text along the bottom: “includes Pretraining, Midtraining, Long Context.”
	•	Right panel title: “Post-Training.” Right y-axis label: “Adapt Eval Average (%),” running roughly 30–80. Bottom text: “includes SFT, DPO, RL.”

Main pink curve:
A bright pink line with diamond or cross-shaped markers runs across both panels, climbing steadily.
	•	In Base Model Training it starts a bit above 60%, rises to about 70%, stays level, then jumps to the mid-70s. A label “Marin 32B” sits near one mid-range point, with a small white sailboat icon on a nearby dashed line.
	•	In Post-Training it continues upward, with three points in the high-70s to low-80s, forming a nearly flat top. Each point is marked with a pink outlined diamond plus icon.

Other models / baselines:
Several white or pink dashed lines with different icons represent comparison models.
	•	In the left panel a white dashed line with a sailboat icon starts just above 50% and climbs toward the mid-60s, and another dashed line with a white square-plus “medical cross” icon stays in the high-50s.
	•	In the right panel, a dashed pink line labeled “OLMo 2 32B” rises from the low-40s to just under 50%. A white dashed line labeled “Apertus 70B” hovers around 60% with a square-plus icon.

Top-right labels:
Near the upper-right corner of the Post-Training panel, clustered labels mark high-performing models:
	•	“Qwen 3 32B” and “OLMo 3” at the very top near ~80% on the Adapt Eval axis, with star-like white markers.
	•	Slightly below, “Gemma 3 27B” and “Qwen 2.5 32B” around the low-60s.
47 4
the qwen models have been fantastic for accelerating AI research. i’m hoping this helps even more
4
tech report: www.datocms-assets.com/64837/176364...
www.datocms-assets.com
5
7 hours later
bsky.app/profile/nato...
Nathan Lambert @natolambert.bsky.social
We present Olmo 3, our next family of fully open, leading language models.
This family of 7B and 32B models represents:

1. The best 32B base model.
2. The best 7B Western thinking & instruct models.
3. The first 32B (or larger) fully open reasoning model.
2
47 likes 4 reposts

More like this

×