Bluesky Thread

R1 Chimera: a model merge of the routed experts of DeepSeek R1 and V3

View original thread
R1 Chimera: a model merge of the routed experts of DeepSeek R1 and V3

The resulting merged model performs as well as R1 but without the wandering thought traces. Just as smart, but faster.

huggingface.co/tngtech/Deep...
A scatter plot titled “Intelligence Score vs. Inference Cost” is shown, with the TNG Technology Consulting logo in the top left. The x-axis represents “Inference cost in % of R1 output tokens” and ranges from 30 to 100. The y-axis represents “Intelligence score (AIME 24 & MT-Bench)” and ranges from 70 to 90.

Four points are plotted:
	•	V3 in green near (40, 75)
	•	R1T in light blue near (70, 87)
	•	R1 in red at (100, 85)
	•	A light blue label “R1T Chimera [V3+R1]” next to R1T
	•	DeepSeek V3-0324 in green, associated with V3
	•	DeepSeek R1 in red, associated with R1

Two blue arrows labeled “smarter” and “faster” point respectively upward and leftward toward the R1T point, suggesting that R1T is both smarter (higher intelligence score) and faster (lower inference cost) compared to R1.

At the bottom left, the Twitter handle @tngtech is present. The overall design is clean, with the plot emphasizing how models compare in terms of intelligence and efficiency.
23 4
i’m seeing reports that it’s better than either model. smart, fast and good at tool calling. this might be a new thing, merges are low-compute
3 1
23 likes 4 reposts

More like this

×