Bluesky Thread

R1 Chimera: a model merge of the routed experts of DeepSeek R1 and V3

April 27, 2025 View original thread

R1 Chimera: a model merge of the routed experts of DeepSeek R1 and V3

The resulting merged model performs as well as R1 but without the wandering thought traces. Just as smart, but faster.

huggingface.co/tngtech/Deep...

23 4

i’m seeing reports that it’s better than either model. smart, fast and good at tool calling. this might be a new thing, merges are low-compute

3 1

More like this