R1 Chimera: a model merge of the routed experts of DeepSeek R1 and V3
The resulting merged model performs as well as R1 but without the wandering thought traces. Just as smart, but faster.
huggingface.co/tngtech/Deep...
R1 Chimera: a model merge of the routed experts of DeepSeek R1 and V3
View original thread
23
4
i’m seeing reports that it’s better than either model. smart, fast and good at tool calling. this might be a new thing, merges are low-compute
3
1