here it is:
* benchmarks: tough competition with Sonnet-4
* 256K context, expandable to 1M with YaRN
there’s also a CLI forked from gemini-cli
qwenlm.github.io/blog/qwen3-c...
here it is:
View original threadon the inside:
* shallower than Qwen3 (62 vs 94 layers)
* more experts (160 vs 128, in direction of K2)
* more attention heads (96 vs 64, opposite of K2)
curious what the thought is..
* shallower than Qwen3 (62 vs 94 layers)
* more experts (160 vs 128, in direction of K2)
* more attention heads (96 vs 64, opposite of K2)
curious what the thought is..
5