Qwen3-Next-80B-A3B Base, Instruct & Thinking
- performs similar to Qwen3-235B-A22B
- 10% the training cost of Qwen3-32B
- 10x throughput of -32B
- outperforms Gemini-2.5-flash on some benchmarks
- native MTP for speculative decoding
qwen.ai/blog?id=4074...
Qwen3-Next-80B-A3B Base, Instruct & Thinking
View original thread
26
3
this really is a big deal. in agent workloads, the context jumps up fast. this architecture was designed from the ground up for that scenario
6
1 hour later
hmm ngl i don’t like this Qwen either. it’s got a similar vibe that i just do not like
bsky.app/profile/timk...
bsky.app/profile/timk...
can an AI be an asshole? this one might be an asshole
1