Bluesky Thread

i can’t get over this — qwen3 32B dense is only *slightly* better than 30B-A3B

View original thread
i can’t get over this — qwen3 32B dense is only *slightly* better than 30B-A3B

but it runs as fast as a 30B, bc it’s only for 3B active

and both of these perform near qwen2.5-70B

fast and smart (also very good at tool use too)
A performance comparison table shows various large language models evaluated across multiple benchmarks. The focus here is on Qwen3-32B (Dense) and Qwen3-30B-A3B (MoE), highlighted in two separate blue-outlined columns.

Qwen3-32B (Dense)
	•	ArenaHard: 93.8
	•	AIME’24: 81.4
	•	AIME’25: 72.9
	•	LiveCodeBench: 65.7
	•	CodeForces: 1977
	•	Aider (Pass@2): 50.2
	•	LiveBench (2024-11-25): 74.9
	•	BFCL v3: 70.3
	•	MultiF (8 Languages): 73.0

Qwen3-30B-A3B (MoE)
	•	ArenaHard: 91.0
	•	AIME’24: 80.4
	•	AIME’25: 70.9
	•	LiveCodeBench: 62.6
	•	CodeForces: 1974
	•	Aider (Pass@2): 65.8
	•	LiveBench (2024-11-25): 74.3
	•	BFCL v3: 69.1
	•	MultiF (8 Languages): 72.2

Notable Observations:
	•	Qwen3-32B generally scores higher on knowledge-based benchmarks like ArenaHard, AIME’24/25, and LiveCodeBench.
	•	Qwen3-30B-A3B scores significantly better on Aider (65.8 vs 50.2), suggesting stronger code generation or reasoning capabilities in that task.
	•	CodeForces scores are nearly identical (1977 vs 1974), showing similar problem-solving strength in competitive programming.
	•	Performance on multilingual (MultiF) and factual correctness (BFCL) tasks is slightly better in the 32B model.

Overall, Qwen3-32B (Dense) performs slightly better on most benchmarks, while Qwen3-30B-A3B (MoE) is more efficient and excels in code-oriented metrics like Aider.
23 1
runs as fast as a 3B
10
they really need edits on this. oh well — enjoy my mistakes
2
23 likes 1 reposts

More like this

×