Bluesky Thread

i can’t get over this — qwen3 32B dense is only slightly better than 30B-A3B

April 29, 2025 View original thread

i can’t get over this — qwen3 32B dense is only *slightly* better than 30B-A3B

but it runs as fast as a 30B, bc it’s only for 3B active

and both of these perform near qwen2.5-70B

fast and smart (also very good at tool use too)

A performance comparison table shows various large language models evaluated across multiple benchmarks. The focus here is on Qwen3-32B (Dense) and Qwen3-30B-A3B (MoE), highlighted in two separate blue-outlined columns.

Qwen3-32B (Dense)
• ArenaHard: 93.8
• AIME’24: 81.4
• AIME’25: 72.9
• LiveCodeBench: 65.7
• CodeForces: 1977
• Aider (Pass@2): 50.2
• LiveBench (2024-11-25): 74.9
• BFCL v3: 70.3
• MultiF (8 Languages): 73.0

Qwen3-30B-A3B (MoE)
• ArenaHard: 91.0
• AIME’24: 80.4
• AIME’25: 70.9
• LiveCodeBench: 62.6
• CodeForces: 1974
• Aider (Pass@2): 65.8
• LiveBench (2024-11-25): 74.3
• BFCL v3: 69.1
• MultiF (8 Languages): 72.2

Notable Observations:
• Qwen3-32B generally scores higher on knowledge-based benchmarks like ArenaHard, AIME’24/25, and LiveCodeBench.
• Qwen3-30B-A3B scores significantly better on Aider (65.8 vs 50.2), suggesting stronger code generation or reasoning capabilities in that task.
• CodeForces scores are nearly identical (1977 vs 1974), showing similar problem-solving strength in competitive programming.
• Performance on multilingual (MultiF) and factual correctness (BFCL) tasks is slightly better in the 32B model.

Overall, Qwen3-32B (Dense) performs slightly better on most benchmarks, while Qwen3-30B-A3B (MoE) is more efficient and excels in code-oriented metrics like Aider.

23 1

runs as fast as a 3B

they really need edits on this. oh well — enjoy my mistakes

More like this