Is 32B-4bit equal to 16B-8bit? Depends on the task
* math: precision matters
* knowledge: effective param count is more important
* 4B-8bit threshold — for bigger prefer quant, smaller prefer more params
* parallel TTC only works above 4B-8bit
arxiv.org/abs/2510.10964
Is 32B-4bit equal to 16B-8bit? Depends on the task
View original thread
31
8
The study is compelling because of how thorough they were
1700 experiments varying
* model size (0.6B-32B)
* weight precision (4-bit, 8, 16)
* serial TTC budget (2k tokens -> 30k)
* parallel TTC (maj@k, up to k=16)
KV cache compression (eviction, quantization, StreamingLLM, HQQ)
1700 experiments varying
* model size (0.6B-32B)
* weight precision (4-bit, 8, 16)
* serial TTC budget (2k tokens -> 30k)
* parallel TTC (maj@k, up to k=16)
KV cache compression (eviction, quantization, StreamingLLM, HQQ)
4
1
all experiments were with Qwen3 family
Qwen remains the absolute biggest help for science. Is anyone else producing a huge number of model sizes?
Qwen remains the absolute biggest help for science. Is anyone else producing a huge number of model sizes?
1