Bluesky Thread

Granite-4.0-H-Small: a 32B-A9B MoE Mamba for high efficency

October 02, 2025 View original thread

Granite-4.0-H-Small: a 32B-A9B MoE Mamba for high efficency

Damn! IBM is on the map. The American Qwen? I barely even knew IBM made LLMs, this is solid

www.ibm.com/new/announce...

The bar chart is titled **“Retrieval Augmented Generation (RAG)”** and shows **MTRAG mean accuracy** on the y-axis (0–80 scale).

### Results by model:

* **Granite-4.0-H-Small**: **73** (blue bar, highest)
* **Granite-4.0-Micro**: **72** (blue bar, nearly tied with H-Small)
* **GPT-OSS-20B**: **68** (green bar)
* **Mistral-Small-3.2-Instruct**: **48** (green bar, lowest score)
* **Llama-3.2-Instruct**: **53** (green bar)
* **Llama-3.3-70B-Instruct**: **61** (green bar)
* **Qwen3-8B**: **55** (green bar)

### Key takeaway:

The **Granite-4.0 models (H-Small and Micro)** outperform all others, achieving ~73 accuracy, with GPT-OSS-20B in third at 68. The weakest performance is from **Mistral-Small-3.2-Instruct (48)**.

31 2

initial take: okay, it can work as a model. it doesn't have a fun personality, but that's kinda what you get with a 32b-a9b. fwiw it doesn't hallucinate too badly afaict

could be worth looking into if you're thinking about RL'ing LoRAs, it's probably malleable

also: Mambas aren’t transformers, so this isn’t technically an LLM, whatever..

More like this