Bluesky Thread

Gemma 3n: the 4b LLM that’s up with sonnet-3.7 in chatbot arena

May 20, 2025 View original thread

Gemma 3n: the 4b LLM that’s up with sonnet-3.7 in chatbot arena

the new innovation is Per-Layer Embeddings, which let it consume dramatically less memory

it was created for phones, and is being rolled out to Android phones soon

developers.googleblog.com/en/introduci...

Bar chart titled “Chatbot Arena Elo Score” compares five chatbot models ranked by performance. Each vertical bar represents a model and its Elo score:
• Claude 3.7 Sonnet scores highest at 1287 (Proprietary).
• Gemma 3n is second with 1283, shown in a glowing blue gradient bar. A note clarifies it is a 4B model, with 1.4B active parameters using PLE caching and a total of 7B parameters.
• GPT-4.1-nano-2025-04-14 follows with 1268 (Proprietary).
• Llama-4-Maverick-17B-128E-Instruct has a score of 1266, with 17B parameters and a MoE (Mixture-of-Experts) configuration.
• Phi 4 ranks last at 1202 and is listed with 14B parameters.

Footnote at bottom notes scores are as of May 19, 2025, with Gemma 3n’s confidence interval listed as ±11.

38 3

small note — they call it a 4b but it’s actually an 8b

it uses the memory of a 4b without quantization. i’m not entirely supportive of that naming, but i can see what they were going for

2 hours later

wait what’s this MatFormer thing?

What’s different this time?

Bump in the road
Why it matters for GGUF/Ollama
Realistic impact
Per-Layer Embeddings (PLE)
New tensor layout; llama.cpp needs a small parser patch so quants know which “inactive” slices to skip.
A day or two—PLE is just metadata; fallback is to export the full dense tensors (bigger RAM hit, but it works).
MatFormer nesting
The 4 B “parent” can spin off 2 B submodels on the fly. GGUF can’t express that yet.
First Ollama release will almost certainly ship the fixed-4 B variant. Dynamic nesting may come later.
Audio tokens
llama.cpp doesn’t do raw audio chunks today.
No blocker—Ollama can expose the text-/vision-only checkpoint first (same as the HF preview).

More like this