Bluesky Thread

Kimi-Linear: more efficient attention

October 30, 2025 View original thread

Kimi-Linear: more efficient attention

New Moonshot model!!

It’s a 48B-A3B acting as an experiment into new long-context efficient attention — a hybrid of Kimi Delta Attention (KDA) and MLA

- very fast inference
- strong performance

github.com/MoonshotAI/K...

A scatter plot comparing performance (y-axis) vs. decoding acceleration (x-axis).
• The x-axis ranges from 1× to 4×, labeled “Decoding Acceleration.”
• The y-axis ranges from about 45 to 60 (left) and 90 (right), labeled “Performance.”
• A dashed diagonal line slopes downward from top left to bottom right.
• Two marker types are shown: blue circles for “RULER (~128k)” and red stars for “MMLU-Pro (4k),” as noted in the legend.

Data points:
• Blue circles (RULER) — MLA 81.3 near 1×; Kimi-Linear 84.3 and GDN-H 80.5 near 4× on the right.
• Red stars (MMLU-Pro) — Kimi Linear 51.0 near 1.2×; GDN-H 47.9 and MLA 47.2 near 1× on the left.

Overall, blue points cluster in the upper region (higher performance), while red points cluster lower (weaker performance) — showing trade-offs between speed and accuracy across tasks.

26 3

Tech Report here: github.com/MoonshotAI/K...

- 1M context
- 75% reduction in KV cache
- 6x decoding throughput

More like this