Kimi-Linear: more efficient attention
New Moonshot model!!
It’s a 48B-A3B acting as an experiment into new long-context efficient attention — a hybrid of Kimi Delta Attention (KDA) and MLA
- very fast inference
- strong performance
github.com/MoonshotAI/K...
Kimi-Linear: more efficient attention
View original thread
26
3
Tech Report here: github.com/MoonshotAI/K...
- 1M context
- 75% reduction in KV cache
- 6x decoding throughput
- 1M context
- 75% reduction in KV cache
- 6x decoding throughput
3