Kimi K2 paper is out!
lessons:
1. they explicitly suppressed long CoT
2. more MoE experts > more attention
3. 20k MCP tools (17k synthetic)
4. agents all the way down
github.com/MoonshotAI/K...
Kimi K2 paper is out!
View original thread
49
12
this part feels incredibly consequential
first there's the visual of MCP tool generation. but also, it's agents all the way down
agents to generate tools, agents to simulate humans, agents to judge/score.. all to make a model that can be used as an agent
first there's the visual of MCP tool generation. but also, it's agents all the way down
agents to generate tools, agents to simulate humans, agents to judge/score.. all to make a model that can be used as an agent
9
great deep dive! bsky.app/profile/timf...
Moonshot have released the Kimi K2 technical report, here are some parts of it I found interesting:
The best data was used in multiple epochs, but was rephrased between them. Their testing showed this produces large gains relative to training repeatedly on the same phrasing.
The best data was used in multiple epochs, but was rephrased between them. Their testing showed this produces large gains relative to training repeatedly on the same phrasing.
4