Bluesky Thread

MiniMax open sources M2

View original thread
MiniMax open sources M2

This model has been shaking the benchmarks last week, now that it’s open we see that it’s 230B-A10B and dueling (arguably beating) Sonnet 4.5 at 8% of the cost

github.com/MiniMax-AI/M...
Eight side-by-side bar charts (2 rows × 4 columns) comparing models; each bar has a small icon that matches the legend at the bottom.

Top row, left → right within each chart:

SWE-bench Verified — 69.4, 67.8, 68.0, 69.2, 63.8, 77.2, 74.9.
Multi-SWE-Bench — 36.2, 30.6, 30.0, 33.5, 44.3.
Terminal-Bench — 46.3, 37.7, 40.5, 44.5, 25.3, 50.0, 43.8.
ArtifactsBench — 66.8, 55.8, 59.8, 54.2, 57.7, 61.5, 73.0.

Bottom row, left → right within each chart:

T²-Bench — 77.2, 66.7, 75.9, 70.3, 59.2, 84.7, 80.1.
GAIA (text only) — 75.7, 63.5, 71.9, 60.2, 60.2, 71.2, 76.4.
BrowseComp — 44.0, 40.1, 45.1, 14.1, 9.9, 19.6, 54.9.
FinSearchComp-global — 65.5, 26.2, 29.2, 29.5, 42.6, 60.8, 63.9.

Legend (icons → model names):
MiniMax-M2; DeepSeek-V3.2; GLM-4.6; Kimi K2 0905; Gemini 2.5 Pro; Claude Sonnet 4.5; GPT-5 (thinking).
34 3
why build a model? vertical integration

seems like US models actually do work fine, they’re just too expensive

also, a little bit of an admission that Chinese models are a little wonky
Our team has been building a variety of Agents to help tackle the challenges of our company's rapid growth. These Agents are beginning to complete increasingly complex tasks, from analyzing online data and researching technical issues to daily programming, processing user feedback, and even screening HR resumes. These Agents, working alongside our team, are driving the company's development, building an Al-native organization that is evolving from developing AGI to advancing together with AGI. We have an ever-stronger conviction that AGI is a force of production, and Agents are an excellent vehicle for it, representing an evolution from the simple Q&A of conversational assistants to the independent completion of complex tasks by Agents.
However, we found that no single model could fully meet our needs for these Agents. The challenge lies in finding a model that strikes the right balance between performance, price, and inference speed-an almost "impossible triangle." The best overseas models offer good performance but are very expensive and relatively slow. Domestic models are cheaper, but there is a gap in their performance and speed.
This has led to existing Agent products often being very expensive or slow to achieve good results. For instance, many Agent subscriptions cost tens or even hundreds of dollars per month, and completing a single task can often take hours.
4
how they’re thinking about model architecture
* Why activation size matters
By maintaining activations around 10B, the plan → act → verify loop in the agentic workflow is streamlined, improving responsiveness and reducing compute overhead:
• Faster feedback cycles in compile-run-test and browse-retrieve-cite chains.
• More concurrent runs on the same budget for regression suites and multi-seed explorations.
• Simpler capacity planning with smaller per-request memory and steadier tail latency.
In short: 10B activations = responsive
agent loops + better unit economics.
2
this is another model in fp8 format, definitely a trend toward lower precision

gpt-oss was in fp4, but several others have been in fp8. i’m a bit surprised that attention is in fp8, but the model does seem to work well

1M context (if you can pay for the KV cache)
3
gents, i think we’ve reached it
doomslide & @doomslide • 13h
Personally I draw the line at superduperintelligence.
9
1 hour later
bsky.app/profile/dori...
Alexander Doria @dorialexander.bsky.social
New MiniMax release today. Still waiting for the tech report, but the blogpost makes a compelling case for mastering the technology end-to-end to get actual agentic automation www.minimax.io/news/minimax...
2
34 likes 3 reposts

More like this

×