K2 is the first i’m aware that did this, directly training on *thousands* of tools
o3 was narrowly designed for deep research & chatgpt. most models followed their lead
K2 is the first i’m aware that did this, directly training on *thousands* of ...
View original thread
31
2
also — they have an agentic model w/o chain of thought 🤔 i didn’t know that was possible
that essentially means this is a dramatically cheaper agentic model, bc it’s not spending its token budget on thinking
if you buy “more tokens = more intelligence”, then this one is shocking
that essentially means this is a dramatically cheaper agentic model, bc it’s not spending its token budget on thinking
if you buy “more tokens = more intelligence”, then this one is shocking
9
1
i don’t recall seeing “user agent” used this way before, although i’ve wanted to use it at work
if you’re training or eval’ing an agent, you need an “agent” that represents a user, otherwise you need someone who has a very boring job — the “user agent”
why does HTTP say “User Agent”?
if you’re training or eval’ing an agent, you need an “agent” that represents a user, otherwise you need someone who has a very boring job — the “user agent”
why does HTTP say “User Agent”?
5
it seems the takeaway for researchers is that, if you RL first on math, then “Wait” & CoT emerges
but if you RL on agentic environments first, you get the same behavior but with fewer tokens
this feels like the heart of context engineering — we need better info (via tools), not more thinking
but if you RL on agentic environments first, you get the same behavior but with fewer tokens
this feels like the heart of context engineering — we need better info (via tools), not more thinking
14
1
1 hour later
a tool i made at work — MCP auto mocker
take any FastMCP object (a server) and it generates mock versions of all the tools. use a data sheet to setup the mocks and you’ve got evals
but since it’s a real FastMCP, you can serve it over HTTP to test an out-of-process agent, but with in-process mocks
take any FastMCP object (a server) and it generates mock versions of all the tools. use a data sheet to setup the mocks and you’ve got evals
but since it’s a real FastMCP, you can serve it over HTTP to test an out-of-process agent, but with in-process mocks
4