Bluesky Thread

pretty strong argument for multi-agents

June 14, 2025 View original thread

pretty strong argument for multi-agents

www.anthropic.com/engineering/...

Once intelligence reaches a threshold, multi-agent systems become a vital way to scale performance. For instance, although individual humans have become more intelligent in the last 100,000 years, human societies have become exponentially more capable in the information age because of our collective intelligence and ability to coordinate. Even generally-intelligent agents face limits when operating as individuals; groups of agents can accomplish far more.

44 4

2 hours later

there’s constraints though — good use of multi-agents involves heavy parallelization

if your agents can’t figure out how to parallelize, it’s not going to work well

For instance, most coding tasks involve fewer truly parallelizable tasks than research, and LLM agents are not yet great at coordinating and
delegating to other agents in real time.
We've found that multi-agent systems excel at valuable tasks that involve heavy parallelization, information that exceeds single context windows, and interfacing with numerous complex tools.

FYI this is how you call A2A from MCP (use an MCP tool to call an agent)

A flowchart diagram titled “High-level Architecture of Advanced Research” shows how a multi-agent research system processes a user request from Claude.ai chat.
• On the left, a sample Claude.ai chat input asks:
“what are all the companies in the united states working on AI agents in 2025? make a list of at least 100. for each company, include the name, website, product, description of what they do, type of agents they build, and their vertical/industry.”
• An arrow labeled “User request” passes from this chat box into the Multi-agent research system on the right.
• At the center of that system is the Lead agent (orchestrator), which uses:
Tools: search tools + MCP tools + memory + run_subagent + complete_task
• The lead agent coordinates with:
• A Citations subagent (top left)
• Three Search subagents (bottom row)
• A Memory module (top right)
• Double-headed arrows indicate the lead agent both gives instructions to and receives information from all the components.
• Once the task is complete, an arrow labeled “Final report” returns to Claude.ai chat.

The architecture emphasizes modular delegation: a central agent orchestrates research, citing sources, managing search subagents, and assembling a detailed final report.

btw all these interactions can be managed via A2A bsky.app/profile/timk...

Tim Kellogg @timkellogg.me

New Post: A2A is for UI

There’s a lot of skepticism around A2A, Google’s Agent-to-Agent protocol. But a lot’s changed, and it’s worth taking a look again

I’d like to convince you that you should be thinking about A2A as a protocol for giving agents a UI

timkellogg.me/blog/2025/06...

10 hours later

building agents with MCP tools

Anthropic did succeed in building their Research agent to support arbitrary MCP servers, but it was challenging, because people suck at building good interfaces

Al
=
4. Tool design and selection are critical. Agent-tool interfaces are as critical as human-computer interfaces. Using the right tool is efficient-often, it's strictly necessary. For instance, an agent searching the web for context that only exists in Slack is doomed from the start. With MCP servers that give the model access to external tools, this problem compounds, as agents encounter unseen tools with descriptions of wildly varying quality. We gave our agents explicit heuristics: for example, examine all available tools first, match tool usage to user intent, search the web for broad external exploration, or prefer specialized tools over generic ones. Bad tool descriptions can send agents down completely wrong paths, so each tool needs a distinct purpose and a clear description.

in summary, multi-agent = more tokens = more intelligence

they split work into multiple agents, but not in an object-oriented way, it’s not about responsibilities, it’s about parallelizing work

more agents = tokens consumed faster (lots of work went into doing it effectively)

weird that everything is still just coming down to test-time compute

More like this