Bluesky Thread

two views from Anthropic

November 21, 2025 View original thread

two views from Anthropic

1. Claude Skills are for collaboration
2. Skills are for continual learning

39 4

“spec driven development is broken…a spec isn’t just a more detailed prompt.”

1 hour later

Meta superintelligence researcher developing CWD (code world model) have big ambitions of problems that a CWD can solve by loosely simulating code execution in latent space

- the halting problem
- distributed systems behavior

14 1

btw yes i’m at a conference today bsky.app/profile/timk...

Tim Kellogg @timkellogg.me

fyi i’ll be at AI Engineer Code Summit on Friday in NY, arriving tomorrow evening. lmk if you want to meet up

Will Brown (willccbb) of Prime Intellect talking about RL scaling, the other side. How do you scale up your workforce of AI researchers without actually paying more?: increase the pool of researchers

A large presentation slide hangs in a modern glass-walled auditorium. A speaker stands below on a red circular carpet, wearing a gray T-shirt, a brown leather jacket, light beige pants, and white sneakers, holding a clicker.

Slide content (dark background):

Title in large white text:
Prime Intellect

Subheading: We are:
Bulleted list:
• a research lab
• a compute provider
• a platform company
• an open-source ecosystem

Another section reads:
Our mission:
increase the accessibility of doing AI research

On the right side of the slide are UI mockups and dashboards:
• A performance chart with an upward-trending line.
• Interface panels showing GPU offerings labeled “A100,” “H100,” “$2.00/hr,” “$3.84/hr”, with purple “Get an H100”/“Get an A100” buttons.
• A lower panel showing a social-feed style list of posts under the heading Prime Intellect.

The background of the auditorium shows tall windows, greenery, and modern architecture outside.

Will: RL environments are the web apps of AI research

Cursor’s composer-1 and codemax are both trained in an RL env containing Cursor & codex respectively. Model & product intertwined

Prime Intellect’s environment hub is like github for these environments

OpenAI is talking about their RL finetuning APIs

they find that RFT is really good at teaching an agent how to call tools, when to do it in parallel, etc. Overall it’s good for squeezing out that last bit of efficiency

a theme that’s forming across sessions — RL environments are *extremely* sensitive, they have to look identical to your prod environment

which ofc is why everyone has async cloud agents, that’s no mistake!

Cursor, Codex, etc. cloud agents is most of the way to an RL environment

truth hurts

A large slide fills most of the image, projected in a modern glass-walled auditorium. The slide has a pink background with bold red text at the top:

“Managers have been vibe coding forever”

Below is a bulleted list in plain red text:
• tell dev to implement a new feature (vibe coding)
• dev makes changes to code
• manager tests app
• manager does not read the code
• manager complains about bugs
• dev makes changes to fix bugs
• manager doesn’t read the code (again)
• dev says “done, by now”
• manager says “gj but be faster next time” or insults the living hell out of the dev

In the lower-left foreground, a speaker stands behind a black podium labeled:

AI Engineer
Code Summit
Presented by Google DeepMind

The room is bright with floor-to-ceiling windows behind the stage showing greenery and modern architecture outside.

122 21

this guy’s on fire

A large pink slide is displayed in a modern glass-walled auditorium. The slide title, in bold red text, reads:

What is MCP?

Below it is a bulleted list of humorous expansions:
• Marketing Charged Protocol
• Mythical Compatibility Promise
• Manufactured Complexity Pipeline
• A fancy word for API

In the lower-left foreground, a speaker stands behind a black podium labeled:

AI Engineer
Code Summit
Presented by Google DeepMind

Tall windows behind the stage show greenery, trees, and modern architecture outside.

55 8

1 hour later

Software 2.0 relies on validation

If your code base doesn’t have verification & controls that are as good or better than your senior dev, you’ll get slop

A large pink slide fills most of the image, displayed in a bright glass-walled auditorium. A presenter stands beneath it on a red circular carpet, wearing a dark T-shirt and dark pants, holding a clicker.

Slide title (top, in large red text):
“The Problem: Most Codebases Lack Sufficient Verifiability”

Subheading in smaller text:
“Humans work around incomplete infrastructure. AI agents cannot.”

The slide is divided into two rounded pink boxes:

⸻

Left box: “What Humans Can Handle”

A bulleted list in red text:
• 60% test coverage (“I’ll test manually”)
• Outdated docs (“I’ll ask the team”)
• No linters/formatters (“I’ll review it”)
• Flaky builds (“I’ll retry”)
• Complex setup (“I’ll help onboard”)
• Missing observability (“Check logs”)
• No security scanning (“We’ll catch it later”)
• Inconsistent patterns (“I know the history”)

⸻

Right box: “What Breaks AI Agents”

Bulleted list with each line marked by a red “X”:
• No tests → can’t validate correctness
• Outdated docs → makes wrong assumptions
• No quality checks → generates bad code
• Flaky builds → can’t verify changes
• Complex setup → can’t reproduce environment
• No observability → can’t debug failures
• No security checks → introduces vulnerabilities
• No standards → creates inconsistency

⸻

At the bottom in a wide pink bar:
“Most organizations have partial infrastructure across the eight pillars. AI agents need systematic coverage to succeed.”

Tall windows behind the stage reveal greenery and modern architecture outside.

34 6

More like this