Bluesky Thread

two views from Anthropic

View original thread
two views from Anthropic

1. Claude Skills are for collaboration
2. Skills are for continual learning
A large presentation screen dominates the image, hanging in a modern glass-walled auditorium with tall windows, greenery outside, and wood-paneled ceiling beams above. Two presenters stand on a circular red carpet beneath the screen.

On the screen (pink slide):
At the top: “The Skills ecosystem” followed by “Enterprise & Team Skills.”
Two large rounded rectangles appear side by side:
	•	Left rectangle: light pink with a simple line icon of a briefcase. Beneath it: “Fortune 100 Org-wide Skills.”
	•	Right rectangle: light pink with a line icon of a laptop cursor on a screen. Beneath it: “Enterprise FinTech for 1000s of SWEs.”

Bottom-left corner of the slide shows the Anthropic logo.

Presenters:
On the left, a person with shoulder-length dark hair, glasses, a gray sweater, light pants, and sneakers. On the right, a person with short dark hair wearing a dark gray sweater and black pants. They stand in front of the screen facing the audience.

The setting is bright and spacious, with natural outdoor light filtering through the glass behind them.
A large presentation slide fills most of the image, displayed in a modern glass-walled auditorium. Two presenters stand beneath it on a circular red carpet.

Slide content (pink background):

At the top, bold text reads:
“Skills are a concrete step towards continuous learning”

Below is a rising curved line from left to right, marked with three labeled milestones:
	•	DAY 1 — a small red icon with a white starburst. Text underneath:
“No skills”
“Intelligent”
	•	DAY 5 — the same icon, slightly larger. Text underneath:
“A few skills”
“Capable”
	•	DAY 30 — the icon again, larger and placed at the top-right of the curve. Text underneath:
“Many skills”
“Useful”

Bottom-left corner of the slide shows the Anthropic logo.

Presenters:
On the left, a person with glasses, a gray sweater, light pants, and sneakers holding a small device. On the right, a person with short dark hair, wearing a muted blue-gray sweater and dark pants, also holding a clicker. Behind them, tall glass windows reveal trees, greenery, and interior hallways.
39 4
“spec driven development is broken…a spec isn’t just a more detailed prompt.”
12
1 hour later
Meta superintelligence researcher developing CWD (code world model) have big ambitions of problems that a CWD can solve by loosely simulating code execution in latent space

- the halting problem
- distributed systems behavior
A large presentation slide fills most of the image, displayed in a modern glass-walled auditorium. A presenter stands beneath it on a stage, mid-gesture, wearing a light gray sweater, collared shirt, dark pants, and sneakers.

Slide content (pink background):

Top-left text:
“What can we do with CWM?”

Centered large title:
“The Halting Problem”

Below the title, a subtitle in italicized text:
“Given a program and an input, will a program run forever?”

On the right side of the slide is a dark code-window graphic with orange header text: “Conv Neural Approximation • Probabilistic.”
Inside the window is Python-like pseudocode defining a function colatz that computes the Collatz sequence via a while loop. Beneath it is sample output showing steps of the sequence (“Simulating recursive function… Time = 27 of 27 … steps: 82 41  … 1 … Pattern: Converged oscillation detected”) and a final line:
“> Prediction: HALTS | Confidence: 0.918”

Bottom-left corner of the slide shows a small Meta AI logo.
14 1
btw yes i’m at a conference today bsky.app/profile/timk...
Tim Kellogg @timkellogg.me
fyi i’ll be at AI Engineer Code Summit on Friday in NY, arriving tomorrow evening. lmk if you want to meet up
8
Will Brown (willccbb) of Prime Intellect talking about RL scaling, the other side. How do you scale up your workforce of AI researchers without actually paying more?: increase the pool of researchers
A large presentation slide hangs in a modern glass-walled auditorium. A speaker stands below on a red circular carpet, wearing a gray T-shirt, a brown leather jacket, light beige pants, and white sneakers, holding a clicker.

Slide content (dark background):

Title in large white text:
Prime Intellect

Subheading: We are:
Bulleted list:
	•	a research lab
	•	a compute provider
	•	a platform company
	•	an open-source ecosystem

Another section reads:
Our mission:
increase the accessibility of doing AI research

On the right side of the slide are UI mockups and dashboards:
	•	A performance chart with an upward-trending line.
	•	Interface panels showing GPU offerings labeled “A100,” “H100,” “$2.00/hr,” “$3.84/hr”, with purple “Get an H100”/“Get an A100” buttons.
	•	A lower panel showing a social-feed style list of posts under the heading Prime Intellect.

The background of the auditorium shows tall windows, greenery, and modern architecture outside.
9
Will: RL environments are the web apps of AI research

Cursor’s composer-1 and codemax are both trained in an RL env containing Cursor & codex respectively. Model & product intertwined

Prime Intellect’s environment hub is like github for these environments
7
OpenAI is talking about their RL finetuning APIs

they find that RFT is really good at teaching an agent how to call tools, when to do it in parallel, etc. Overall it’s good for squeezing out that last bit of efficiency
A presenter stands beneath a large slide in a glass-walled auditorium, speaking while standing on a red circular carpet. The slide has a dark background with the Cognition logo (a hexagonal flower-like icon) and the title:

“Code Edit Planning Agent”

Under the title is a bulleted list with white text:
	•	Given an initial user prompt, determine which files to edit to complete the task.
	•	Tools – shell, read file
	•	Dataset – user queries with labelled ground truth of edited files
	•	Reward – F1 score of the list of files returned

To the right of the bullets is a diagram showing multiple Devin VMs connected through the OpenAI Platform:
	•	Devin VM 1: an orange box labeled “Tool call 1: shell,” leading to a peach box labeled “Tool call 9: read_file,” then a purple box labeled “Final Answer,” ending in a gray “Grader.”
	•	Devin VM 35: similar sequence, starting with “Tool call 1: shell,” then “Tool call 7: shell,” then “Final Answer,” then “Grader.”
	•	Ellipses between them indicate many similar VMs.

The presenter on stage wears a black shirt, dark gray jeans, and white sneakers, holding a clicker. Behind her, tall glass windows reveal greenery and modern architecture outside.
7
a theme that’s forming across sessions — RL environments are *extremely* sensitive, they have to look identical to your prod environment

which ofc is why everyone has async cloud agents, that’s no mistake!

Cursor, Codex, etc. cloud agents is most of the way to an RL environment
10
truth hurts
A large slide fills most of the image, projected in a modern glass-walled auditorium. The slide has a pink background with bold red text at the top:

“Managers have been vibe coding forever”

Below is a bulleted list in plain red text:
	•	tell dev to implement a new feature (vibe coding)
	•	dev makes changes to code
	•	manager tests app
	•	manager does not read the code
	•	manager complains about bugs
	•	dev makes changes to fix bugs
	•	manager doesn’t read the code (again)
	•	dev says “done, by now”
	•	manager says “gj but be faster next time” or insults the living hell out of the dev

In the lower-left foreground, a speaker stands behind a black podium labeled:

AI Engineer
Code Summit
Presented by Google DeepMind

The room is bright with floor-to-ceiling windows behind the stage showing greenery and modern architecture outside.
122 21
this guy’s on fire
A large pink slide is displayed in a modern glass-walled auditorium. The slide title, in bold red text, reads:

What is MCP?

Below it is a bulleted list of humorous expansions:
	•	Marketing Charged Protocol
	•	Mythical Compatibility Promise
	•	Manufactured Complexity Pipeline
	•	A fancy word for API

In the lower-left foreground, a speaker stands behind a black podium labeled:

AI Engineer
Code Summit
Presented by Google DeepMind

Tall windows behind the stage show greenery, trees, and modern architecture outside.
55 8
1 hour later
Software 2.0 relies on validation

If your code base doesn’t have verification & controls that are as good or better than your senior dev, you’ll get slop
A large pink slide fills most of the image, displayed in a bright glass-walled auditorium. A presenter stands beneath it on a red circular carpet, wearing a dark T-shirt and dark pants, holding a clicker.

Slide title (top, in large red text):
“The Problem: Most Codebases Lack Sufficient Verifiability”

Subheading in smaller text:
“Humans work around incomplete infrastructure. AI agents cannot.”

The slide is divided into two rounded pink boxes:

⸻

Left box: “What Humans Can Handle”

A bulleted list in red text:
	•	60% test coverage (“I’ll test manually”)
	•	Outdated docs (“I’ll ask the team”)
	•	No linters/formatters (“I’ll review it”)
	•	Flaky builds (“I’ll retry”)
	•	Complex setup (“I’ll help onboard”)
	•	Missing observability (“Check logs”)
	•	No security scanning (“We’ll catch it later”)
	•	Inconsistent patterns (“I know the history”)

⸻

Right box: “What Breaks AI Agents”

Bulleted list with each line marked by a red “X”:
	•	No tests → can’t validate correctness
	•	Outdated docs → makes wrong assumptions
	•	No quality checks → generates bad code
	•	Flaky builds → can’t verify changes
	•	Complex setup → can’t reproduce environment
	•	No observability → can’t debug failures
	•	No security checks → introduces vulnerabilities
	•	No standards → creates inconsistency

⸻

At the bottom in a wide pink bar:
“Most organizations have partial infrastructure across the eight pillars. AI agents need systematic coverage to succeed.”

Tall windows behind the stage reveal greenery and modern architecture outside.
34 6
39 likes 4 reposts

More like this

×