Bluesky Thread

oh, this is a wild new take on AI development

View original thread
oh, this is a wild new take on AI development

Prime Intellect offers pre-built and shareable RL environments

these are pre-built harnesses to train your agentic model to do some type of task. big labs all use their vast resources to build their own

www.primeintellect.ai/blog/environ...
www.primeintellect.ai
Environments Hub: A Community Hub To Scale RL To Open AGI
RL environments are the playgrounds where agents learn. Until now, they’ve been fragmented, closed, and hard to share. We are launching the Environments Hub to change that: an open, community-powered platform that gives environments a true home.Environments define the world, rules and feedback loop of state, action and reward. From games to coding tasks to dialogue, they’re the contexts where AI learns, without them, RL is just an algorithm with nothing to act on.
44 2
think simulator + world model for robots

or generated MCP servers for code agents

or fake web apps for computer use

then you only have to implement the training script & reward function. just use a pre-built harness from Prime Intellect
8
Karpathy’s take
Andrej Karpathy
@karpathy
X.com
In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from.
In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit like what you'd see on Stack Overflow / Quora, or etc., but geared towards LLM use cases.
Neither of the two above are going away (imo), but in this era of reinforcement learning, it is now environments. Unlike the above, they give the LLM an opportunity to actually interact - take actions, see outcomes, etc. This means you can hope to do a lot better than statistical expert imitation. And they can be used both for model training and evaluation. But just like before, the core problem now is needing a large, diverse, high quality set of environments, as exercises for the LLM to practice against.
In some ways, I'm reminded of OpenAl's very first project (gym), which was exactly a framework hoping to build a large collection of environments in the same schema, but this was way before LLMs. So the environments were simple academic control tasks of the time, like cartpole, ATARI, etc. The @PrimeIntellect environments hub (and the verifiers' repo on GitHub) builds the modernized version specifically targeting LLMs, and it's a great effort/idea. I pitched that someone build something like it earlier this year:
x.com/karpathy/statu...
Environments have the property that once the skeleton of the framework is in place, in principle the community / industry can parallelize across many different domains, which is exciting.
Final thought - personally and long-term, lam bullish on environments and agentic interactions but l am bearish on reinforcement learning specifically. I think that reward functions are super sus, and I think humans don't use RL to learn (maybe they do for some motor tasks etc, but not intellectual problem solving tasks).
Humans use different learning paradigms that are significantly more powerful and sample efficient and that haven't been properly invented and scaled yet, though early sketches and ideas exist (as just one example, the idea of "system prompt learning", moving the update to tokens/ contexts not weights and optionally distilling to weights as a separate process a bit like sleep does).
10
44 likes 2 reposts

More like this

×