Bluesky Thread

DeepSeek is shipping a theorem prover (automate math proofs)

April 30, 2025 View original thread

DeepSeek is shipping a theorem prover (automate math proofs)

no paper yet, but word is they used MCTS, which would be surprising bc one of my big takeaways from the R1 paper was that MCTS didn’t work and RL alone was enough

huggingface.co/deepseek-ai/...

huggingface.co

deepseek-ai/DeepSeek-Prover-V2-671B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

32 8

MCTS = Monte Carlo Tree Search = test time compute that searches a tree of possible answers and uses a second “reward model” to verify or rate results

RL = reinforcement learning = use post training to teach it to chain of thought autonomously

same general config as V3: huggingface.co/deepseek-ai/...

training script: huggingface.co/deepseek-ai/...

i don’t see MCTS here, but i might have missed it

huggingface.co

configuration_deepseek.py · deepseek-ai/DeepSeek-Prover-V2-671B at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

the model weights are there, in the open, but so huge i don’t stand a chance at running them

previous paper that likely explains how the verifier part of the MCTS flow works

bsky.app/profile/timk...

Tim Kellogg @timkellogg.me

🚨New DeepSeek Model Incoming🚨

but first they release the paper describing generative reward modeling (GRM) via Self-Principled Critique Tuning (SPCT)

looking forward to DeepSeek-GRM!

arxiv.org/abs/2504.02495

More like this