DeepSeek is shipping a theorem prover (automate math proofs)
no paper yet, but word is they used MCTS, which would be surprising bc one of my big takeaways from the R1 paper was that MCTS didn’t work and RL alone was enough
huggingface.co/deepseek-ai/...
DeepSeek is shipping a theorem prover (automate math proofs)
View original threadMCTS = Monte Carlo Tree Search = test time compute that searches a tree of possible answers and uses a second “reward model” to verify or rate results
RL = reinforcement learning = use post training to teach it to chain of thought autonomously
RL = reinforcement learning = use post training to teach it to chain of thought autonomously
13
same general config as V3: huggingface.co/deepseek-ai/...
training script: huggingface.co/deepseek-ai/...
i don’t see MCTS here, but i might have missed it
training script: huggingface.co/deepseek-ai/...
i don’t see MCTS here, but i might have missed it
3
the model weights are there, in the open, but so huge i don’t stand a chance at running them
2
previous paper that likely explains how the verifier part of the MCTS flow works
bsky.app/profile/timk...
bsky.app/profile/timk...
🚨New DeepSeek Model Incoming🚨
but first they release the paper describing generative reward modeling (GRM) via Self-Principled Critique Tuning (SPCT)
looking forward to DeepSeek-GRM!
arxiv.org/abs/2504.02495
but first they release the paper describing generative reward modeling (GRM) via Self-Principled Critique Tuning (SPCT)
looking forward to DeepSeek-GRM!
arxiv.org/abs/2504.02495
3