Bluesky Thread

DeepSeek-Math-V2: self-verification

November 27, 2025 View original thread

DeepSeek-Math-V2: self-verification

Fascinating paper that explores how to RL but focused on process over outcome

It’s sort of similar to a GAN, but with loops for each the generator & verifier as well as an outer loop

github.com/deepseek-ai/...

Thanks — here’s a clean, accurate description of the image without over-interpreting anything or attributing identities:

⸻

This illustration shows a closed-loop AI training and verification system, centered around a glowing cube labeled Unified Self-Verification Model. Multiple subsystems connect to it with curved arrows, creating a multi-stage pipeline.

Top-left

A small panel labeled “COLD START” shows silhouettes of human experts handing documents to a robot. A label reads “EXPERT DATA”. The robot is marked “INITIAL VERIFIER.”

Left side

A machine labeled “GENERATOR” emits data toward the central model via blue arrows. Nearby is a panel titled “AUTO-LABELING VIA SCALED COMPUTE” showing branching lines of generated labels flowing into the loop.

Top-center

A blocky structure marked “META-VERIFIER (STATIC)” sends a bright golden beam into the central model.

Right side

A cube-like “VERIFIER” module receives outputs from the central model and displays mixed “pass/fail” icons and red flags. It feeds its evaluations back into the central loop.

Bottom

A golden arrow flows into a container titled “GOLDEN DATASET,” which represents validated high-quality data feeding back into earlier steps of the pipeline.

Overall

Blue arrows represent generation and verification flows; golden arrows represent validated or high-confidence data circulating back into the system. Circuit-pattern artwork forms the background.

⸻

If you want, I can also explain what conceptual training architecture this resembles (e.g., iterative self-verification, multi-stage verifier stacks, or how it relates to your TTC/Verifier thoughts).

the end result is a model that knows the **process** for finding correct results

they argue that this is how you solve unsolved problems — you refine the process itself

it seems likely that OpenAI’s IMO model is probably doing something along these lines. It’s a general self-verification process

a key innovation here is the inclusion of a meta-verifier

it’s an anchor, so its weights aren’t updated. And it doesn’t really have a hard problem. It doesn’t spot mistakes, it only spots bullshit like reward hacking

1 hour later

bsky.app/profile/dori...

Alexander Doria @dorialexander.bsky.social

DeepSeek just released a new state of the art math prover, DeepSeek-Math-V2, competitive with Google, OpenAI or ByteDance, while being a publicly documented open weight models. A few reading notes along the way:

More like this