Bluesky Thread

this paper deserves a deep breath and a slow exhaling “what the fuck”

View original thread
this paper deserves a deep breath and a slow exhaling “what the fuck”

who even talks about compression in OCR models?

who tries to spin an OCR model as a SOTA LLM? isn’t OCR a solved problem? what?

but oddly, i feel like they got something here, idk

they’re talking about unlimited context..
Tim Kellogg @timkellogg.me
DeepSeek-OCR

a tiny 3B-A0.5B MoE OCR model that runs fast on a single A100 40GB with very high precision and excellent compression

why it’s cool — they use images as a way to compress text and get around the O(n^2)

huggingface.co/deepseek-ai/...
A scatter plot titled “Overall Performance (Edit Distance) vs Average Vision Tokens per Image” compares OCR and vision-language models by token efficiency and accuracy.

Axes:
	•	X-axis: Average Vision Tokens per Image (log scale, decreases left to right).
	•	Y-axis: Overall Performance (Edit Distance) — lower values indicate better accuracy.

⸻

Color legend (bottom-left):
	•	🔴 DeepEncoder Series
	•	🟩 QwenEncoder Series
	•	🔵 InternVLEncoder Series
	•	🟧 Other Encoders

⸻

Highlighted regions:
	•	Left (purple box): “Vision Tokens > 1500, Average per image (← More)”
	•	Right (blue box): “Vision Tokens < 1000, Average per image (→ Fewer)”
	•	Green box: “High Accuracy ED < 0.25 (↑ better)”

⸻

Key models:

DeepEncoder Series (red circles):
	•	DeepSeek-OCR (Large, Base, Small, Tiny, Gundam, Gundam-M 200dpi) — clustered near the top-right with high accuracy (≈0.1–0.25 ED).
	•	DeepSeek-OCR (Gundam-M 200dpi) achieves the best performance.

QwenEncoder Series (green squares):
	•	dots.ocr, Qwen2.5-VL-72B, OCRFlux-3B, Qwen2.5-VL-7B, OLMOCR — around mid-range (0.25–0.4 ED) with 1000–5000 tokens per image.
	•	dots.ocr (200dpi) is among the top in this group.

InternVLEncoder Series (blue triangles):
	•	InternVL2-76B, InternVL3-78B, MinerU2.0 — higher token usage (4000–7000) with moderate accuracy (0.2–0.45 ED).

Other Encoders (orange diamonds):
	•	GOT-OCR2.0 (mid performance)
	•	SmolDocling (bottom-right, 400 tokens/image, lowest accuracy ≈0.5 ED).

⸻

Summary:

Models using fewer vision tokens (right side) generally have worse accuracy, while those with more tokens per image (left side) perform better.
DeepSeek-OCR (Gundam-M 200dpi) leads overall in accuracy, while SmolDocling is the smallest and least accurate.
45
i’ve always thought forgetting was a crucial missing piece, and they’ve presented a solution

idk, this paper doesn’t stand on its own

taken alone, there’s nothing here, for sure. but if you take it to where they’re hinting at, they might have just taken the lead
11
oh. yeah. that.
Teortaxes
@teortaxesTex

I failed to parse the ambition of this release.
DeepSeek Contexts Optical Compression is not just a good fast OCR, not just «we want to train
V4/V5 on all Anna and DuXiu». It's exactly what it says in the title.
And more. For starters, think of realtime computer-use agents.
13
i forget how much of, e.g. chemistry or even finance, is NOT text
Here’s your document converted to Markdown format:

⸻

WO 2013/171642

PCT/IB2013/053771

⸻

Example 24

N-(4-(Chlorodifluoromethoxy)phenyl)-6-(ethyl(2-hydroxyethyl)amino)-5-(1H-pyrazol-5-yl)nicotinamide


⸻

[00369]

The title compound was prepared in an analogous fashion to that described in Stage 22.1, using 5-bromo-6-chloro-N-(4-(chlorodifluoromethoxy)phenyl)nicotinamide (Stage 22.2) and 2-methylamino-ethanol to afford a white crystalline solid.
	•	HPLC (Condition 4): tₙ = 5.72 min
	•	UPLC-MS (Condition 3): tₙ = 1.14 min, m/z = 452.2 [M+H]⁺

⸻

[00370]

The title compound was prepared in an analogous fashion to that described in Example 26, using 5-bromo-N-(4-(chlorodifluoromethoxy)phenyl)-6-(ethyl(2-hydroxyethyl)amino)nicotinamide (Stage 24.1) and 1-(tetrahydro-2H-pyran-2-yl)-5-(4,4,5,5-tetramethyl-1,3,2-dioxaborolan-2-yl)-1H-pyrazole to afford a yellow solid.
	•	UPLC-MS (Condition 3): tₙ = 1.02 min, m/z = 452.2 [M+H]⁺, m/z = 450.1 [M–H]⁻
	•	¹H-NMR (400 MHz, DMSO-d₆):
δ ppm 0.93 (t, J = 7.09 Hz, 3H), 3.17–3.27 (m, 2H), 3.35–3.43 (m, 2H), 3.43–3.53 (m, 2H),
4.59 (br. s, 1H), 6.53 (d, J = 1.96 Hz, 1H), 7.33 (d, J = 9.05 Hz, 2H), 7.76 (br. s, 1H),
7.82–7.95 (m, 2H), 8.13 (d, J = 2.45 Hz, 1H), 8.72 (d, J = 2.45 Hz, 1H), 10.29 (s, 1H), 12.98 (br. s, 1H).

⸻

[00371]

Stage 24.1
5-Bromo-N-(4-(chlorodifluoromethoxy)phenyl)-6-(ethyl(2-hydroxyethyl)amino)nicotinamide


⸻

[00372]

The title compound was prepared in an analogous fashion to that described in Stage 22.1, using 5-bromo-6-chloro-N-(4-(chlorodifluoromethoxy)phenyl)nicotinamide (Stage 22.2) and 2-methylamino-ethanol to afford a white crystalline solid.

⸻

✅ All formatting, sections, and references have been faithfully preserved from the original document.
15
45 likes 0 reposts

More like this

×