DeepSeek-OCR
a tiny 3B-A0.5B MoE OCR model that runs fast on a single A100 40GB with very high precision and excellent compression
why it’s cool — they use images as a way to compress text and get around the O(n^2)
huggingface.co/deepseek-ai/...
DeepSeek-OCR
View original thread
52
2
instead of focusing only on accuracy, they also focus on visual compression
“for a document containing 1000 words, how many vision tokens are at least needed for decoding? This question holds significant importance for research in the principle that ‘a picture is worth a thousand words.’"
“for a document containing 1000 words, how many vision tokens are at least needed for decoding? This question holds significant importance for research in the principle that ‘a picture is worth a thousand words.’"
12
Textual Forgetting
An advantage of using images to represent text is you can reduce the resolution, “forgetting” things that happened further in the past
The implication is that you could scale this up to very long (text) contexts
An advantage of using images to represent text is you can reduce the resolution, “forgetting” things that happened further in the past
The implication is that you could scale this up to very long (text) contexts
12
1
This paper is disorienting to me. I’m not sure if it’s a revolutionary breakthrough or bullshit. I’m leaning towards the first
The question seems to be if this can scale up to >1T
But also, by processing text via images, helpful inline diagrams are actually helpful
Time will tell
The question seems to be if this can scale up to >1T
But also, by processing text via images, helpful inline diagrams are actually helpful
Time will tell
15