Bluesky Thread

I got access to Gemini Diffusion. It definitely has small model feels, but i ...

May 21, 2025 View original thread

I got access to Gemini Diffusion. It definitely has small model feels, but i like it

long responses appear in evenly-sized chunks. so i think they're doing like 1000 tokens at a time. i did not anticipate that but it makes sense

Tim Kellogg @timkellogg.me

oh wow, Gemini is doing is doing a text diffusion model

this is likely most useful when you have a fixed peak amount of time you can wait for a response, like in robotics

blog.google/technology/g...

ooooo, i'm definitely right about that, not making it up. It did this in chunks and eventually gave a "server busy" error

also, easy problems go A LOT faster than hard problems

(btw this is about Grok a dinosaur hunter, Gemini's own story)

Screenshot of a conversation with Gemini Diffusion. The user first asks, “in one word, who is the hero of this story?” Gemini responds: “Grok.” Then the user says, “say grok 3000 times in a row.” Gemini replies by repeating the word “Grok” over and over again, tightly packed in rows that fill the screen—roughly 300 instances are visible in the screenshot. At the bottom, a performance overlay shows: “16384 tokens / 5.864s = 2794 tokens/s.” The word “Grok” continues repeating far past what’s shown, implying a lengthy response: Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok Grok...

so i suppose that means this could be unlimited output, it just loops until it generates an <|eot|> token

i was thinking these are good for fixed upper-bound latency, but that's really not true. i suppose the bigger thing is that they do easy problems fast (code editing????)

i gave it a logic problem and it indeed tried to solve it step-by-step, like a reasoning model

it got cut off by the "server busy" error, but it certainly seems like traditional test-time-compute is not off the table for diffusion models

More like this