Bluesky Thread

Large Language Diffusion Models

View original thread
Large Language Diffusion Models

A wildly new AI architecture, this uses diffusion (all tokens at once), not next token prediction

ml-gsai.github.io/LLaDA-demo/
ml-gsai.github.io
SOCIAL MEDIA TITLE TAG
SOCIAL MEDIA DESCRIPTION TAG TAG
40 7
i’m trying to wrap my head around why this is a good thing, but the reliable parts of twitter are taken with it, so i guess i just have to look harder
3
9 hours later
consider my head wrapped, here's the write-up: bsky.app/profile/timk...
Tim Kellogg @timkellogg.me
LLMs That Don't Gaslight You

A new language model uses diffusion instead of next-token prediction. That means the text it can back out of a hallucination before it commits. This is a big win for areas like law & contracts, where global consistency is valued

timkellogg.me/blog/2025/02...
3
40 likes 7 reposts

More like this

×