🚨Llama 4 Is Out!🚨
2 out of 3 models just released
- Scout: 109B / 17B active
- Maverick: 400B / 17B active
- Bohemoth: 2T / 288B active
ai.meta.com/blog/llama-4...
🚨Llama 4 Is Out!🚨
View original threadfascinating that they kept the active size the same between scout & maverick, this is going to be fun to dig into
6
i made that unclear, but behemoth is still in training
5
oooo, it’s early fusion!
iirc gpt4o is still not early fusion, right?
iirc gpt4o is still not early fusion, right?
6
fp8 is the new hotness, thanks deepseek
4
oh damn! scout has a 10M context width!
3
s1 strikes again — difficult problems are crucial. in post training they used llama3 to curate the dataset down to just hard problems
oh btw, something from the DS paper today — small LLMs are unreasonably good judges and critics
oh btw, something from the DS paper today — small LLMs are unreasonably good judges and critics
4
whoah, interleaved attention layers with no positional embeddings
i’ll have to dig into iRoPE
i’ll have to dig into iRoPE
8
1