Bluesky Thread

Day 1 of DeepSeek open source week: FlashMLA

View original thread
Day 1 of DeepSeek open source week: FlashMLA

MLA=Multihead Latent Attention

One of the big innovations that made V3 such a notable model

github.com/deepseek-ai/...
github.com
GitHub - deepseek-ai/FlashMLA
Contribute to deepseek-ai/FlashMLA development by creating an account on GitHub.
21 1
fyi here’s a good overview of MLA

medium.com/towards-data...
medium.com
DeepSeek-V3 Explained 1: Multi-head Latent Attention
Key architecture innovation behind DeepSeek-V2 and DeepSeek-V3 for faster inference
1
it’s MIT license.. if deepseek flexes any more they’re gonna tear a muscle
8
21 likes 1 reposts

More like this

×