Bluesky Thread

recently we got

October 21, 2025 View original thread

recently we got

1. DeepSeek Sparse Attention (DSA) which solved the cost angle
2. DeepSeek-OCR which solved the performance angle

also there’s memory like Letta, and then cartridges pushing it into latent space

seems obvious that we’re rapidly approaching the 1B “cognitive core” model

55 4

i.e. a model that has just enough smarts to know what to do with 1B token context

add in highly sparse MoE models, like gpt-oss:120b only computing 3b active params

this stuff is going to run on a wrist watch real fast. gosh, why even bother with phones..

15 1

but context — all that data is hard to keep secure and available. that’s always been the case, and i can’t see that changing

data services.. hard to know what the shape will be, but if you can seamlessly keep data secure and available, you’re gonna have a job for a while

More like this