correct
i’ve been saying this for a couple months. RL is driving towards specialization
my hunch is it’s temporary and something will shift again back towards generalization, but for now.. buckle up!
correct
View original thread
60
5
the most exciting specialization method, imo, is Cartridges
basically take a gigantic context, and then *train* a KV cache
this is wild, who even thinks to train a KV cache? but it works. incredible compression & recall
hazyresearch.stanford.edu/blog/2025-06...
basically take a gigantic context, and then *train* a KV cache
this is wild, who even thinks to train a KV cache? but it works. incredible compression & recall
hazyresearch.stanford.edu/blog/2025-06...
18
4
the neat part with Cartridges is you can do it with any off-the-shelf model with standard inference tooling (uh, needs to be open weights)
it’s just stacking the KV cache, that’s already a supported feature in vLLM & sglang
and you can still prefill on top, so it’s purely context extension
it’s just stacking the KV cache, that’s already a supported feature in vLLM & sglang
and you can still prefill on top, so it’s purely context extension
10
they stuffed 484K tokens into 128K context
6