if i could wish something into being — a complete decoupling of LLM knowledge vs reasoning
seems like the key would be a “database” model that returns queries in vectors, raw information rather than snippets of documents
if i could wish something into being — a complete decoupling of LLM knowledge...
View original thread
28
1
the innovation behind such a database would be
1. most of the n*n calculation is cacheable across requests so multi gigabyte context is feasible
2. raw vectors include reasoning across docs, it’s raw info inferred from snippets across hundreds of docs
1. most of the n*n calculation is cacheable across requests so multi gigabyte context is feasible
2. raw vectors include reasoning across docs, it’s raw info inferred from snippets across hundreds of docs
3
on my mind — highly sparse MoE models are hot right now, but they’re lame af. you’re forcibly requiring that 7/8ths of your hardware be idle (or 31/32nds, in the case of R1/V3), which is nuts
this is the way: dense reasoning models with huge remote async “knowledge” models
this is the way: dense reasoning models with huge remote async “knowledge” models
4
and RAG still doesn’t go away, you need it for rapidly fluctuating or highly structured data
3