mxbai-edge-colbert-v0: tiny long context multi-vector embedding models
This report is huge, it gives us:
- Apache 2 17M(!!) and 32M models
- tops LongEmbed benchmark
- reproducible(!!) training pipelines
- extensive ablations to understand ColBERT models
www.mixedbread.com/blog/edge-v0
mxbai-edge-colbert-v0: tiny long context multi-vector embedding models
View original threadtech report has more details: www.mixedbread.com/papers/small...
it’s basically a how-to manual for how to train a SOTA late interaction model
it’s basically a how-to manual for how to train a SOTA late interaction model
1
steps are:
1. contrastive pre-training
2. fine tuning
3. knowledge distillation
they call out distillation as the key that lets their model outperform much larger ones
1. contrastive pre-training
2. fine tuning
3. knowledge distillation
they call out distillation as the key that lets their model outperform much larger ones
but all that is just on *single vector* training data. They start with traditional embeddings and then shift to multi-vector
2
whoah, Muon works for tiny models?! i thought it was only for managing huge models
3