Bluesky Thread

this is nuts

View original thread
this is nuts

a new 7B llama-style LLM for embedding of genomes & detection of pathogens in wastewater

i’ve had a hunch that LLMs could lead to some big bio breakthroughs, since it feels like genes & proteins are a lot like a language
Ollie Liu @oliu-io.bsky.social
📈METAGENE-1 achieves state-of-the-art results in:
- Pathogen detection
- Genomic embedding benchmarks
- Generalization to multi-species tasks
It already shows promise in public health and biosurveillance, and we are collaborating with experts to unlock its full impact.
🧵5/
31
7 hours later
does this mean we can chat with DNA/RNA?

no, and that’s kind of a flaw in multimodal models. to train it, they’d need DNA -> text annotation pairs, but the whole problem being solved here is that our languages don’t describe DNA well
1
so to use this, you’ll already have high throughput sequencing infra setup, and then you’ll do things like use the DNA embeddings to compare traits of the strands, and generally get more of an understanding of how it works via contrastive methods
1 hour later
my dream: doctor’s office microbiome sequencing

the doc doesn’t just tell you have “a cold”, or “a virus” and shuffle you off. they give you a probiotic or tell you how to eat differently, because they actually understand what’s going on in your body, unlike today
1
i think this model brings us close to that

sequencing tech is kinda expensive but available and fast enough

there’s a few gaps beyond that, but we’re getting there
31 likes 0 reposts

More like this

×