Bluesky Thread

Gemini 2.5 tech report is out!

View original thread
Gemini 2.5 tech report is out!

the tech report goes into great detail on the training of gemini, i’ll do my take later when i have time

they also announced gemini-2.5-flash-lite, which they note is not the same as gemini-2.5-torch

blog.google/products/gem...
blog.google
We’re expanding our Gemini 2.5 family of models
Gemini 2.5 Flash and Pro are now generally available, and we’re introducing 2.5 Flash-Lite, our most cost-efficient and fastest 2.5 model yet.
23 4
6 hours later
overall impression: The infrastructure

one big reason you should pay attention to Google: TPUv5 delivers 2x compute per Watt than v4

but also, they tweaked some algorithms to get rid of the I/O bottleneck, so their training run was 93% efficient (incredible!)
1
these TPUs are worth paying attention to because the chip design is done mostly by AI

as each successive generation of chip is produced, it produces new stronger models that in turn produce dramatically more capable hardware

hard to understate why that’s important

research.google/blog/chip-de...
research.google
Chip Design with Deep Reinforcement Learning
Posted by Anna Goldie, Senior Software Engineer and Azalia Mirhoseini, Senior Research Scientist, Google Research, Brain Team Update, June 9, 202...
2
k-sparse logits: a multi-pronged optimization

the gist: they store a lot less data (sparse logits) when storing distillation data

that means the distillation data is small enough that they’re no longer I/O bound, the network transfer of the data is faster than the training compute
1
hierarchical checkpoints

this is cool — pro, flash & lite are kinda the same model but with some experts removed

- attention blocks: identical
- experts: remove like 50% + short distill

they don’t re-do pre & post training, they just do light distilling to sooth the shock of removing experts
2
flash-lite (i’m calling it torch) is just an 8-bit quant of flash. less memory, less transistors required for compute = lighter model
2
23 likes 4 reposts

More like this

×