Bluesky Thread

huge 1T+ models are fascinating bc they’re like tree rings. they take so long...

April 06, 2025 View original thread

huge 1T+ models are fascinating bc they’re like tree rings. they take so long to train that several evolutions of LLM architecture happen during the process

in this case, DeepSeek in January was unignorable, but Behemoth was likely too deep into training to change course. hence scout & maverick

xjdr (@_xjdr)
Hmmm …
Maverick is DeepSeek shaped
Behemoth is GPT4 / Opus shaped

I wonder why?

34 5

part of the calculation — if they pause training on behemoth to do scout & maverick, maybe some new post-training innovations will land while behemoth waits

More like this