Bluesky Thread

huge 1T+ models are fascinating bc they’re like tree rings. they take so long...

View original thread
huge 1T+ models are fascinating bc they’re like tree rings. they take so long to train that several evolutions of LLM architecture happen during the process

in this case, DeepSeek in January was unignorable, but Behemoth was likely too deep into training to change course. hence scout & maverick
xjdr (@_xjdr)
Hmmm …
Maverick is DeepSeek shaped
Behemoth is GPT4 / Opus shaped

I wonder why?
34 5
part of the calculation — if they pause training on behemoth to do scout & maverick, maybe some new post-training innovations will land while behemoth waits
34 likes 5 reposts

More like this

×