Bluesky Thread

it’s true, the last two Qwens have been beautiful works of art, until you tal...

September 12, 2025 View original thread

it’s true, the last two Qwens have been beautiful works of art, until you talk to them. gpt-oss same, even GPT-5 to some extent

specifically this is referring to model architecture. cool new optimizers (K2) don’t seem to do it

Teortaxes
（DeepSeek 推特...
•10h g
deeply disappointing pattern: the fancier the model, the higher its efficiency, the worse the personality. Qwen3-next is maybe smarter than 30B-3AB but even more insufferably slopped.
GPT-OSS is a dead robot. It's not just benchmaxxing, something more is going on here I think.

24 2

my hunch is there’s something similar between overly math RL’d models and overly sparse models, because you get a similar result with too much math-only RL too

i bet RL on a narrow set of tasks shuts down a lot of pathways/circuits, and sparse MoE is same but with those pathways physically removed

6 1

8 1

More like this