Bluesky Thread

Sparse Circuits

View original thread
Sparse Circuits

a new mech interp paper from OpenAI proposes a way to train models so that they’re natively easier to understand

openai.com/index/unders...
openai.com
Understanding neural networks through sparse circuits
We trained models to think in simpler, more traceable steps—so we can better understand how they work.
23 3
it’s exactly what it sounds like. a much larger model is trained such that its interconnections can be disentangled and easier to trace and observe
But what if we trained untangled neural networks, with many more neurons, but where each neuron has only a few dozen connections? Then maybe the resulting network will be simpler, and easier to understand. This is the central research bet of our work.
4
23 likes 3 reposts

More like this

×