astonishing: using fp16 instead of bf16 results in more stable training runs as well as a smaller performance gap between training & inference
this is critical for RL, which is mostly inference and very sensitive to reproducible results
astonishing: using fp16 instead of bf16 results in more stable training runs ...
View original thread
38
1
fyi
bf16: 16-bit brain float, i.e. Google Brain. Preceded (initiated) fp16 but is widely implemented in hardware. Very popular.
fp16: 16-bit IEEE standard, uses smaller dynamic range and dedicates more bits to precision
bf16: 16-bit brain float, i.e. Google Brain. Preceded (initiated) fp16 but is widely implemented in hardware. Very popular.
fp16: 16-bit IEEE standard, uses smaller dynamic range and dedicates more bits to precision
12
1
3 hours later
in case it’s still not clear bsky.app/profile/dori...
ml halloween costume concept
9