Bluesky Thread

astonishing: using fp16 instead of bf16 results in more stable training runs ...

October 31, 2025 View original thread

astonishing: using fp16 instead of bf16 results in more stable training runs as well as a smaller performance gap between training & inference

this is critical for RL, which is mostly inference and very sensitive to reproducible results

A 3×4 grid of line charts comparing BF16 (blue) and FP16 (green) across various training setups.
Each subplot shows Training Steps (x-axis) vs. a performance metric (y-axis, 0.0–1.0).

Top row:
(a) Sanity GRPO – FP16 consistently outperforms BF16; smooth upward trend to ~0.95.
(b) Sanity GRPO-Token-TIS – similar behavior; FP16 steadier and higher.
(c) Sanity GRPO-Seq-MIS – FP16 reaches ~0.95; BF16 lags below 0.85.
(d) Sanity GSPO – FP16 again higher and smoother; BF16 converges slower.

Middle row:
(e) Sanity PG-Seq-IS and (f) Sanity PG-Seq-MIS – both show FP16 > BF16; FP16 reaches near 1.0.
(g) OctoThinker GRPO – FP16 stable near 1.0; BF16 spikes early, then collapses.
(h) Lora GRPO-Token-TIS – FP16 stable around 0.8; BF16 fluctuates sharply and drops after ~800 steps.

Bottom row:
(i) MoE GRPO-Seq-MIS, (j) MoE GRPO-Token-TIS, and (k) MoE PG-Seq-TIS – both precisions close; FP16 slightly faster early on.
(l) Dense-14B DAPO – both curves rise smoothly; FP16 maintains a small advantage.

In nearly all cases, FP16 (green) trains more stably and converges to higher performance than BF16 (blue).

38 1

fyi

bf16: 16-bit brain float, i.e. Google Brain. Preceded (initiated) fp16 but is widely implemented in hardware. Very popular.

fp16: 16-bit IEEE standard, uses smaller dynamic range and dedicates more bits to precision

12 1

link arxiv.org/abs/2510.26788

arxiv.org

Defeating the Training-Inference Mismatch via FP16

Reinforcement learning (RL) fine-tuning of large language models (LLMs) often suffers from instability due to the numerical mismatch between the training and inference policies. While prior work has a...

3 hours later

in case it’s still not clear bsky.app/profile/dori...

Alexander Doria @dorialexander.bsky.social

ml halloween costume concept

More like this