Skip to content

feat(graph): bf16/fp16 Parameter.AddGradient (unblock bf16 autograd training)#153

Merged
dndungu merged 1 commit into
mainfrom
feat/bf16-addgradient
Jun 16, 2026
Merged

feat(graph): bf16/fp16 Parameter.AddGradient (unblock bf16 autograd training)#153
dndungu merged 1 commit into
mainfrom
feat/bf16-addgradient

Conversation

@dndungu

@dndungu dndungu commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Parameter.AddGradient/ClearGradient host gradient-accumulation type switches covered float32/float64/ints but no reduced-precision float case, so bf16 (or fp16) autograd training failed at the first backward with AddGradient unsupported for this numeric type; use engine ops instead. This is the gating blocker for bf16 CrossAsset training (its layer backwards — LayerNorm/Linear/Bias — accumulate grads through this path).

Fix

Add float16.BFloat16 and float16.Float16 cases: accumulate through f32 and round on store (matching how every bf16 op publishes its result). bf16 shares f32's exponent range, so no overflow vs f32.

Tests

  • TestParameter_AddGradient_BFloat16 / _Float16 (round-trip via the shared helper)
  • TestParameter_AddGradient_BFloat16_Value — asserts two accumulations sum exactly (0.5+0.25→0.75, 1.0+0.5→1.5, both bf16-exact).
  • f32/f64/int CPU paths byte-identical (untouched).

General framework fix — any bf16/fp16 consumer benefits, nothing Wolf-specific. Follow-up to v1.13.0 (native bf16 GPU kernels).

The host gradient-accumulation type switch in AddGradient/ClearGradient
had float32/float64/ints but no reduced-precision float case, so any
bf16 (or fp16) autograd training failed at the first backward with
'AddGradient unsupported for this numeric type'. Add float16.BFloat16 and
float16.Float16 cases: accumulate through f32 and round on store (matching
how every bf16 op publishes its result). bf16 shares f32's exponent range,
so no overflow vs f32. Unblocks bf16 CrossAsset training (its layer
backwards accumulate grads here). General framework fix -- any bf16/fp16
consumer benefits; CPU f32/f64/int paths byte-identical.
@dndungu dndungu merged commit cfa1b45 into main Jun 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant