BF16, or bfloat16, is a 16-bit floating-point format developed by Google Brain. It uses the same 8-bit exponent as FP32 (giving it the same numerical range) but reduces the mantissa to 7 bits. This makes it more robust for training large models where gradient ranges can span many orders of magnitude.
BF16 trades precision for dynamic range relative to FP16. Where FP16 can underflow on small gradient values and require loss scaling tricks, BF16 typically does not. Most modern datacenter GPUs (H100, A100, MI300X, B200) support BF16 at the same throughput as FP16.
For LLM training in particular, BF16 has become a de facto standard, especially for models above ~10B parameters. AIMC tracks the FP16 throughput rating for each GPU; BF16 throughput is typically equivalent on supported hardware.