Mixed precision is a training technique where most operations (forward pass, backward pass, weight matrix multiplications) run at FP16 or BF16 for speed and memory savings, while critical operations (loss scaling, weight updates, certain normalization layers) remain in FP32 for numerical stability.
The technique was popularized by NVIDIA's Apex/AMP and PyTorch's torch.cuda.amp. It typically delivers 2-3x training speedup over pure FP32 with minimal accuracy loss when implemented carefully. BF16-based mixed precision avoids the loss-scaling complexity that FP16 requires due to its wider exponent range.
Modern training frameworks (PyTorch, JAX, MLX) handle mixed precision largely automatically. AIMC's fit-score assumes mixed-precision training as the default workload profile for LLM training; pure FP32 training is rare in 2026.