Workload Concept

Mixed Precision

Training technique using FP16/BF16 for most operations while keeping critical paths in FP32.

Mixed precision is a training technique where most operations (forward pass, backward pass, weight matrix multiplications) run at FP16 or BF16 for speed and memory savings, while critical operations (loss scaling, weight updates, certain normalization layers) remain in FP32 for numerical stability.

The technique was popularized by NVIDIA's Apex/AMP and PyTorch's torch.cuda.amp. It typically delivers 2-3x training speedup over pure FP32 with minimal accuracy loss when implemented carefully. BF16-based mixed precision avoids the loss-scaling complexity that FP16 requires due to its wider exponent range.

Modern training frameworks (PyTorch, JAX, MLX) handle mixed precision largely automatically. AIMC's fit-score assumes mixed-precision training as the default workload profile for LLM training; pure FP32 training is rare in 2026.

Related Terms

Concepts directly relevant to Mixed Precision.

FP16

16-bit half-precision floating-point format used heavily in deep learning.

BF16 (bfloat16)

Alternative 16-bit floating-point format with wider exponent range than FP16.

LLM Training

Compute-intensive process of training large language models from scratch or via continued pre-training.

Workloads Where Mixed Precision Matters

GPU fit analysis for the workloads this concept directly influences.

Llm Training

Ranked GPUs →

This definition is part of AIMC's reference glossary — 36 concepts across 10 categories.

Browse full glossary