Workload Concept

LLM Training

Compute-intensive process of training large language models from scratch or via continued pre-training.

LLM training is the process of training a large language model — typically a transformer architecture with billions of parameters — on large text corpora. Frontier-model training requires tens of thousands of GPU-hours on top-tier hardware; even fine-tuning a 70B-parameter model usually requires multi-GPU systems with high-bandwidth interconnects.

The compute profile is dominated by matrix multiplications at FP16 or BF16 precision, with backward-pass gradients adding roughly 2-3x the forward-pass memory and compute. Datacenter-class GPUs (H100, H200, B200, A100, MI300X) are the typical hardware tier; consumer cards rarely have enough VRAM and lack the NVLink interconnect needed for multi-GPU training at scale.

AIMC tracks LLM training as one of 10 workloads with a 24 GB VRAM minimum and 500 FP16 TFLOPS threshold. The /for/llm-training hub ranks every viable GPU in the index by fit score.