LoRA, or Low-Rank Adaptation, is a fine-tuning technique that adds small low-rank decomposition matrices to a pre-trained model while keeping the base weights frozen. Instead of updating billions of base parameters, LoRA trains only the rank-decomposition matrices — typically 0.1% to 1% of the original parameter count.
The technique drastically reduces VRAM requirements: full fine-tuning of a 70B-parameter model requires hundreds of GB of VRAM (model weights plus optimizer state plus gradients plus activations), while LoRA fine-tuning of the same model can fit in a single 80 GB datacenter card.
LoRA adapters are typically just tens of megabytes — small enough to swap dynamically at inference time, enabling multi-tenant serving where one base model serves many specialized variants. The technique has become the dominant approach for production fine-tuning since 2023, with libraries like PEFT, axolotl, and Unsloth providing optimized implementations.