LoRA (Low-Rank Adaptation)

Parameter-efficient fine-tuning method that adds small trainable matrices to a frozen base model.

LoRA, or Low-Rank Adaptation, is a fine-tuning technique that adds small low-rank decomposition matrices to a pre-trained model while keeping the base weights frozen. Instead of updating billions of base parameters, LoRA trains only the rank-decomposition matrices — typically 0.1% to 1% of the original parameter count.

The technique drastically reduces VRAM requirements: full fine-tuning of a 70B-parameter model requires hundreds of GB of VRAM (model weights plus optimizer state plus gradients plus activations), while LoRA fine-tuning of the same model can fit in a single 80 GB datacenter card.

LoRA adapters are typically just tens of megabytes — small enough to swap dynamically at inference time, enabling multi-tenant serving where one base model serves many specialized variants. The technique has become the dominant approach for production fine-tuning since 2023, with libraries like PEFT, axolotl, and Unsloth providing optimized implementations.

Related Terms

Concepts directly relevant to LoRA (Low-Rank Adaptation).

QLoRA (Quantized LoRA)

LoRA fine-tuning combined with 4-bit quantization of the base model — dramatically reduces VRAM needs.

PEFT (Parameter-Efficient Fine-Tuning)

Umbrella term for techniques that fine-tune large models by updating only a small fraction of parameters.

LLM Training

Compute-intensive process of training large language models from scratch or via continued pre-training.

Workloads Where LoRA (Low-Rank Adaptation) Matters

GPU fit analysis for the workloads this concept directly influences.

This definition is part of AIMC's reference glossary — 36 concepts across 10 categories.

Browse full glossary