PEFT — Parameter-Efficient Fine-Tuning — is the family of techniques that adapt pre-trained models to specific tasks by training only a small subset of parameters, leaving the bulk of the original weights frozen. The approach contrasts with full fine-tuning, which updates every parameter and requires VRAM proportional to several times the model size.
PEFT methods include LoRA, QLoRA, prefix tuning, prompt tuning, adapter layers, and IA3. Of these, LoRA and QLoRA dominate production usage in 2026 because they offer the best accuracy-per-VRAM tradeoff and integrate cleanly with existing inference stacks.
The Hugging Face PEFT library is the de-facto standard implementation, supporting all major techniques across PyTorch and JAX. Specialized frameworks like axolotl and Unsloth provide additional throughput optimizations, often delivering 2-5x faster training than vanilla PEFT.