Fine-tuning is the process of further training a pre-trained language model on a smaller, domain-specific dataset to improve performance on targeted tasks. It sits between using a model off-the-shelf and training one from scratch.
The dominant fine-tuning approaches in 2026 are LoRA (Low-Rank Adaptation), QLoRA (quantized LoRA), and full-weight fine-tuning. LoRA freezes the base model and adds small trainable rank-decomposition matrices, drastically reducing VRAM requirements — a QLoRA fine-tune of Llama 3 8B fits comfortably in 24 GB, where full fine-tuning would need 80+ GB.
Workloads vary widely by approach: simple LoRA fine-tunes can run on workstation cards, while full fine-tuning of 70B+ models requires multi-GPU datacenter setups with NVLink. PEFT libraries from Hugging Face, axolotl, and Unsloth have substantially lowered the hardware bar for production fine-tuning over the past two years.