QLoRA is LoRA fine-tuning applied to a base model that has been quantized to 4-bit precision (typically NF4 — NormalFloat 4-bit). This combination drastically reduces VRAM consumption during fine-tuning while preserving most of the accuracy of full-precision LoRA.
The base model weights are stored in 4-bit format and dequantized on-the-fly during the forward pass. The trainable LoRA adapters remain in higher precision (FP16 or BF16) so gradient computation is well-conditioned. Memory savings are substantial: a 70B-parameter model in QLoRA fits in 48 GB, where full LoRA would require 80 GB and full fine-tuning would require 600+ GB.
QLoRA democratized large-model fine-tuning by making it possible on workstation cards. A 7B-parameter QLoRA fine-tune fits comfortably on a 16 GB card; a 13B fits on 24 GB. It remains the dominant approach when VRAM is the binding constraint.