Capability · Comparison

Full Fine-Tuning vs LoRA

LoRA (Low-Rank Adaptation) has become the default fine-tuning method for LLMs because it's 10-100x cheaper and produces portable adapters. Full fine-tuning still has a place — when you need to deeply change model behavior or when the base model is small enough to fine-tune cheaply. The choice is mostly about cost, quality ceiling, and deployment flexibility.

Side-by-side

Criterion Full Fine-Tuning LoRA
Parameters updated All (billions) ~0.1-1% (low-rank adapters)
GPU memory (70B model) ~1.2TB for training ~140GB — fits on 2xH100
Training time Days to weeks Hours to days
Cost $10,000s per run $10s-$1,000s per run
Quality ceiling Highest (if data is good) Slightly lower in theory, close in practice
Deployment Full model weights Base + small adapter (hot-swappable)
Catastrophic forgetting risk High if not managed Low — base weights frozen
Composable with other adapters No Yes — swap or merge multiple LoRAs
Use case fit Deep domain adaptation, style rewrites at scale Task-specific tuning, multi-tenant adapters

Verdict

LoRA (and its variants like QLoRA, DoRA) is the default for ~95% of LLM fine-tuning work in 2026. It's cheap, fast, portable, and quality is close to full fine-tuning on most tasks. Full fine-tuning is worth the cost only when you have enough high-quality data to justify it and LoRA genuinely plateaus — in practice, that's rare. For small models (7B-13B) full FT can be reasonable; for 70B+ models, LoRA is almost always the right call.

When to choose each

Choose Full Fine-Tuning if…

  • You have massive, high-quality domain data (>100k examples).
  • You need to deeply change model behavior, not add a skill.
  • Your model is small enough (7B-13B) that full FT is affordable.
  • LoRA has been tried and quality plateaued.

Choose LoRA if…

  • You want to fine-tune without a dedicated ML platform team.
  • You need to ship multiple tenant/task-specific adapters from one base.
  • Your data set is small to medium (<50k examples).
  • You want to iterate fast and keep the base model unchanged.

Frequently asked questions

How much worse is LoRA than full fine-tuning?

On most tasks, indistinguishable. Published studies show LoRA reaches 95-99% of full-FT quality with 1% of the compute. The gap grows when data is very large and the task is distant from pretraining.

What's QLoRA?

LoRA on top of a quantized (typically 4-bit) base model. Enables fine-tuning 70B-class models on a single consumer GPU. Now the default for cost-conscious fine-tuning.

Can I merge a LoRA back into the base model?

Yes — mathematically it's a matrix multiplication. Most inference servers (vLLM, TGI) support both merged and hot-swap modes.

Sources

  1. LoRA paper (Hu et al., 2021) — accessed 2026-04-20
  2. QLoRA paper (Dettmers et al., 2023) — accessed 2026-04-20