Capability · Comparison
Full Fine-Tuning vs LoRA
LoRA (Low-Rank Adaptation) has become the default fine-tuning method for LLMs because it's 10-100x cheaper and produces portable adapters. Full fine-tuning still has a place — when you need to deeply change model behavior or when the base model is small enough to fine-tune cheaply. The choice is mostly about cost, quality ceiling, and deployment flexibility.
Side-by-side
| Criterion | Full Fine-Tuning | LoRA |
|---|---|---|
| Parameters updated | All (billions) | ~0.1-1% (low-rank adapters) |
| GPU memory (70B model) | ~1.2TB for training | ~140GB — fits on 2xH100 |
| Training time | Days to weeks | Hours to days |
| Cost | $10,000s per run | $10s-$1,000s per run |
| Quality ceiling | Highest (if data is good) | Slightly lower in theory, close in practice |
| Deployment | Full model weights | Base + small adapter (hot-swappable) |
| Catastrophic forgetting risk | High if not managed | Low — base weights frozen |
| Composable with other adapters | No | Yes — swap or merge multiple LoRAs |
| Use case fit | Deep domain adaptation, style rewrites at scale | Task-specific tuning, multi-tenant adapters |
Verdict
LoRA (and its variants like QLoRA, DoRA) is the default for ~95% of LLM fine-tuning work in 2026. It's cheap, fast, portable, and quality is close to full fine-tuning on most tasks. Full fine-tuning is worth the cost only when you have enough high-quality data to justify it and LoRA genuinely plateaus — in practice, that's rare. For small models (7B-13B) full FT can be reasonable; for 70B+ models, LoRA is almost always the right call.
When to choose each
Choose Full Fine-Tuning if…
- You have massive, high-quality domain data (>100k examples).
- You need to deeply change model behavior, not add a skill.
- Your model is small enough (7B-13B) that full FT is affordable.
- LoRA has been tried and quality plateaued.
Choose LoRA if…
- You want to fine-tune without a dedicated ML platform team.
- You need to ship multiple tenant/task-specific adapters from one base.
- Your data set is small to medium (<50k examples).
- You want to iterate fast and keep the base model unchanged.
Frequently asked questions
How much worse is LoRA than full fine-tuning?
On most tasks, indistinguishable. Published studies show LoRA reaches 95-99% of full-FT quality with 1% of the compute. The gap grows when data is very large and the task is distant from pretraining.
What's QLoRA?
LoRA on top of a quantized (typically 4-bit) base model. Enables fine-tuning 70B-class models on a single consumer GPU. Now the default for cost-conscious fine-tuning.
Can I merge a LoRA back into the base model?
Yes — mathematically it's a matrix multiplication. Most inference servers (vLLM, TGI) support both merged and hot-swap modes.
Sources
- LoRA paper (Hu et al., 2021) — accessed 2026-04-20
- QLoRA paper (Dettmers et al., 2023) — accessed 2026-04-20