Capability · Comparison

Full Fine-Tuning vs LoRA

Q: How much worse is LoRA than full fine-tuning?

On most tasks, indistinguishable. Published studies show LoRA reaches 95-99% of full-FT quality with 1% of the compute. The gap grows when data is very large and the task is distant from pretraining.

Q: What's QLoRA?

LoRA on top of a quantized (typically 4-bit) base model. Enables fine-tuning 70B-class models on a single consumer GPU. Now the default for cost-conscious fine-tuning.

Q: Can I merge a LoRA back into the base model?

Yes — mathematically it's a matrix multiplication. Most inference servers (vLLM, TGI) support both merged and hot-swap modes.

LoRA (Low-Rank Adaptation) has become the default fine-tuning method for LLMs because it's 10-100x cheaper and produces portable adapters. Full fine-tuning still has a place — when you need to deeply change model behavior or when the base model is small enough to fine-tune cheaply. The choice is mostly about cost, quality ceiling, and deployment flexibility.

Side-by-side

Criterion	Full Fine-Tuning	LoRA
Parameters updated	All (billions)	~0.1-1% (low-rank adapters)
GPU memory (70B model)	~1.2TB for training	~140GB — fits on 2xH100
Training time	Days to weeks	Hours to days
Cost	$10,000s per run	$10s-$1,000s per run
Quality ceiling	Highest (if data is good)	Slightly lower in theory, close in practice
Deployment	Full model weights	Base + small adapter (hot-swappable)
Catastrophic forgetting risk	High if not managed	Low — base weights frozen
Composable with other adapters	No	Yes — swap or merge multiple LoRAs
Use case fit	Deep domain adaptation, style rewrites at scale	Task-specific tuning, multi-tenant adapters

Verdict

LoRA (and its variants like QLoRA, DoRA) is the default for ~95% of LLM fine-tuning work in 2026. It's cheap, fast, portable, and quality is close to full fine-tuning on most tasks. Full fine-tuning is worth the cost only when you have enough high-quality data to justify it and LoRA genuinely plateaus — in practice, that's rare. For small models (7B-13B) full FT can be reasonable; for 70B+ models, LoRA is almost always the right call.

When to choose each

Choose Full Fine-Tuning if…

You have massive, high-quality domain data (>100k examples).
You need to deeply change model behavior, not add a skill.
Your model is small enough (7B-13B) that full FT is affordable.
LoRA has been tried and quality plateaued.

Choose LoRA if…

You want to fine-tune without a dedicated ML platform team.
You need to ship multiple tenant/task-specific adapters from one base.
Your data set is small to medium (<50k examples).
You want to iterate fast and keep the base model unchanged.

Frequently asked questions

How much worse is LoRA than full fine-tuning?