Creativity · Comparison

In-Context Learning vs Fine-Tuning

Two canonical ways to make an LLM do your task. In-context learning (ICL) stuffs examples into the prompt at inference time — fast to iterate, no training run. Fine-tuning changes the model's weights, usually with LoRA or full fine-tune — slower to set up, cheaper per request, more durable. Pick by how stable and how high-volume your task is.

Side-by-side

Criterion Fine-Tuning In-Context Learning
Where adaptation lives Model weights (persistent) Prompt (ephemeral)
Setup cost Training compute + data curation Prompt engineering only
Inference cost Same as base model (smaller prompts) Higher — long prompts every call
Iteration speed Hours to days per experiment Minutes per experiment
Data requirement Hundreds–thousands of labelled examples A handful of examples
Behaviour stability Very high Depends on prompt; drift-prone
Safety / bias risk Bakes your data into weights Contained per call; easier to audit
Best fit Stable high-volume tasks with style/output constraints Prototyping, low-volume, or rapidly-changing tasks

Verdict

Start with in-context learning — it's cheaper, faster, and tells you whether the task is learnable at all. Move to fine-tuning when three things are true: the task is stable, your prompts are long and expensive, and you have enough clean labelled data to beat a well-crafted prompt. Prompt engineering + evals is almost always the right first move; fine-tuning is an optimisation, not a default.

When to choose each

Choose Fine-Tuning if…

  • Your task is stable and runs at high volume.
  • You have hundreds to thousands of clean labelled examples.
  • Your prompts are long (style guides, tool schemas, rubrics).
  • You care about consistency and latency at scale.

Choose In-Context Learning if…

  • You're prototyping and don't know what the task should even look like.
  • Your task changes every week.
  • You have a handful of examples, not thousands.
  • You value audibility of the exact instructions per call.

Frequently asked questions

Is fine-tuning obsolete with long context windows?

Not quite. Long context helps ICL a lot, but for high-volume tasks, fine-tuning still wins on cost and latency. For task-specific behaviour that needs to be rock-solid — JSON schemas, safety constraints — fine-tuning remains useful.

When should I use LoRA vs full fine-tuning?

LoRA for most adaptation tasks — it's cheap, fast, and easy to serve. Full fine-tune when you're changing the model's core behaviour or creating a derivative model for open-weights distribution.

Which should VSET students learn first?

Prompt engineering and ICL first — they're the foundation. Fine-tuning becomes valuable in year three or four when students have enough labelled data and compute to make it worthwhile.

Sources

  1. OpenAI — fine-tuning guide — accessed 2026-04-20
  2. Anthropic — prompt engineering overview — accessed 2026-04-20