Capability · Framework — fine-tuning
torchtune
torchtune is PyTorch's answer to the fine-tuning ecosystem: a native library that uses standard PyTorch patterns (no Transformers wrapper) with readable 'recipes' for each training setup. It's preferred by teams who want to understand and modify their training code without fighting framework abstractions, and by researchers who need a clean base for novel methods.
Framework facts
- Category
- fine-tuning
- Language
- Python / PyTorch
- License
- BSD-3-Clause
- Repository
- https://github.com/pytorch/torchtune
Install
pip install torchtune Quickstart
# Download model
tune download meta-llama/Llama-3.1-8B --output-dir ./llama3
# Run LoRA recipe with custom config
tune run lora_finetune_single_device \
--config llama3/8B_lora_single_device \
dataset.source=my-jsonl-dataset Alternatives
- TRL — Hugging Face alternative
- Axolotl — config-driven, HF-based
- Unsloth — fastest single-GPU
- LLaMA-Factory — GUI-based config tuner
Frequently asked questions
Why pick torchtune over TRL?
torchtune has a smaller surface area and avoids the Transformers dependency tree. If you value code clarity or plan to modify training internals, torchtune is easier to fork. TRL has broader model coverage and algorithm support.
Does it scale to multi-node?
Yes — torchtune supports FSDP and tensor parallelism out of the box using PyTorch's native distributed primitives, which many practitioners find simpler to debug than Accelerate/DeepSpeed stacks.
Sources
- torchtune — docs — accessed 2026-04-20
- torchtune on GitHub — accessed 2026-04-20