Capability · Framework — fine-tuning
Megatron-LM
Megatron-LM is NVIDIA's research repository for training transformers at trillion-parameter scale. Its papers introduced tensor parallelism (splitting attention heads across GPUs) and pipeline parallelism, and the code is the reference implementation the wider ecosystem benchmarks against. Megatron-Core is the modular library derived from Megatron-LM, used by NeMo, DeepSpeed, and most frontier-scale training stacks.
Framework facts
- Category
- fine-tuning
- Language
- Python / CUDA
- License
- BSD-3-Clause
- Repository
- https://github.com/NVIDIA/Megatron-LM
Install
git clone https://github.com/NVIDIA/Megatron-LM
pip install -e Megatron-LM
pip install megatron-core Quickstart
# Train a 345M GPT on a single DGX node
cd Megatron-LM
bash examples/gpt3/train_gpt3_345m_distributed.sh Alternatives
- DeepSpeed
- Colossal-AI
- NVIDIA NeMo — higher-level wrapper
Frequently asked questions
Megatron-LM or NeMo?
NeMo is NVIDIA's higher-level, supported product that uses Megatron-Core under the hood. If you're fine-tuning or training production models, start with NeMo. Use raw Megatron-LM when you need unreleased research features or full control.
Does Megatron-LM run on non-NVIDIA hardware?
Officially it targets NVIDIA GPUs (CUDA + Apex). Community forks exist for AMD ROCm and TPU but are much less mature.
Sources
- Megatron-LM GitHub — accessed 2026-04-20
- Megatron-Core docs — accessed 2026-04-20