Capability · Framework — fine-tuning

Megatron-LM

Megatron-LM is NVIDIA's research repository for training transformers at trillion-parameter scale. Its papers introduced tensor parallelism (splitting attention heads across GPUs) and pipeline parallelism, and the code is the reference implementation the wider ecosystem benchmarks against. Megatron-Core is the modular library derived from Megatron-LM, used by NeMo, DeepSpeed, and most frontier-scale training stacks.

Framework facts

Category
fine-tuning
Language
Python / CUDA
License
BSD-3-Clause
Repository
https://github.com/NVIDIA/Megatron-LM

Install

git clone https://github.com/NVIDIA/Megatron-LM
pip install -e Megatron-LM
pip install megatron-core

Quickstart

# Train a 345M GPT on a single DGX node
cd Megatron-LM
bash examples/gpt3/train_gpt3_345m_distributed.sh

Alternatives

  • DeepSpeed
  • Colossal-AI
  • NVIDIA NeMo — higher-level wrapper

Frequently asked questions

Megatron-LM or NeMo?

NeMo is NVIDIA's higher-level, supported product that uses Megatron-Core under the hood. If you're fine-tuning or training production models, start with NeMo. Use raw Megatron-LM when you need unreleased research features or full control.

Does Megatron-LM run on non-NVIDIA hardware?

Officially it targets NVIDIA GPUs (CUDA + Apex). Community forks exist for AMD ROCm and TPU but are much less mature.

Sources

  1. Megatron-LM GitHub — accessed 2026-04-20
  2. Megatron-Core docs — accessed 2026-04-20