Capability · Framework — fine-tuning

Colossal-AI

Colossal-AI is the deep-learning training system from HPC-AI Tech (Singapore). Like DeepSpeed it focuses on scaling very large models, with its own implementations of tensor parallelism, pipeline parallelism, sequence parallelism, and ZeRO-like sharding. Its Gemini heterogeneous memory manager lets you offload to CPU and NVMe with finer granularity, and Colossal-Chat was one of the first open reproductions of ChatGPT's RLHF pipeline.

Framework facts

Category: fine-tuning
Language: Python / CUDA
License: Apache-2.0
Repository: https://github.com/hpcaitech/ColossalAI

Install

pip install colossalai

Quickstart

import colossalai
from colossalai.booster import Booster
from colossalai.booster.plugin import GeminiPlugin

colossalai.launch_from_torch(config={})
plugin = GeminiPlugin(placement_policy='auto')
booster = Booster(plugin=plugin)
model, optimizer, _, dataloader, _ = booster.boost(model, optimizer, dataloader=dataloader)

Alternatives

DeepSpeed
Megatron-LM
FSDP

Frequently asked questions

Colossal-AI or DeepSpeed?

DeepSpeed has a wider user base and tighter integration with HF Transformers. Colossal-AI often shines in benchmarks on the same hardware for tensor-parallel heavy models and in heterogeneous memory (Gemini) scenarios. Try both on your workload.

Is Colossal-Chat still relevant?

As reference code, yes — it walks through SFT + reward model + PPO. For production RLHF/DPO most teams now use TRL or veRL, which have better maintenance.

Sources

Colossal-AI docs — accessed 2026-04-20
Colossal-AI GitHub — accessed 2026-04-20