Capability · Framework — fine-tuning
Modular MAX Platform
Modular's MAX platform combines the MAX Engine (a graph-optimised runtime), MAX Serve (an OpenAI-compatible server), and Mojo (Modular's systems language for writing GPU kernels). It aims to let one codebase run on NVIDIA, AMD, Apple, and CPU targets with minimal porting, and ships hundreds of model recipes preconfigured on the MAX Hub.
Framework facts
- Category
- fine-tuning
- Language
- Python / Mojo
- License
- Apache-2.0 (MAX + Mojo core)
- Repository
- https://github.com/modular/modular
Install
# Use the modular CLI (pixi-based)
curl -ssL https://magic.modular.com | bash
magic global install max Quickstart
# Pull a recipe from the MAX Hub and serve it as OpenAI-compatible
max-pipelines serve --model-path modularai/Llama-3.1-8B-Instruct-GGUF
# then
curl http://localhost:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"llama-3.1","messages":[{"role":"user","content":"hi"}]}' Alternatives
- vLLM — OSS inference engine
- TensorRT-LLM — NVIDIA-optimised peer
- TGI — Hugging Face alternative
- llama.cpp — minimal CPU-first alternative
Frequently asked questions
Is MAX open-source?
As of early 2025 Modular open-sourced the core MAX engine and Mojo standard library under Apache-2.0. Some enterprise tooling remains commercial.
Do I need to learn Mojo?
No for the common case — you serve existing models through Python recipes. Mojo is there when you want to write custom high-performance kernels that run everywhere.
Sources
- Modular MAX docs — accessed 2026-04-20
- Modular GitHub — accessed 2026-04-20