Capability · Framework — fine-tuning

Modular MAX Platform

Modular's MAX platform combines the MAX Engine (a graph-optimised runtime), MAX Serve (an OpenAI-compatible server), and Mojo (Modular's systems language for writing GPU kernels). It aims to let one codebase run on NVIDIA, AMD, Apple, and CPU targets with minimal porting, and ships hundreds of model recipes preconfigured on the MAX Hub.

Framework facts

Category: fine-tuning
Language: Python / Mojo
License: Apache-2.0 (MAX + Mojo core)
Repository: https://github.com/modular/modular

Install

# Use the modular CLI (pixi-based)
curl -ssL https://magic.modular.com | bash
magic global install max

Quickstart

# Pull a recipe from the MAX Hub and serve it as OpenAI-compatible
max-pipelines serve --model-path modularai/Llama-3.1-8B-Instruct-GGUF
# then
curl http://localhost:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"llama-3.1","messages":[{"role":"user","content":"hi"}]}'

Alternatives

vLLM — OSS inference engine
TensorRT-LLM — NVIDIA-optimised peer
TGI — Hugging Face alternative
llama.cpp — minimal CPU-first alternative

Frequently asked questions

Is MAX open-source?

As of early 2025 Modular open-sourced the core MAX engine and Mojo standard library under Apache-2.0. Some enterprise tooling remains commercial.

Do I need to learn Mojo?

No for the common case — you serve existing models through Python recipes. Mojo is there when you want to write custom high-performance kernels that run everywhere.

Sources

Modular MAX docs — accessed 2026-04-20
Modular GitHub — accessed 2026-04-20