Capability · Framework — fine-tuning

Modular MAX Platform

Modular's MAX platform combines the MAX Engine (a graph-optimised runtime), MAX Serve (an OpenAI-compatible server), and Mojo (Modular's systems language for writing GPU kernels). It aims to let one codebase run on NVIDIA, AMD, Apple, and CPU targets with minimal porting, and ships hundreds of model recipes preconfigured on the MAX Hub.

Framework facts

Category
fine-tuning
Language
Python / Mojo
License
Apache-2.0 (MAX + Mojo core)
Repository
https://github.com/modular/modular

Install

# Use the modular CLI (pixi-based)
curl -ssL https://magic.modular.com | bash
magic global install max

Quickstart

# Pull a recipe from the MAX Hub and serve it as OpenAI-compatible
max-pipelines serve --model-path modularai/Llama-3.1-8B-Instruct-GGUF
# then
curl http://localhost:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"llama-3.1","messages":[{"role":"user","content":"hi"}]}'

Alternatives

  • vLLM — OSS inference engine
  • TensorRT-LLM — NVIDIA-optimised peer
  • TGI — Hugging Face alternative
  • llama.cpp — minimal CPU-first alternative

Frequently asked questions

Is MAX open-source?

As of early 2025 Modular open-sourced the core MAX engine and Mojo standard library under Apache-2.0. Some enterprise tooling remains commercial.

Do I need to learn Mojo?

No for the common case — you serve existing models through Python recipes. Mojo is there when you want to write custom high-performance kernels that run everywhere.

Sources

  1. Modular MAX docs — accessed 2026-04-20
  2. Modular GitHub — accessed 2026-04-20