Capability · Framework — fine-tuning

Modal

Modal lets you take any Python function, add a decorator, and run it on managed GPUs (A100, H100, H200, B200) in the cloud with per-second billing. It's widely used for LLM fine-tuning jobs, batch inference, embedding pipelines, and hosting OpenAI-compatible endpoints without writing Dockerfiles or YAML.

Framework facts

Category
fine-tuning
Language
Python
License
Proprietary (open-source client)
Repository
https://github.com/modal-labs/modal-client

Install

pip install modal
modal setup

Quickstart

import modal

app = modal.App('finetune')
image = modal.Image.debian_slim().pip_install('axolotl')

@app.function(image=image, gpu='H100', timeout=60*60*4)
def train():
    import subprocess
    subprocess.run(['accelerate', 'launch', 'train.py'])

@app.local_entrypoint()
def main():
    train.remote()

Alternatives

  • Replicate — model-centric hosting
  • Runpod — GPU marketplace
  • Baseten — model serving platform
  • SkyPilot — multi-cloud launcher

Frequently asked questions

What is Modal best at?

GPU workloads with bursty demand — fine-tuning, batch embeddings, scheduled eval jobs — where per-second billing and fast cold starts matter. Also great for serving FastAPI endpoints backed by GPU inference.

Can I run Modal on my own infra?

No. The runtime is a proprietary managed service. The Python client is open source and you can abstract it behind interfaces if portability matters.

Sources

  1. Modal — docs — accessed 2026-04-20
  2. Modal — GitHub — accessed 2026-04-20