Capability · Framework — fine-tuning

Replicate

Replicate hosts thousands of open-source models behind a unified HTTP API. Every model has a stable URL, versioned weights, and pay-per-second billing. Teams use it to ship image, audio, and LLM features without managing GPUs — and publish their own models by wrapping them in Cog.

Framework facts

Category: fine-tuning
Language: Python / HTTP
License: Proprietary (Cog is Apache-2.0)
Repository: https://github.com/replicate/replicate-python

Install

pip install replicate
export REPLICATE_API_TOKEN=r8_...

Quickstart

import replicate

output = replicate.run(
    'black-forest-labs/flux-schnell',
    input={'prompt': 'A retro poster of VSET, Delhi'},
)
print(output)

Alternatives

Modal — serverless GPU for custom code
Fal.ai — low-latency image/audio
Hugging Face Inference Endpoints
Runpod serverless

Frequently asked questions

When is Replicate the right choice?

When you want a hosted HTTP endpoint for popular open-source models, or when you want to publish your own model as a product with zero infra work via Cog.

Does Replicate support fine-tuning?

Yes for many models (e.g. SDXL, Flux LoRAs, Llama). Upload training data, kick off a fine-tune via API, and get a new private model version.

Sources

Replicate — docs — accessed 2026-04-20
Cog — GitHub — accessed 2026-04-20