Capability · Framework — fine-tuning

Ollama

Ollama makes running open-weight LLMs on a laptop or workstation as simple as docker run. It packages models, quantisation, and an OpenAI-compatible HTTP server into a single binary for macOS, Linux, and Windows. It supports a growing catalogue of models (Llama, Qwen, Mistral, Gemma, Phi, DeepSeek, Mixtral, and many more), GPU acceleration on Apple Silicon / NVIDIA / AMD, and has become the de-facto local inference runtime for LangChain, LlamaIndex, Continue.dev, and similar tools.

Framework facts

Category
fine-tuning
Language
Go
License
MIT
Repository
https://github.com/ollama/ollama

Install

# macOS (Homebrew)
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh

Quickstart

# in one shell
ollama serve

# in another
ollama pull llama3.1:8b
ollama run llama3.1:8b 'Capital of India?'

# OpenAI-compatible API at http://localhost:11434/v1

Alternatives

  • LM Studio — desktop GUI alternative
  • llama.cpp — lower-level C/C++ runtime Ollama builds on
  • vLLM — server-grade high-throughput alternative
  • Jan — open-source desktop chat app + server

Frequently asked questions

Is Ollama production-ready?

Ollama is excellent for local development, laptops, desktops, and small internal deployments. For production multi-GPU serving with high QPS, pair it with vLLM, TGI, or SGLang instead — those are built for server workloads and much higher throughput.

Does Ollama work with LangChain / LlamaIndex?

Yes. Both frameworks ship first-class Ollama integrations. Ollama also exposes an OpenAI-compatible /v1 endpoint, so anything that speaks the OpenAI API can talk to it with a base-URL change.

Sources

  1. Ollama — docs — accessed 2026-04-20
  2. Ollama — GitHub — accessed 2026-04-20