Capability · Framework — fine-tuning
Ollama
Ollama makes running open-weight LLMs on a laptop or workstation as simple as docker run. It packages models, quantisation, and an OpenAI-compatible HTTP server into a single binary for macOS, Linux, and Windows. It supports a growing catalogue of models (Llama, Qwen, Mistral, Gemma, Phi, DeepSeek, Mixtral, and many more), GPU acceleration on Apple Silicon / NVIDIA / AMD, and has become the de-facto local inference runtime for LangChain, LlamaIndex, Continue.dev, and similar tools.
Framework facts
- Category
- fine-tuning
- Language
- Go
- License
- MIT
- Repository
- https://github.com/ollama/ollama
Install
# macOS (Homebrew)
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh Quickstart
# in one shell
ollama serve
# in another
ollama pull llama3.1:8b
ollama run llama3.1:8b 'Capital of India?'
# OpenAI-compatible API at http://localhost:11434/v1 Alternatives
- LM Studio — desktop GUI alternative
- llama.cpp — lower-level C/C++ runtime Ollama builds on
- vLLM — server-grade high-throughput alternative
- Jan — open-source desktop chat app + server
Frequently asked questions
Is Ollama production-ready?
Ollama is excellent for local development, laptops, desktops, and small internal deployments. For production multi-GPU serving with high QPS, pair it with vLLM, TGI, or SGLang instead — those are built for server workloads and much higher throughput.
Does Ollama work with LangChain / LlamaIndex?
Yes. Both frameworks ship first-class Ollama integrations. Ollama also exposes an OpenAI-compatible /v1 endpoint, so anything that speaks the OpenAI API can talk to it with a base-URL change.
Sources
- Ollama — docs — accessed 2026-04-20
- Ollama — GitHub — accessed 2026-04-20