Capability · Framework — evals
Argilla
Argilla (now part of Hugging Face) is the open-source UI for building the high-quality datasets that modern LLM work depends on. It supports text classification, span annotation, SFT demonstrations, pairwise preference data for DPO/RLHF, and LLM eval feedback — all versioned in a Postgres backend and exportable as Hugging Face Datasets. Teams use it for both initial dataset creation and continuous production-feedback loops.
Framework facts
- Category
- evals
- Language
- Python / TypeScript
- License
- Apache-2.0
- Repository
- https://github.com/argilla-io/argilla
Install
pip install argilla
# or self-host via Docker
docker pull argilla/argilla-quickstart:latest Quickstart
import argilla as rg
rg.init(api_url='http://localhost:6900', api_key='argilla.apikey')
dataset = rg.Dataset(
name='dpo-pairs',
settings=rg.Settings(fields=[rg.TextField('prompt')], questions=[rg.RatingQuestion('quality', [1,2,3,4,5])]),
)
dataset.create() Alternatives
- Label Studio — broader task types
- Prodigy — scriptable Python
- Scale AI — managed annotation
Frequently asked questions
Argilla or Label Studio?
Label Studio covers more modalities (audio, images, video). Argilla is purpose-built for NLP / LLM workflows and has native support for pairwise preference tasks, LLM-as-judge, and `datasets` export — usually the better fit for LLM teams.
Does Argilla require Hugging Face?
No — you can self-host with Docker and use it fully offline. HF Spaces offers a one-click template if you want a hosted instance.
Sources
- Argilla docs — accessed 2026-04-20
- Argilla GitHub — accessed 2026-04-20