Capability · Framework — evals

Argilla

Argilla (now part of Hugging Face) is the open-source UI for building the high-quality datasets that modern LLM work depends on. It supports text classification, span annotation, SFT demonstrations, pairwise preference data for DPO/RLHF, and LLM eval feedback — all versioned in a Postgres backend and exportable as Hugging Face Datasets. Teams use it for both initial dataset creation and continuous production-feedback loops.

Framework facts

Category
evals
Language
Python / TypeScript
License
Apache-2.0
Repository
https://github.com/argilla-io/argilla

Install

pip install argilla
# or self-host via Docker
docker pull argilla/argilla-quickstart:latest

Quickstart

import argilla as rg
rg.init(api_url='http://localhost:6900', api_key='argilla.apikey')

dataset = rg.Dataset(
    name='dpo-pairs',
    settings=rg.Settings(fields=[rg.TextField('prompt')], questions=[rg.RatingQuestion('quality', [1,2,3,4,5])]),
)
dataset.create()

Alternatives

  • Label Studio — broader task types
  • Prodigy — scriptable Python
  • Scale AI — managed annotation

Frequently asked questions

Argilla or Label Studio?

Label Studio covers more modalities (audio, images, video). Argilla is purpose-built for NLP / LLM workflows and has native support for pairwise preference tasks, LLM-as-judge, and `datasets` export — usually the better fit for LLM teams.

Does Argilla require Hugging Face?

No — you can self-host with Docker and use it fully offline. HF Spaces offers a one-click template if you want a hosted instance.

Sources

  1. Argilla docs — accessed 2026-04-20
  2. Argilla GitHub — accessed 2026-04-20