Open-Source Annotation Tools Compared
An honest comparison of open-source data annotation tools, Potato, Label Studio, Prodigy, Doccano, brat, and Argilla, and how to choose between them.
There is no single best annotation tool, the right choice depends on your modalities, your budget, whether you need agent/LLM evaluation, and how much setup you can tolerate. This guide compares the main open-source options fairly so you can match one to your project.
The options at a glance
| Tool | License | Strengths | Best when |
|---|---|---|---|
| Potato | Free, open-source (research) | 30+ task types across text/image/audio/video, agent & LLM evaluation, zero-code YAML, built-in agreement metrics | Research, agent/LLM eval, fast setup without code |
| Label Studio | Open-source + paid tiers | Broad modality support, polished UI, large ecosystem | Teams wanting a commercial-backed platform |
| Prodigy | Paid (commercial) | Scriptable, active-learning-first, tight spaCy integration | spaCy users comfortable with a paid, code-driven tool |
| Doccano | Open-source | Simple, clean, easy to self-host | Straightforward text classification and NER |
| brat | Open-source | Mature rich text/relation annotation | Linguistic annotation of entities and relations |
| Argilla | Open-source | LLM-data focus, Hugging Face integration | Feedback/RLHF data collection in the HF stack |
(Details change over time, check each project for current licensing and features.)
How to choose
- What are you annotating? For text-only NER, Doccano or brat are simple. For mixed text/image/audio/video, Potato and Label Studio cover the range.
- Do you need agent or LLM evaluation? This is where Potato is unusual: it reads agent traces in many formats and has purpose-built tools for trajectory, process reward, web-agent, and coding-agent evaluation. Most general tools don't.
- Budget. Potato, Label Studio (core), Doccano, brat, and Argilla are free and open-source; Prodigy and some Label Studio tiers are paid.
- Setup effort. Potato is configured with a YAML file and needs no code; Prodigy is code-first; the others sit in between.
- Ecosystem. Prodigy pairs with spaCy; Argilla with Hugging Face; Potato exports to many ML formats including CoNLL, spaCy, Hugging Face, and COCO/YOLO.
Where Potato fits
Potato came out of academic NLP (it was presented at EMNLP 2022 and HCOMP 2024) and is built for the full research workflow: many task types, quality control and agreement metrics in the box, crowdsourcing integrations, and, more recently, a deep set of AI-agent evaluation tools. If your work spans several modalities or includes evaluating LLMs and agents, it's worth a look.
If you mainly need a single text task with a hosted commercial product, or you live entirely inside spaCy or Hugging Face, one of the others may suit you better. Pick the tool that fits the work.