Skip to content
هذه الصفحة غير متوفرة بلغتك بعد. يتم عرض النسخة الإنجليزية.

How Many Annotators Do You Need?

How to decide annotator count and overlap for an annotation project, balancing agreement, cost, and statistical confidence, with Potato overlap settings.

There is no single right number, but the decision comes down to three levers: how many people label each item (overlap), how clear the task is, and your budget. Clear tasks need little overlap; subjective tasks need more. This guide gives rules of thumb and the settings to implement them.

Overlap vs. coverage

Every annotation budget is split between two goals:

  • Coverage: labeling more distinct items (each once).
  • Overlap: labeling the same items multiple times, which buys you agreement estimates and the ability to aggregate.

You cannot maximize both. A common pattern: fully overlap a subset to measure agreement, then single-annotate the rest once you trust the task.

Rules of thumb

  • Objective tasks (clear categories, high agreement): 1 annotator for most items, with 2–3× overlap on a 5–10% sample to monitor quality.
  • Moderately subjective tasks: 3 annotators per item, resolved by majority vote or MACE.
  • Highly subjective tasks (offense, emotion, preference): 5+ annotators per item, and consider keeping the full label distribution rather than collapsing it.

More annotators reduce the variance of an item's aggregate label, with diminishing returns, going from 1 to 3 helps far more than 7 to 9.

Setting overlap in Potato

Potato's task assignment controls how many annotators see each item and how items are distributed.

yaml
automatic_assignment:
  on: true
  instance_per_annotator: 50     # how many items each person labels
  labels_per_instance: 3         # how many annotators label each item (overlap)

Don't forget quality checks

Headcount doesn't help if some annotators are unreliable. Pair overlap with gold standards and attention checks so you can weight or exclude low-quality work before aggregating.

Further reading