Skip to content
هذه الصفحة غير متوفرة بلغتك بعد. يتم عرض النسخة الإنجليزية.

Adjudication and Resolving Disagreement

What to do when annotators disagree, adjudication workflows, aggregation by majority vote, and statistical models like MACE that weight annotators by competence.

Disagreement is normal and informative. Resolving it means turning several annotators' labels into one trusted label, by expert review, by aggregation, or by a statistical model that weights annotators by how reliable they are. Forcing a single answer too early throws away signal about which items are genuinely hard.

Three ways to resolve

  1. Majority vote. Simple and transparent: take the most common label. Works well when annotators are roughly equal and the task is clear, but it treats a careless annotator the same as a careful one.

  2. Expert adjudication. Route disagreed items to an expert who makes the final call. Most accurate, most expensive. Use it for the items that matter and where aggregation is unreliable.

  3. Statistical aggregation. Models like MACE (Multi-Annotator Competence Estimation) infer each annotator's reliability from their pattern of agreement and produce a weighted "best guess" label plus a competence score per annotator. This down-weights spammers automatically without hand-checking every item. See the underlying idea of latent-variable models for crowdsourced labels.

A practical workflow

  • Collect overlapping annotations (several people per item).
  • Aggregate with majority vote or MACE to get a draft label and flag low-agreement items.
  • Send only the flagged items to expert adjudication.
  • Feed what you learn back into the guidelines.

Potato supports an adjudication workflow where a reviewer sees all annotators' labels side by side and records the resolved answer.

When disagreement is the data

For subjective tasks, humor, offense, emotion, persistent disagreement can reflect real differences between people, not error. In those cases, consider keeping the full distribution of labels (sometimes called soft labels or perspectivist annotation) instead of collapsing to one answer. Potato supports capturing distributions rather than forcing consensus.

Further reading