Gold Standards and Attention Checks
How to use gold-standard items and attention checks to catch low-quality annotators and keep a project calibrated, with Potato configuration.
Gold standards and attention checks are items where you already know the correct answer. Mixing them into the stream lets you measure each annotator's accuracy and catch people who are rushing, confused, or gaming the task. They are the front line of annotation quality control, especially in crowdsourcing.
Gold standards
A gold-standard item has an expert-verified answer. Sprinkle them in and compare each annotator's response against the known answer to get a per-person accuracy score. Gold items can be silent (used only for scoring) or give immediate feedback (used for training).
gold_standards:
enabled: true
items_file: "gold_standards.json"
mode: mixed # silent scoring + occasional feedback
frequency: 20 # roughly one gold item per 20Build your gold set from the unambiguous cases your guidelines settled. Don't use genuinely ambiguous items as gold, you'll punish good annotators for reasonable choices.
Attention checks
An attention check is an item with an obvious, instruction-embedded answer ("Select 'Disagree' for this item"). It catches annotators who aren't reading. Potato can also flag suspicious timing, answers submitted faster than a human could read.
attention_checks:
enabled: true
items_file: "attention_checks.json"
frequency: 10Using the signal
- Set a passing accuracy threshold. Annotators below it can be retrained or excluded.
- Combine with a training phase. Require a passing score on gold items before live work begins.
- Don't over-check. Too many checks annoy good annotators and inflate cost. A small, steady rate is enough.
For estimating annotator competence and inferring labels from disagreement statistically, see Adjudication and Disagreement and Potato's MACE support.