Heterogeneous Annotator Coverage
Assign different numbers of annotators to different items. Configure a default cap, a stratified overlap sample for quality monitoring, adaptive disagreement boosts, per-annotator quotas, and automatic adjudication routing.
Heterogeneous coverage lets you assign different numbers of annotators to different items instead of a uniform cap. The common research design is one annotator on most items, with two or three overlapping on a 5–10% sample to monitor quality. Potato expresses that through the num_annotators_per_item and per_annotator_quota config blocks.
Per-item annotator caps
num_annotators_per_item is the canonical key. It accepts a single integer for a uniform cap, or a structured mapping with a default, an overlap sample, and an optional adaptive boost:
num_annotators_per_item:
default: 1
overlap_sample:
fraction: 0.1
count: 3
stratify_by: domain
seed: 42
adaptive:
enabled: true
disagreement_threshold: 0.5
boost_to: 3
min: 1max_annotations_per_item is now a deprecated alias for num_annotators_per_item: <int>.
Overlap sample
The overlap_sample block raises the cap on a deterministic subset of items for quality monitoring. Sampling happens once at startup, and the chosen items are stamped with required_annotations so the assignment logic treats them as high-coverage.
| Field | Type | Description |
|---|---|---|
fraction | float in (0, 1] | proportion of items to sample |
count | int ≥ 2 | annotator cap for sampled items (must exceed default) |
stratify_by | string (optional) | item-data field used to stratify the sample |
seed | int (optional) | RNG seed; defaults to the global random_seed |
When stratify_by is set, the fraction is applied per stratum, so every category contributes proportionally.
Adaptive boost
Adaptive boost expands the cap on an item whose early annotators disagreed. Once an item has at least two annotations and its disagreement score crosses disagreement_threshold, its cap is raised to boost_to and the item re-enters the assignment queue. The boost is one-shot per item.
Per-annotator quota
per_annotator_quota controls how many items each annotator is assigned, independent of per-item caps:
per_annotator_quota:
default: 100
by_user:
alice: 30
by_user_role:
expert: 30
novice: 200
user_roles:
alice: expert
carol: noviceResolution order: by_user[uid] → by_user_role[user_roles[uid]] → default.
Adjudication auto-routing
When the adjudication block is enabled, overlap-sample items that reach their cap are scored automatically and pushed into the adjudication queue if agreement falls below agreement_threshold. Low-quality items surface as soon as the sample saturates, rather than when an adjudicator manually rebuilds the queue.
adjudication:
enabled: true
adjudicator_users: [admin]
min_annotations: 2
agreement_threshold: 0.75Inspecting agreement
Once overlap-sample items saturate, agreement statistics are available at /admin/iaa, which computes the metric set appropriate to each schema's annotation_type — for example Cohen's and Fleiss' kappa for nominal schemes, weighted kappa for ordinal ones, and token-level kappa plus span F1 for spans. See the inter-annotator agreement guide for what these metrics mean.
Example
A runnable demonstration lives at examples/advanced/heterogeneous-coverage/. From the repository root:
python potato/flask_server.py start examples/advanced/heterogeneous-coverage/config.yaml -p 8000It uses 20 items across two domains, samples 20% for 3-annotator overlap stratified by domain, enables an adaptive boost at threshold 0.5, defines two expertise tiers, and routes low-agreement items into adjudication.
Related
- Task Assignment — assignment strategies
- Inter-annotator agreement guide — the metrics behind
/admin/iaa - Crowdsourcing — MTurk and Prolific integration
For implementation details, see the source documentation.