Heterogeneous Annotator Coverage

Assign different numbers of annotators to different items. Configure a default cap, a stratified overlap sample for quality monitoring, adaptive disagreement boosts, per-annotator quotas, and automatic adjudication routing.

Heterogeneous coverage lets you assign different numbers of annotators to different items instead of a uniform cap. The common research design is one annotator on most items, with two or three overlapping on a 5–10% sample to monitor quality. Potato expresses that through the num_annotators_per_item and per_annotator_quota config blocks.

Per-item annotator caps

num_annotators_per_item is the canonical key. It accepts a single integer for a uniform cap, or a structured mapping with a default, an overlap sample, and an optional adaptive boost:

yaml

num_annotators_per_item:
  default: 1
  overlap_sample:
    fraction: 0.1
    count: 3
    stratify_by: domain
    seed: 42
  adaptive:
    enabled: true
    disagreement_threshold: 0.5
    boost_to: 3
  min: 1

max_annotations_per_item is now a deprecated alias for num_annotators_per_item: <int>.

Overlap sample

The overlap_sample block raises the cap on a deterministic subset of items for quality monitoring. Sampling happens once at startup, and the chosen items are stamped with required_annotations so the assignment logic treats them as high-coverage.

Field	Type	Description
`fraction`	float in (0, 1]	proportion of items to sample
`count`	int ≥ 2	annotator cap for sampled items (must exceed `default`)
`stratify_by`	string (optional)	item-data field used to stratify the sample
`seed`	int (optional)	RNG seed; defaults to the global `random_seed`

When stratify_by is set, the fraction is applied per stratum, so every category contributes proportionally.

Adaptive boost

Adaptive boost expands the cap on an item whose early annotators disagreed. Once an item has at least two annotations and its disagreement score crosses disagreement_threshold, its cap is raised to boost_to and the item re-enters the assignment queue. The boost is one-shot per item.

Per-annotator quota

per_annotator_quota controls how many items each annotator is assigned, independent of per-item caps:

yaml

per_annotator_quota:
  default: 100
  by_user:
    alice: 30
  by_user_role:
    expert: 30
    novice: 200
 
user_roles:
  alice: expert
  carol: novice

Resolution order: by_user[uid] → by_user_role[user_roles[uid]] → default.

Adjudication auto-routing

When the adjudication block is enabled, overlap-sample items that reach their cap are scored automatically and pushed into the adjudication queue if agreement falls below agreement_threshold. Low-quality items surface as soon as the sample saturates, rather than when an adjudicator manually rebuilds the queue.

yaml

adjudication:
  enabled: true
  adjudicator_users: [admin]
  min_annotations: 2
  agreement_threshold: 0.75

Inspecting agreement

Once overlap-sample items saturate, agreement statistics are available at /admin/iaa, which computes the metric set appropriate to each schema's annotation_type — for example Cohen's and Fleiss' kappa for nominal schemes, weighted kappa for ordinal ones, and token-level kappa plus span F1 for spans. See the inter-annotator agreement guide for what these metrics mean.

Example

A runnable demonstration lives at examples/advanced/heterogeneous-coverage/. From the repository root:

bash

python potato/flask_server.py start examples/advanced/heterogeneous-coverage/config.yaml -p 8000

It uses 20 items across two domains, samples 20% for 3-annotator overlap stratified by domain, enables an adaptive boost at threshold 0.5, defines two expertise tiers, and routes low-agreement items into adjudication.

Task Assignment — assignment strategies
Inter-annotator agreement guide — the metrics behind /admin/iaa
Crowdsourcing — MTurk and Prolific integration

For implementation details, see the source documentation.

Heterogeneous Annotator Coverage

Per-item annotator caps

Overlap sample

Adaptive boost

Per-annotator quota

Adjudication auto-routing

Inspecting agreement

Example

Related