Skip to content

Heterogeneous Annotator Coverage

Assign different numbers of annotators to different items. Configure a default cap, a stratified overlap sample for quality monitoring, adaptive disagreement boosts, per-annotator quotas, and automatic adjudication routing.

Heterogeneous coverage lets you assign different numbers of annotators to different items instead of a uniform cap. The common research design is one annotator on most items, with two or three overlapping on a 5–10% sample to monitor quality. Potato expresses that through the num_annotators_per_item and per_annotator_quota config blocks.

Per-item annotator caps

num_annotators_per_item is the canonical key. It accepts a single integer for a uniform cap, or a structured mapping with a default, an overlap sample, and an optional adaptive boost:

yaml
num_annotators_per_item:
  default: 1
  overlap_sample:
    fraction: 0.1
    count: 3
    stratify_by: domain
    seed: 42
  adaptive:
    enabled: true
    disagreement_threshold: 0.5
    boost_to: 3
  min: 1

max_annotations_per_item is now a deprecated alias for num_annotators_per_item: <int>.

Overlap sample

The overlap_sample block raises the cap on a deterministic subset of items for quality monitoring. Sampling happens once at startup, and the chosen items are stamped with required_annotations so the assignment logic treats them as high-coverage.

FieldTypeDescription
fractionfloat in (0, 1]proportion of items to sample
countint ≥ 2annotator cap for sampled items (must exceed default)
stratify_bystring (optional)item-data field used to stratify the sample
seedint (optional)RNG seed; defaults to the global random_seed

When stratify_by is set, the fraction is applied per stratum, so every category contributes proportionally.

Adaptive boost

Adaptive boost expands the cap on an item whose early annotators disagreed. Once an item has at least two annotations and its disagreement score crosses disagreement_threshold, its cap is raised to boost_to and the item re-enters the assignment queue. The boost is one-shot per item.

Per-annotator quota

per_annotator_quota controls how many items each annotator is assigned, independent of per-item caps:

yaml
per_annotator_quota:
  default: 100
  by_user:
    alice: 30
  by_user_role:
    expert: 30
    novice: 200
 
user_roles:
  alice: expert
  carol: novice

Resolution order: by_user[uid]by_user_role[user_roles[uid]]default.

Adjudication auto-routing

When the adjudication block is enabled, overlap-sample items that reach their cap are scored automatically and pushed into the adjudication queue if agreement falls below agreement_threshold. Low-quality items surface as soon as the sample saturates, rather than when an adjudicator manually rebuilds the queue.

yaml
adjudication:
  enabled: true
  adjudicator_users: [admin]
  min_annotations: 2
  agreement_threshold: 0.75

Inspecting agreement

Once overlap-sample items saturate, agreement statistics are available at /admin/iaa, which computes the metric set appropriate to each schema's annotation_type — for example Cohen's and Fleiss' kappa for nominal schemes, weighted kappa for ordinal ones, and token-level kappa plus span F1 for spans. See the inter-annotator agreement guide for what these metrics mean.

Example

A runnable demonstration lives at examples/advanced/heterogeneous-coverage/. From the repository root:

bash
python potato/flask_server.py start examples/advanced/heterogeneous-coverage/config.yaml -p 8000

It uses 20 items across two domains, samples 20% for 3-annotator overlap stratified by domain, enables an adaptive boost at threshold 0.5, defines two expertise tiers, and routes low-agreement items into adjudication.

For implementation details, see the source documentation.