Note: This post describes Potato 2.2 as it was at release. Some configuration keys and features have been updated in later versions. See the current documentation for up-to-date configuration syntax.

Potato 2.2.0 is out, and it widens both what you can annotate and how you keep the quality up. It adds 9 new annotation schemas, a pluggable export system, MACE competence estimation, 55 validated survey instruments, and remote data sources.

New annotation schemas

Event annotation

The biggest schema addition in 2.2 is N-ary event annotation. An event has a trigger span (the word that signals the event) and argument spans with typed semantic roles. A hub-and-spoke arc visualization connects each trigger to its arguments.

yaml

annotation_schemes:
  - annotation_type: event_annotation
    name: events
    span_schema: entities
    event_types:
      - type: "ATTACK"
        trigger_labels: ["EVENT_TRIGGER"]
        arguments:
          - role: "attacker"
            entity_types: ["PERSON", "ORGANIZATION"]
            required: true
          - role: "target"
            entity_types: ["PERSON", "ORGANIZATION", "LOCATION"]
            required: true

This covers information extraction, semantic role labeling, and knowledge graph construction, all of which used to need custom tooling.

Read the event annotation documentation →

Entity linking

Span annotations can now point at external knowledge bases. An annotator highlights text, assigns a label, then uses a search modal to find and link the matching Wikidata, UMLS, or custom KB entity.

yaml

annotation_schemes:
  - annotation_type: span
    name: ner
    labels: [PERSON, ORGANIZATION, LOCATION]
    entity_linking:
      enabled: true
      knowledge_bases:
        - name: wikidata
          type: wikidata
          language: en

It also handles multi-select mode for ambiguous entities and lets you wire up several knowledge bases in one task.

Read the entity linking documentation →

Triage, pairwise, coreference, and more

Six more annotation types fill out the v2.2 additions:

Triage gives you an accept/reject/skip interface for screening data fast, with auto-advance and keyboard shortcuts
Pairwise comparison offers a binary A/B choice or a scale slider for preference learning and RLHF data
Conversation trees support hierarchical tree annotation with per-node ratings and path selection
Coreference chains let you group coreferring mentions, with visual indicators showing the chains
Segmentation masks add fill, eraser, and brush tools for pixel-level image annotation
Discontinuous spans (allow_discontinuous: true) handle non-contiguous text selections

Smarter annotation

MACE competence estimation

MACE runs a Variational Bayes EM algorithm to estimate the true labels and each annotator's competence (a score from 0.0 to 1.0) at the same time. It flags reliable annotators, catches spammers, and produces better predicted labels.

yaml

mace:
  enabled: true
  trigger_every_n: 10
  min_annotations_per_item: 3

It runs in the background on its own and hooks into the admin dashboard and the adjudication system.

Read the MACE documentation →

Option highlighting

This new AI feature reads the content and highlights the options most likely to be correct on discrete tasks. The top-k options show at full opacity with a star next to them; the rest are dimmed.

yaml

ai_support:
  option_highlighting:
    enabled: true
    top_k: 3
    dim_opacity: 0.4

Read the option highlighting documentation →

Diversity ordering

Sentence-transformer embeddings group similar items into clusters, then round-robin sampling pulls items from different clusters in turn. Annotators see more variety, which keeps them fresh and gives you better coverage of the topic space.

yaml

assignment_strategy: diversity_clustering
diversity_ordering:
  enabled: true
  prefill_count: 100

Read the diversity ordering documentation →

Export system

The new export CLI (python -m potato.export) converts annotations to 6 standard formats in one command:

bash

python -m potato.export --config config.yaml --format coco --output ./export/
python -m potato.export --config config.yaml --format yolo --output ./export/
python -m potato.export --config config.yaml --format conll_2003 --output ./export/

Supported formats: COCO, YOLO, Pascal VOC, CoNLL-2003, CoNLL-U, and segmentation masks. If you need a format that is not in the list, subclass BaseExporter and write your own.

Read the export formats documentation →

Remote data sources

Load annotation data from URLs, S3, Google Drive, Dropbox, Hugging Face datasets, Google Sheets, and SQL databases:

yaml

data_sources:
  - type: huggingface
    dataset: "squad"
    split: "train"
 
  - type: s3
    bucket: "my-annotation-data"
    key: "datasets/items.jsonl"

It also handles partial and incremental loading for large datasets, caches data locally, and keeps credentials in environment variables rather than in your config.

Read the remote data sources documentation →

Survey instruments

A library of 55 validated questionnaires you can drop into prestudy and poststudy phases:

yaml

phases:
  prestudy:
    type: prestudy
    instrument: "tipi"      # 10-item personality questionnaire
 
  poststudy:
    type: poststudy
    instrument: "phq-9"     # 9-item depression screening

They span 8 categories: personality (BFI-2, TIPI), mental health (PHQ-9, GAD-7), affect (PANAS), self-concept (RSE), social attitudes (SDO-7, MFQ), response style, short-form versions, and demographic batteries from major surveys (ANES, GSS, ESS).

Read the survey instruments documentation →

Smaller fixes

Video object tracking with keyframe interpolation
Bounding box annotation on PDF pages
Support for an external AI config file
Form layout grid improvements

Upgrading to v2.2

bash

pip install --upgrade potato-annotation

Your v2.0 and v2.1 configs keep working unchanged. Everything new is opt-in through extra config blocks.

Getting started

What's New, the full v2.2 feature overview
Event Annotation, N-ary event structures
Entity Linking, knowledge base linking
MACE, annotator competence estimation
Export Formats, the export CLI
Survey Instruments, 55 validated questionnaires

For the full changelog, including any config keys that changed, see the v2.2.0 release notes in the repository.

Have questions or feedback? Join our Discord or open an issue on GitHub.