Potato 2.2: Events, Entity Linking, Export, and 55 Survey Instruments
Potato 2.2.0 adds 9 new annotation schemas, a pluggable export system, MACE competence estimation, 55 validated survey instruments, and remote data sources.
Note: This post describes Potato 2.2 as it was at release. Some configuration keys and features have been updated in later versions. See the current documentation for up-to-date configuration syntax.
Potato 2.2.0 is out, and it widens both what you can annotate and how you keep the quality up. It adds 9 new annotation schemas, a pluggable export system, MACE competence estimation, 55 validated survey instruments, and remote data sources.
New annotation schemas
Event annotation
The biggest schema addition in 2.2 is N-ary event annotation. An event has a trigger span (the word that signals the event) and argument spans with typed semantic roles. A hub-and-spoke arc visualization connects each trigger to its arguments.
annotation_schemes:
- annotation_type: event_annotation
name: events
span_schema: entities
event_types:
- type: "ATTACK"
trigger_labels: ["EVENT_TRIGGER"]
arguments:
- role: "attacker"
entity_types: ["PERSON", "ORGANIZATION"]
required: true
- role: "target"
entity_types: ["PERSON", "ORGANIZATION", "LOCATION"]
required: trueThis covers information extraction, semantic role labeling, and knowledge graph construction, all of which used to need custom tooling.
Read the event annotation documentation →
Entity linking
Span annotations can now point at external knowledge bases. An annotator highlights text, assigns a label, then uses a search modal to find and link the matching Wikidata, UMLS, or custom KB entity.
annotation_schemes:
- annotation_type: span
name: ner
labels: [PERSON, ORGANIZATION, LOCATION]
entity_linking:
enabled: true
knowledge_bases:
- name: wikidata
type: wikidata
language: enIt also handles multi-select mode for ambiguous entities and lets you wire up several knowledge bases in one task.
Read the entity linking documentation →
Triage, pairwise, coreference, and more
Six more annotation types fill out the v2.2 additions:
- Triage gives you an accept/reject/skip interface for screening data fast, with auto-advance and keyboard shortcuts
- Pairwise comparison offers a binary A/B choice or a scale slider for preference learning and RLHF data
- Conversation trees support hierarchical tree annotation with per-node ratings and path selection
- Coreference chains let you group coreferring mentions, with visual indicators showing the chains
- Segmentation masks add fill, eraser, and brush tools for pixel-level image annotation
- Discontinuous spans (
allow_discontinuous: true) handle non-contiguous text selections
Smarter annotation
MACE competence estimation
MACE runs a Variational Bayes EM algorithm to estimate the true labels and each annotator's competence (a score from 0.0 to 1.0) at the same time. It flags reliable annotators, catches spammers, and produces better predicted labels.
mace:
enabled: true
trigger_every_n: 10
min_annotations_per_item: 3It runs in the background on its own and hooks into the admin dashboard and the adjudication system.
Option highlighting
This new AI feature reads the content and highlights the options most likely to be correct on discrete tasks. The top-k options show at full opacity with a star next to them; the rest are dimmed.
ai_support:
option_highlighting:
enabled: true
top_k: 3
dim_opacity: 0.4Read the option highlighting documentation →
Diversity ordering
Sentence-transformer embeddings group similar items into clusters, then round-robin sampling pulls items from different clusters in turn. Annotators see more variety, which keeps them fresh and gives you better coverage of the topic space.
assignment_strategy: diversity_clustering
diversity_ordering:
enabled: true
prefill_count: 100Read the diversity ordering documentation →
Export system
The new export CLI (python -m potato.export) converts annotations to 6 standard formats in one command:
python -m potato.export --config config.yaml --format coco --output ./export/
python -m potato.export --config config.yaml --format yolo --output ./export/
python -m potato.export --config config.yaml --format conll_2003 --output ./export/Supported formats: COCO, YOLO, Pascal VOC, CoNLL-2003, CoNLL-U, and segmentation masks. If you need a format that is not in the list, subclass BaseExporter and write your own.
Read the export formats documentation →
Remote data sources
Load annotation data from URLs, S3, Google Drive, Dropbox, Hugging Face datasets, Google Sheets, and SQL databases:
data_sources:
- type: huggingface
dataset: "squad"
split: "train"
- type: s3
bucket: "my-annotation-data"
key: "datasets/items.jsonl"It also handles partial and incremental loading for large datasets, caches data locally, and keeps credentials in environment variables rather than in your config.
Read the remote data sources documentation →
Survey instruments
A library of 55 validated questionnaires you can drop into prestudy and poststudy phases:
phases:
prestudy:
type: prestudy
instrument: "tipi" # 10-item personality questionnaire
poststudy:
type: poststudy
instrument: "phq-9" # 9-item depression screeningThey span 8 categories: personality (BFI-2, TIPI), mental health (PHQ-9, GAD-7), affect (PANAS), self-concept (RSE), social attitudes (SDO-7, MFQ), response style, short-form versions, and demographic batteries from major surveys (ANES, GSS, ESS).
Read the survey instruments documentation →
Smaller fixes
- Video object tracking with keyframe interpolation
- Bounding box annotation on PDF pages
- Support for an external AI config file
- Form layout grid improvements
Upgrading to v2.2
pip install --upgrade potato-annotationYour v2.0 and v2.1 configs keep working unchanged. Everything new is opt-in through extra config blocks.
Getting started
- What's New, the full v2.2 feature overview
- Event Annotation, N-ary event structures
- Entity Linking, knowledge base linking
- MACE, annotator competence estimation
- Export Formats, the export CLI
- Survey Instruments, 55 validated questionnaires
For the full changelog, including any config keys that changed, see the v2.2.0 release notes in the repository.
Have questions or feedback? Join our Discord or open an issue on GitHub.