Potato 2.2: Events, Entity Linking, Export, and 55 Survey Instruments
Potato 2.2.0 adds 9 new annotation schemas, a pluggable export system, MACE competence estimation, 55 validated survey instruments, and remote data sources.
Potato 2.2: Events, Entity Linking, Export, and 55 Survey Instruments
We're excited to announce Potato 2.2.0, a major feature release that significantly expands what you can annotate and how you manage annotation quality. This update adds 9 new annotation schemas, a pluggable export system, MACE competence estimation, 55 validated survey instruments, and remote data sources.
New Annotation Schemas
Event Annotation
The headline annotation feature of v2.2 is N-ary event annotation. Events consist of a trigger span (the word indicating the event) and argument spans with typed semantic roles. A hub-spoke arc visualization connects triggers to their arguments.
annotation_schemes:
- annotation_type: event_annotation
name: events
span_schema: entities
event_types:
- type: "ATTACK"
trigger_labels: ["EVENT_TRIGGER"]
arguments:
- role: "attacker"
entity_types: ["PERSON", "ORGANIZATION"]
required: true
- role: "target"
entity_types: ["PERSON", "ORGANIZATION", "LOCATION"]
required: trueThis opens up information extraction, semantic role labeling, and knowledge graph construction tasks that previously required custom tooling.
Read the Event Annotation documentation →
Entity Linking
Span annotations can now be linked to external knowledge bases. Annotators highlight text, assign a label, then use a search modal to find and link the matching Wikidata, UMLS, or custom KB entity.
annotation_schemes:
- annotation_type: span
name: ner
labels: [PERSON, ORGANIZATION, LOCATION]
entity_linking:
enabled: true
knowledge_bases:
- name: wikidata
type: wikidata
language: enSupports multi-select mode for ambiguous entities and multiple knowledge bases in a single task.
Read the Entity Linking documentation →
Triage, Pairwise, Coreference, and More
Six additional annotation types round out the v2.2 schema additions:
- Triage — Accept/reject/skip interface for rapid data screening with auto-advance and keyboard shortcuts
- Pairwise Comparison — Binary A/B or scale slider for preference learning and RLHF data collection
- Conversation Trees — Hierarchical tree annotation with per-node ratings and path selection
- Coreference Chains — Group coreferring mentions into chains with visual indicators
- Segmentation Masks — New fill, eraser, and brush tools for pixel-level image annotation
- Discontinuous Spans —
allow_discontinuous: truefor non-contiguous text selections
Intelligent Annotation
MACE Competence Estimation
MACE uses a Variational Bayes EM algorithm to jointly estimate true labels and annotator competence scores (0.0-1.0). It identifies reliable annotators, detects spammers, and produces higher-quality predicted labels.
mace:
enabled: true
trigger_every_n: 10
min_annotations_per_item: 3MACE runs automatically in the background and integrates with the admin dashboard and adjudication system.
Option Highlighting
A new AI feature that analyzes content to highlight the most likely correct options for discrete annotation tasks. Top-k options display at full opacity with a star indicator while less-likely options are dimmed.
ai_support:
option_highlighting:
enabled: true
top_k: 3
dim_opacity: 0.4Read the Option Highlighting documentation →
Diversity Ordering
Sentence-transformer embeddings cluster similar items together, then round-robin sampling presents items from different clusters. This reduces annotator fatigue and improves coverage of the topic space.
assignment_strategy: diversity_clustering
diversity_ordering:
enabled: true
prefill_count: 100Read the Diversity Ordering documentation →
Export System
The new export CLI (python -m potato.export) converts annotations to 6 industry-standard formats with a single command:
python -m potato.export --config config.yaml --format coco --output ./export/
python -m potato.export --config config.yaml --format yolo --output ./export/
python -m potato.export --config config.yaml --format conll_2003 --output ./export/Supported formats: COCO, YOLO, Pascal VOC, CoNLL-2003, CoNLL-U, and Segmentation Masks. The system is extensible — create custom exporters by subclassing BaseExporter.
Read the Export Formats documentation →
Remote Data Sources
Load annotation data from URLs, S3, Google Drive, Dropbox, Hugging Face datasets, Google Sheets, and SQL databases:
data_sources:
- type: huggingface
dataset: "squad"
split: "train"
- type: s3
bucket: "my-annotation-data"
key: "datasets/items.jsonl"Includes partial/incremental loading for large datasets, local caching, and secure credential management with environment variables.
Read the Remote Data Sources documentation →
Survey Instruments
A library of 55 validated questionnaires ready to use in prestudy and poststudy phases:
phases:
prestudy:
type: prestudy
instrument: "tipi" # 10-item personality questionnaire
poststudy:
type: poststudy
instrument: "phq-9" # 9-item depression screeningInstruments span 8 categories: Personality (BFI-2, TIPI), Mental Health (PHQ-9, GAD-7), Affect (PANAS), Self-Concept (RSE), Social Attitudes (SDO-7, MFQ), Response Style, Short-Form versions, and Demographic Batteries from major surveys (ANES, GSS, ESS).
Read the Survey Instruments documentation →
UX Improvements
- Video object tracking with keyframe interpolation
- Bounding box annotation on PDF pages
- External AI config file support
- Form layout grid improvements
Upgrading to v2.2
pip install --upgrade potato-annotationExisting v2.0 and v2.1 configurations work without changes — all new features are opt-in through additional config blocks.
Getting Started
- What's New — Full v2.2 feature overview
- Event Annotation — N-ary event structures
- Entity Linking — Knowledge base linking
- MACE — Annotator competence estimation
- Export Formats — Export CLI
- Survey Instruments — 55 validated questionnaires
Have questions or feedback? Join our Discord or open an issue on GitHub.