What's New
Overview of new features and improvements in Potato v2.x releases.
What's New
This page covers new features and improvements across Potato v2.x releases.
Potato 2.3.0
Released March 9, 2026
Potato 2.3 is the largest release in Potato's history, introducing agentic annotation, Solo Mode, Best-Worst Scaling, SSO/OAuth authentication, Parquet export, 15 new demo projects, and security hardening.
Agentic Annotation
A complete system for evaluating AI agents through human annotation. Includes 12 trace format converters, 3 specialized display types, and 9 pre-built annotation schemas.
12 Trace Format Converters — Import agent traces from OpenAI, Anthropic, SWE-bench, OpenTelemetry, MCP, CrewAI/AutoGen/LangGraph, LangChain, LangFuse, ReAct, WebArena/VisualWebArena, ATIF, and raw browser recordings. Auto-detection available.
agentic:
enabled: true
trace_converter: react # or openai, anthropic, webarena, auto, etc.
trace_file: "data/traces.jsonl"3 Display Types:
- Agent Trace Display — Color-coded step cards with collapsible observations, JSON pretty-printing, and timeline sidebar for tool-using agents
- Web Agent Trace Display — Full screenshots with SVG overlays showing click targets, text inputs, and scroll actions; filmstrip navigation for browsing agents
- Interactive Chat Display — Live chat mode (annotator interacts with agent via proxy) and trace review mode for conversational agents
Per-Turn Ratings — Rate individual steps alongside the overall trace for fine-grained evaluation.
9 Pre-Built Schemas — agent_task_success, agent_step_correctness, agent_error_taxonomy, agent_safety, agent_efficiency, agent_instruction_following, agent_explanation_quality, agent_web_action_correctness, agent_conversation_quality.
Agent Proxy System — OpenAI, HTTP, and echo proxies for live agent evaluation.
Learn more about Agentic Annotation →
Solo Mode
A 12-phase intelligent workflow where a single human annotator collaborates with an LLM to label entire datasets, achieving 95%+ agreement with multi-annotator pipelines while requiring only 10-15% of total human labels.
The 12 Phases:
- Seed Annotation — human labels 50 diverse instances
- Initial LLM Calibration — LLM labels using seed examples
- Confusion Analysis — identify systematic disagreement patterns
- Guideline Refinement — LLM proposes, human approves updated guidelines
- Labeling Function Generation — ALCHEmist-inspired programmatic rules
- Active Labeling — human labels most informative instances
- Automated Refinement Loop — iterative re-labeling with improved guidelines
- Disagreement Exploration — human resolves LLM/LF conflicts
- Edge Case Synthesis — LLM generates ambiguous examples for human labeling
- Cascaded Confidence Escalation — human reviews lowest-confidence labels
- Prompt Optimization — DSPy-inspired automated prompt search
- Final Validation — random sample review
solo_mode:
enabled: true
llm:
endpoint_type: openai
model: "gpt-4o"
api_key: ${OPENAI_API_KEY}
seed_count: 50
accuracy_threshold: 0.92Multi-Signal Instance Prioritization — 6 weighted pools (uncertain, disagreement, boundary, novel, error_pattern, random) for selecting the most valuable instances.
Best-Worst Scaling
Efficient comparative annotation where annotators select the best and worst items from tuples. Automatic tuple generation with balanced incomplete block designs and three scoring methods (Counting, Bradley-Terry, Plackett-Luce).
annotation_schemes:
- annotation_type: best_worst_scaling
name: fluency
items_key: "translations"
tuple_size: 4
best_label: "Most Fluent"
worst_label: "Least Fluent"
scoring:
method: bradley_terryLearn more about Best-Worst Scaling →
SSO & OAuth Authentication
Production-ready authentication with Google OAuth (domain restriction), GitHub OAuth (organization restriction), and generic OIDC (Okta, Azure AD, Auth0, Keycloak). Supports auto-registration, mixed mode, and session management.
authentication:
method: google_oauth
google_oauth:
client_id: ${GOOGLE_CLIENT_ID}
client_secret: ${GOOGLE_CLIENT_SECRET}
allowed_domains:
- "umich.edu"
auto_register: trueLearn more about SSO & OAuth →
Parquet Export
Export annotations to Apache Parquet format, producing three structured files: annotations.parquet, spans.parquet, and items.parquet. Supports snappy, gzip, zstd, lz4, and brotli compression, incremental export, and date/annotator partitioning. Compatible with pandas, DuckDB, PyArrow, Polars, and Hugging Face Datasets.
parquet_export:
enabled: true
output_dir: "output/parquet/"
compression: zstd
auto_export: trueLearn more about Parquet Export →
15 New Demo Projects
New demos in project-hub/ covering agentic annotation (5 demos), Solo Mode (3 demos), Best-Worst Scaling (3 demos), authentication (2 demos), and export workflows (2 demos). Start any demo with potato start config.yaml.
Security Hardening
- Cryptographically secure session tokens with configurable expiration
- CSRF protection enabled by default
- Rate limiting on authentication endpoints
- Input sanitization for user-provided content
- Dependency audit with all packages updated
- Content Security Policy headers
Other Improvements
- Custom trace converters for unsupported agent frameworks
- Hybrid Solo Mode with multi-annotator verification sampling
- BWS admin dashboard tab with score convergence charts
- Incremental Parquet export with date partitioning
v2.2 vs v2.3 Comparison
| Feature | v2.2 | v2.3 |
|---|---|---|
| Agentic Annotation | Not available | 12 converters, 3 displays, 9 schemas |
| Solo Mode | Not available | 12-phase human-LLM workflow |
| Best-Worst Scaling | Not available | BWS with 3 scoring methods |
| Authentication | Username only | + Google OAuth, GitHub OAuth, OIDC |
| Parquet Export | Not available | 3-file Parquet with 6 compression options |
| Demo Projects | 125+ | 140+ (15 new) |
| Security | Basic | CSRF, rate limiting, CSP, secure sessions |
Potato 2.2.0
Released February 20, 2026
Potato 2.2 is a major feature release with 9 new annotation schemas, a pluggable export system, MACE competence estimation, 55 validated survey instruments, and remote data sources.
New Annotation Schemas (9)
Event Annotation — N-ary event structures with trigger spans and typed argument roles. Annotate events like ATTACK, HIRE, and TRAVEL with constrained entity arguments and hub-spoke arc visualization.
annotation_schemes:
- annotation_type: event_annotation
name: events
span_schema: entities
event_types:
- type: "ATTACK"
trigger_labels: ["EVENT_TRIGGER"]
arguments:
- role: "attacker"
entity_types: ["PERSON", "ORGANIZATION"]
required: trueLearn more about Event Annotation →
Entity Linking — Link span annotations to external knowledge bases (Wikidata, UMLS, custom REST APIs). Add an entity_linking: block to any span schema to enable KB search and linking.
Learn more about Entity Linking →
Triage — Prodigy-style accept/reject/skip interface for rapid data screening. Customizable labels, keyboard shortcuts, and auto-advance for high-throughput annotation.
Pairwise Comparison — Compare two items with binary (click preferred tile) or scale (slider) modes. Supports items_key, allow_tie, scale: block with configurable range.
Learn more about Pairwise Comparison →
Conversation Trees — Annotate hierarchical conversation structures with per-node ratings, path selection, and branch comparison.
Learn more about Conversation Trees →
Coreference Chains — Group coreferring text mentions into chains with visual indicators. Supports entity types, singleton control, and multiple highlight modes.
Learn more about Coreference Chains →
Segmentation Masks — New fill, eraser, and brush tools for pixel-level image segmentation.
Bounding Box for PDF/Documents — Draw boxes on PDF pages for document annotation tasks.
Discontinuous Spans — allow_discontinuous: true enables selecting non-contiguous text segments as a single span.
Intelligent Annotation
MACE Competence Estimation — Variational Bayes EM algorithm that jointly estimates true labels and annotator competence scores (0.0-1.0). Works with radio, likert, select, and multiselect schemas.
mace:
enabled: true
trigger_every_n: 10
min_annotations_per_item: 3Option Highlighting — LLM-based highlighting of likely correct options for discrete annotation tasks. Highlights top-k options with a star indicator while dimming less-likely options.
ai_support:
option_highlighting:
enabled: true
top_k: 3
dim_opacity: 0.4Learn more about Option Highlighting →
Diversity Ordering — Embedding-based clustering and round-robin sampling to ensure annotators see diverse content rather than similar items in sequence.
assignment_strategy: diversity_clustering
diversity_ordering:
enabled: true
prefill_count: 100Learn more about Diversity Ordering →
Export System
A new pluggable export CLI (python -m potato.export) converts annotations to 6 industry-standard formats: COCO, YOLO, Pascal VOC, CoNLL-2003, CoNLL-U, and Segmentation Masks.
python -m potato.export --config config.yaml --format coco --output ./export/Learn more about Export Formats →
Remote Data Sources
Load annotation data from URLs, S3, Google Drive, Dropbox, Hugging Face, Google Sheets, and SQL databases via the new data_sources: config block. Includes partial loading, caching, and credential management.
Learn more about Remote Data Sources →
Survey Instruments
55 validated questionnaires across 8 categories (Personality, Mental Health, Affect, Self-Concept, Social Attitudes, Response Style, Short-Form, Demographics). Use in prestudy/poststudy phases with instrument: "tipi".
Learn more about Survey Instruments →
Other Improvements
- Video object tracking with keyframe interpolation
- External AI config file support
- Form layout grid improvements
- Format handlers for PDF, Word, code, and spreadsheets
Potato 2.1.0
Released February 5, 2026
Potato 2.1 introduces the instance display system, visual AI support, span linking, multi-field span annotation, and layout customization.
Instance Display System
A new instance_display config block that separates content display from annotation. Display any combination of images, videos, audio, text, and dialogues alongside any annotation schemes.
instance_display:
fields:
- key: image_url
type: image
display_options:
max_width: 600
zoomable: true
- key: description
type: text
annotation_schemes:
- annotation_type: radio
name: category
labels: [nature, urban, people]Supports 11 display types including text, html, image, video, audio, dialogue, pairwise, code, spreadsheet, document, and pdf.
Learn more about Instance Display →
Multi-Field Span Annotation
Span annotation schemes now support a target_field option to annotate across multiple text fields in the same instance.
annotation_schemes:
- annotation_type: span
name: source_entities
target_field: "source_text"
labels: [PERSON, ORGANIZATION]
- annotation_type: span
name: summary_entities
target_field: "summary"
labels: [PERSON, ORGANIZATION]Learn more about Span Annotation →
Span Linking
A new span_link annotation type for creating typed relationships between annotated spans. Supports directed and undirected links, n-ary relationships, visual arc display, and label constraints.
annotation_schemes:
- annotation_type: span
name: entities
labels:
- name: "PERSON"
color: "#3b82f6"
- name: "ORGANIZATION"
color: "#22c55e"
- annotation_type: span_link
name: relations
span_schema: entities
link_types:
- name: "WORKS_FOR"
directed: true
allowed_source_labels: ["PERSON"]
allowed_target_labels: ["ORGANIZATION"]
color: "#dc2626"Learn more about Span Linking →
Visual AI Support
Four new vision endpoints for AI-powered image and video annotation assistance:
- YOLO — Fast local object detection
- Ollama Vision — Local vision-language models (LLaVA, Qwen-VL)
- OpenAI Vision — GPT-4o cloud vision
- Anthropic Vision — Claude with vision
Features include object detection, pre-annotation, classification, hints, scene detection, keyframe detection, and object tracking.
Learn more about Visual AI Support →
Layout Customization
Create sophisticated custom visual layouts using HTML templates and CSS. Potato generates an editable layout file, or you can provide a fully custom template with grid layouts, color-coded options, and section styling.
task_layout: layouts/custom_task_layout.htmlThree example layouts included: content moderation, dialogue QA, and medical review.
Learn more about Layout Customization →
Label Rationales
A fourth AI capability that generates balanced explanations for why each label might apply, helping annotators understand different classification perspectives.
ai_support:
features:
rationales:
enabled: trueOther Improvements
- 50+ new tests for improved reliability
- Responsive design improvements
- Enhanced project-hub organization with layout examples
- Bug fixes across annotation types
v2.0 vs v2.1 Comparison
| Feature | v2.0 | v2.1 |
|---|---|---|
| Instance Display | Via annotation hacks | Dedicated instance_display block |
| Span Targets | Single text field | Multi-field with target_field |
| Span Linking | Not available | Full span_link type |
| Visual AI | Not available | YOLO, Ollama Vision, OpenAI Vision, Anthropic Vision |
| Layout Customization | Basic auto-generated | Auto-generated + custom templates |
| AI Capabilities | 3 (hints, keywords, suggestions) | 4 (+ rationales) |
Potato 2.0
Potato 2.0 is a major release that introduces powerful new features for intelligent, scalable annotation. This section highlights the key additions and improvements.
AI Support
Integrate Large Language Models to assist annotators with intelligent hints, keyword highlighting, and label suggestions.
Supported providers:
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude 3, Claude 3.5)
- Google (Gemini)
- Ollama (local models)
- vLLM (self-hosted)
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
features:
hints:
enabled: true
label_suggestions:
enabled: trueAudio Annotation
Full-featured audio annotation with waveform visualization powered by Peaks.js. Create segments, label time regions, and annotate speech with keyboard shortcuts.
Key features:
- Waveform visualization
- Segment creation and labeling
- Per-segment annotation questions
- 15+ keyboard shortcuts
- Server-side waveform caching
annotation_schemes:
- annotation_type: audio
name: speakers
mode: label
labels:
- Speaker A
- Speaker BLearn more about Audio Annotation →
Active Learning
Automatically prioritize annotation instances based on model uncertainty. Train classifiers on existing annotations and focus annotators on the most informative examples.
Capabilities:
- Multiple classifier options (LogisticRegression, RandomForest, SVC, MultinomialNB)
- Various vectorizers (TF-IDF, Count, Hashing)
- Model persistence across restarts
- LLM-enhanced selection
- Multi-schema support
active_learning:
enabled: true
schema_names:
- sentiment
min_instances_for_training: 30
update_frequency: 50
classifier:
type: LogisticRegressionLearn more about Active Learning →
Training Phase
Qualify annotators with practice questions before the main task. Provide immediate feedback and ensure quality through configurable passing criteria.
Features:
- Practice questions with known answers
- Immediate feedback and explanations
- Configurable passing criteria
- Retry options
- Progress tracking in admin dashboard
phases:
training:
enabled: true
data_file: "data/training.json"
passing_criteria:
min_correct: 8
total_questions: 10Learn more about Training Phase →
Enhanced Admin Dashboard
Comprehensive monitoring and management interface for annotation tasks.
Dashboard tabs:
- Overview: High-level metrics and completion rates
- Annotators: Performance tracking, timing analysis
- Instances: Browse data with disagreement scores
- Configuration: Real-time settings adjustment
admin_api_key: ${ADMIN_API_KEY}Learn more about Admin Dashboard →
Database Backend
MySQL support for large-scale deployments with connection pooling and transaction support.
database:
type: mysql
host: localhost
database: potato_db
user: ${DB_USER}
password: ${DB_PASSWORD}Potato automatically creates required tables on first startup.
Annotation History
Complete tracking of all annotation changes with timestamps, user IDs, and action types. Enables auditing and behavioral analysis.
{
"history": [
{
"timestamp": "2024-01-15T10:30:00Z",
"user": "annotator_1",
"action": "create",
"schema": "sentiment",
"value": "Positive"
}
]
}Multi-Phase Workflows
Build complex annotation workflows with multiple sequential phases:
- Consent - Informed consent collection
- Pre-study - Demographics and screening
- Instructions - Task guidelines
- Training - Practice questions
- Annotation - Main task
- Post-study - Feedback surveys
phases:
consent:
enabled: true
data_file: "data/consent.json"
prestudy:
enabled: true
data_file: "data/demographics.json"
training:
enabled: true
data_file: "data/training.json"
poststudy:
enabled: true
data_file: "data/feedback.json"Learn more about Multi-Phase Workflows →
v2.0 Configuration Changes
New Configuration Structure
Potato 2.0 uses a cleaner configuration format:
v1 (old):
data_files:
- data.json
id_key: id
text_key: text
output_file: annotations.jsonv2 (new):
data_files:
- "data/data.json"
item_properties:
id_key: id
text_key: text
output_annotation_dir: "output/"
output_annotation_format: "json"Security Requirement
Configuration files must now be located within the task_dir:
# Valid - config.yaml is in the project directory
task_dir: "."
# Valid - config in configs/ subdirectory
task_dir: "my_project/"Quick Comparison
| Feature | v1 | v2.0 | v2.1 | v2.2 | v2.3 |
|---|---|---|---|---|---|
| AI/LLM Support | No | Yes | Yes + Visual AI + Rationales | + Option Highlighting | + Solo Mode |
| Agentic Annotation | No | No | No | No | 12 converters, 3 displays |
| Best-Worst Scaling | No | No | No | No | Yes (3 scoring methods) |
| Audio Annotation | Basic | Full waveform | Full waveform | Full waveform | Full waveform |
| Active Learning | No | Yes | Yes | Yes + Diversity Ordering | + Solo Mode integration |
| Instance Display | No | No | Yes | Yes | Yes |
| Span Linking | No | No | Yes | Yes | Yes |
| Event Annotation | No | No | No | Yes | Yes |
| Entity Linking | No | No | No | Yes | Yes |
| Pairwise/Triage/Coreference/Trees | No | No | No | Yes | Yes |
| Layout Customization | No | Auto-generated | Auto + Custom templates | Auto + Custom templates | Auto + Custom templates |
| Training Phase | No | Yes | Yes | Yes | Yes |
| Admin Dashboard | Basic | Enhanced | Enhanced | Enhanced + MACE | + BWS tab, Solo Mode |
| Database Backend | File only | File + MySQL | File + MySQL | File + MySQL | File + MySQL |
| Export CLI | No | No | No | Yes (COCO, YOLO, CoNLL, etc.) | + Parquet |
| Authentication | Username | Username | Username | Username | + Google/GitHub OAuth, OIDC |
| Survey Instruments | No | No | No | 55 validated questionnaires | 55 validated questionnaires |
| Remote Data Sources | No | No | No | S3, GDrive, HuggingFace, etc. | S3, GDrive, HuggingFace, etc. |
Migration Guide
Updating Your Configuration (v1 to v2)
-
Data configuration
yaml# Old id_key: id text_key: text # New item_properties: id_key: id text_key: text -
Output configuration
yaml# Old output_file: annotations.json # New output_annotation_dir: "output/" output_annotation_format: "json" -
Config file location Ensure your config file is inside the project directory.
Starting the Server
# v2 command
python -m potato start config.yaml -p 8000
# Or shorthand
potato start config.yamlGetting Started
Ready to try Potato? Start with the Quick Start Guide or explore specific features:
v2.3 Features:
- Agentic Annotation - Evaluate AI agents with 12 converters and 3 display types
- Solo Mode - Human-LLM collaborative labeling
- Best-Worst Scaling - Comparative annotation with scoring
- SSO & OAuth - Google, GitHub, and OIDC authentication
- Parquet Export - Columnar data export
v2.2 Features:
- Event Annotation - N-ary event structures
- Entity Linking - Knowledge base linking
- Triage - Rapid data screening
- Coreference Chains - Entity coreference
- Conversation Trees - Hierarchical dialogue annotation
- MACE - Annotator competence estimation
- Option Highlighting - AI-assisted option guidance
- Diversity Ordering - Embedding-based item ordering
- Export Formats - Export CLI with 6 formats
- Remote Data Sources - Cloud data loading
- Survey Instruments - 55 validated questionnaires
v2.1 Features:
- Instance Display - Multi-modal content display
- Visual AI Support - AI for image and video annotation
- Span Linking - Entity relationship annotation
Core Features:
- AI Support - Intelligent annotation assistance
- Active Learning - Smart instance prioritization
- Audio Annotation - Waveform-based annotation
- Training Phase - Annotator qualification
- Admin Dashboard - Monitoring and management