What's New

Overview of new features and improvements in Potato v2.x releases.

What's New

This page covers new features and improvements across Potato v2.x releases.

Potato 2.3.0

Released March 9, 2026

Potato 2.3 is the largest release in Potato's history, introducing agentic annotation, Solo Mode, Best-Worst Scaling, SSO/OAuth authentication, Parquet export, 15 new demo projects, and security hardening.

Agentic Annotation

A complete system for evaluating AI agents through human annotation. Includes 12 trace format converters, 3 specialized display types, and 9 pre-built annotation schemas.

12 Trace Format Converters — Import agent traces from OpenAI, Anthropic, SWE-bench, OpenTelemetry, MCP, CrewAI/AutoGen/LangGraph, LangChain, LangFuse, ReAct, WebArena/VisualWebArena, ATIF, and raw browser recordings. Auto-detection available.

yaml

agentic:
  enabled: true
  trace_converter: react       # or openai, anthropic, webarena, auto, etc.
  trace_file: "data/traces.jsonl"

3 Display Types:

Agent Trace Display — Color-coded step cards with collapsible observations, JSON pretty-printing, and timeline sidebar for tool-using agents
Web Agent Trace Display — Full screenshots with SVG overlays showing click targets, text inputs, and scroll actions; filmstrip navigation for browsing agents
Interactive Chat Display — Live chat mode (annotator interacts with agent via proxy) and trace review mode for conversational agents

Per-Turn Ratings — Rate individual steps alongside the overall trace for fine-grained evaluation.

9 Pre-Built Schemas — agent_task_success, agent_step_correctness, agent_error_taxonomy, agent_safety, agent_efficiency, agent_instruction_following, agent_explanation_quality, agent_web_action_correctness, agent_conversation_quality.

Agent Proxy System — OpenAI, HTTP, and echo proxies for live agent evaluation.

Learn more about Agentic Annotation →

Solo Mode

A 12-phase intelligent workflow where a single human annotator collaborates with an LLM to label entire datasets, achieving 95%+ agreement with multi-annotator pipelines while requiring only 10-15% of total human labels.

The 12 Phases:

Seed Annotation — human labels 50 diverse instances
Initial LLM Calibration — LLM labels using seed examples
Confusion Analysis — identify systematic disagreement patterns
Guideline Refinement — LLM proposes, human approves updated guidelines
Labeling Function Generation — ALCHEmist-inspired programmatic rules
Active Labeling — human labels most informative instances
Automated Refinement Loop — iterative re-labeling with improved guidelines
Disagreement Exploration — human resolves LLM/LF conflicts
Edge Case Synthesis — LLM generates ambiguous examples for human labeling
Cascaded Confidence Escalation — human reviews lowest-confidence labels
Prompt Optimization — DSPy-inspired automated prompt search
Final Validation — random sample review

yaml

solo_mode:
  enabled: true
  llm:
    endpoint_type: openai
    model: "gpt-4o"
    api_key: ${OPENAI_API_KEY}
  seed_count: 50
  accuracy_threshold: 0.92

Multi-Signal Instance Prioritization — 6 weighted pools (uncertain, disagreement, boundary, novel, error_pattern, random) for selecting the most valuable instances.

Learn more about Solo Mode →

Best-Worst Scaling

Efficient comparative annotation where annotators select the best and worst items from tuples. Automatic tuple generation with balanced incomplete block designs and three scoring methods (Counting, Bradley-Terry, Plackett-Luce).

yaml

annotation_schemes:
  - annotation_type: best_worst_scaling
    name: fluency
    items_key: "translations"
    tuple_size: 4
    best_label: "Most Fluent"
    worst_label: "Least Fluent"
    scoring:
      method: bradley_terry

Learn more about Best-Worst Scaling →

SSO & OAuth Authentication

Production-ready authentication with Google OAuth (domain restriction), GitHub OAuth (organization restriction), and generic OIDC (Okta, Azure AD, Auth0, Keycloak). Supports auto-registration, mixed mode, and session management.

yaml

authentication:
  method: google_oauth
  google_oauth:
    client_id: ${GOOGLE_CLIENT_ID}
    client_secret: ${GOOGLE_CLIENT_SECRET}
    allowed_domains:
      - "umich.edu"
    auto_register: true

Learn more about SSO & OAuth →

Parquet Export

Export annotations to Apache Parquet format, producing three structured files: annotations.parquet, spans.parquet, and items.parquet. Supports snappy, gzip, zstd, lz4, and brotli compression, incremental export, and date/annotator partitioning. Compatible with pandas, DuckDB, PyArrow, Polars, and Hugging Face Datasets.

yaml

parquet_export:
  enabled: true
  output_dir: "output/parquet/"
  compression: zstd
  auto_export: true

Learn more about Parquet Export →

15 New Demo Projects

New demos in project-hub/ covering agentic annotation (5 demos), Solo Mode (3 demos), Best-Worst Scaling (3 demos), authentication (2 demos), and export workflows (2 demos). Start any demo with potato start config.yaml.

Security Hardening

Cryptographically secure session tokens with configurable expiration
CSRF protection enabled by default
Rate limiting on authentication endpoints
Input sanitization for user-provided content
Dependency audit with all packages updated
Content Security Policy headers

Other Improvements

Custom trace converters for unsupported agent frameworks
Hybrid Solo Mode with multi-annotator verification sampling
BWS admin dashboard tab with score convergence charts
Incremental Parquet export with date partitioning

v2.2 vs v2.3 Comparison

Feature	v2.2	v2.3
Agentic Annotation	Not available	12 converters, 3 displays, 9 schemas
Solo Mode	Not available	12-phase human-LLM workflow
Best-Worst Scaling	Not available	BWS with 3 scoring methods
Authentication	Username only	+ Google OAuth, GitHub OAuth, OIDC
Parquet Export	Not available	3-file Parquet with 6 compression options
Demo Projects	125+	140+ (15 new)
Security	Basic	CSRF, rate limiting, CSP, secure sessions

Potato 2.2.0

Released February 20, 2026

Potato 2.2 is a major feature release with 9 new annotation schemas, a pluggable export system, MACE competence estimation, 55 validated survey instruments, and remote data sources.

New Annotation Schemas (9)

Event Annotation — N-ary event structures with trigger spans and typed argument roles. Annotate events like ATTACK, HIRE, and TRAVEL with constrained entity arguments and hub-spoke arc visualization.

yaml

annotation_schemes:
  - annotation_type: event_annotation
    name: events
    span_schema: entities
    event_types:
      - type: "ATTACK"
        trigger_labels: ["EVENT_TRIGGER"]
        arguments:
          - role: "attacker"
            entity_types: ["PERSON", "ORGANIZATION"]
            required: true

Learn more about Event Annotation →

Entity Linking — Link span annotations to external knowledge bases (Wikidata, UMLS, custom REST APIs). Add an entity_linking: block to any span schema to enable KB search and linking.

Learn more about Entity Linking →

Triage — Prodigy-style accept/reject/skip interface for rapid data screening. Customizable labels, keyboard shortcuts, and auto-advance for high-throughput annotation.

Learn more about Triage →

Pairwise Comparison — Compare two items with binary (click preferred tile) or scale (slider) modes. Supports items_key, allow_tie, scale: block with configurable range.

Learn more about Pairwise Comparison →

Conversation Trees — Annotate hierarchical conversation structures with per-node ratings, path selection, and branch comparison.

Learn more about Conversation Trees →

Coreference Chains — Group coreferring text mentions into chains with visual indicators. Supports entity types, singleton control, and multiple highlight modes.

Learn more about Coreference Chains →

Segmentation Masks — New fill, eraser, and brush tools for pixel-level image segmentation.

Bounding Box for PDF/Documents — Draw boxes on PDF pages for document annotation tasks.

Discontinuous Spans — allow_discontinuous: true enables selecting non-contiguous text segments as a single span.

Intelligent Annotation

MACE Competence Estimation — Variational Bayes EM algorithm that jointly estimates true labels and annotator competence scores (0.0-1.0). Works with radio, likert, select, and multiselect schemas.

yaml

mace:
  enabled: true
  trigger_every_n: 10
  min_annotations_per_item: 3

Learn more about MACE →

Option Highlighting — LLM-based highlighting of likely correct options for discrete annotation tasks. Highlights top-k options with a star indicator while dimming less-likely options.

yaml

ai_support:
  option_highlighting:
    enabled: true
    top_k: 3
    dim_opacity: 0.4

Learn more about Option Highlighting →

Diversity Ordering — Embedding-based clustering and round-robin sampling to ensure annotators see diverse content rather than similar items in sequence.

yaml

assignment_strategy: diversity_clustering
diversity_ordering:
  enabled: true
  prefill_count: 100

Learn more about Diversity Ordering →

Export System

A new pluggable export CLI (python -m potato.export) converts annotations to 6 industry-standard formats: COCO, YOLO, Pascal VOC, CoNLL-2003, CoNLL-U, and Segmentation Masks.

bash

python -m potato.export --config config.yaml --format coco --output ./export/

Learn more about Export Formats →

Remote Data Sources

Load annotation data from URLs, S3, Google Drive, Dropbox, Hugging Face, Google Sheets, and SQL databases via the new data_sources: config block. Includes partial loading, caching, and credential management.

Learn more about Remote Data Sources →

Survey Instruments

55 validated questionnaires across 8 categories (Personality, Mental Health, Affect, Self-Concept, Social Attitudes, Response Style, Short-Form, Demographics). Use in prestudy/poststudy phases with instrument: "tipi".

Learn more about Survey Instruments →

Other Improvements

Video object tracking with keyframe interpolation
External AI config file support
Form layout grid improvements
Format handlers for PDF, Word, code, and spreadsheets

Potato 2.1.0

Released February 5, 2026

Potato 2.1 introduces the instance display system, visual AI support, span linking, multi-field span annotation, and layout customization.

Instance Display System

A new instance_display config block that separates content display from annotation. Display any combination of images, videos, audio, text, and dialogues alongside any annotation schemes.

yaml

instance_display:
  fields:
    - key: image_url
      type: image
      display_options:
        max_width: 600
        zoomable: true
    - key: description
      type: text
 
annotation_schemes:
  - annotation_type: radio
    name: category
    labels: [nature, urban, people]

Supports 11 display types including text, html, image, video, audio, dialogue, pairwise, code, spreadsheet, document, and pdf.

Learn more about Instance Display →

Multi-Field Span Annotation

Span annotation schemes now support a target_field option to annotate across multiple text fields in the same instance.

yaml

annotation_schemes:
  - annotation_type: span
    name: source_entities
    target_field: "source_text"
    labels: [PERSON, ORGANIZATION]
 
  - annotation_type: span
    name: summary_entities
    target_field: "summary"
    labels: [PERSON, ORGANIZATION]

Learn more about Span Annotation →

Span Linking

A new span_link annotation type for creating typed relationships between annotated spans. Supports directed and undirected links, n-ary relationships, visual arc display, and label constraints.

yaml

annotation_schemes:
  - annotation_type: span
    name: entities
    labels:
      - name: "PERSON"
        color: "#3b82f6"
      - name: "ORGANIZATION"
        color: "#22c55e"
 
  - annotation_type: span_link
    name: relations
    span_schema: entities
    link_types:
      - name: "WORKS_FOR"
        directed: true
        allowed_source_labels: ["PERSON"]
        allowed_target_labels: ["ORGANIZATION"]
        color: "#dc2626"

Learn more about Span Linking →

Visual AI Support

Four new vision endpoints for AI-powered image and video annotation assistance:

YOLO — Fast local object detection
Ollama Vision — Local vision-language models (LLaVA, Qwen-VL)
OpenAI Vision — GPT-4o cloud vision
Anthropic Vision — Claude with vision

Features include object detection, pre-annotation, classification, hints, scene detection, keyframe detection, and object tracking.

Learn more about Visual AI Support →

Layout Customization

Create sophisticated custom visual layouts using HTML templates and CSS. Potato generates an editable layout file, or you can provide a fully custom template with grid layouts, color-coded options, and section styling.

yaml

task_layout: layouts/custom_task_layout.html

Three example layouts included: content moderation, dialogue QA, and medical review.

Learn more about Layout Customization →

Label Rationales

A fourth AI capability that generates balanced explanations for why each label might apply, helping annotators understand different classification perspectives.

yaml

ai_support:
  features:
    rationales:
      enabled: true

Learn more about AI Support →

Other Improvements

50+ new tests for improved reliability
Responsive design improvements
Enhanced project-hub organization with layout examples
Bug fixes across annotation types

v2.0 vs v2.1 Comparison

Feature	v2.0	v2.1
Instance Display	Via annotation hacks	Dedicated `instance_display` block
Span Targets	Single text field	Multi-field with `target_field`
Span Linking	Not available	Full `span_link` type
Visual AI	Not available	YOLO, Ollama Vision, OpenAI Vision, Anthropic Vision
Layout Customization	Basic auto-generated	Auto-generated + custom templates
AI Capabilities	3 (hints, keywords, suggestions)	4 (+ rationales)

Potato 2.0

Potato 2.0 is a major release that introduces powerful new features for intelligent, scalable annotation. This section highlights the key additions and improvements.

AI Support

Integrate Large Language Models to assist annotators with intelligent hints, keyword highlighting, and label suggestions.

Supported providers:

OpenAI (GPT-4, GPT-3.5)
Anthropic (Claude 3, Claude 3.5)
Google (Gemini)
Ollama (local models)
vLLM (self-hosted)

yaml

ai_support:
  enabled: true
  endpoint_type: openai
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
  features:
    hints:
      enabled: true
    label_suggestions:
      enabled: true

Learn more about AI Support →

Audio Annotation

Full-featured audio annotation with waveform visualization powered by Peaks.js. Create segments, label time regions, and annotate speech with keyboard shortcuts.

Key features:

Waveform visualization
Segment creation and labeling
Per-segment annotation questions
15+ keyboard shortcuts
Server-side waveform caching

yaml

annotation_schemes:
  - annotation_type: audio
    name: speakers
    mode: label
    labels:
      - Speaker A
      - Speaker B

Learn more about Audio Annotation →

Active Learning

Automatically prioritize annotation instances based on model uncertainty. Train classifiers on existing annotations and focus annotators on the most informative examples.

Capabilities:

Multiple classifier options (LogisticRegression, RandomForest, SVC, MultinomialNB)
Various vectorizers (TF-IDF, Count, Hashing)
Model persistence across restarts
LLM-enhanced selection
Multi-schema support

yaml

active_learning:
  enabled: true
  schema_names:
    - sentiment
  min_instances_for_training: 30
  update_frequency: 50
  classifier:
    type: LogisticRegression

Learn more about Active Learning →

Training Phase

Qualify annotators with practice questions before the main task. Provide immediate feedback and ensure quality through configurable passing criteria.

Features:

Practice questions with known answers
Immediate feedback and explanations
Configurable passing criteria
Retry options
Progress tracking in admin dashboard

yaml

phases:
  training:
    enabled: true
    data_file: "data/training.json"
    passing_criteria:
      min_correct: 8
      total_questions: 10

Learn more about Training Phase →

Enhanced Admin Dashboard

Comprehensive monitoring and management interface for annotation tasks.

Dashboard tabs:

Overview: High-level metrics and completion rates
Annotators: Performance tracking, timing analysis
Instances: Browse data with disagreement scores
Configuration: Real-time settings adjustment

yaml

admin_api_key: ${ADMIN_API_KEY}

Learn more about Admin Dashboard →

Database Backend

MySQL support for large-scale deployments with connection pooling and transaction support.

yaml

database:
  type: mysql
  host: localhost
  database: potato_db
  user: ${DB_USER}
  password: ${DB_PASSWORD}

Potato automatically creates required tables on first startup.

Annotation History

Complete tracking of all annotation changes with timestamps, user IDs, and action types. Enables auditing and behavioral analysis.

json

{
  "history": [
    {
      "timestamp": "2024-01-15T10:30:00Z",
      "user": "annotator_1",
      "action": "create",
      "schema": "sentiment",
      "value": "Positive"
    }
  ]
}

Multi-Phase Workflows

Build complex annotation workflows with multiple sequential phases:

Consent - Informed consent collection
Pre-study - Demographics and screening
Instructions - Task guidelines
Training - Practice questions
Annotation - Main task
Post-study - Feedback surveys

yaml

phases:
  consent:
    enabled: true
    data_file: "data/consent.json"
  prestudy:
    enabled: true
    data_file: "data/demographics.json"
  training:
    enabled: true
    data_file: "data/training.json"
  poststudy:
    enabled: true
    data_file: "data/feedback.json"

Learn more about Multi-Phase Workflows →

v2.0 Configuration Changes

New Configuration Structure

Potato 2.0 uses a cleaner configuration format:

v1 (old):

yaml

data_files:
  - data.json
id_key: id
text_key: text
output_file: annotations.json

v2 (new):

yaml

data_files:
  - "data/data.json"
 
item_properties:
  id_key: id
  text_key: text
 
output_annotation_dir: "output/"
output_annotation_format: "json"

Security Requirement

Configuration files must now be located within the task_dir:

yaml

# Valid - config.yaml is in the project directory
task_dir: "."
 
# Valid - config in configs/ subdirectory
task_dir: "my_project/"

Quick Comparison

Feature	v1	v2.0	v2.1	v2.2	v2.3
AI/LLM Support	No	Yes	Yes + Visual AI + Rationales	+ Option Highlighting	+ Solo Mode
Agentic Annotation	No	No	No	No	12 converters, 3 displays
Best-Worst Scaling	No	No	No	No	Yes (3 scoring methods)
Audio Annotation	Basic	Full waveform	Full waveform	Full waveform	Full waveform
Active Learning	No	Yes	Yes	Yes + Diversity Ordering	+ Solo Mode integration
Instance Display	No	No	Yes	Yes	Yes
Span Linking	No	No	Yes	Yes	Yes
Event Annotation	No	No	No	Yes	Yes
Entity Linking	No	No	No	Yes	Yes
Pairwise/Triage/Coreference/Trees	No	No	No	Yes	Yes
Layout Customization	No	Auto-generated	Auto + Custom templates	Auto + Custom templates	Auto + Custom templates
Training Phase	No	Yes	Yes	Yes	Yes
Admin Dashboard	Basic	Enhanced	Enhanced	Enhanced + MACE	+ BWS tab, Solo Mode
Database Backend	File only	File + MySQL	File + MySQL	File + MySQL	File + MySQL
Export CLI	No	No	No	Yes (COCO, YOLO, CoNLL, etc.)	+ Parquet
Authentication	Username	Username	Username	Username	+ Google/GitHub OAuth, OIDC
Survey Instruments	No	No	No	55 validated questionnaires	55 validated questionnaires
Remote Data Sources	No	No	No	S3, GDrive, HuggingFace, etc.	S3, GDrive, HuggingFace, etc.

Migration Guide

Updating Your Configuration (v1 to v2)

Data configuration

yaml

# Old
id_key: id
text_key: text
 
# New
item_properties:
  id_key: id
  text_key: text

Output configuration

yaml

# Old
output_file: annotations.json
 
# New
output_annotation_dir: "output/"
output_annotation_format: "json"

Config file location Ensure your config file is inside the project directory.

Starting the Server

bash

# v2 command
python -m potato start config.yaml -p 8000
 
# Or shorthand
potato start config.yaml

Getting Started

Ready to try Potato? Start with the Quick Start Guide or explore specific features:

v2.3 Features:

Agentic Annotation - Evaluate AI agents with 12 converters and 3 display types
Solo Mode - Human-LLM collaborative labeling
Best-Worst Scaling - Comparative annotation with scoring
SSO & OAuth - Google, GitHub, and OIDC authentication
Parquet Export - Columnar data export

v2.2 Features:

Event Annotation - N-ary event structures
Entity Linking - Knowledge base linking
Triage - Rapid data screening
Coreference Chains - Entity coreference
Conversation Trees - Hierarchical dialogue annotation
MACE - Annotator competence estimation
Option Highlighting - AI-assisted option guidance
Diversity Ordering - Embedding-based item ordering
Export Formats - Export CLI with 6 formats
Remote Data Sources - Cloud data loading
Survey Instruments - 55 validated questionnaires

v2.1 Features:

Instance Display - Multi-modal content display
Visual AI Support - AI for image and video annotation
Span Linking - Entity relationship annotation

Core Features:

AI Support - Intelligent annotation assistance
Active Learning - Smart instance prioritization
Audio Annotation - Waveform-based annotation
Training Phase - Annotator qualification
Admin Dashboard - Monitoring and management