Skip to content

What's New

Overview of new features and improvements in Potato v2.x releases.

What's New

This page covers new features and improvements across Potato v2.x releases.


Potato 2.3.0

Released March 9, 2026

Potato 2.3 is the largest release in Potato's history, introducing agentic annotation, Solo Mode, Best-Worst Scaling, SSO/OAuth authentication, Parquet export, 15 new demo projects, and security hardening.

Agentic Annotation

A complete system for evaluating AI agents through human annotation. Includes 12 trace format converters, 3 specialized display types, and 9 pre-built annotation schemas.

12 Trace Format Converters — Import agent traces from OpenAI, Anthropic, SWE-bench, OpenTelemetry, MCP, CrewAI/AutoGen/LangGraph, LangChain, LangFuse, ReAct, WebArena/VisualWebArena, ATIF, and raw browser recordings. Auto-detection available.

yaml
agentic:
  enabled: true
  trace_converter: react       # or openai, anthropic, webarena, auto, etc.
  trace_file: "data/traces.jsonl"

3 Display Types:

  • Agent Trace Display — Color-coded step cards with collapsible observations, JSON pretty-printing, and timeline sidebar for tool-using agents
  • Web Agent Trace Display — Full screenshots with SVG overlays showing click targets, text inputs, and scroll actions; filmstrip navigation for browsing agents
  • Interactive Chat Display — Live chat mode (annotator interacts with agent via proxy) and trace review mode for conversational agents

Per-Turn Ratings — Rate individual steps alongside the overall trace for fine-grained evaluation.

9 Pre-Built Schemasagent_task_success, agent_step_correctness, agent_error_taxonomy, agent_safety, agent_efficiency, agent_instruction_following, agent_explanation_quality, agent_web_action_correctness, agent_conversation_quality.

Agent Proxy System — OpenAI, HTTP, and echo proxies for live agent evaluation.

Learn more about Agentic Annotation →


Solo Mode

A 12-phase intelligent workflow where a single human annotator collaborates with an LLM to label entire datasets, achieving 95%+ agreement with multi-annotator pipelines while requiring only 10-15% of total human labels.

The 12 Phases:

  1. Seed Annotation — human labels 50 diverse instances
  2. Initial LLM Calibration — LLM labels using seed examples
  3. Confusion Analysis — identify systematic disagreement patterns
  4. Guideline Refinement — LLM proposes, human approves updated guidelines
  5. Labeling Function Generation — ALCHEmist-inspired programmatic rules
  6. Active Labeling — human labels most informative instances
  7. Automated Refinement Loop — iterative re-labeling with improved guidelines
  8. Disagreement Exploration — human resolves LLM/LF conflicts
  9. Edge Case Synthesis — LLM generates ambiguous examples for human labeling
  10. Cascaded Confidence Escalation — human reviews lowest-confidence labels
  11. Prompt Optimization — DSPy-inspired automated prompt search
  12. Final Validation — random sample review
yaml
solo_mode:
  enabled: true
  llm:
    endpoint_type: openai
    model: "gpt-4o"
    api_key: ${OPENAI_API_KEY}
  seed_count: 50
  accuracy_threshold: 0.92

Multi-Signal Instance Prioritization — 6 weighted pools (uncertain, disagreement, boundary, novel, error_pattern, random) for selecting the most valuable instances.

Learn more about Solo Mode →


Best-Worst Scaling

Efficient comparative annotation where annotators select the best and worst items from tuples. Automatic tuple generation with balanced incomplete block designs and three scoring methods (Counting, Bradley-Terry, Plackett-Luce).

yaml
annotation_schemes:
  - annotation_type: best_worst_scaling
    name: fluency
    items_key: "translations"
    tuple_size: 4
    best_label: "Most Fluent"
    worst_label: "Least Fluent"
    scoring:
      method: bradley_terry

Learn more about Best-Worst Scaling →


SSO & OAuth Authentication

Production-ready authentication with Google OAuth (domain restriction), GitHub OAuth (organization restriction), and generic OIDC (Okta, Azure AD, Auth0, Keycloak). Supports auto-registration, mixed mode, and session management.

yaml
authentication:
  method: google_oauth
  google_oauth:
    client_id: ${GOOGLE_CLIENT_ID}
    client_secret: ${GOOGLE_CLIENT_SECRET}
    allowed_domains:
      - "umich.edu"
    auto_register: true

Learn more about SSO & OAuth →


Parquet Export

Export annotations to Apache Parquet format, producing three structured files: annotations.parquet, spans.parquet, and items.parquet. Supports snappy, gzip, zstd, lz4, and brotli compression, incremental export, and date/annotator partitioning. Compatible with pandas, DuckDB, PyArrow, Polars, and Hugging Face Datasets.

yaml
parquet_export:
  enabled: true
  output_dir: "output/parquet/"
  compression: zstd
  auto_export: true

Learn more about Parquet Export →


15 New Demo Projects

New demos in project-hub/ covering agentic annotation (5 demos), Solo Mode (3 demos), Best-Worst Scaling (3 demos), authentication (2 demos), and export workflows (2 demos). Start any demo with potato start config.yaml.


Security Hardening

  • Cryptographically secure session tokens with configurable expiration
  • CSRF protection enabled by default
  • Rate limiting on authentication endpoints
  • Input sanitization for user-provided content
  • Dependency audit with all packages updated
  • Content Security Policy headers

Other Improvements

  • Custom trace converters for unsupported agent frameworks
  • Hybrid Solo Mode with multi-annotator verification sampling
  • BWS admin dashboard tab with score convergence charts
  • Incremental Parquet export with date partitioning

v2.2 vs v2.3 Comparison

Featurev2.2v2.3
Agentic AnnotationNot available12 converters, 3 displays, 9 schemas
Solo ModeNot available12-phase human-LLM workflow
Best-Worst ScalingNot availableBWS with 3 scoring methods
AuthenticationUsername only+ Google OAuth, GitHub OAuth, OIDC
Parquet ExportNot available3-file Parquet with 6 compression options
Demo Projects125+140+ (15 new)
SecurityBasicCSRF, rate limiting, CSP, secure sessions

Potato 2.2.0

Released February 20, 2026

Potato 2.2 is a major feature release with 9 new annotation schemas, a pluggable export system, MACE competence estimation, 55 validated survey instruments, and remote data sources.

New Annotation Schemas (9)

Event Annotation — N-ary event structures with trigger spans and typed argument roles. Annotate events like ATTACK, HIRE, and TRAVEL with constrained entity arguments and hub-spoke arc visualization.

yaml
annotation_schemes:
  - annotation_type: event_annotation
    name: events
    span_schema: entities
    event_types:
      - type: "ATTACK"
        trigger_labels: ["EVENT_TRIGGER"]
        arguments:
          - role: "attacker"
            entity_types: ["PERSON", "ORGANIZATION"]
            required: true

Learn more about Event Annotation →

Entity Linking — Link span annotations to external knowledge bases (Wikidata, UMLS, custom REST APIs). Add an entity_linking: block to any span schema to enable KB search and linking.

Learn more about Entity Linking →

Triage — Prodigy-style accept/reject/skip interface for rapid data screening. Customizable labels, keyboard shortcuts, and auto-advance for high-throughput annotation.

Learn more about Triage →

Pairwise Comparison — Compare two items with binary (click preferred tile) or scale (slider) modes. Supports items_key, allow_tie, scale: block with configurable range.

Learn more about Pairwise Comparison →

Conversation Trees — Annotate hierarchical conversation structures with per-node ratings, path selection, and branch comparison.

Learn more about Conversation Trees →

Coreference Chains — Group coreferring text mentions into chains with visual indicators. Supports entity types, singleton control, and multiple highlight modes.

Learn more about Coreference Chains →

Segmentation Masks — New fill, eraser, and brush tools for pixel-level image segmentation.

Bounding Box for PDF/Documents — Draw boxes on PDF pages for document annotation tasks.

Discontinuous Spansallow_discontinuous: true enables selecting non-contiguous text segments as a single span.


Intelligent Annotation

MACE Competence Estimation — Variational Bayes EM algorithm that jointly estimates true labels and annotator competence scores (0.0-1.0). Works with radio, likert, select, and multiselect schemas.

yaml
mace:
  enabled: true
  trigger_every_n: 10
  min_annotations_per_item: 3

Learn more about MACE →

Option Highlighting — LLM-based highlighting of likely correct options for discrete annotation tasks. Highlights top-k options with a star indicator while dimming less-likely options.

yaml
ai_support:
  option_highlighting:
    enabled: true
    top_k: 3
    dim_opacity: 0.4

Learn more about Option Highlighting →

Diversity Ordering — Embedding-based clustering and round-robin sampling to ensure annotators see diverse content rather than similar items in sequence.

yaml
assignment_strategy: diversity_clustering
diversity_ordering:
  enabled: true
  prefill_count: 100

Learn more about Diversity Ordering →


Export System

A new pluggable export CLI (python -m potato.export) converts annotations to 6 industry-standard formats: COCO, YOLO, Pascal VOC, CoNLL-2003, CoNLL-U, and Segmentation Masks.

bash
python -m potato.export --config config.yaml --format coco --output ./export/

Learn more about Export Formats →


Remote Data Sources

Load annotation data from URLs, S3, Google Drive, Dropbox, Hugging Face, Google Sheets, and SQL databases via the new data_sources: config block. Includes partial loading, caching, and credential management.

Learn more about Remote Data Sources →


Survey Instruments

55 validated questionnaires across 8 categories (Personality, Mental Health, Affect, Self-Concept, Social Attitudes, Response Style, Short-Form, Demographics). Use in prestudy/poststudy phases with instrument: "tipi".

Learn more about Survey Instruments →


Other Improvements

  • Video object tracking with keyframe interpolation
  • External AI config file support
  • Form layout grid improvements
  • Format handlers for PDF, Word, code, and spreadsheets

Potato 2.1.0

Released February 5, 2026

Potato 2.1 introduces the instance display system, visual AI support, span linking, multi-field span annotation, and layout customization.

Instance Display System

A new instance_display config block that separates content display from annotation. Display any combination of images, videos, audio, text, and dialogues alongside any annotation schemes.

yaml
instance_display:
  fields:
    - key: image_url
      type: image
      display_options:
        max_width: 600
        zoomable: true
    - key: description
      type: text
 
annotation_schemes:
  - annotation_type: radio
    name: category
    labels: [nature, urban, people]

Supports 11 display types including text, html, image, video, audio, dialogue, pairwise, code, spreadsheet, document, and pdf.

Learn more about Instance Display →


Multi-Field Span Annotation

Span annotation schemes now support a target_field option to annotate across multiple text fields in the same instance.

yaml
annotation_schemes:
  - annotation_type: span
    name: source_entities
    target_field: "source_text"
    labels: [PERSON, ORGANIZATION]
 
  - annotation_type: span
    name: summary_entities
    target_field: "summary"
    labels: [PERSON, ORGANIZATION]

Learn more about Span Annotation →


Span Linking

A new span_link annotation type for creating typed relationships between annotated spans. Supports directed and undirected links, n-ary relationships, visual arc display, and label constraints.

yaml
annotation_schemes:
  - annotation_type: span
    name: entities
    labels:
      - name: "PERSON"
        color: "#3b82f6"
      - name: "ORGANIZATION"
        color: "#22c55e"
 
  - annotation_type: span_link
    name: relations
    span_schema: entities
    link_types:
      - name: "WORKS_FOR"
        directed: true
        allowed_source_labels: ["PERSON"]
        allowed_target_labels: ["ORGANIZATION"]
        color: "#dc2626"

Learn more about Span Linking →


Visual AI Support

Four new vision endpoints for AI-powered image and video annotation assistance:

  • YOLO — Fast local object detection
  • Ollama Vision — Local vision-language models (LLaVA, Qwen-VL)
  • OpenAI Vision — GPT-4o cloud vision
  • Anthropic Vision — Claude with vision

Features include object detection, pre-annotation, classification, hints, scene detection, keyframe detection, and object tracking.

Learn more about Visual AI Support →


Layout Customization

Create sophisticated custom visual layouts using HTML templates and CSS. Potato generates an editable layout file, or you can provide a fully custom template with grid layouts, color-coded options, and section styling.

yaml
task_layout: layouts/custom_task_layout.html

Three example layouts included: content moderation, dialogue QA, and medical review.

Learn more about Layout Customization →


Label Rationales

A fourth AI capability that generates balanced explanations for why each label might apply, helping annotators understand different classification perspectives.

yaml
ai_support:
  features:
    rationales:
      enabled: true

Learn more about AI Support →


Other Improvements

  • 50+ new tests for improved reliability
  • Responsive design improvements
  • Enhanced project-hub organization with layout examples
  • Bug fixes across annotation types

v2.0 vs v2.1 Comparison

Featurev2.0v2.1
Instance DisplayVia annotation hacksDedicated instance_display block
Span TargetsSingle text fieldMulti-field with target_field
Span LinkingNot availableFull span_link type
Visual AINot availableYOLO, Ollama Vision, OpenAI Vision, Anthropic Vision
Layout CustomizationBasic auto-generatedAuto-generated + custom templates
AI Capabilities3 (hints, keywords, suggestions)4 (+ rationales)

Potato 2.0

Potato 2.0 is a major release that introduces powerful new features for intelligent, scalable annotation. This section highlights the key additions and improvements.

AI Support

Integrate Large Language Models to assist annotators with intelligent hints, keyword highlighting, and label suggestions.

Supported providers:

  • OpenAI (GPT-4, GPT-3.5)
  • Anthropic (Claude 3, Claude 3.5)
  • Google (Gemini)
  • Ollama (local models)
  • vLLM (self-hosted)
yaml
ai_support:
  enabled: true
  endpoint_type: openai
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
  features:
    hints:
      enabled: true
    label_suggestions:
      enabled: true

Learn more about AI Support →


Audio Annotation

Full-featured audio annotation with waveform visualization powered by Peaks.js. Create segments, label time regions, and annotate speech with keyboard shortcuts.

Key features:

  • Waveform visualization
  • Segment creation and labeling
  • Per-segment annotation questions
  • 15+ keyboard shortcuts
  • Server-side waveform caching
yaml
annotation_schemes:
  - annotation_type: audio
    name: speakers
    mode: label
    labels:
      - Speaker A
      - Speaker B

Learn more about Audio Annotation →


Active Learning

Automatically prioritize annotation instances based on model uncertainty. Train classifiers on existing annotations and focus annotators on the most informative examples.

Capabilities:

  • Multiple classifier options (LogisticRegression, RandomForest, SVC, MultinomialNB)
  • Various vectorizers (TF-IDF, Count, Hashing)
  • Model persistence across restarts
  • LLM-enhanced selection
  • Multi-schema support
yaml
active_learning:
  enabled: true
  schema_names:
    - sentiment
  min_instances_for_training: 30
  update_frequency: 50
  classifier:
    type: LogisticRegression

Learn more about Active Learning →


Training Phase

Qualify annotators with practice questions before the main task. Provide immediate feedback and ensure quality through configurable passing criteria.

Features:

  • Practice questions with known answers
  • Immediate feedback and explanations
  • Configurable passing criteria
  • Retry options
  • Progress tracking in admin dashboard
yaml
phases:
  training:
    enabled: true
    data_file: "data/training.json"
    passing_criteria:
      min_correct: 8
      total_questions: 10

Learn more about Training Phase →


Enhanced Admin Dashboard

Comprehensive monitoring and management interface for annotation tasks.

Dashboard tabs:

  • Overview: High-level metrics and completion rates
  • Annotators: Performance tracking, timing analysis
  • Instances: Browse data with disagreement scores
  • Configuration: Real-time settings adjustment
yaml
admin_api_key: ${ADMIN_API_KEY}

Learn more about Admin Dashboard →


Database Backend

MySQL support for large-scale deployments with connection pooling and transaction support.

yaml
database:
  type: mysql
  host: localhost
  database: potato_db
  user: ${DB_USER}
  password: ${DB_PASSWORD}

Potato automatically creates required tables on first startup.


Annotation History

Complete tracking of all annotation changes with timestamps, user IDs, and action types. Enables auditing and behavioral analysis.

json
{
  "history": [
    {
      "timestamp": "2024-01-15T10:30:00Z",
      "user": "annotator_1",
      "action": "create",
      "schema": "sentiment",
      "value": "Positive"
    }
  ]
}

Multi-Phase Workflows

Build complex annotation workflows with multiple sequential phases:

  1. Consent - Informed consent collection
  2. Pre-study - Demographics and screening
  3. Instructions - Task guidelines
  4. Training - Practice questions
  5. Annotation - Main task
  6. Post-study - Feedback surveys
yaml
phases:
  consent:
    enabled: true
    data_file: "data/consent.json"
  prestudy:
    enabled: true
    data_file: "data/demographics.json"
  training:
    enabled: true
    data_file: "data/training.json"
  poststudy:
    enabled: true
    data_file: "data/feedback.json"

Learn more about Multi-Phase Workflows →


v2.0 Configuration Changes

New Configuration Structure

Potato 2.0 uses a cleaner configuration format:

v1 (old):

yaml
data_files:
  - data.json
id_key: id
text_key: text
output_file: annotations.json

v2 (new):

yaml
data_files:
  - "data/data.json"
 
item_properties:
  id_key: id
  text_key: text
 
output_annotation_dir: "output/"
output_annotation_format: "json"

Security Requirement

Configuration files must now be located within the task_dir:

yaml
# Valid - config.yaml is in the project directory
task_dir: "."
 
# Valid - config in configs/ subdirectory
task_dir: "my_project/"

Quick Comparison

Featurev1v2.0v2.1v2.2v2.3
AI/LLM SupportNoYesYes + Visual AI + Rationales+ Option Highlighting+ Solo Mode
Agentic AnnotationNoNoNoNo12 converters, 3 displays
Best-Worst ScalingNoNoNoNoYes (3 scoring methods)
Audio AnnotationBasicFull waveformFull waveformFull waveformFull waveform
Active LearningNoYesYesYes + Diversity Ordering+ Solo Mode integration
Instance DisplayNoNoYesYesYes
Span LinkingNoNoYesYesYes
Event AnnotationNoNoNoYesYes
Entity LinkingNoNoNoYesYes
Pairwise/Triage/Coreference/TreesNoNoNoYesYes
Layout CustomizationNoAuto-generatedAuto + Custom templatesAuto + Custom templatesAuto + Custom templates
Training PhaseNoYesYesYesYes
Admin DashboardBasicEnhancedEnhancedEnhanced + MACE+ BWS tab, Solo Mode
Database BackendFile onlyFile + MySQLFile + MySQLFile + MySQLFile + MySQL
Export CLINoNoNoYes (COCO, YOLO, CoNLL, etc.)+ Parquet
AuthenticationUsernameUsernameUsernameUsername+ Google/GitHub OAuth, OIDC
Survey InstrumentsNoNoNo55 validated questionnaires55 validated questionnaires
Remote Data SourcesNoNoNoS3, GDrive, HuggingFace, etc.S3, GDrive, HuggingFace, etc.

Migration Guide

Updating Your Configuration (v1 to v2)

  1. Data configuration

    yaml
    # Old
    id_key: id
    text_key: text
     
    # New
    item_properties:
      id_key: id
      text_key: text
  2. Output configuration

    yaml
    # Old
    output_file: annotations.json
     
    # New
    output_annotation_dir: "output/"
    output_annotation_format: "json"
  3. Config file location Ensure your config file is inside the project directory.

Starting the Server

bash
# v2 command
python -m potato start config.yaml -p 8000
 
# Or shorthand
potato start config.yaml

Getting Started

Ready to try Potato? Start with the Quick Start Guide or explore specific features:

v2.3 Features:

v2.2 Features:

v2.1 Features:

Core Features: