Docs/Getting Started

What's New in v2

Overview of new features and improvements in Potato 2.0.

What's New in Potato 2.0

Potato 2.0 is a major release that introduces powerful new features for intelligent, scalable annotation. This page highlights the key additions and improvements.

Major New Features

AI Support

Integrate Large Language Models to assist annotators with intelligent hints, keyword highlighting, and label suggestions.

Supported providers:

  • OpenAI (GPT-4, GPT-3.5)
  • Anthropic (Claude 3, Claude 3.5)
  • Google (Gemini)
  • Ollama (local models)
  • vLLM (self-hosted)
ai_support:
  enabled: true
  endpoint_type: openai
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
  features:
    hints:
      enabled: true
    label_suggestions:
      enabled: true

Learn more about AI Support →


Audio Annotation

Full-featured audio annotation with waveform visualization powered by Peaks.js. Create segments, label time regions, and annotate speech with keyboard shortcuts.

Key features:

  • Waveform visualization
  • Segment creation and labeling
  • Per-segment annotation questions
  • 15+ keyboard shortcuts
  • Server-side waveform caching
annotation_schemes:
  - annotation_type: audio
    name: speakers
    mode: label
    labels:
      - Speaker A
      - Speaker B

Learn more about Audio Annotation →


Active Learning

Automatically prioritize annotation instances based on model uncertainty. Train classifiers on existing annotations and focus annotators on the most informative examples.

Capabilities:

  • Multiple classifier options (LogisticRegression, RandomForest, SVC, MultinomialNB)
  • Various vectorizers (TF-IDF, Count, Hashing)
  • Model persistence across restarts
  • LLM-enhanced selection
  • Multi-schema support
active_learning:
  enabled: true
  schema_names:
    - sentiment
  min_instances_for_training: 30
  update_frequency: 50
  classifier:
    type: LogisticRegression

Learn more about Active Learning →


Training Phase

Qualify annotators with practice questions before the main task. Provide immediate feedback and ensure quality through configurable passing criteria.

Features:

  • Practice questions with known answers
  • Immediate feedback and explanations
  • Configurable passing criteria
  • Retry options
  • Progress tracking in admin dashboard
phases:
  training:
    enabled: true
    data_file: "data/training.json"
    passing_criteria:
      min_correct: 8
      total_questions: 10

Learn more about Training Phase →


Enhanced Admin Dashboard

Comprehensive monitoring and management interface for annotation tasks.

Dashboard tabs:

  • Overview: High-level metrics and completion rates
  • Annotators: Performance tracking, timing analysis
  • Instances: Browse data with disagreement scores
  • Configuration: Real-time settings adjustment
admin_api_key: ${ADMIN_API_KEY}

Learn more about Admin Dashboard →


Database Backend

MySQL support for large-scale deployments with connection pooling and transaction support.

database:
  type: mysql
  host: localhost
  database: potato_db
  user: ${DB_USER}
  password: ${DB_PASSWORD}

Potato automatically creates required tables on first startup.


Annotation History

Complete tracking of all annotation changes with timestamps, user IDs, and action types. Enables auditing and behavioral analysis.

{
  "history": [
    {
      "timestamp": "2024-01-15T10:30:00Z",
      "user": "annotator_1",
      "action": "create",
      "schema": "sentiment",
      "value": "Positive"
    }
  ]
}

Multi-Phase Workflows

Build complex annotation workflows with multiple sequential phases:

  1. Consent - Informed consent collection
  2. Pre-study - Demographics and screening
  3. Instructions - Task guidelines
  4. Training - Practice questions
  5. Annotation - Main task
  6. Post-study - Feedback surveys
phases:
  consent:
    enabled: true
    data_file: "data/consent.json"
  prestudy:
    enabled: true
    data_file: "data/demographics.json"
  training:
    enabled: true
    data_file: "data/training.json"
  poststudy:
    enabled: true
    data_file: "data/feedback.json"

Learn more about Multi-Phase Workflows →


Configuration Changes

New Configuration Structure

Potato 2.0 uses a cleaner configuration format:

v1 (old):

data_files:
  - data.json
id_key: id
text_key: text
output_file: annotations.json

v2 (new):

data_files:
  - "data/data.json"
 
item_properties:
  id_key: id
  text_key: text
 
output_annotation_dir: "output/"
output_annotation_format: "json"

Security Requirement

Configuration files must now be located within the task_dir:

# Valid - config.yaml is in the project directory
task_dir: "."
 
# Valid - config in configs/ subdirectory
task_dir: "my_project/"

Quick Comparison

Featurev1v2
AI/LLM Support
Audio AnnotationBasicFull waveform
Active Learning
Training Phase
Admin DashboardBasicEnhanced
Database BackendFile onlyFile + MySQL
Annotation History
Multi-Phase WorkflowsLimitedFull support

Migration Guide

Updating Your Configuration

  1. Data configuration

    # Old
    id_key: id
    text_key: text
     
    # New
    item_properties:
      id_key: id
      text_key: text
  2. Output configuration

    # Old
    output_file: annotations.json
     
    # New
    output_annotation_dir: "output/"
    output_annotation_format: "json"
  3. Config file location Ensure your config file is inside the project directory.

Starting the Server

# v2 command
python -m potato start config.yaml -p 8000
 
# Or shorthand
potato start config.yaml

Getting Started

Ready to try Potato 2.0? Start with the Quick Start Guide or explore specific features: