What's New in v2
Overview of new features and improvements in Potato 2.0.
What's New in Potato 2.0
Potato 2.0 is a major release that introduces powerful new features for intelligent, scalable annotation. This page highlights the key additions and improvements.
Major New Features
AI Support
Integrate Large Language Models to assist annotators with intelligent hints, keyword highlighting, and label suggestions.
Supported providers:
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude 3, Claude 3.5)
- Google (Gemini)
- Ollama (local models)
- vLLM (self-hosted)
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
features:
hints:
enabled: true
label_suggestions:
enabled: trueAudio Annotation
Full-featured audio annotation with waveform visualization powered by Peaks.js. Create segments, label time regions, and annotate speech with keyboard shortcuts.
Key features:
- Waveform visualization
- Segment creation and labeling
- Per-segment annotation questions
- 15+ keyboard shortcuts
- Server-side waveform caching
annotation_schemes:
- annotation_type: audio
name: speakers
mode: label
labels:
- Speaker A
- Speaker BLearn more about Audio Annotation →
Active Learning
Automatically prioritize annotation instances based on model uncertainty. Train classifiers on existing annotations and focus annotators on the most informative examples.
Capabilities:
- Multiple classifier options (LogisticRegression, RandomForest, SVC, MultinomialNB)
- Various vectorizers (TF-IDF, Count, Hashing)
- Model persistence across restarts
- LLM-enhanced selection
- Multi-schema support
active_learning:
enabled: true
schema_names:
- sentiment
min_instances_for_training: 30
update_frequency: 50
classifier:
type: LogisticRegressionLearn more about Active Learning →
Training Phase
Qualify annotators with practice questions before the main task. Provide immediate feedback and ensure quality through configurable passing criteria.
Features:
- Practice questions with known answers
- Immediate feedback and explanations
- Configurable passing criteria
- Retry options
- Progress tracking in admin dashboard
phases:
training:
enabled: true
data_file: "data/training.json"
passing_criteria:
min_correct: 8
total_questions: 10Learn more about Training Phase →
Enhanced Admin Dashboard
Comprehensive monitoring and management interface for annotation tasks.
Dashboard tabs:
- Overview: High-level metrics and completion rates
- Annotators: Performance tracking, timing analysis
- Instances: Browse data with disagreement scores
- Configuration: Real-time settings adjustment
admin_api_key: ${ADMIN_API_KEY}Learn more about Admin Dashboard →
Database Backend
MySQL support for large-scale deployments with connection pooling and transaction support.
database:
type: mysql
host: localhost
database: potato_db
user: ${DB_USER}
password: ${DB_PASSWORD}Potato automatically creates required tables on first startup.
Annotation History
Complete tracking of all annotation changes with timestamps, user IDs, and action types. Enables auditing and behavioral analysis.
{
"history": [
{
"timestamp": "2024-01-15T10:30:00Z",
"user": "annotator_1",
"action": "create",
"schema": "sentiment",
"value": "Positive"
}
]
}Multi-Phase Workflows
Build complex annotation workflows with multiple sequential phases:
- Consent - Informed consent collection
- Pre-study - Demographics and screening
- Instructions - Task guidelines
- Training - Practice questions
- Annotation - Main task
- Post-study - Feedback surveys
phases:
consent:
enabled: true
data_file: "data/consent.json"
prestudy:
enabled: true
data_file: "data/demographics.json"
training:
enabled: true
data_file: "data/training.json"
poststudy:
enabled: true
data_file: "data/feedback.json"Learn more about Multi-Phase Workflows →
Configuration Changes
New Configuration Structure
Potato 2.0 uses a cleaner configuration format:
v1 (old):
data_files:
- data.json
id_key: id
text_key: text
output_file: annotations.jsonv2 (new):
data_files:
- "data/data.json"
item_properties:
id_key: id
text_key: text
output_annotation_dir: "output/"
output_annotation_format: "json"Security Requirement
Configuration files must now be located within the task_dir:
# Valid - config.yaml is in the project directory
task_dir: "."
# Valid - config in configs/ subdirectory
task_dir: "my_project/"Quick Comparison
| Feature | v1 | v2 |
|---|---|---|
| AI/LLM Support | ❌ | ✅ |
| Audio Annotation | Basic | Full waveform |
| Active Learning | ❌ | ✅ |
| Training Phase | ❌ | ✅ |
| Admin Dashboard | Basic | Enhanced |
| Database Backend | File only | File + MySQL |
| Annotation History | ❌ | ✅ |
| Multi-Phase Workflows | Limited | Full support |
Migration Guide
Updating Your Configuration
-
Data configuration
# Old id_key: id text_key: text # New item_properties: id_key: id text_key: text -
Output configuration
# Old output_file: annotations.json # New output_annotation_dir: "output/" output_annotation_format: "json" -
Config file location Ensure your config file is inside the project directory.
Starting the Server
# v2 command
python -m potato start config.yaml -p 8000
# Or shorthand
potato start config.yamlGetting Started
Ready to try Potato 2.0? Start with the Quick Start Guide or explore specific features:
- AI Support - Intelligent annotation assistance
- Active Learning - Smart instance prioritization
- Audio Annotation - Waveform-based annotation
- Training Phase - Annotator qualification
- Admin Dashboard - Monitoring and management