AI Support
Integrate LLMs for intelligent annotation assistance.
AI Support
Potato 2.0 includes built-in support for Large Language Models (LLMs) to assist annotators with intelligent hints, keyword highlighting, and label suggestions.
Supported Providers
Potato supports multiple LLM providers:
Cloud Providers:
- OpenAI (GPT-4, GPT-4 Turbo, GPT-3.5)
- Anthropic (Claude 3, Claude 3.5)
- Google (Gemini Pro)
- Hugging Face
- OpenRouter
Local/Self-Hosted:
- Ollama (run models locally)
- vLLM (high-performance self-hosted inference)
Configuration
Basic Setup
Add an ai_support section to your configuration file:
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
temperature: 0.3
max_tokens: 500Provider-Specific Configuration
OpenAI
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4-turbo-preview
api_key: ${OPENAI_API_KEY}
temperature: 0.3
max_tokens: 500Anthropic Claude
ai_support:
enabled: true
endpoint_type: anthropic
ai_config:
model: claude-3-sonnet-20240229
api_key: ${ANTHROPIC_API_KEY}
temperature: 0.3
max_tokens: 500Google Gemini
ai_support:
enabled: true
endpoint_type: google
ai_config:
model: gemini-pro
api_key: ${GOOGLE_API_KEY}Local Ollama
ai_support:
enabled: true
endpoint_type: ollama
ai_config:
model: llama2
base_url: http://localhost:11434vLLM (Self-Hosted)
ai_support:
enabled: true
endpoint_type: vllm
ai_config:
model: meta-llama/Llama-2-7b-chat-hf
base_url: http://localhost:8000/v1AI Features
Potato's AI support provides three primary capabilities:
1. Intelligent Hints
Provide contextual guidance to annotators without revealing the answer:
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
# Hints appear as tooltips or sidebars
features:
hints:
enabled: true2. Keyword Highlighting
Automatically highlight relevant keywords in the text:
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
features:
keyword_highlighting:
enabled: true
highlight_color: "#fef08a"3. Label Suggestions
Suggest labels for annotator consideration (shown with confidence indicators):
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
features:
label_suggestions:
enabled: true
show_confidence: trueCaching and Performance
AI responses can be cached to improve performance and reduce API costs:
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
cache_config:
enabled: true
cache_dir: "ai_cache/"
# Pre-generate hints on startup
warmup:
enabled: true
num_instances: 100
# Generate hints for upcoming instances
prefetch:
enabled: true
lookahead: 5Caching Strategies
- Warmup: Pre-generates AI hints for an initial batch of instances when the server starts
- Prefetch: Generates hints for upcoming instances as annotators work
- Disk Persistence: Caches are saved to disk and persist across server restarts
Custom Prompts
Potato includes default prompts for each annotation type, stored in potato/ai/prompt/. You can customize these for your specific task:
| Annotation Type | Prompt File |
|---|---|
| Radio buttons | radio_prompt.txt |
| Likert scales | likert_prompt.txt |
| Checkboxes | checkbox_prompt.txt |
| Span annotation | span_prompt.txt |
| Sliders | slider_prompt.txt |
| Dropdowns | dropdown_prompt.txt |
| Number input | number_prompt.txt |
| Text input | text_prompt.txt |
Prompts support variable substitution:
{text}- The document text{labels}- Available labels for the scheme{description}- The scheme description
Multi-Schema Support
For tasks with multiple annotation schemes, you can enable AI support selectively:
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
# Only enable for specific schemes
special_include:
- page: 1
schema: sentiment
- page: 1
schema: topicsFull Example
Complete configuration for AI-assisted sentiment analysis:
task_name: "AI-Assisted Sentiment Analysis"
task_dir: "."
port: 8000
data_files:
- "data/reviews.json"
item_properties:
id_key: id
text_key: text
annotation_schemes:
- annotation_type: radio
name: sentiment
description: "What is the sentiment of this review?"
labels:
- Positive
- Negative
- Neutral
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
temperature: 0.3
max_tokens: 500
features:
hints:
enabled: true
keyword_highlighting:
enabled: true
highlight_color: "#fef08a"
label_suggestions:
enabled: true
show_confidence: true
cache_config:
enabled: true
cache_dir: "ai_cache/"
warmup:
enabled: true
num_instances: 50
prefetch:
enabled: true
lookahead: 3
output_annotation_dir: "output/"
output_annotation_format: "json"
allow_all_users: trueEnvironment Variables
Store API keys securely using environment variables:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."Reference them in your config with ${VARIABLE_NAME} syntax.
Cost Considerations
- AI calls are made per-instance by default
- Enable caching to reduce repeated API calls
- Use warmup and prefetch to pre-generate hints
- Consider using smaller/cheaper models for simple tasks
- Local providers (Ollama, vLLM) have no API costs
Best Practices
- Use AI as assistance, not replacement - Let annotators make final decisions
- Enable caching for production - Reduces latency and costs
- Test prompts thoroughly - Custom prompts should be validated
- Monitor API costs - Track usage especially with cloud providers
- Consider local providers - Ollama or vLLM for high-volume annotation
- Protect API credentials - Use environment variables, never commit keys