Docs/Features

AI Support

Integrate LLMs for intelligent annotation assistance.

AI Support

Potato 2.0 includes built-in support for Large Language Models (LLMs) to assist annotators with intelligent hints, keyword highlighting, and label suggestions.

Supported Providers

Potato supports multiple LLM providers:

Cloud Providers:

  • OpenAI (GPT-4, GPT-4 Turbo, GPT-3.5)
  • Anthropic (Claude 3, Claude 3.5)
  • Google (Gemini Pro)
  • Hugging Face
  • OpenRouter

Local/Self-Hosted:

  • Ollama (run models locally)
  • vLLM (high-performance self-hosted inference)

Configuration

Basic Setup

Add an ai_support section to your configuration file:

ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
    temperature: 0.3
    max_tokens: 500

Provider-Specific Configuration

OpenAI

ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4-turbo-preview
    api_key: ${OPENAI_API_KEY}
    temperature: 0.3
    max_tokens: 500

Anthropic Claude

ai_support:
  enabled: true
  endpoint_type: anthropic
 
  ai_config:
    model: claude-3-sonnet-20240229
    api_key: ${ANTHROPIC_API_KEY}
    temperature: 0.3
    max_tokens: 500

Google Gemini

ai_support:
  enabled: true
  endpoint_type: google
 
  ai_config:
    model: gemini-pro
    api_key: ${GOOGLE_API_KEY}

Local Ollama

ai_support:
  enabled: true
  endpoint_type: ollama
 
  ai_config:
    model: llama2
    base_url: http://localhost:11434

vLLM (Self-Hosted)

ai_support:
  enabled: true
  endpoint_type: vllm
 
  ai_config:
    model: meta-llama/Llama-2-7b-chat-hf
    base_url: http://localhost:8000/v1

AI Features

Potato's AI support provides three primary capabilities:

1. Intelligent Hints

Provide contextual guidance to annotators without revealing the answer:

ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
 
  # Hints appear as tooltips or sidebars
  features:
    hints:
      enabled: true

2. Keyword Highlighting

Automatically highlight relevant keywords in the text:

ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
 
  features:
    keyword_highlighting:
      enabled: true
      highlight_color: "#fef08a"

3. Label Suggestions

Suggest labels for annotator consideration (shown with confidence indicators):

ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
 
  features:
    label_suggestions:
      enabled: true
      show_confidence: true

Caching and Performance

AI responses can be cached to improve performance and reduce API costs:

ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
 
  cache_config:
    enabled: true
    cache_dir: "ai_cache/"
 
    # Pre-generate hints on startup
    warmup:
      enabled: true
      num_instances: 100
 
    # Generate hints for upcoming instances
    prefetch:
      enabled: true
      lookahead: 5

Caching Strategies

  1. Warmup: Pre-generates AI hints for an initial batch of instances when the server starts
  2. Prefetch: Generates hints for upcoming instances as annotators work
  3. Disk Persistence: Caches are saved to disk and persist across server restarts

Custom Prompts

Potato includes default prompts for each annotation type, stored in potato/ai/prompt/. You can customize these for your specific task:

Annotation TypePrompt File
Radio buttonsradio_prompt.txt
Likert scaleslikert_prompt.txt
Checkboxescheckbox_prompt.txt
Span annotationspan_prompt.txt
Slidersslider_prompt.txt
Dropdownsdropdown_prompt.txt
Number inputnumber_prompt.txt
Text inputtext_prompt.txt

Prompts support variable substitution:

  • {text} - The document text
  • {labels} - Available labels for the scheme
  • {description} - The scheme description

Multi-Schema Support

For tasks with multiple annotation schemes, you can enable AI support selectively:

ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
 
  # Only enable for specific schemes
  special_include:
    - page: 1
      schema: sentiment
    - page: 1
      schema: topics

Full Example

Complete configuration for AI-assisted sentiment analysis:

task_name: "AI-Assisted Sentiment Analysis"
task_dir: "."
port: 8000
 
data_files:
  - "data/reviews.json"
 
item_properties:
  id_key: id
  text_key: text
 
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment of this review?"
    labels:
      - Positive
      - Negative
      - Neutral
 
ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
    temperature: 0.3
    max_tokens: 500
 
  features:
    hints:
      enabled: true
    keyword_highlighting:
      enabled: true
      highlight_color: "#fef08a"
    label_suggestions:
      enabled: true
      show_confidence: true
 
  cache_config:
    enabled: true
    cache_dir: "ai_cache/"
    warmup:
      enabled: true
      num_instances: 50
    prefetch:
      enabled: true
      lookahead: 3
 
output_annotation_dir: "output/"
output_annotation_format: "json"
allow_all_users: true

Environment Variables

Store API keys securely using environment variables:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."

Reference them in your config with ${VARIABLE_NAME} syntax.

Cost Considerations

  • AI calls are made per-instance by default
  • Enable caching to reduce repeated API calls
  • Use warmup and prefetch to pre-generate hints
  • Consider using smaller/cheaper models for simple tasks
  • Local providers (Ollama, vLLM) have no API costs

Best Practices

  1. Use AI as assistance, not replacement - Let annotators make final decisions
  2. Enable caching for production - Reduces latency and costs
  3. Test prompts thoroughly - Custom prompts should be validated
  4. Monitor API costs - Track usage especially with cloud providers
  5. Consider local providers - Ollama or vLLM for high-volume annotation
  6. Protect API credentials - Use environment variables, never commit keys