AI-powered keyword highlighting pulls the annotator's eye toward the terms, entities, or patterns that matter in a piece of text. This guide walks through Potato's built-in AI support and how to set it up so relevant keywords get highlighted on their own.

Why use keyword highlighting?

It guides annotators to the part of the text that actually matters, which means they find the key information faster and are less likely to skim past an important term. Because the highlighting comes from an LLM, it can adapt to the context of each item instead of relying on a fixed word list.

For how Potato's option and keyword highlighting works under the hood, see the source documentation.

Basic AI-powered highlighting

Potato leans on its AI support system to find and highlight important keywords. Here is a minimal config:

yaml

annotation_task_name: "Keyword Highlighted Annotation"
 
data_files:
  - path: "data/reviews.json"
    format: json
 
item_properties:
  id_key: id
  text_key: text
 
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the overall sentiment?"
    labels:
      - Positive
      - Negative
      - Neutral
 
ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
    temperature: 0.3
    max_tokens: 500
 
  features:
    keyword_highlighting:
      enabled: true
      # Highlights are rendered as box overlays on the text

When AI keyword highlighting is enabled, relevant terms are automatically highlighted in the annotation text:

AI-powered keyword highlighting in the annotation interface Important keywords and entities are automatically highlighted to guide annotator attention

Using different AI providers

OpenAI

yaml

ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4o
    api_key: ${OPENAI_API_KEY}
    temperature: 0.3
    max_tokens: 500
 
  features:
    keyword_highlighting:
      enabled: true

Anthropic Claude

yaml

ai_support:
  enabled: true
  endpoint_type: anthropic
 
  ai_config:
    model: claude-3-sonnet-20240229
    api_key: ${ANTHROPIC_API_KEY}
    temperature: 0.3
    max_tokens: 500
 
  features:
    keyword_highlighting:
      enabled: true
      # Highlights are rendered as box overlays on the text

Local Ollama (No API Costs)

yaml

ai_support:
  enabled: true
  endpoint_type: ollama
 
  ai_config:
    model: llama2
    base_url: http://localhost:11434
 
  features:
    keyword_highlighting:
      enabled: true
      # Highlights are rendered as box overlays on the text

Combining features

The AI features stack, and they tend to be more useful together than alone:

yaml

ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
    temperature: 0.3
    max_tokens: 500
 
  features:
    # Highlight important keywords
    keyword_highlighting:
      enabled: true
      # Highlights are rendered as box overlays on the text
 
    # Show contextual hints
    hints:
      enabled: true
 
    # Suggest labels for consideration
    label_suggestions:
      enabled: true
      show_confidence: true

Complete configuration example

Here is a full config for entity-aware annotation with AI highlighting:

yaml

annotation_task_name: "Entity-Aware Annotation"
 
data_files:
  - path: "data/documents.json"
    format: json
 
item_properties:
  id_key: id
  text_key: text
 
annotation_schemes:
  - annotation_type: span
    name: entities
    labels:
      - name: PERSON
        color: "#FECACA"
      - name: ORG
        color: "#BBF7D0"
      - name: LOCATION
        color: "#BFDBFE"
 
ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
    temperature: 0.3
    max_tokens: 500
 
  features:
    keyword_highlighting:
      enabled: true
      # Highlights are rendered as box overlays on the text
    hints:
      enabled: true
    label_suggestions:
      enabled: true
      show_confidence: true
 
  cache_config:
    disk_cache:
      enabled: true
      path: "ai_cache/cache.json"
    prefetch:
      warm_up_page_count: 50
      on_next: 3
      on_prev: 2
 
output_annotation_dir: "output/"
export_annotation_format: json
allow_all_users: true

Caching for performance

Turn on caching to cut down on API calls and speed up responses:

yaml

ai_support:
  enabled: true
  endpoint_type: openai
 
  ai_config:
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
 
  features:
    keyword_highlighting:
      enabled: true
 
  cache_config:
    disk_cache:
      enabled: true
      path: "ai_cache/cache.json"
 
    # Pre-generate highlights on startup and prefetch upcoming
    prefetch:
      warm_up_page_count: 100
      on_next: 5
      on_prev: 2

Tips

Pick highlight colors that sit well next to your annotation scheme rather than fighting it. Keep caching on so you are not paying for the same content twice. If you are annotating at high volume, Ollama runs locally and skips the API bill entirely. And remember the features stack: keyword highlighting pairs naturally with hints and label suggestions.

Full documentation at /docs/features/ai-support.