Automatic Keyword Highlighting
Configure AI-powered keyword highlighting in Potato to draw annotator attention to important terms. Covers OpenAI, Claude, and custom keyword list configuration.
AI-powered keyword highlighting pulls the annotator's eye toward the terms, entities, or patterns that matter in a piece of text. This guide walks through Potato's built-in AI support and how to set it up so relevant keywords get highlighted on their own.
Why use keyword highlighting?
It guides annotators to the part of the text that actually matters, which means they find the key information faster and are less likely to skim past an important term. Because the highlighting comes from an LLM, it can adapt to the context of each item instead of relying on a fixed word list.
For how Potato's option and keyword highlighting works under the hood, see the source documentation.
Basic AI-powered highlighting
Potato leans on its AI support system to find and highlight important keywords. Here is a minimal config:
annotation_task_name: "Keyword Highlighted Annotation"
data_files:
- path: "data/reviews.json"
format: json
item_properties:
id_key: id
text_key: text
annotation_schemes:
- annotation_type: radio
name: sentiment
description: "What is the overall sentiment?"
labels:
- Positive
- Negative
- Neutral
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
temperature: 0.3
max_tokens: 500
features:
keyword_highlighting:
enabled: true
# Highlights are rendered as box overlays on the textWhen AI keyword highlighting is enabled, relevant terms are automatically highlighted in the annotation text:
Important keywords and entities are automatically highlighted to guide annotator attention
Using different AI providers
OpenAI
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4o
api_key: ${OPENAI_API_KEY}
temperature: 0.3
max_tokens: 500
features:
keyword_highlighting:
enabled: true
Anthropic Claude
ai_support:
enabled: true
endpoint_type: anthropic
ai_config:
model: claude-3-sonnet-20240229
api_key: ${ANTHROPIC_API_KEY}
temperature: 0.3
max_tokens: 500
features:
keyword_highlighting:
enabled: true
# Highlights are rendered as box overlays on the textLocal Ollama (No API Costs)
ai_support:
enabled: true
endpoint_type: ollama
ai_config:
model: llama2
base_url: http://localhost:11434
features:
keyword_highlighting:
enabled: true
# Highlights are rendered as box overlays on the textCombining features
The AI features stack, and they tend to be more useful together than alone:
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
temperature: 0.3
max_tokens: 500
features:
# Highlight important keywords
keyword_highlighting:
enabled: true
# Highlights are rendered as box overlays on the text
# Show contextual hints
hints:
enabled: true
# Suggest labels for consideration
label_suggestions:
enabled: true
show_confidence: trueComplete configuration example
Here is a full config for entity-aware annotation with AI highlighting:
annotation_task_name: "Entity-Aware Annotation"
data_files:
- path: "data/documents.json"
format: json
item_properties:
id_key: id
text_key: text
annotation_schemes:
- annotation_type: span
name: entities
labels:
- name: PERSON
color: "#FECACA"
- name: ORG
color: "#BBF7D0"
- name: LOCATION
color: "#BFDBFE"
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
temperature: 0.3
max_tokens: 500
features:
keyword_highlighting:
enabled: true
# Highlights are rendered as box overlays on the text
hints:
enabled: true
label_suggestions:
enabled: true
show_confidence: true
cache_config:
disk_cache:
enabled: true
path: "ai_cache/cache.json"
prefetch:
warm_up_page_count: 50
on_next: 3
on_prev: 2
output_annotation_dir: "output/"
export_annotation_format: json
allow_all_users: trueCaching for performance
Turn on caching to cut down on API calls and speed up responses:
ai_support:
enabled: true
endpoint_type: openai
ai_config:
model: gpt-4
api_key: ${OPENAI_API_KEY}
features:
keyword_highlighting:
enabled: true
cache_config:
disk_cache:
enabled: true
path: "ai_cache/cache.json"
# Pre-generate highlights on startup and prefetch upcoming
prefetch:
warm_up_page_count: 100
on_next: 5
on_prev: 2Tips
Pick highlight colors that sit well next to your annotation scheme rather than fighting it. Keep caching on so you are not paying for the same content twice. If you are annotating at high volume, Ollama runs locally and skips the API bill entirely. And remember the features stack: keyword highlighting pairs naturally with hints and label suggestions.
Full documentation at /docs/features/ai-support.