Span Annotation
Highlight and label text spans for named entity recognition and more.
Span Annotation
Span annotation allows annotators to select and label portions of text, commonly used for named entity recognition (NER), part-of-speech tagging, and text highlighting tasks.
Basic Configuration
annotation_schemes:
- annotation_type: span
name: entities
description: "Highlight named entities in the text"
labels:
- PERSON
- ORGANIZATION
- LOCATIONConfiguration Options
Entity Labels
Define the types of spans annotators can create:
labels:
- PERSON
- ORGANIZATION
- LOCATION
- DATE
- EVENTLabel Colors
Customize colors for visual distinction:
label_colors:
PERSON: "#3b82f6"
ORGANIZATION: "#10b981"
LOCATION: "#f59e0b"
DATE: "#8b5cf6"
EVENT: "#ec4899"Colors can be hex (#ff0000) or RGB (rgb(255, 0, 0)).
Keyboard Shortcuts
Speed up annotation with keyboard bindings:
keyboard_shortcuts:
PERSON: "1"
ORGANIZATION: "2"
LOCATION: "3"
DATE: "4"Tooltips
Provide guidance for each label:
tooltips:
PERSON: "Names of people, characters, or personas"
ORGANIZATION: "Companies, agencies, institutions"
LOCATION: "Physical locations, addresses, geographic regions"Overlapping Spans
Allow Overlapping
Enable spans that can overlap:
- annotation_type: span
name: entities
labels:
- PERSON
- ROLE
allow_overlapping: trueThis is useful when the same text can have multiple labels (e.g., "Dr. Smith" is both a PERSON and has a ROLE).
Disable Overlapping (Default)
- annotation_type: span
name: entities
labels:
- PERSON
- ORGANIZATION
allow_overlapping: false # Default behaviorSpan Selection Modes
Word-Level Selection
Select complete words only:
- annotation_type: span
name: entities
selection_mode: word
labels:
- ENTITYCharacter-Level Selection
Allow selection of partial words:
- annotation_type: span
name: entities
selection_mode: character
labels:
- ENTITYPre-Annotated Spans
Load existing annotations for review or correction:
{
"id": "doc1",
"text": "John Smith works at Microsoft in Seattle.",
"spans": [
{"start": 0, "end": 10, "label": "PERSON"},
{"start": 20, "end": 29, "label": "ORGANIZATION"},
{"start": 33, "end": 40, "label": "LOCATION"}
]
}Configure to load pre-annotations:
- annotation_type: span
name: entities
load_pre_annotations: true
pre_annotation_field: spansCommon NER Configurations
Standard NER (4 Types)
- annotation_type: span
name: ner
description: "Label named entities"
labels:
- PER # Person
- ORG # Organization
- LOC # Location
- MISC # Miscellaneous
label_colors:
PER: "#3b82f6"
ORG: "#10b981"
LOC: "#f59e0b"
MISC: "#6b7280"
keyboard_shortcuts:
PER: "1"
ORG: "2"
LOC: "3"
MISC: "4"Extended NER (OntoNotes Style)
- annotation_type: span
name: ner_extended
labels:
- PERSON
- NORP # Nationalities, religious/political groups
- FAC # Facilities
- ORG
- GPE # Geopolitical entities
- LOC
- PRODUCT
- EVENT
- WORK_OF_ART
- LAW
- LANGUAGE
- DATE
- TIME
- PERCENT
- MONEY
- QUANTITY
- ORDINAL
- CARDINALBiomedical NER
- annotation_type: span
name: bio_ner
labels:
- GENE
- PROTEIN
- DISEASE
- DRUG
- SPECIES
label_colors:
GENE: "#22c55e"
PROTEIN: "#3b82f6"
DISEASE: "#ef4444"
DRUG: "#f59e0b"
SPECIES: "#8b5cf6"Social Media NER
- annotation_type: span
name: social_ner
labels:
- PERSON
- ORGANIZATION
- LOCATION
- PRODUCT
- CREATIVE_WORK
- GROUPSpan with Attributes
Add attributes to spans for richer annotation:
annotation_schemes:
- annotation_type: span
name: entities
labels:
- PERSON
- ORGANIZATION
- annotation_type: radio
name: entity_type
description: "What type of entity is this?"
show_for_span: entities
labels:
- Named
- Nominal
- PronominalMultiple Span Schemes
Annotate different aspects separately:
annotation_schemes:
# Named entities
- annotation_type: span
name: entities
description: "Label named entities"
labels:
- PERSON
- ORGANIZATION
- LOCATION
# Sentiment expressions
- annotation_type: span
name: sentiment_spans
description: "Highlight sentiment expressions"
labels:
- POSITIVE
- NEGATIVE
label_colors:
POSITIVE: "#22c55e"
NEGATIVE: "#ef4444"Multi-Field Span Annotation
New in v2.1.0
Span annotation can target specific text fields in multi-field data using the target_field option. This is useful when your data contains multiple text fields and you want to annotate spans in a particular one.
Configuration
annotation_schemes:
- annotation_type: span
name: source_entities
description: "Label entities in the source text"
target_field: "source_text"
labels:
- PERSON
- ORGANIZATION
- annotation_type: span
name: summary_entities
description: "Label entities in the summary"
target_field: "summary"
labels:
- PERSON
- ORGANIZATIONMulti-Field Data Format
Your data should include the separate text fields:
{
"id": "doc1",
"source_text": "John Smith works at Microsoft in Seattle.",
"summary": "Smith is employed by Microsoft."
}Output Format
When using target_field, annotations are keyed by field:
{
"id": "doc1",
"source_entities": {
"source_text": [
{"start": 0, "end": 10, "text": "John Smith", "label": "PERSON"},
{"start": 20, "end": 29, "text": "Microsoft", "label": "ORGANIZATION"}
]
},
"summary_entities": {
"summary": [
{"start": 0, "end": 5, "text": "Smith", "label": "PERSON"},
{"start": 22, "end": 31, "text": "Microsoft", "label": "ORGANIZATION"}
]
}
}For a complete working example, see project-hub/simple_examples/simple-multi-span/ in the Potato repository.
Display Options
Show Label in Span
Display the label text within highlighted spans:
- annotation_type: span
name: entities
show_label_in_span: trueUnderline Style
Use underlines instead of background highlighting:
- annotation_type: span
name: entities
display_style: underlineOutput Format
Span annotations are saved with character offsets:
{
"id": "doc1",
"entities": [
{
"start": 0,
"end": 10,
"text": "John Smith",
"label": "PERSON"
},
{
"start": 20,
"end": 29,
"text": "Microsoft",
"label": "ORGANIZATION"
}
]
}Full Example: NER Task
task_name: "Named Entity Recognition"
data_files:
- path: data/documents.json
text_field: text
annotation_schemes:
- annotation_type: span
name: entities
description: "Highlight and label all named entities"
labels:
- PERSON
- ORGANIZATION
- LOCATION
- DATE
- MONEY
label_colors:
PERSON: "#3b82f6"
ORGANIZATION: "#10b981"
LOCATION: "#f59e0b"
DATE: "#8b5cf6"
MONEY: "#ec4899"
keyboard_shortcuts:
PERSON: "1"
ORGANIZATION: "2"
LOCATION: "3"
DATE: "4"
MONEY: "5"
tooltips:
PERSON: "Names of people"
ORGANIZATION: "Companies, agencies, institutions"
LOCATION: "Cities, countries, addresses"
DATE: "Dates and time expressions"
MONEY: "Monetary values"
allow_overlapping: false
selection_mode: word
- annotation_type: radio
name: difficulty
description: "How difficult was this document to annotate?"
labels:
- Easy
- Medium
- HardDiscontinuous Spans
New in v2.2.0
Enable non-contiguous text spans with the allow_discontinuous parameter. This allows annotators to select multiple non-adjacent text segments as a single span annotation, useful for discontinuous entities or split expressions.
- annotation_type: span
name: entities
labels:
- PERSON
- ORGANIZATION
allow_discontinuous: trueWhen enabled, annotators can hold a modifier key while selecting additional text segments to add them to the current span. The output includes multiple start/end pairs for each segment.
Entity Linking Integration
New in v2.2.0
Span annotations can be linked to external knowledge bases (Wikidata, UMLS, or custom REST APIs) by adding an entity_linking configuration block to the span schema:
- annotation_type: span
name: entities
labels:
- PERSON
- ORGANIZATION
- LOCATION
entity_linking:
enabled: true
knowledge_bases:
- name: wikidata
type: wikidata
language: enWhen entity linking is enabled, a link icon appears on each span's control bar. Clicking it opens a search modal to find and link matching KB entities. See the Entity Linking documentation for full details.
Best Practices
- Use distinct colors for easy visual differentiation
- Provide clear tooltips with examples for each entity type
- Enable keyboard shortcuts for faster annotation
- Use word-level selection unless character precision is needed
- Consider pre-annotation for faster correction workflows
- Test overlapping settings based on your annotation guidelines
Further Reading
- Entity Linking - Link spans to knowledge bases
- Coreference Chains - Group coreferring mentions
- Event Annotation - N-ary event structures with span arguments
- Span Linking - Create relationships between spans
- Instance Display - Multi-field content display with span targets
- UI Configuration - Customize span colors
- Productivity Features - Keyboard shortcuts
For implementation details, see the source documentation.