Span Annotation
Highlight and label text spans for named entity recognition and more.
Span Annotation
Span annotation allows annotators to select and label portions of text, commonly used for named entity recognition (NER), part-of-speech tagging, and text highlighting tasks.
Basic Configuration
annotation_schemes:
- annotation_type: span
name: entities
description: "Highlight named entities in the text"
labels:
- PERSON
- ORGANIZATION
- LOCATIONConfiguration Options
Entity Labels
Define the types of spans annotators can create:
labels:
- PERSON
- ORGANIZATION
- LOCATION
- DATE
- EVENTLabel Colors
Customize colors for visual distinction:
label_colors:
PERSON: "#3b82f6"
ORGANIZATION: "#10b981"
LOCATION: "#f59e0b"
DATE: "#8b5cf6"
EVENT: "#ec4899"Colors can be hex (#ff0000) or RGB (rgb(255, 0, 0)).
Keyboard Shortcuts
Speed up annotation with keyboard bindings:
keyboard_shortcuts:
PERSON: "1"
ORGANIZATION: "2"
LOCATION: "3"
DATE: "4"Tooltips
Provide guidance for each label:
tooltips:
PERSON: "Names of people, characters, or personas"
ORGANIZATION: "Companies, agencies, institutions"
LOCATION: "Physical locations, addresses, geographic regions"Overlapping Spans
Allow Overlapping
Enable spans that can overlap:
- annotation_type: span
name: entities
labels:
- PERSON
- ROLE
allow_overlapping: trueThis is useful when the same text can have multiple labels (e.g., "Dr. Smith" is both a PERSON and has a ROLE).
Disable Overlapping (Default)
- annotation_type: span
name: entities
labels:
- PERSON
- ORGANIZATION
allow_overlapping: false # Default behaviorSpan Selection Modes
Word-Level Selection
Select complete words only:
- annotation_type: span
name: entities
selection_mode: word
labels:
- ENTITYCharacter-Level Selection
Allow selection of partial words:
- annotation_type: span
name: entities
selection_mode: character
labels:
- ENTITYPre-Annotated Spans
Load existing annotations for review or correction:
{
"id": "doc1",
"text": "John Smith works at Microsoft in Seattle.",
"spans": [
{"start": 0, "end": 10, "label": "PERSON"},
{"start": 20, "end": 29, "label": "ORGANIZATION"},
{"start": 33, "end": 40, "label": "LOCATION"}
]
}Configure to load pre-annotations:
- annotation_type: span
name: entities
load_pre_annotations: true
pre_annotation_field: spansCommon NER Configurations
Standard NER (4 Types)
- annotation_type: span
name: ner
description: "Label named entities"
labels:
- PER # Person
- ORG # Organization
- LOC # Location
- MISC # Miscellaneous
label_colors:
PER: "#3b82f6"
ORG: "#10b981"
LOC: "#f59e0b"
MISC: "#6b7280"
keyboard_shortcuts:
PER: "1"
ORG: "2"
LOC: "3"
MISC: "4"Extended NER (OntoNotes Style)
- annotation_type: span
name: ner_extended
labels:
- PERSON
- NORP # Nationalities, religious/political groups
- FAC # Facilities
- ORG
- GPE # Geopolitical entities
- LOC
- PRODUCT
- EVENT
- WORK_OF_ART
- LAW
- LANGUAGE
- DATE
- TIME
- PERCENT
- MONEY
- QUANTITY
- ORDINAL
- CARDINALBiomedical NER
- annotation_type: span
name: bio_ner
labels:
- GENE
- PROTEIN
- DISEASE
- DRUG
- SPECIES
label_colors:
GENE: "#22c55e"
PROTEIN: "#3b82f6"
DISEASE: "#ef4444"
DRUG: "#f59e0b"
SPECIES: "#8b5cf6"Social Media NER
- annotation_type: span
name: social_ner
labels:
- PERSON
- ORGANIZATION
- LOCATION
- PRODUCT
- CREATIVE_WORK
- GROUPSpan with Attributes
Add attributes to spans for richer annotation:
annotation_schemes:
- annotation_type: span
name: entities
labels:
- PERSON
- ORGANIZATION
- annotation_type: radio
name: entity_type
description: "What type of entity is this?"
show_for_span: entities
labels:
- Named
- Nominal
- PronominalMultiple Span Schemes
Annotate different aspects separately:
annotation_schemes:
# Named entities
- annotation_type: span
name: entities
description: "Label named entities"
labels:
- PERSON
- ORGANIZATION
- LOCATION
# Sentiment expressions
- annotation_type: span
name: sentiment_spans
description: "Highlight sentiment expressions"
labels:
- POSITIVE
- NEGATIVE
label_colors:
POSITIVE: "#22c55e"
NEGATIVE: "#ef4444"Display Options
Show Label in Span
Display the label text within highlighted spans:
- annotation_type: span
name: entities
show_label_in_span: trueUnderline Style
Use underlines instead of background highlighting:
- annotation_type: span
name: entities
display_style: underlineOutput Format
Span annotations are saved with character offsets:
{
"id": "doc1",
"entities": [
{
"start": 0,
"end": 10,
"text": "John Smith",
"label": "PERSON"
},
{
"start": 20,
"end": 29,
"text": "Microsoft",
"label": "ORGANIZATION"
}
]
}Full Example: NER Task
task_name: "Named Entity Recognition"
data_files:
- path: data/documents.json
text_field: text
annotation_schemes:
- annotation_type: span
name: entities
description: "Highlight and label all named entities"
labels:
- PERSON
- ORGANIZATION
- LOCATION
- DATE
- MONEY
label_colors:
PERSON: "#3b82f6"
ORGANIZATION: "#10b981"
LOCATION: "#f59e0b"
DATE: "#8b5cf6"
MONEY: "#ec4899"
keyboard_shortcuts:
PERSON: "1"
ORGANIZATION: "2"
LOCATION: "3"
DATE: "4"
MONEY: "5"
tooltips:
PERSON: "Names of people"
ORGANIZATION: "Companies, agencies, institutions"
LOCATION: "Cities, countries, addresses"
DATE: "Dates and time expressions"
MONEY: "Monetary values"
allow_overlapping: false
selection_mode: word
- annotation_type: radio
name: difficulty
description: "How difficult was this document to annotate?"
labels:
- Easy
- Medium
- HardBest Practices
- Use distinct colors for easy visual differentiation
- Provide clear tooltips with examples for each entity type
- Enable keyboard shortcuts for faster annotation
- Use word-level selection unless character precision is needed
- Consider pre-annotation for faster correction workflows
- Test overlapping settings based on your annotation guidelines