Skip to content

Span Annotation

Highlight and label text spans for named entity recognition and more.

Span Annotation

Span annotation allows annotators to select and label portions of text, commonly used for named entity recognition (NER), part-of-speech tagging, and text highlighting tasks.

Basic Configuration

yaml
annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight named entities in the text"
    labels:
      - PERSON
      - ORGANIZATION
      - LOCATION

Configuration Options

Entity Labels

Define the types of spans annotators can create:

yaml
labels:
  - PERSON
  - ORGANIZATION
  - LOCATION
  - DATE
  - EVENT

Label Colors

Customize colors for visual distinction:

yaml
label_colors:
  PERSON: "#3b82f6"
  ORGANIZATION: "#10b981"
  LOCATION: "#f59e0b"
  DATE: "#8b5cf6"
  EVENT: "#ec4899"

Colors can be hex (#ff0000) or RGB (rgb(255, 0, 0)).

Keyboard Shortcuts

Speed up annotation with keyboard bindings:

yaml
keyboard_shortcuts:
  PERSON: "1"
  ORGANIZATION: "2"
  LOCATION: "3"
  DATE: "4"

Tooltips

Provide guidance for each label:

yaml
tooltips:
  PERSON: "Names of people, characters, or personas"
  ORGANIZATION: "Companies, agencies, institutions"
  LOCATION: "Physical locations, addresses, geographic regions"

Overlapping Spans

Allow Overlapping

Enable spans that can overlap:

yaml
- annotation_type: span
  name: entities
  labels:
    - PERSON
    - ROLE
  allow_overlapping: true

This is useful when the same text can have multiple labels (e.g., "Dr. Smith" is both a PERSON and has a ROLE).

Disable Overlapping (Default)

yaml
- annotation_type: span
  name: entities
  labels:
    - PERSON
    - ORGANIZATION
  allow_overlapping: false  # Default behavior

Span Selection Modes

Word-Level Selection

Select complete words only:

yaml
- annotation_type: span
  name: entities
  selection_mode: word
  labels:
    - ENTITY

Character-Level Selection

Allow selection of partial words:

yaml
- annotation_type: span
  name: entities
  selection_mode: character
  labels:
    - ENTITY

Pre-Annotated Spans

Load existing annotations for review or correction:

json
{
  "id": "doc1",
  "text": "John Smith works at Microsoft in Seattle.",
  "spans": [
    {"start": 0, "end": 10, "label": "PERSON"},
    {"start": 20, "end": 29, "label": "ORGANIZATION"},
    {"start": 33, "end": 40, "label": "LOCATION"}
  ]
}

Configure to load pre-annotations:

yaml
- annotation_type: span
  name: entities
  load_pre_annotations: true
  pre_annotation_field: spans

Common NER Configurations

Standard NER (4 Types)

yaml
- annotation_type: span
  name: ner
  description: "Label named entities"
  labels:
    - PER    # Person
    - ORG    # Organization
    - LOC    # Location
    - MISC   # Miscellaneous
  label_colors:
    PER: "#3b82f6"
    ORG: "#10b981"
    LOC: "#f59e0b"
    MISC: "#6b7280"
  keyboard_shortcuts:
    PER: "1"
    ORG: "2"
    LOC: "3"
    MISC: "4"

Extended NER (OntoNotes Style)

yaml
- annotation_type: span
  name: ner_extended
  labels:
    - PERSON
    - NORP        # Nationalities, religious/political groups
    - FAC         # Facilities
    - ORG
    - GPE         # Geopolitical entities
    - LOC
    - PRODUCT
    - EVENT
    - WORK_OF_ART
    - LAW
    - LANGUAGE
    - DATE
    - TIME
    - PERCENT
    - MONEY
    - QUANTITY
    - ORDINAL
    - CARDINAL

Biomedical NER

yaml
- annotation_type: span
  name: bio_ner
  labels:
    - GENE
    - PROTEIN
    - DISEASE
    - DRUG
    - SPECIES
  label_colors:
    GENE: "#22c55e"
    PROTEIN: "#3b82f6"
    DISEASE: "#ef4444"
    DRUG: "#f59e0b"
    SPECIES: "#8b5cf6"

Social Media NER

yaml
- annotation_type: span
  name: social_ner
  labels:
    - PERSON
    - ORGANIZATION
    - LOCATION
    - PRODUCT
    - CREATIVE_WORK
    - GROUP

Span with Attributes

Add attributes to spans for richer annotation:

yaml
annotation_schemes:
  - annotation_type: span
    name: entities
    labels:
      - PERSON
      - ORGANIZATION
 
  - annotation_type: radio
    name: entity_type
    description: "What type of entity is this?"
    show_for_span: entities
    labels:
      - Named
      - Nominal
      - Pronominal

Multiple Span Schemes

Annotate different aspects separately:

yaml
annotation_schemes:
  # Named entities
  - annotation_type: span
    name: entities
    description: "Label named entities"
    labels:
      - PERSON
      - ORGANIZATION
      - LOCATION
 
  # Sentiment expressions
  - annotation_type: span
    name: sentiment_spans
    description: "Highlight sentiment expressions"
    labels:
      - POSITIVE
      - NEGATIVE
    label_colors:
      POSITIVE: "#22c55e"
      NEGATIVE: "#ef4444"

Multi-Field Span Annotation

New in v2.1.0

Span annotation can target specific text fields in multi-field data using the target_field option. This is useful when your data contains multiple text fields and you want to annotate spans in a particular one.

Configuration

yaml
annotation_schemes:
  - annotation_type: span
    name: source_entities
    description: "Label entities in the source text"
    target_field: "source_text"
    labels:
      - PERSON
      - ORGANIZATION
 
  - annotation_type: span
    name: summary_entities
    description: "Label entities in the summary"
    target_field: "summary"
    labels:
      - PERSON
      - ORGANIZATION

Multi-Field Data Format

Your data should include the separate text fields:

json
{
  "id": "doc1",
  "source_text": "John Smith works at Microsoft in Seattle.",
  "summary": "Smith is employed by Microsoft."
}

Output Format

When using target_field, annotations are keyed by field:

json
{
  "id": "doc1",
  "source_entities": {
    "source_text": [
      {"start": 0, "end": 10, "text": "John Smith", "label": "PERSON"},
      {"start": 20, "end": 29, "text": "Microsoft", "label": "ORGANIZATION"}
    ]
  },
  "summary_entities": {
    "summary": [
      {"start": 0, "end": 5, "text": "Smith", "label": "PERSON"},
      {"start": 22, "end": 31, "text": "Microsoft", "label": "ORGANIZATION"}
    ]
  }
}

For a complete working example, see project-hub/simple_examples/simple-multi-span/ in the Potato repository.

Display Options

Show Label in Span

Display the label text within highlighted spans:

yaml
- annotation_type: span
  name: entities
  show_label_in_span: true

Underline Style

Use underlines instead of background highlighting:

yaml
- annotation_type: span
  name: entities
  display_style: underline

Output Format

Span annotations are saved with character offsets:

json
{
  "id": "doc1",
  "entities": [
    {
      "start": 0,
      "end": 10,
      "text": "John Smith",
      "label": "PERSON"
    },
    {
      "start": 20,
      "end": 29,
      "text": "Microsoft",
      "label": "ORGANIZATION"
    }
  ]
}

Full Example: NER Task

yaml
task_name: "Named Entity Recognition"
 
data_files:
  - path: data/documents.json
    text_field: text
 
annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight and label all named entities"
    labels:
      - PERSON
      - ORGANIZATION
      - LOCATION
      - DATE
      - MONEY
    label_colors:
      PERSON: "#3b82f6"
      ORGANIZATION: "#10b981"
      LOCATION: "#f59e0b"
      DATE: "#8b5cf6"
      MONEY: "#ec4899"
    keyboard_shortcuts:
      PERSON: "1"
      ORGANIZATION: "2"
      LOCATION: "3"
      DATE: "4"
      MONEY: "5"
    tooltips:
      PERSON: "Names of people"
      ORGANIZATION: "Companies, agencies, institutions"
      LOCATION: "Cities, countries, addresses"
      DATE: "Dates and time expressions"
      MONEY: "Monetary values"
    allow_overlapping: false
    selection_mode: word
 
  - annotation_type: radio
    name: difficulty
    description: "How difficult was this document to annotate?"
    labels:
      - Easy
      - Medium
      - Hard

Discontinuous Spans

New in v2.2.0

Enable non-contiguous text spans with the allow_discontinuous parameter. This allows annotators to select multiple non-adjacent text segments as a single span annotation, useful for discontinuous entities or split expressions.

yaml
- annotation_type: span
  name: entities
  labels:
    - PERSON
    - ORGANIZATION
  allow_discontinuous: true

When enabled, annotators can hold a modifier key while selecting additional text segments to add them to the current span. The output includes multiple start/end pairs for each segment.

Entity Linking Integration

New in v2.2.0

Span annotations can be linked to external knowledge bases (Wikidata, UMLS, or custom REST APIs) by adding an entity_linking configuration block to the span schema:

yaml
- annotation_type: span
  name: entities
  labels:
    - PERSON
    - ORGANIZATION
    - LOCATION
  entity_linking:
    enabled: true
    knowledge_bases:
      - name: wikidata
        type: wikidata
        language: en

When entity linking is enabled, a link icon appears on each span's control bar. Clicking it opens a search modal to find and link matching KB entities. See the Entity Linking documentation for full details.

Best Practices

  1. Use distinct colors for easy visual differentiation
  2. Provide clear tooltips with examples for each entity type
  3. Enable keyboard shortcuts for faster annotation
  4. Use word-level selection unless character precision is needed
  5. Consider pre-annotation for faster correction workflows
  6. Test overlapping settings based on your annotation guidelines

Further Reading

For implementation details, see the source documentation.