Docs/Annotation Types

Span Annotation

Highlight and label text spans for named entity recognition and more.

Span Annotation

Span annotation allows annotators to select and label portions of text, commonly used for named entity recognition (NER), part-of-speech tagging, and text highlighting tasks.

Basic Configuration

annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight named entities in the text"
    labels:
      - PERSON
      - ORGANIZATION
      - LOCATION

Configuration Options

Entity Labels

Define the types of spans annotators can create:

labels:
  - PERSON
  - ORGANIZATION
  - LOCATION
  - DATE
  - EVENT

Label Colors

Customize colors for visual distinction:

label_colors:
  PERSON: "#3b82f6"
  ORGANIZATION: "#10b981"
  LOCATION: "#f59e0b"
  DATE: "#8b5cf6"
  EVENT: "#ec4899"

Colors can be hex (#ff0000) or RGB (rgb(255, 0, 0)).

Keyboard Shortcuts

Speed up annotation with keyboard bindings:

keyboard_shortcuts:
  PERSON: "1"
  ORGANIZATION: "2"
  LOCATION: "3"
  DATE: "4"

Tooltips

Provide guidance for each label:

tooltips:
  PERSON: "Names of people, characters, or personas"
  ORGANIZATION: "Companies, agencies, institutions"
  LOCATION: "Physical locations, addresses, geographic regions"

Overlapping Spans

Allow Overlapping

Enable spans that can overlap:

- annotation_type: span
  name: entities
  labels:
    - PERSON
    - ROLE
  allow_overlapping: true

This is useful when the same text can have multiple labels (e.g., "Dr. Smith" is both a PERSON and has a ROLE).

Disable Overlapping (Default)

- annotation_type: span
  name: entities
  labels:
    - PERSON
    - ORGANIZATION
  allow_overlapping: false  # Default behavior

Span Selection Modes

Word-Level Selection

Select complete words only:

- annotation_type: span
  name: entities
  selection_mode: word
  labels:
    - ENTITY

Character-Level Selection

Allow selection of partial words:

- annotation_type: span
  name: entities
  selection_mode: character
  labels:
    - ENTITY

Pre-Annotated Spans

Load existing annotations for review or correction:

{
  "id": "doc1",
  "text": "John Smith works at Microsoft in Seattle.",
  "spans": [
    {"start": 0, "end": 10, "label": "PERSON"},
    {"start": 20, "end": 29, "label": "ORGANIZATION"},
    {"start": 33, "end": 40, "label": "LOCATION"}
  ]
}

Configure to load pre-annotations:

- annotation_type: span
  name: entities
  load_pre_annotations: true
  pre_annotation_field: spans

Common NER Configurations

Standard NER (4 Types)

- annotation_type: span
  name: ner
  description: "Label named entities"
  labels:
    - PER    # Person
    - ORG    # Organization
    - LOC    # Location
    - MISC   # Miscellaneous
  label_colors:
    PER: "#3b82f6"
    ORG: "#10b981"
    LOC: "#f59e0b"
    MISC: "#6b7280"
  keyboard_shortcuts:
    PER: "1"
    ORG: "2"
    LOC: "3"
    MISC: "4"

Extended NER (OntoNotes Style)

- annotation_type: span
  name: ner_extended
  labels:
    - PERSON
    - NORP        # Nationalities, religious/political groups
    - FAC         # Facilities
    - ORG
    - GPE         # Geopolitical entities
    - LOC
    - PRODUCT
    - EVENT
    - WORK_OF_ART
    - LAW
    - LANGUAGE
    - DATE
    - TIME
    - PERCENT
    - MONEY
    - QUANTITY
    - ORDINAL
    - CARDINAL

Biomedical NER

- annotation_type: span
  name: bio_ner
  labels:
    - GENE
    - PROTEIN
    - DISEASE
    - DRUG
    - SPECIES
  label_colors:
    GENE: "#22c55e"
    PROTEIN: "#3b82f6"
    DISEASE: "#ef4444"
    DRUG: "#f59e0b"
    SPECIES: "#8b5cf6"

Social Media NER

- annotation_type: span
  name: social_ner
  labels:
    - PERSON
    - ORGANIZATION
    - LOCATION
    - PRODUCT
    - CREATIVE_WORK
    - GROUP

Span with Attributes

Add attributes to spans for richer annotation:

annotation_schemes:
  - annotation_type: span
    name: entities
    labels:
      - PERSON
      - ORGANIZATION
 
  - annotation_type: radio
    name: entity_type
    description: "What type of entity is this?"
    show_for_span: entities
    labels:
      - Named
      - Nominal
      - Pronominal

Multiple Span Schemes

Annotate different aspects separately:

annotation_schemes:
  # Named entities
  - annotation_type: span
    name: entities
    description: "Label named entities"
    labels:
      - PERSON
      - ORGANIZATION
      - LOCATION
 
  # Sentiment expressions
  - annotation_type: span
    name: sentiment_spans
    description: "Highlight sentiment expressions"
    labels:
      - POSITIVE
      - NEGATIVE
    label_colors:
      POSITIVE: "#22c55e"
      NEGATIVE: "#ef4444"

Display Options

Show Label in Span

Display the label text within highlighted spans:

- annotation_type: span
  name: entities
  show_label_in_span: true

Underline Style

Use underlines instead of background highlighting:

- annotation_type: span
  name: entities
  display_style: underline

Output Format

Span annotations are saved with character offsets:

{
  "id": "doc1",
  "entities": [
    {
      "start": 0,
      "end": 10,
      "text": "John Smith",
      "label": "PERSON"
    },
    {
      "start": 20,
      "end": 29,
      "text": "Microsoft",
      "label": "ORGANIZATION"
    }
  ]
}

Full Example: NER Task

task_name: "Named Entity Recognition"
 
data_files:
  - path: data/documents.json
    text_field: text
 
annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight and label all named entities"
    labels:
      - PERSON
      - ORGANIZATION
      - LOCATION
      - DATE
      - MONEY
    label_colors:
      PERSON: "#3b82f6"
      ORGANIZATION: "#10b981"
      LOCATION: "#f59e0b"
      DATE: "#8b5cf6"
      MONEY: "#ec4899"
    keyboard_shortcuts:
      PERSON: "1"
      ORGANIZATION: "2"
      LOCATION: "3"
      DATE: "4"
      MONEY: "5"
    tooltips:
      PERSON: "Names of people"
      ORGANIZATION: "Companies, agencies, institutions"
      LOCATION: "Cities, countries, addresses"
      DATE: "Dates and time expressions"
      MONEY: "Monetary values"
    allow_overlapping: false
    selection_mode: word
 
  - annotation_type: radio
    name: difficulty
    description: "How difficult was this document to annotate?"
    labels:
      - Easy
      - Medium
      - Hard

Best Practices

  1. Use distinct colors for easy visual differentiation
  2. Provide clear tooltips with examples for each entity type
  3. Enable keyboard shortcuts for faster annotation
  4. Use word-level selection unless character precision is needed
  5. Consider pre-annotation for faster correction workflows
  6. Test overlapping settings based on your annotation guidelines