Diese Seite ist in Ihrer Sprache noch nicht verfügbar. Englische Version wird angezeigt.

Span Annotation

A complete guide to span annotation, highlighting regions of text, overlapping and nested spans, label colors, BIO/IOB tagging, and building span tasks in Potato.

Span annotation means marking a region inside an item rather than labeling the whole item. The annotator highlights a stretch of text (or a segment of audio) and assigns it a label. It is the foundation for named entity recognition, error marking, extractive question answering, and audio event detection, they are all span tasks with different label sets.

A span is a labeled sub-sequence: a start position, an end position, and a category. In machine learning this is usually framed as sequence labeling, where every token gets a tag.

A basic span task

Define the labels and let annotators highlight. Colors and tooltips make the interface fast and self-explanatory:

yaml

annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight each named entity and choose its type."
    labels: [PERSON, ORGANIZATION, LOCATION, DATE, MISC]
    label_colors:
      PERSON: "#3b82f6"
      ORGANIZATION: "#10b981"
      LOCATION: "#f59e0b"
      DATE: "#8b5cf6"
      MISC: "#6b7280"
    sequential_key_binding: true
    allow_overlapping: false

The named entity recognition design is exactly this task, ready to run.

Overlapping and nested spans

By default a character belongs to at most one span. Some tasks need more:

Overlapping spans: two annotations cover some of the same text, e.g. a sentiment span over an entity span.
Nested spans: one span sits inside another, e.g. "[University of [Michigan]]" where the location is nested in the organization.

Set allow_overlapping: true when your guidelines call for it. Decide this early, because it changes how annotators think about boundaries.

BIO/IOB tagging, what your export looks like

Span annotations are usually exported for training as token tags in the BIO scheme (also called IOB): B- marks the first token of an entity, I- marks tokens inside it, and O marks tokens outside any entity.

text

Barack    B-PERSON
Obama     I-PERSON
visited   O
Paris     B-LOCATION

Potato can export spans to CoNLL and spaCy formats, which use this tagging directly. See Exporting Annotations for ML.

Getting boundaries right

The hardest part of span work is agreeing where a span starts and ends. A few rules that help:

Decide whether to include surrounding punctuation, titles ("Dr."), and trailing possessives, and write it down.
Measure agreement at the span level, not just the document level, so boundary disagreements show up. See Inter-Annotator Agreement.
Use tooltips to keep the boundary rule in front of the annotator.

Span tasks beyond NER

The same mechanism powers many tasks:

Error spans: mark mistakes in a translation or a model output, MQM-style. See Detecting Hallucinations.
Extractive QA: highlight the answer to a question in a passage.
Audio event detection: mark when a sound occurs on a waveform. See Audio Annotation.
Relations and coreference: link spans together. See Relation and Event Extraction and Coreference Resolution.