# Span Annotation

Source: https://www.potatoannotator.com/docs/guides/span-annotation

**Span annotation means marking a region *inside* an item rather than labeling the whole item. The annotator highlights a stretch of text (or a segment of audio) and assigns it a label.** It is the foundation for named entity recognition, error marking, extractive question answering, and audio event detection, they are all span tasks with different label sets.

A span is a labeled sub-sequence: a start position, an end position, and a category. In machine learning this is usually framed as [sequence labeling](https://en.wikipedia.org/wiki/Sequence_labeling), where every token gets a tag.

## A basic span task

Define the labels and let annotators highlight. Colors and tooltips make the interface fast and self-explanatory:

```yaml
annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight each named entity and choose its type."
    labels: [PERSON, ORGANIZATION, LOCATION, DATE, MISC]
    label_colors:
      PERSON: "#3b82f6"
      ORGANIZATION: "#10b981"
      LOCATION: "#f59e0b"
      DATE: "#8b5cf6"
      MISC: "#6b7280"
    sequential_key_binding: true
    allow_overlapping: false
```

The [named entity recognition](/showcase/named-entity-recognition) design is exactly this task, ready to run.

## Overlapping and nested spans

By default a character belongs to at most one span. Some tasks need more:

- **Overlapping spans**: two annotations cover some of the same text, e.g. a sentiment span over an entity span.
- **Nested spans**: one span sits inside another, e.g. "[University of [Michigan]]" where the location is nested in the organization.

Set `allow_overlapping: true` when your guidelines call for it. Decide this early, because it changes how annotators think about boundaries.

## BIO/IOB tagging, what your export looks like

Span annotations are usually exported for training as token tags in the **BIO** scheme (also called [IOB](https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging))): `B-` marks the first token of an entity, `I-` marks tokens inside it, and `O` marks tokens outside any entity.

```
Barack    B-PERSON
Obama     I-PERSON
visited   O
Paris     B-LOCATION
```

Potato can export spans to [CoNLL](https://en.wikipedia.org/wiki/CoNLL) and spaCy formats, which use this tagging directly. See [Exporting Annotations for ML](/docs/guides/exporting-annotations-for-ml).

## Getting boundaries right

The hardest part of span work is agreeing where a span starts and ends. A few rules that help:

- Decide whether to include surrounding punctuation, titles ("Dr."), and trailing possessives, and write it down.
- Measure agreement at the span level, not just the document level, so boundary disagreements show up. See [Inter-Annotator Agreement](/docs/guides/inter-annotator-agreement).
- Use tooltips to keep the boundary rule in front of the annotator.

## Span tasks beyond NER

The same mechanism powers many tasks:

- **Error spans**: mark mistakes in a translation or a model output, MQM-style. See [Detecting Hallucinations](/docs/guides/detecting-hallucinations).
- **Extractive QA**: highlight the answer to a question in a passage.
- **Audio event detection**: mark *when* a sound occurs on a waveform. See [Audio Annotation](/docs/guides/audio-annotation).
- **Relations and coreference**: link spans together. See [Relation and Event Extraction](/docs/guides/relation-and-event-extraction) and [Coreference Resolution](/docs/guides/coreference-resolution).

## Further reading

- [Named Entity Recognition](/docs/guides/named-entity-recognition)
- [Text Annotation](/docs/guides/text-annotation)
- [Annotation Schemes reference](/docs/core-concepts/annotation-schemes)
