# Span Annotation

Source: https://www.potatoannotator.com/docs/annotation-types/span-annotation

Span annotation allows annotators to select and label portions of text, commonly used for named entity recognition (NER), part-of-speech tagging, and text highlighting tasks.

![Span annotation interface with entity highlighting](/images/docs/span-annotation.png "Span annotation with color-coded entity labels in Potato")

## Basic Configuration

```yaml
annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight named entities in the text"
    labels:
      - PERSON
      - ORGANIZATION
      - LOCATION
```

## Configuration Options

### Entity Labels

Define the types of spans annotators can create:

```yaml
labels:
  - PERSON
  - ORGANIZATION
  - LOCATION
  - DATE
  - EVENT
```

### Colors

Customize colors for visual distinction:

```yaml
colors:
  PERSON: "#3b82f6"
  ORGANIZATION: "#10b981"
  LOCATION: "#f59e0b"
  DATE: "#8b5cf6"
  EVENT: "#ec4899"
```

Colors can be hex (`#ff0000`) or RGB (`rgb(255, 0, 0)`).

### Keyboard Shortcuts

Speed up annotation with keyboard bindings:

```yaml
keyboard_shortcuts:
  PERSON: "1"
  ORGANIZATION: "2"
  LOCATION: "3"
  DATE: "4"
```

### Tooltips

Provide guidance for each label:

```yaml
tooltips:
  PERSON: "Names of people, characters, or personas"
  ORGANIZATION: "Companies, agencies, institutions"
  LOCATION: "Physical locations, addresses, geographic regions"
```

## Overlapping Spans

### Allow Overlapping

Enable spans that can overlap:

```yaml
- annotation_type: span
  name: entities
  labels:
    - PERSON
    - ROLE
  allow_overlapping: true
```

This is useful when the same text can have multiple labels (e.g., "Dr. Smith" is both a PERSON and has a ROLE).

### Disable Overlapping (Default)

```yaml
- annotation_type: span
  name: entities
  labels:
    - PERSON
    - ORGANIZATION
  allow_overlapping: false  # Default behavior
```

## Span Selection Modes

### Word-Level Selection

Select complete words only:

```yaml
- annotation_type: span
  name: entities
  selection_mode: word
  labels:
    - ENTITY
```

### Character-Level Selection

Allow selection of partial words:

```yaml
- annotation_type: span
  name: entities
  selection_mode: character
  labels:
    - ENTITY
```

## Pre-Annotated Spans

Load existing annotations for review or correction:

```json
{
  "id": "doc1",
  "text": "John Smith works at Microsoft in Seattle.",
  "spans": [
    {"start": 0, "end": 10, "label": "PERSON"},
    {"start": 20, "end": 29, "label": "ORGANIZATION"},
    {"start": 33, "end": 40, "label": "LOCATION"}
  ]
}
```

Configure to load pre-annotations:

```yaml
- annotation_type: span
  name: entities
  load_pre_annotations: true
  pre_annotation_field: spans
```

## Common NER Configurations

### Standard NER (4 Types)

```yaml
- annotation_type: span
  name: ner
  description: "Label named entities"
  labels:
    - PER    # Person
    - ORG    # Organization
    - LOC    # Location
    - MISC   # Miscellaneous
  colors:
    PER: "#3b82f6"
    ORG: "#10b981"
    LOC: "#f59e0b"
    MISC: "#6b7280"
  keyboard_shortcuts:
    PER: "1"
    ORG: "2"
    LOC: "3"
    MISC: "4"
```

### Extended NER (OntoNotes Style)

```yaml
- annotation_type: span
  name: ner_extended
  labels:
    - PERSON
    - NORP        # Nationalities, religious/political groups
    - FAC         # Facilities
    - ORG
    - GPE         # Geopolitical entities
    - LOC
    - PRODUCT
    - EVENT
    - WORK_OF_ART
    - LAW
    - LANGUAGE
    - DATE
    - TIME
    - PERCENT
    - MONEY
    - QUANTITY
    - ORDINAL
    - CARDINAL
```

### Biomedical NER

```yaml
- annotation_type: span
  name: bio_ner
  labels:
    - GENE
    - PROTEIN
    - DISEASE
    - DRUG
    - SPECIES
  colors:
    GENE: "#22c55e"
    PROTEIN: "#3b82f6"
    DISEASE: "#ef4444"
    DRUG: "#f59e0b"
    SPECIES: "#8b5cf6"
```

### Social Media NER

```yaml
- annotation_type: span
  name: social_ner
  labels:
    - PERSON
    - ORGANIZATION
    - LOCATION
    - PRODUCT
    - CREATIVE_WORK
    - GROUP
```

## Span with Attributes

Add attributes to spans for richer annotation:

```yaml
annotation_schemes:
  - annotation_type: span
    name: entities
    labels:
      - PERSON
      - ORGANIZATION

  - annotation_type: radio
    name: entity_type
    description: "What type of entity is this?"
    show_for_span: entities
    labels:
      - Named
      - Nominal
      - Pronominal
```

## Multiple Span Schemes

Annotate different aspects separately:

```yaml
annotation_schemes:
  # Named entities
  - annotation_type: span
    name: entities
    description: "Label named entities"
    labels:
      - PERSON
      - ORGANIZATION
      - LOCATION

  # Sentiment expressions
  - annotation_type: span
    name: sentiment_spans
    description: "Highlight sentiment expressions"
    labels:
      - POSITIVE
      - NEGATIVE
    colors:
      POSITIVE: "#22c55e"
      NEGATIVE: "#ef4444"
```

## Multi-Field Span Annotation

*New in v2.1.0*

Span annotation can target specific text fields in multi-field data using the `target_field` option. This is useful when your data contains multiple text fields and you want to annotate spans in a particular one.

### Configuration

```yaml
annotation_schemes:
  - annotation_type: span
    name: source_entities
    description: "Label entities in the source text"
    target_field: "source_text"
    labels:
      - PERSON
      - ORGANIZATION

  - annotation_type: span
    name: summary_entities
    description: "Label entities in the summary"
    target_field: "summary"
    labels:
      - PERSON
      - ORGANIZATION
```

### Multi-Field Data Format

Your data should include the separate text fields:

```json
{
  "id": "doc1",
  "source_text": "John Smith works at Microsoft in Seattle.",
  "summary": "Smith is employed by Microsoft."
}
```

### Output Format

When using `target_field`, annotations are keyed by field:

```json
{
  "id": "doc1",
  "source_entities": {
    "source_text": [
      {"start": 0, "end": 10, "text": "John Smith", "label": "PERSON"},
      {"start": 20, "end": 29, "text": "Microsoft", "label": "ORGANIZATION"}
    ]
  },
  "summary_entities": {
    "summary": [
      {"start": 0, "end": 5, "text": "Smith", "label": "PERSON"},
      {"start": 22, "end": 31, "text": "Microsoft", "label": "ORGANIZATION"}
    ]
  }
}
```

For a complete working example, see `project-hub/simple_examples/simple-multi-span/` in the Potato repository.

## Display Options

### Show Label in Span

Display the label text within highlighted spans:

```yaml
- annotation_type: span
  name: entities
  show_label_in_span: true
```

### Underline Style

Use underlines instead of background highlighting:

```yaml
- annotation_type: span
  name: entities
  display_style: underline
```

## Output Format

Span annotations are saved with character offsets:

```json
{
  "id": "doc1",
  "entities": [
    {
      "start": 0,
      "end": 10,
      "text": "John Smith",
      "label": "PERSON"
    },
    {
      "start": 20,
      "end": 29,
      "text": "Microsoft",
      "label": "ORGANIZATION"
    }
  ]
}
```

## Full Example: NER Task

```yaml
annotation_task_name: "Named Entity Recognition"

data_files:
  - path: data/documents.json
    text_field: text

annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight and label all named entities"
    labels:
      - PERSON
      - ORGANIZATION
      - LOCATION
      - DATE
      - MONEY
    colors:
      PERSON: "#3b82f6"
      ORGANIZATION: "#10b981"
      LOCATION: "#f59e0b"
      DATE: "#8b5cf6"
      MONEY: "#ec4899"
    keyboard_shortcuts:
      PERSON: "1"
      ORGANIZATION: "2"
      LOCATION: "3"
      DATE: "4"
      MONEY: "5"
    tooltips:
      PERSON: "Names of people"
      ORGANIZATION: "Companies, agencies, institutions"
      LOCATION: "Cities, countries, addresses"
      DATE: "Dates and time expressions"
      MONEY: "Monetary values"
    allow_overlapping: false
    selection_mode: word

  - annotation_type: radio
    name: difficulty
    description: "How difficult was this document to annotate?"
    labels:
      - Easy
      - Medium
      - Hard
```

## Discontinuous Spans

*New in v2.2.0*

Enable non-contiguous text spans with the `allow_discontinuous` parameter. This allows annotators to select multiple non-adjacent text segments as a single span annotation, useful for discontinuous entities or split expressions.

```yaml
- annotation_type: span
  name: entities
  labels:
    - PERSON
    - ORGANIZATION
  allow_discontinuous: true
```

When enabled, annotators can hold a modifier key while selecting additional text segments to add them to the current span. The output includes multiple start/end pairs for each segment.

## Entity Linking Integration

*New in v2.2.0*

Span annotations can be linked to external knowledge bases (Wikidata, UMLS, or custom REST APIs) by adding an `entity_linking` configuration block to the span schema:

```yaml
- annotation_type: span
  name: entities
  labels:
    - PERSON
    - ORGANIZATION
    - LOCATION
  entity_linking:
    enabled: true
    knowledge_bases:
      - name: wikidata
        type: wikidata
        language: en
```

When entity linking is enabled, a link icon appears on each span's control bar. Clicking it opens a search modal to find and link matching KB entities. See the [Entity Linking](/docs/annotation-types/entity-linking) documentation for full details.

## Best Practices

1. **Use distinct colors** for easy visual differentiation
2. **Provide clear tooltips** with examples for each entity type
3. **Enable keyboard shortcuts** for faster annotation
4. **Use word-level selection** unless character precision is needed
5. **Consider pre-annotation** for faster correction workflows
6. **Test overlapping settings** based on your annotation guidelines

## Further Reading

- [Entity Linking](/docs/annotation-types/entity-linking) - Link spans to knowledge bases
- [Coreference Chains](/docs/annotation-types/coreference) - Group coreferring mentions
- [Event Annotation](/docs/annotation-types/event-annotation) - N-ary event structures with span arguments
- [Span Linking](/docs/annotation-types/span-linking) - Create relationships between spans
- [Instance Display](/docs/core-concepts/instance-display) - Multi-field content display with span targets
- [UI Configuration](/docs/core-concepts/ui-configuration) - Customize span colors
- [Productivity Features](/docs/features/productivity) - Keyboard shortcuts

For implementation details, see the [source documentation](https://github.com/davidjurgens/potato/blob/main/docs/span_annotation.md).
