Skip to content
Docs/Annotation Types

Coreference Chains

Group text spans that refer to the same entity for coreference resolution tasks.

Coreference Chains

Coreference annotation allows annotators to group text spans that refer to the same entity. This is essential for entity resolution, pronoun resolution, and discourse analysis.

Overview

A coreference chain is a collection of mentions (text spans) that all refer to the same real-world entity. For example:

"Marie Curie was a physicist. She won the Nobel Prize. The scientist changed her field forever."

The spans "Marie Curie", "She", "The scientist", and "her" all refer to the same person and form a single coreference chain.

Quick Start

Coreference annotation requires two schema components:

  1. A span schema for creating mentions
  2. A coreference schema for grouping mentions into chains
yaml
annotation_schemes:
  - annotation_type: span
    name: mentions
    description: Highlight all entity mentions
    labels:
      - name: MENTION
        tooltip: "Any reference to an entity"
    sequential_key_binding: true
 
  - annotation_type: coreference
    name: coref_chains
    description: Group mentions that refer to the same entity
    span_schema: mentions
    allow_singletons: true

Configuration Options

FieldTypeDefaultDescription
annotation_typestringRequiredMust be "coreference"
namestringRequiredUnique identifier for this schema
descriptionstringRequiredInstructions displayed to annotators
span_schemastringRequiredName of the span schema providing mentions
entity_typeslist[]List of entity type categories
allow_singletonsbooleantrueAllow chains with only one mention
visual_display.highlight_modestring"background"Visual style: "background", "bracket", or "underline"

Examples

With Entity Types

Classify chains by entity type:

yaml
annotation_schemes:
  - annotation_type: span
    name: ner
    description: Mark named entities
    labels:
      - name: ENTITY
        tooltip: "Any named entity mention"
 
  - annotation_type: coreference
    name: coref
    description: Create coreference chains
    span_schema: ner
    entity_types:
      - name: PERSON
        color: "#6E56CF"
      - name: ORGANIZATION
        color: "#22C55E"
      - name: LOCATION
        color: "#3B82F6"
      - name: OTHER
        color: "#F59E0B"

Without Singletons

For tasks where every mention must link to at least one other mention:

yaml
annotation_schemes:
  - annotation_type: span
    name: mentions
    description: Highlight co-referring mentions
    labels:
      - name: MENTION
 
  - annotation_type: coreference
    name: strict_coref
    description: All mentions must be part of a chain with at least 2 mentions
    span_schema: mentions
    allow_singletons: false

Custom Visual Display

yaml
annotation_schemes:
  - annotation_type: coreference
    name: coref
    description: Link coreference chains
    span_schema: mentions
    visual_display:
      highlight_mode: "underline"  # Options: background, bracket, underline

User Interface

Creating Chains

  1. Create mentions: Use the span annotation tool to highlight all entity mentions
  2. Select mentions: Click on the highlighted spans you want to chain together
  3. Create chain: Click "New Chain" to group the selected mentions

Managing Chains

  • Add to Chain: Select additional mentions and click "Add to Chain"
  • Merge Chains: Select multiple chains and click "Merge Chains" to combine them
  • Remove Mention: Select a mention and click "Remove Mention" to remove it from its chain

Color Coding

Each chain is automatically assigned a distinct color. Mentions in the same chain share the same color, making it easy to visually identify chain membership.

Output Format

Coreference annotations are saved as span links:

json
{
  "span_links": [
    {
      "schema": "coref_chains",
      "link_type": "coreference",
      "span_ids": ["mentions_0_5_MENTION", "mentions_34_37_MENTION", "mentions_72_85_MENTION"],
      "entity_type": "PERSON"
    },
    {
      "schema": "coref_chains",
      "link_type": "coreference",
      "span_ids": ["mentions_15_23_MENTION", "mentions_95_97_MENTION"],
      "entity_type": "ORGANIZATION"
    }
  ]
}
  1. First pass - Read through the text and highlight all entity mentions
  2. Second pass - Group mentions into coreference chains
  3. Review - Check that all mentions are correctly assigned and no chains are missing

Best Practices

  1. Define clear mention boundaries - establish guidelines for what counts as a mention
  2. Handle nested mentions - decide how to handle cases like "the CEO of Microsoft"
  3. Consider generic references - determine whether generic references should be included
  4. Train annotators - coreference is complex; provide examples and practice rounds
  5. Use entity types sparingly - too many can slow annotation without improving data quality

Further Reading

For implementation details, see the source documentation.