Skip to content
Showcase/Multilingual Coreference Resolution (CorefUD)
advancedtext

Multilingual Coreference Resolution (CorefUD)

Multilingual coreference resolution across 17 languages using Universal Dependencies-style annotations. Annotators identify entity mentions (names, nominals, pronouns) and link them into coreference chains. Based on the CorefUD dataset and the CRAC 2023 Shared Task on Multilingual Coreference Resolution.

PERORGLOCPERORGLOCDATESelect text to annotate

Configuration Fileconfig.yaml

# Multilingual Coreference Resolution (CorefUD)
# Based on Zabokrtsky et al., CRAC@EMNLP 2023
# Paper: https://aclanthology.org/2023.crac-sharedtask.1/
# Dataset: https://ufal.mff.cuni.cz/corefud
#
# CorefUD provides multilingual coreference annotations across 17 languages
# using Universal Dependencies-style annotation guidelines:
# - Mention types: Name, Nominal, Pronoun
# - Mentions are linked into coreference chains
# - Supports entity, event, and bridging coreference
#
# Annotation process:
# 1. Identify all entity mentions (names, nominals, pronouns)
# 2. Classify each mention by its type
# 3. Link coreferent mentions into chains

annotation_task_name: "CorefUD Multilingual Coreference Resolution"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  # Step 1: Entity mention identification
  - annotation_type: span
    name: entity_mentions
    description: "Highlight all entity mentions in the text and classify their mention type."
    labels:
      - "Name"
      - "Nominal"
      - "Pronoun"
    label_colors:
      "Name": "#3b82f6"
      "Nominal": "#22c55e"
      "Pronoun": "#f59e0b"
    tooltips:
      "Name": "Proper noun or named entity (e.g., 'Marie Curie', 'Paris', 'the Sorbonne')"
      "Nominal": "Common noun phrase referring to an entity (e.g., 'the scientist', 'the committee', 'a new plan')"
      "Pronoun": "Pronominal reference (e.g., 'she', 'it', 'they', 'his', 'her')"
    keyboard_shortcuts:
      "Name": "n"
      "Nominal": "m"
      "Pronoun": "p"
    allow_overlapping: false

  # Step 2: Coreference chain linking
  - annotation_type: span_link
    name: coreference_chains
    description: "Link entity mentions that refer to the same real-world entity. Connect each mention to an earlier mention in the same coreference chain."
    source_scheme: entity_mentions
    target_scheme: entity_mentions
    labels:
      - "Coreference"
    label_colors:
      "Coreference": "#6366f1"
    tooltips:
      "Coreference": "Both mentions refer to the same real-world entity (e.g., 'Marie Curie' and 'She' referring to the same person)"
    keyboard_shortcuts:
      "Coreference": "c"

html_layout: |
  <div style="margin-bottom: 10px;">
    <span style="background: #e0e7ff; padding: 2px 8px; border-radius: 3px; font-size: 13px;">
      Language: <strong>{{language}}</strong>
    </span>
  </div>
  <div style="margin-bottom: 10px;">
    <p style="line-height: 1.8; font-size: 15px;">{{text}}</p>
  </div>
  <div style="background: #f0f0f0; padding: 10px; border-radius: 5px; margin-top: 10px;">
    <strong>Instructions:</strong>
    <ol>
      <li>Highlight all entity mentions (names, nominals, pronouns) in the text.</li>
      <li>Link mentions that refer to the same entity by creating coreference links.</li>
      <li>Each mention should be linked to the nearest preceding coreferent mention to form a chain.</li>
    </ol>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

[
  {
    "id": "corefud_001",
    "text": "Marie Curie was born in Warsaw in 1867. She moved to Paris to study at the Sorbonne, where the young scientist quickly distinguished herself. Curie's research on radioactivity earned her two Nobel Prizes. The physicist remains one of the most celebrated scientists in history.",
    "language": "English"
  },
  {
    "id": "corefud_002",
    "text": "Der Bundeskanzler hielt gestern eine Rede vor dem Parlament. Er betonte die Notwendigkeit einer Reform des Gesundheitssystems. Die Opposition kritisierte den Regierungschef fuer seinen Mangel an konkreten Vorschlaegen. Trotz der Kritik verteidigte er seinen Plan.",
    "language": "German"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/coreference/corefud-multilingual-coreference
potato start config.yaml

Details

Annotation Types

spanspan_link

Domain

NLPMultilingualCoreference

Use Cases

Coreference ResolutionEntity Mention DetectionMultilingual NLP

Tags

coreferencemultilingualuniversal-dependenciescorefudcracentity-mentionsemnlp2023

Found an issue or want to improve this design?

Open an Issue