Skip to content
Showcase/Named Entity Disambiguation (AIDA-CoNLL)
intermediatetext

Named Entity Disambiguation (AIDA-CoNLL)

Named entity disambiguation and linking to Wikidata knowledge base based on the AIDA-CoNLL dataset. Annotators identify named entity mentions in news text, classify them by type (PER, ORG, LOC, MISC), and link them to their corresponding Wikidata entities using QIDs, handling ambiguous references and NIL entities.

Q1: Rate your experience12345Q2: Primary use case?ResearchIndustryEducationQ3: Additional feedback

Configuration Fileconfig.yaml

# Named Entity Disambiguation (AIDA-CoNLL)
# Based on Hoffart et al., EMNLP 2011
#
# This configuration supports entity mention detection and Wikidata
# entity linking for news text from the AIDA-CoNLL dataset.
#
# Entity Types (CoNLL scheme):
# - PER: Person names (individuals, fictional characters)
# - ORG: Organizations (companies, agencies, teams, institutions)
# - LOC: Locations (countries, cities, geographic features)
# - MISC: Miscellaneous named entities (events, products, nationalities, works)
#
# Annotation Guidelines:
# 1. Highlight all named entity mentions in the text
# 2. Classify each mention as PER, ORG, LOC, or MISC
# 3. For each mention, enter the Wikidata QID (e.g., Q5284 for Bill Gates)
# 4. If the entity has no Wikidata entry, mark as "nil-entity"
# 5. If the mention is ambiguous between multiple entities, mark as "ambiguous"
# 6. Use keyboard shortcuts 1-4 for fast entity type selection
#
# Disambiguation Tips:
# - Use surrounding context to resolve ambiguity (e.g., "Washington" could be
#   a person, city, or state)
# - "Paris" in sports context likely refers to Paris Saint-Germain (ORG)
# - Consider the document topic and domain when disambiguating
# - When truly ambiguous, mark as "ambiguous" and add notes

annotation_task_name: "Named Entity Disambiguation (AIDA-CoNLL)"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  # Step 1: Span annotation for entity mentions
  - annotation_type: span
    name: entity_mentions
    description: "Highlight all named entity mentions in the text and classify by type."
    labels:
      - "PER"
      - "ORG"
      - "LOC"
      - "MISC"
    label_colors:
      "PER": "#ef4444"
      "ORG": "#3b82f6"
      "LOC": "#22c55e"
      "MISC": "#f59e0b"
    keyboard_shortcuts:
      "PER": "1"
      "ORG": "2"
      "LOC": "3"
      "MISC": "4"
    tooltips:
      "PER": "Person names: individuals, fictional characters (e.g., 'Barack Obama', 'Sherlock Holmes')"
      "ORG": "Organizations: companies, agencies, teams, institutions (e.g., 'Google', 'United Nations', 'FC Barcelona')"
      "LOC": "Locations: countries, cities, geographic features (e.g., 'France', 'Mount Everest', 'Amazon River')"
      "MISC": "Miscellaneous: events, products, nationalities, works of art (e.g., 'Nobel Prize', 'iPhone', 'French')"
    allow_overlapping: false

  # Step 2: Wikidata QID entry
  - annotation_type: text
    name: wikidata_qid
    description: "Enter Wikidata QID for the highlighted entity (e.g., Q5284 for Bill Gates). Leave blank if unknown."

  # Step 3: Entity status
  - annotation_type: radio
    name: entity_status
    description: "What is the linking status of this entity mention?"
    labels:
      - "linkable"
      - "nil-entity"
      - "ambiguous"
      - "not-an-entity"
    tooltips:
      "linkable": "Entity can be unambiguously linked to a Wikidata entry"
      "nil-entity": "Entity is real but has no Wikidata entry (e.g., obscure local business)"
      "ambiguous": "Entity mention is genuinely ambiguous between multiple Wikidata entries"
      "not-an-entity": "Highlighted span is not actually a named entity upon closer inspection"

  # Step 4: Disambiguation notes
  - annotation_type: text
    name: disambiguation_notes
    description: "Optional: explain your disambiguation reasoning, especially for ambiguous or difficult cases."

annotation_instructions: |
  You are annotating news text for named entity disambiguation.
  For each text passage:
  1. Highlight all named entity mentions using the span tool (use keys 1-4 for PER/ORG/LOC/MISC)
  2. Enter the Wikidata QID for the most recently highlighted entity
  3. Indicate whether the entity is linkable, NIL, ambiguous, or not actually an entity
  4. Optionally add notes explaining your disambiguation reasoning
  Pay special attention to ambiguous mentions like "Washington", "Paris", or "Jordan".

html_layout: |
  <div style="padding: 15px; font-family: Georgia, serif;">
    <div style="margin-bottom: 8px; color: #6b7280; font-size: 13px;">
      <strong>Source:</strong> {{source}}
    </div>
    <div style="font-size: 16px; line-height: 1.8; background: #f9fafb; padding: 15px; border-left: 4px solid #22c55e; border-radius: 4px;">
      {{text}}
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

[
  {
    "id": "aida_001",
    "text": "Michael Jordan announced his retirement from the Chicago Bulls in January 1999, ending an era that brought six NBA championships to the city of Chicago. The decision surprised fans across the United States.",
    "source": "Reuters"
  },
  {
    "id": "aida_002",
    "text": "The European Union imposed sanctions on Russia following the annexation of Crimea. German Chancellor Angela Merkel and French President Emmanuel Macron led the diplomatic efforts in Brussels.",
    "source": "Reuters"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/entity-linking/aida-conll-entity-disambiguation
potato start config.yaml

Details

Annotation Types

radiospantext

Domain

NLPEntity DisambiguationKnowledge Graphs

Use Cases

Entity LinkingEntity DisambiguationKnowledge Base Population

Tags

entity-linkingentity-disambiguationwikidataaidaconllemnlp2011

Found an issue or want to improve this design?

Open an Issue