Skip to content
Docs/Annotation Types

Entity Linking

Link span annotations to external knowledge bases like Wikidata, UMLS, or custom APIs.

Entity Linking

Entity linking enables annotators to connect span annotations to external knowledge bases (KBs) like Wikidata or UMLS. This creates semantic links between text mentions and canonical entities, valuable for named entity recognition, concept normalization, and knowledge graph construction.

How It Works

When entity linking is enabled for a span annotation schema:

  1. Annotators highlight text and assign a label (e.g., "PERSON", "ORGANIZATION")
  2. A link icon appears on the span's control bar
  3. Clicking the icon opens a search modal to find matching KB entities
  4. The selected entity ID is stored with the span annotation
  5. Linked spans display a filled icon and show entity details on hover

Quick Start

Enable entity linking by adding the entity_linking configuration to a span schema:

yaml
annotation_schemes:
  - annotation_type: span
    name: ner
    description: Named Entity Recognition with KB linking
    labels:
      - name: PERSON
        tooltip: "People's names"
      - name: ORGANIZATION
        tooltip: "Companies, agencies, institutions"
      - name: LOCATION
        tooltip: "Places, cities, countries"
    entity_linking:
      enabled: true
      knowledge_bases:
        - name: wikidata
          type: wikidata
          language: en

Configuration Options

OptionTypeDefaultDescription
enabledbooleanfalseEnable entity linking for this schema
knowledge_baseslist[]List of KB configurations
auto_searchbooleantrueAutomatically search when the modal opens
requiredbooleanfalseRequire entity link before saving span
multi_selectbooleanfalseAllow linking to multiple entities

Knowledge Base Configuration

OptionTypeDefaultDescription
namestringrequiredUnique identifier for this KB
typestringrequiredKB type: wikidata, umls, or rest
api_keystringnullAPI key for authenticated services
base_urlstringnullBase URL for REST APIs
languagestring"en"Language code for search results
timeoutinteger10Request timeout in seconds

Supported Knowledge Bases

Wikidata

Free, open knowledge base with 100+ million entities. No API key required.

yaml
entity_linking:
  enabled: true
  knowledge_bases:
    - name: wikidata
      type: wikidata
      language: en

Features multilingual labels, entity aliases (e.g., "NYC" finds "New York City"), and links to Wikipedia articles.

UMLS

Comprehensive medical and biomedical terminology. Requires a free API key from UTS.

yaml
entity_linking:
  enabled: true
  knowledge_bases:
    - name: umls
      type: umls
      api_key: ${UMLS_API_KEY}

Includes medical concepts, drugs, diseases, procedures, and cross-references to 200+ source vocabularies (SNOMED CT, ICD-10, MeSH, RxNorm).

Custom REST APIs

Connect to any knowledge base with a REST API:

yaml
entity_linking:
  enabled: true
  knowledge_bases:
    - name: internal_kb
      type: rest
      base_url: https://api.example.com
      api_key: optional_api_key
      extra_params:
        search_endpoint: /search
        entity_endpoint: /entity/{entity_id}
        search_query_param: q
        results_path: data.results
        entity_id_field: id
        label_field: name
        description_field: description

Multiple Knowledge Bases

Configure multiple KBs to let annotators choose the most appropriate source:

yaml
entity_linking:
  enabled: true
  knowledge_bases:
    - name: wikidata
      type: wikidata
      language: en
    - name: umls
      type: umls
      api_key: ${UMLS_API_KEY}
    - name: company_entities
      type: rest
      base_url: https://internal.company.com/api/entities

A dropdown in the search modal lets annotators switch between configured knowledge bases.

Multi-Select Mode

Enable multi-select to allow linking a span to multiple entities, useful for ambiguous mentions:

yaml
entity_linking:
  enabled: true
  multi_select: true
  knowledge_bases:
    - name: wikidata
      type: wikidata
      language: en

Data Format

Entity-linked spans include additional fields in the output:

json
{
  "id": "instance_001",
  "text": "Albert Einstein was born in Ulm, Germany in 1879.",
  "annotations": {
    "ner": {
      "spans": [
        {
          "text": "Albert Einstein",
          "start": 0,
          "end": 15,
          "label": "PERSON",
          "kb_id": "Q937",
          "kb_source": "wikidata",
          "kb_label": "Albert Einstein"
        },
        {
          "text": "Ulm",
          "start": 28,
          "end": 31,
          "label": "LOCATION",
          "kb_id": "Q3012",
          "kb_source": "wikidata",
          "kb_label": "Ulm"
        }
      ]
    }
  }
}

Best Practices

  1. Enable auto-search for efficiency - pre-populates search with span text
  2. Don't require linking unless essential - don't block annotation if entity not found
  3. Set appropriate timeouts for slow networks
  4. Match KB to entity type - Use Wikidata for general entities, UMLS for biomedical terms, custom APIs for domain-specific entities
  5. Use multi-select for ambiguous mentions - abbreviations, common names, polysemous terms

Further Reading

For implementation details, see the source documentation.