# Entity Linking

Source: https://www.potatoannotator.com/docs/annotation-types/entity-linking

Entity linking enables annotators to connect span annotations to external knowledge bases (KBs) like Wikidata or UMLS. This creates semantic links between text mentions and canonical entities, valuable for named entity recognition, concept normalization, and knowledge graph construction.

## How It Works

When entity linking is enabled for a span annotation schema:

1. Annotators highlight text and assign a label (e.g., "PERSON", "ORGANIZATION")
2. A link icon appears on the span's control bar
3. Clicking the icon opens a search modal to find matching KB entities
4. The selected entity ID is stored with the span annotation
5. Linked spans display a filled icon and show entity details on hover

## Quick Start

Enable entity linking by adding the `entity_linking` configuration to a span schema:

```yaml
annotation_schemes:
  - annotation_type: span
    name: ner
    description: Named Entity Recognition with KB linking
    labels:
      - name: PERSON
        tooltip: "People's names"
      - name: ORGANIZATION
        tooltip: "Companies, agencies, institutions"
      - name: LOCATION
        tooltip: "Places, cities, countries"
    entity_linking:
      enabled: true
      knowledge_bases:
        - name: wikidata
          type: wikidata
          language: en
```

## Configuration Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | boolean | `false` | Enable entity linking for this schema |
| `knowledge_bases` | list | `[]` | List of KB configurations |
| `auto_search` | boolean | `true` | Automatically search when the modal opens |
| `required` | boolean | `false` | Require entity link before saving span |
| `multi_select` | boolean | `false` | Allow linking to multiple entities |

### Knowledge Base Configuration

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `name` | string | required | Unique identifier for this KB |
| `type` | string | required | KB type: `wikidata`, `umls`, or `rest` |
| `api_key` | string | `null` | API key for authenticated services |
| `base_url` | string | `null` | Base URL for REST APIs |
| `language` | string | `"en"` | Language code for search results |
| `timeout` | integer | `10` | Request timeout in seconds |

## Supported Knowledge Bases

### Wikidata

Free, open knowledge base with 100+ million entities. No API key required.

```yaml
entity_linking:
  enabled: true
  knowledge_bases:
    - name: wikidata
      type: wikidata
      language: en
```

Features multilingual labels, entity aliases (e.g., "NYC" finds "New York City"), and links to Wikipedia articles.

### UMLS

Comprehensive medical and biomedical terminology. Requires a free API key from [UTS](https://uts.nlm.nih.gov/uts/).

```yaml
entity_linking:
  enabled: true
  knowledge_bases:
    - name: umls
      type: umls
      api_key: ${UMLS_API_KEY}
```

Includes medical concepts, drugs, diseases, procedures, and cross-references to 200+ source vocabularies (SNOMED CT, ICD-10, MeSH, RxNorm).

### Custom REST APIs

Connect to any knowledge base with a REST API:

```yaml
entity_linking:
  enabled: true
  knowledge_bases:
    - name: internal_kb
      type: rest
      base_url: https://api.example.com
      api_key: optional_api_key
      extra_params:
        search_endpoint: /search
        entity_endpoint: /entity/{entity_id}
        search_query_param: q
        results_path: data.results
        entity_id_field: id
        label_field: name
        description_field: description
```

## Multiple Knowledge Bases

Configure multiple KBs to let annotators choose the most appropriate source:

```yaml
entity_linking:
  enabled: true
  knowledge_bases:
    - name: wikidata
      type: wikidata
      language: en
    - name: umls
      type: umls
      api_key: ${UMLS_API_KEY}
    - name: company_entities
      type: rest
      base_url: https://internal.company.com/api/entities
```

A dropdown in the search modal lets annotators switch between configured knowledge bases.

## Multi-Select Mode

Enable multi-select to allow linking a span to multiple entities, useful for ambiguous mentions:

```yaml
entity_linking:
  enabled: true
  multi_select: true
  knowledge_bases:
    - name: wikidata
      type: wikidata
      language: en
```

## Data Format

Entity-linked spans include additional fields in the output:

```json
{
  "id": "instance_001",
  "text": "Albert Einstein was born in Ulm, Germany in 1879.",
  "annotations": {
    "ner": {
      "spans": [
        {
          "text": "Albert Einstein",
          "start": 0,
          "end": 15,
          "label": "PERSON",
          "kb_id": "Q937",
          "kb_source": "wikidata",
          "kb_label": "Albert Einstein"
        },
        {
          "text": "Ulm",
          "start": 28,
          "end": 31,
          "label": "LOCATION",
          "kb_id": "Q3012",
          "kb_source": "wikidata",
          "kb_label": "Ulm"
        }
      ]
    }
  }
}
```

## Best Practices

1. **Enable auto-search** for efficiency - pre-populates search with span text
2. **Don't require linking** unless essential - don't block annotation if entity not found
3. **Set appropriate timeouts** for slow networks
4. **Match KB to entity type** - Use Wikidata for general entities, UMLS for biomedical terms, custom APIs for domain-specific entities
5. **Use multi-select for ambiguous mentions** - abbreviations, common names, polysemous terms

## Further Reading

- [Span Annotation](/docs/annotation-types/span-annotation) - Basic span annotation setup
- [Coreference Chains](/docs/annotation-types/coreference) - Grouping entity mentions
- [Event Annotation](/docs/annotation-types/event-annotation) - N-ary event structures

For implementation details, see the [source documentation](https://github.com/davidjurgens/potato/blob/main/docs/entity_linking.md).
