Named Entity Recognition
What named entity recognition (NER) is, common label sets, and how to build an NER annotation task in Potato with colored span labels and tooltips.
Named entity recognition (NER) is the task of finding and classifying named things in text, people, organizations, locations, dates, and more. It is a span annotation task with an entity-typed label set. NER is a building block for search, knowledge graphs, redaction, and information extraction.
See Named-entity recognition for background.
Choosing a label set
Start from a standard scheme and trim it to your domain:
- CoNLL-2003:
PER,ORG,LOC,MISC. A good minimal default. - OntoNotes: 18 types including dates, money, and percentages, for richer needs.
- Domain-specific: biomedical (genes, diseases), legal (statutes, parties), or finance.
Fewer, well-defined types give higher agreement. Add types only when a real downstream use needs them.
Building the task in Potato
annotation_schemes:
- annotation_type: span
name: entities
description: "Highlight each named entity and select its type."
labels: [PERSON, ORGANIZATION, LOCATION, DATE, MISC]
label_colors:
PERSON: "#3b82f6"
ORGANIZATION: "#10b981"
LOCATION: "#f59e0b"
DATE: "#8b5cf6"
MISC: "#6b7280"
tooltips:
PERSON: "Names of people, e.g. 'Ada Lovelace'."
ORGANIZATION: "Companies, agencies, teams, e.g. 'United Nations'."
LOCATION: "Cities, countries, landmarks, e.g. 'Paris'."
DATE: "Dates and time expressions, e.g. 'next Monday'."
MISC: "Named entities that fit none of the above."
allow_overlapping: false
sequential_key_binding: trueThe named entity recognition showcase runs this configuration with sample data.
Boundary rules that prevent disagreement
Most NER disagreement is about where an entity starts and ends, not what it is. Decide and document:
- Do titles count? ("Dr. Jane Smith" vs. "Dr. Jane Smith".)
- Do you include "the" in "the United Nations"?
- How do you tag nested entities like "Bank of England"? If you need them, set
allow_overlapping: true.
From labels to a model
Export to CoNLL or spaCy format, which represent entities with BIO/IOB tags. See Exporting Annotations for ML.
Further reading
- Span Annotation
- Entity Linking, connecting entities to a knowledge base
- Relation and Event Extraction