Skip to content
Esta página aún no está disponible en su idioma. Se muestra la versión en inglés.

Named Entity Recognition

What named entity recognition (NER) is, common label sets, and how to build an NER annotation task in Potato with colored span labels and tooltips.

Named entity recognition (NER) is the task of finding and classifying named things in text, people, organizations, locations, dates, and more. It is a span annotation task with an entity-typed label set. NER is a building block for search, knowledge graphs, redaction, and information extraction.

See Named-entity recognition for background.

Choosing a label set

Start from a standard scheme and trim it to your domain:

  • CoNLL-2003: PER, ORG, LOC, MISC. A good minimal default.
  • OntoNotes: 18 types including dates, money, and percentages, for richer needs.
  • Domain-specific: biomedical (genes, diseases), legal (statutes, parties), or finance.

Fewer, well-defined types give higher agreement. Add types only when a real downstream use needs them.

Building the task in Potato

yaml
annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight each named entity and select its type."
    labels: [PERSON, ORGANIZATION, LOCATION, DATE, MISC]
    label_colors:
      PERSON: "#3b82f6"
      ORGANIZATION: "#10b981"
      LOCATION: "#f59e0b"
      DATE: "#8b5cf6"
      MISC: "#6b7280"
    tooltips:
      PERSON: "Names of people, e.g. 'Ada Lovelace'."
      ORGANIZATION: "Companies, agencies, teams, e.g. 'United Nations'."
      LOCATION: "Cities, countries, landmarks, e.g. 'Paris'."
      DATE: "Dates and time expressions, e.g. 'next Monday'."
      MISC: "Named entities that fit none of the above."
    allow_overlapping: false
    sequential_key_binding: true

The named entity recognition showcase runs this configuration with sample data.

Boundary rules that prevent disagreement

Most NER disagreement is about where an entity starts and ends, not what it is. Decide and document:

  • Do titles count? ("Dr. Jane Smith" vs. "Dr. Jane Smith".)
  • Do you include "the" in "the United Nations"?
  • How do you tag nested entities like "Bank of England"? If you need them, set allow_overlapping: true.

From labels to a model

Export to CoNLL or spaCy format, which represent entities with BIO/IOB tags. See Exporting Annotations for ML.

Further reading