La reconnaissance d'entités nommées (NER) est l'une des tâches NLP les plus courantes. Dans ce tutoriel, vous apprendrez à créer une interface d'annotation NER complète avec la mise en surbrillance des spans, les raccourcis clavier et la sélection du type d'entité.

Ce que nous construisons

À la fin de ce tutoriel, vous aurez une interface d'annotation où les annotateurs peuvent :

Surligner des spans de texte en cliquant et en faisant glisser
Attribuer des types d'entités (Personne, Organisation, Lieu, etc.)
Utiliser des raccourcis clavier pour une annotation plus rapide
Modifier ou supprimer des annotations existantes

Prérequis

Potato installé (pip install potato-annotation)
Familiarité de base avec YAML
Des données textuelles à annoter

Étape 1 : Configurer le schéma d'annotation

Créez un fichier config.yaml :

yaml

annotation_task_name: "Named Entity Recognition"
 
data_files:
  - data/sentences.json
 
item_properties:
  id_key: id
  text_key: text
 
# Enable span annotation
annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight and label named entities in the text"
    labels:
      - name: PER
        description: "Person names"
        color: "#FF6B6B"
        keyboard_shortcut: "p"
      - name: ORG
        description: "Organizations"
        color: "#4ECDC4"
        keyboard_shortcut: "o"
      - name: LOC
        description: "Locations"
        color: "#45B7D1"
        keyboard_shortcut: "l"
      - name: DATE
        description: "Dates and times"
        color: "#96CEB4"
        keyboard_shortcut: "d"
      - name: MISC
        description: "Miscellaneous entities"
        color: "#FFEAA7"
        keyboard_shortcut: "m"
    min_spans: 0  # Allow sentences with no entities

Étape 2 : Préparer vos données

Créez data/sentences.json avec vos données textuelles :

json

{"id": "1", "text": "Apple Inc. announced that CEO Tim Cook will visit Paris next Tuesday."}
{"id": "2", "text": "The United Nations headquarters in New York hosted delegates from Japan."}
{"id": "3", "text": "Dr. Sarah Johnson published her research at Stanford University in March 2024."}

Étape 3 : Ajouter des consignes d'annotation

Aidez vos annotateurs avec des consignes claires :

yaml

# Add to config.yaml
annotation_guidelines:
  title: "NER Annotation Guidelines"
  content: |
    ## Entity Types
 
    **PER (Person)**: Names of people, including fictional characters
    - Examples: "John Smith", "Dr. Johnson", "Batman"
 
    **ORG (Organization)**: Companies, institutions, agencies
    - Examples: "Apple Inc.", "United Nations", "Stanford University"
 
    **LOC (Location)**: Places, including countries, cities, landmarks
    - Examples: "Paris", "New York", "Mount Everest"
 
    **DATE**: Dates, times, and temporal expressions
    - Examples: "Tuesday", "March 2024", "next week"
 
    **MISC**: Other named entities not fitting above categories
    - Examples: "Nobel Prize", "iPhone", "COVID-19"
 
    ## Annotation Rules
    1. Include titles (Dr., Mr.) with person names
    2. For nested entities, annotate the largest meaningful span
    3. Don't include articles (the, a) in entity spans

Étape 4 : Commencer à annoter

Lancez votre tâche NER :

bash

potato start config.yaml

Flux de travail d'annotation

Sélectionner du texte : Cliquez et faites glisser pour surligner un span
Choisir le type d'entité : Cliquez sur un bouton d'étiquette ou utilisez un raccourci clavier
Modifier les annotations : Cliquez sur un span existant pour le modifier ou le supprimer
Soumettre : Appuyez sur Entrée ou cliquez sur Soumettre quand c'est terminé

Étape 5 : Examiner la sortie

Les annotations sont sauvegardées au format JSONL :

json

{
  "id": "1",
  "text": "Apple Inc. announced that CEO Tim Cook will visit Paris next Tuesday.",
  "annotations": {
    "entities": [
      {"start": 0, "end": 10, "label": "ORG", "text": "Apple Inc."},
      {"start": 30, "end": 38, "label": "PER", "text": "Tim Cook"},
      {"start": 50, "end": 55, "label": "LOC", "text": "Paris"},
      {"start": 61, "end": 73, "label": "DATE", "text": "next Tuesday"}
    ]
  }
}

Conseils pour une meilleure annotation NER

Consignes cohérentes : Des règles claires réduisent les désaccords
Exemples d'entraînement : Montrez aux annotateurs les cas limites avant qu'ils ne commencent
Calibration régulière : Discutez des cas difficiles en équipe
Mesurez l'accord : Utilisez l'accord inter-annotateurs pour identifier les problèmes

Prochaines étapes

Ajoutez une phase d'entraînement pour intégrer les annotateurs
Configurez plusieurs annotateurs pour la redondance
Exportez au format Hugging Face pour l'entraînement de modèles

Besoin d'aide ? Consultez notre documentation sur l'annotation par span pour plus de détails.