固有表現認識（NER）は、最も一般的なNLPタスクの1つです。このチュートリアルでは、スパンハイライト、キーボードショートカット、エンティティタイプ選択を備えた完全なNERアノテーションインターフェースの作成方法を学びます。

構築するもの

このチュートリアルの終わりには、アノテーターが以下を行えるアノテーションインターフェースが完成します：

クリック＆ドラッグでテキストスパンをハイライト
エンティティタイプ（人名、組織名、地名など）を割り当て
キーボードショートカットで高速アノテーション
既存のアノテーションを編集または削除

前提条件

Potatoがインストール済み（pip install potato-annotation）
YAMLの基本的な知識
アノテーション対象のサンプルテキストデータ

ステップ1：アノテーションスキームの設定

config.yamlファイルを作成：

yaml

annotation_task_name: "Named Entity Recognition"
 
data_files:
  - data/sentences.json
 
item_properties:
  id_key: id
  text_key: text
 
# Enable span annotation
annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight and label named entities in the text"
    labels:
      - name: PER
        description: "Person names"
        color: "#FF6B6B"
        keyboard_shortcut: "p"
      - name: ORG
        description: "Organizations"
        color: "#4ECDC4"
        keyboard_shortcut: "o"
      - name: LOC
        description: "Locations"
        color: "#45B7D1"
        keyboard_shortcut: "l"
      - name: DATE
        description: "Dates and times"
        color: "#96CEB4"
        keyboard_shortcut: "d"
      - name: MISC
        description: "Miscellaneous entities"
        color: "#FFEAA7"
        keyboard_shortcut: "m"
    min_spans: 0  # Allow sentences with no entities

ステップ2：データの準備

テキストデータとしてdata/sentences.jsonを作成：

json

{"id": "1", "text": "Apple Inc. announced that CEO Tim Cook will visit Paris next Tuesday."}
{"id": "2", "text": "The United Nations headquarters in New York hosted delegates from Japan."}
{"id": "3", "text": "Dr. Sarah Johnson published her research at Stanford University in March 2024."}

ステップ3：アノテーションガイドラインの追加

明確なガイドラインでアノテーターを支援：

yaml

# Add to config.yaml
annotation_guidelines:
  title: "NER Annotation Guidelines"
  content: |
    ## Entity Types
 
    **PER (Person)**: Names of people, including fictional characters
    - Examples: "John Smith", "Dr. Johnson", "Batman"
 
    **ORG (Organization)**: Companies, institutions, agencies
    - Examples: "Apple Inc.", "United Nations", "Stanford University"
 
    **LOC (Location)**: Places, including countries, cities, landmarks
    - Examples: "Paris", "New York", "Mount Everest"
 
    **DATE**: Dates, times, and temporal expressions
    - Examples: "Tuesday", "March 2024", "next week"
 
    **MISC**: Other named entities not fitting above categories
    - Examples: "Nobel Prize", "iPhone", "COVID-19"
 
    ## Annotation Rules
    1. Include titles (Dr., Mr.) with person names
    2. For nested entities, annotate the largest meaningful span
    3. Don't include articles (the, a) in entity spans

ステップ4：アノテーション開始

NERタスクを起動：

bash

potato start config.yaml

アノテーションワークフロー

テキストを選択：クリック＆ドラッグでスパンをハイライト
エンティティタイプを選択：ラベルボタンをクリックまたはキーボードショートカットを使用
アノテーションを編集：既存のスパンをクリックして変更または削除
送信：Enterを押すか送信ボタンをクリック

ステップ5：出力の確認

アノテーションはJSONL形式で保存されます：

json

{
  "id": "1",
  "text": "Apple Inc. announced that CEO Tim Cook will visit Paris next Tuesday.",
  "annotations": {
    "entities": [
      {"start": 0, "end": 10, "label": "ORG", "text": "Apple Inc."},
      {"start": 30, "end": 38, "label": "PER", "text": "Tim Cook"},
      {"start": 50, "end": 55, "label": "LOC", "text": "Paris"},
      {"start": 61, "end": 73, "label": "DATE", "text": "next Tuesday"}
    ]
  }
}

より良いNERアノテーションのヒント

一貫したガイドライン：明確なルールで意見の相違を減らす
トレーニング例：開始前にエッジケースをアノテーターに提示
定期的なキャリブレーション：チームで難しいケースを議論
一致度の測定：アノテーター間一致度を使用して問題を特定

次のステップ

アノテーターのオンボーディングにトレーニングフェーズを追加
冗長性のために複数アノテーターを設定
モデルトレーニングのためにHugging Face形式にエクスポート

ヘルプが必要ですか？詳細はスパンアノテーションのドキュメントをご覧ください。