Skip to content
Showcase/MultiCoNER II: Multilingual Complex Named Entity Recognition
intermediatetext

MultiCoNER II: Multilingual Complex Named Entity Recognition

Complex and ambiguous named entity recognition across 12 languages. Annotators identify fine-grained entity types including creative works, groups, medical terms, and complex entities that require world knowledge. Based on the SemEval-2023 Task 2 shared task for multilingual complex NER.

PERORGLOCPERORGLOCDATESelect text to annotate

配置文件config.yaml

# MultiCoNER II: Multilingual Complex Named Entity Recognition
# Based on SemEval-2023 Task 2 (Fetahu et al., SemEval@ACL 2023)
# Paper: https://aclanthology.org/2023.semeval-1.281/
# Dataset: https://multiconer.github.io/dataset
#
# Task: Fine-grained NER across 12 languages with complex entities
# Annotators identify named entities from a fine-grained taxonomy
# that includes creative works, groups, medical terms, and more.
#
# Key Challenges:
# - Complex entities requiring world knowledge (e.g., song titles)
# - Ambiguous entities that could belong to multiple types
# - Multilingual text with code-switching
# - Low-context short sentences from search queries and social media
#
# Entity Types (Fine-grained):
# - Person: Real and fictional people
# - Location: Places, facilities, geographical features
# - Organization: Companies, institutions, agencies
# - CreativeWork: Movies, books, songs, TV shows, games
# - Group: Bands, teams, political parties, movements
# - Medical: Diseases, symptoms, treatments, medications
# - Product: Consumer goods, vehicles, software, devices
# - Event: Named events, festivals, competitions
# - MISC: Other named entities not fitting above categories

annotation_task_name: "MultiCoNER II: Complex Named Entity Recognition"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight all named entities and assign their fine-grained type"
    labels:
      - "Person"
      - "Location"
      - "Organization"
      - "CreativeWork"
      - "Group"
      - "Medical"
      - "Product"
      - "Event"
      - "MISC"
    label_colors:
      "Person": "#3b82f6"
      "Location": "#22c55e"
      "Organization": "#8b5cf6"
      "CreativeWork": "#ec4899"
      "Group": "#f59e0b"
      "Medical": "#ef4444"
      "Product": "#06b6d4"
      "Event": "#f97316"
      "MISC": "#6b7280"
    tooltips:
      "Person": "Names of real or fictional people (e.g., Albert Einstein, Harry Potter)"
      "Location": "Places, facilities, geographical features (e.g., Tokyo, Central Park, Mount Everest)"
      "Organization": "Companies, institutions, government agencies (e.g., Google, WHO, MIT)"
      "CreativeWork": "Movies, books, songs, TV shows, video games, artworks (e.g., The Matrix, Bohemian Rhapsody)"
      "Group": "Bands, sports teams, political parties, social movements (e.g., Coldplay, FC Barcelona, Greenpeace)"
      "Medical": "Diseases, symptoms, treatments, medications, medical procedures (e.g., COVID-19, ibuprofen)"
      "Product": "Consumer goods, vehicles, software, devices (e.g., iPhone, Windows 11, Boeing 747)"
      "Event": "Named events, festivals, competitions, historical events (e.g., Olympics, Coachella, World War II)"
      "MISC": "Other named entities not fitting the above categories"
    allow_overlapping: false

html_layout: |
  <div style="margin-bottom: 10px; padding: 8px; background: #f0f4ff; border-radius: 6px;">
    <strong>Language:</strong> {{language}}
  </div>
  <div style="padding: 10px; border: 1px solid #ddd; border-radius: 6px; line-height: 1.8; font-size: 16px;">
    {{text}}
  </div>

allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

示例数据sample-data.json

[
  {
    "id": "mconer_001",
    "text": "The Shawshank Redemption, directed by Frank Darabont, was filmed at the Ohio State Reformatory in Mansfield.",
    "language": "English"
  },
  {
    "id": "mconer_002",
    "text": "Le groupe Daft Punk a annonce sa separation apres 28 ans de carriere, laissant des fans du monde entier en deuil.",
    "language": "French"
  }
]

// ... and 8 more items

获取此设计

View on GitHub

Clone or download from the repository

快速开始:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/domain-specific/multiconerii-complex-ner
potato start config.yaml

详情

标注类型

span

领域

NLPNamed Entity RecognitionMultilingual

应用场景

Named Entity RecognitionMultilingual NLPKnowledge Base Population

标签

nermultilingualmulticonersemeval2023complex-entitiesfine-grained-ner

发现问题或想改进此设计?

提交 Issue