MultiCoNER II: Multilingual Complex Named Entity Recognition
Complex and ambiguous named entity recognition across 12 languages. Annotators identify fine-grained entity types including creative works, groups, medical terms, and complex entities that require world knowledge. Based on the SemEval-2023 Task 2 shared task for multilingual complex NER.
配置文件config.yaml
# MultiCoNER II: Multilingual Complex Named Entity Recognition
# Based on SemEval-2023 Task 2 (Fetahu et al., SemEval@ACL 2023)
# Paper: https://aclanthology.org/2023.semeval-1.281/
# Dataset: https://multiconer.github.io/dataset
#
# Task: Fine-grained NER across 12 languages with complex entities
# Annotators identify named entities from a fine-grained taxonomy
# that includes creative works, groups, medical terms, and more.
#
# Key Challenges:
# - Complex entities requiring world knowledge (e.g., song titles)
# - Ambiguous entities that could belong to multiple types
# - Multilingual text with code-switching
# - Low-context short sentences from search queries and social media
#
# Entity Types (Fine-grained):
# - Person: Real and fictional people
# - Location: Places, facilities, geographical features
# - Organization: Companies, institutions, agencies
# - CreativeWork: Movies, books, songs, TV shows, games
# - Group: Bands, teams, political parties, movements
# - Medical: Diseases, symptoms, treatments, medications
# - Product: Consumer goods, vehicles, software, devices
# - Event: Named events, festivals, competitions
# - MISC: Other named entities not fitting above categories
annotation_task_name: "MultiCoNER II: Complex Named Entity Recognition"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
- annotation_type: span
name: entities
description: "Highlight all named entities and assign their fine-grained type"
labels:
- "Person"
- "Location"
- "Organization"
- "CreativeWork"
- "Group"
- "Medical"
- "Product"
- "Event"
- "MISC"
label_colors:
"Person": "#3b82f6"
"Location": "#22c55e"
"Organization": "#8b5cf6"
"CreativeWork": "#ec4899"
"Group": "#f59e0b"
"Medical": "#ef4444"
"Product": "#06b6d4"
"Event": "#f97316"
"MISC": "#6b7280"
tooltips:
"Person": "Names of real or fictional people (e.g., Albert Einstein, Harry Potter)"
"Location": "Places, facilities, geographical features (e.g., Tokyo, Central Park, Mount Everest)"
"Organization": "Companies, institutions, government agencies (e.g., Google, WHO, MIT)"
"CreativeWork": "Movies, books, songs, TV shows, video games, artworks (e.g., The Matrix, Bohemian Rhapsody)"
"Group": "Bands, sports teams, political parties, social movements (e.g., Coldplay, FC Barcelona, Greenpeace)"
"Medical": "Diseases, symptoms, treatments, medications, medical procedures (e.g., COVID-19, ibuprofen)"
"Product": "Consumer goods, vehicles, software, devices (e.g., iPhone, Windows 11, Boeing 747)"
"Event": "Named events, festivals, competitions, historical events (e.g., Olympics, Coachella, World War II)"
"MISC": "Other named entities not fitting the above categories"
allow_overlapping: false
html_layout: |
<div style="margin-bottom: 10px; padding: 8px; background: #f0f4ff; border-radius: 6px;">
<strong>Language:</strong> {{language}}
</div>
<div style="padding: 10px; border: 1px solid #ddd; border-radius: 6px; line-height: 1.8; font-size: 16px;">
{{text}}
</div>
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
示例数据sample-data.json
[
{
"id": "mconer_001",
"text": "The Shawshank Redemption, directed by Frank Darabont, was filmed at the Ohio State Reformatory in Mansfield.",
"language": "English"
},
{
"id": "mconer_002",
"text": "Le groupe Daft Punk a annonce sa separation apres 28 ans de carriere, laissant des fans du monde entier en deuil.",
"language": "French"
}
]
// ... and 8 more items获取此设计
Clone or download from the repository
快速开始:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/text/domain-specific/multiconerii-complex-ner potato start config.yaml
详情
标注类型
领域
应用场景
标签
发现问题或想改进此设计?
提交 Issue相关设计
Multilingual Coreference Resolution (CorefUD)
Multilingual coreference resolution across 17 languages using Universal Dependencies-style annotations. Annotators identify entity mentions (names, nominals, pronouns) and link them into coreference chains. Based on the CorefUD dataset and the CRAC 2023 Shared Task on Multilingual Coreference Resolution.
Multilingual Narrative Extraction
Multilingual narrative extraction task requiring annotators to identify narrative elements such as events, actors, and causal relations in news texts, and classify the narrative themes. Based on SemEval-2025 Task 10.
MultiTACRED: Multilingual TAC Relation Extraction
Multilingual version of the TACRED relation extraction dataset in 12 languages. Annotators identify subject/object entities and classify relations from 41 TAC relation types, with additional translation quality assessment.