WNUT-2017 - Emerging and Rare Entity Recognition
Named entity recognition for emerging and rare entities in noisy user-generated text, based on the W-NUT 2017 shared task (Derczynski et al., W-NUT@EMNLP 2017). Covers novel entity types in social media text from Twitter and Reddit.
ملف الإعدادconfig.yaml
# WNUT-2017 - Emerging and Rare Entity Recognition
# Based on Derczynski et al., W-NUT@EMNLP 2017
# Paper: https://aclanthology.org/W17-4418/
# Dataset: https://noisy-text.github.io/2017/emerging-rare-entities.html
#
# This task presents social media text (tweets, Reddit posts) for
# named entity recognition with a focus on emerging and rare entities.
# Annotators highlight entity spans and classify the overall entity
# composition of the text.
#
# Entity Types:
# - Person: Names of people
# - Location: Places, geographic locations
# - Corporation: Companies and organizations
# - Product: Commercial products, software, services
# - Creative Work: Movies, books, songs, games, etc.
# - Group: Sports teams, bands, political organizations
#
# Annotation Guidelines:
# 1. Read the social media text carefully
# 2. Highlight all entity mentions using the appropriate entity type
# 3. Include the full entity name (e.g., "New York City" not just "New York")
# 4. Classify whether the text contains novel, standard, or no entities
annotation_task_name: "WNUT-2017 - Emerging and Rare Entity Recognition"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
# Step 1: Highlight entity spans
- annotation_type: span
name: entity_spans
description: "Highlight all named entities in the text"
labels:
- "Person"
- "Location"
- "Corporation"
- "Product"
- "Creative Work"
- "Group"
label_colors:
"Person": "#3b82f6"
"Location": "#ef4444"
"Corporation": "#22c55e"
"Product": "#f59e0b"
"Creative Work": "#8b5cf6"
"Group": "#06b6d4"
keyboard_shortcuts:
"Person": "1"
"Location": "2"
"Corporation": "3"
"Product": "4"
"Creative Work": "5"
"Group": "6"
# Step 2: Entity composition classification
- annotation_type: radio
name: entity_composition
description: "What type of entities does this text contain?"
labels:
- "Contains Novel Entities"
- "Standard Entities Only"
- "No Entities"
keyboard_shortcuts:
"Contains Novel Entities": "7"
"Standard Entities Only": "8"
"No Entities": "9"
tooltips:
"Contains Novel Entities": "Text mentions emerging, recently created, or rarely seen entities"
"Standard Entities Only": "Text mentions only well-known, established entities"
"No Entities": "Text contains no named entities"
annotation_instructions: |
You will be shown a social media post (from Twitter or Reddit). Your task is to:
1. Highlight all named entity mentions in the text using the appropriate category.
2. Classify whether the text contains novel/emerging entities, standard entities, or no entities.
Entity categories:
- Person: Names of individuals
- Location: Geographic places, cities, countries
- Corporation: Companies, organizations, institutions
- Product: Software, devices, commercial products
- Creative Work: Movies, books, songs, TV shows, games
- Group: Sports teams, bands, political groups
Pay special attention to emerging or novel entities that may not be well-known.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Social Media Text:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
بيانات نموذجيةsample-data.json
[
{
"id": "wnut_001",
"text": "Just saw the new Deadpool movie with @john_smith at AMC Theatres downtown. Absolutely hilarious!"
},
{
"id": "wnut_002",
"text": "Anyone tried the new Pixel 9 Pro? Thinking of switching from my Samsung Galaxy. Google really stepped up their game."
}
]
// ... and 8 more itemsاحصل على هذا التصميم
Clone or download from the repository
بدء سريع:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/text/named-entity-recognition/wnut2017-emerging-entities potato start config.yaml
التفاصيل
أنواع التوسيم
المجال
حالات الاستخدام
الوسوم
وجدت مشكلة أو تريد تحسين هذا التصميم؟
افتح مشكلةتصاميم ذات صلة
Explainable Online Sexism Detection
Detection and fine-grained classification of online sexism with span-level evidence extraction. Categories include threats, derogation, animosity, and prejudiced discussion. Based on SemEval-2023 Task 10 (Kirk et al.).
OffensEval - Offensive Language Target Identification
Multi-step offensive language annotation combining offensiveness detection, target type classification, and offensive span identification, based on the SemEval 2020 OffensEval shared task (Zampieri et al., SemEval 2020).
Aspect-Based Sentiment Analysis
Identification of aspect terms in review text with sentiment polarity classification for each aspect. Based on SemEval-2016 Task 5 (ABSA).