Entity Linking in Tweets
Named entity recognition and entity linking in tweets, identifying entity mentions and mapping them to knowledge base URIs. Based on SemEval-2022 Task 12 (Agarwal et al.).
配置文件config.yaml
# Entity Linking in Tweets
# Based on Agarwal et al., SemEval 2022
# Paper: https://aclanthology.org/2022.semeval-1.196/
# Dataset: https://github.com/Agarwal-SemEval2022-Task12
#
# This task asks annotators to identify entity mentions in tweets and
# provide their corresponding knowledge base URIs (e.g., Wikipedia or
# Wikidata URLs). Entity mentions include persons, organizations,
# locations, events, and other named entities.
#
# Annotation Steps:
# 1. Highlight all entity mentions in the tweet text
# 2. For each entity, provide the corresponding knowledge base URI
annotation_task_name: "Entity Linking in Tweets"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: span
name: entity_mentions
description: "Highlight all named entity mentions in the tweet"
labels:
- "Entity Mention"
- annotation_type: text
name: entity_uri
description: "Provide the knowledge base URI (e.g., Wikipedia URL) for each identified entity"
annotation_instructions: |
You will see a tweet that may contain named entities (people, organizations, places, events, etc.).
1. Highlight all named entity mentions in the text using the span annotation tool.
2. For each entity, provide its corresponding knowledge base URI (Wikipedia or Wikidata URL).
3. If multiple entities are present, list all URIs separated by semicolons.
Note: Focus on proper nouns and named entities, not common nouns.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Tweet:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
示例数据sample-data.json
[
{
"id": "entity_link_001",
"text": "Just watched Oppenheimer at the IMAX theater in Manhattan. Christopher Nolan really outdid himself with this one. Cillian Murphy deserves every award."
},
{
"id": "entity_link_002",
"text": "The Lakers are playing the Celtics at Staples Center tonight. LeBron James vs Jayson Tatum is going to be epic!"
}
]
// ... and 8 more items获取此设计
Clone or download from the repository
快速开始:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/semeval/2022/task12-entity-linking-tweets potato start config.yaml
详情
标注类型
领域
应用场景
标签
发现问题或想改进此设计?
提交 Issue相关设计
Clickbait Spoiling
Classification and extraction of spoilers for clickbait posts, including spoiler type identification and span-level spoiler detection. Based on SemEval-2023 Task 5 (Hagen et al.).
MeasEval - Counts and Measurements
Extract and classify measurements, quantities, units, and measured entities from scientific text, based on SemEval-2021 Task 8 (Harper et al.). Annotators span-annotate measurement components and classify quantity types with normalized values.
NLPContributionGraph - Structured Extraction of NLP Contributions
Extract structured contribution information from NLP papers by annotating research problem, approach, model, dataset, metric, and result spans and forming contribution triples, based on SemEval-2021 Task 11 (D'Souza et al.).