Multilingual News Article Similarity
Rating the similarity between pairs of news articles on a 5-point scale, assessing whether they cover the same story across multiple languages. Based on SemEval-2022 Task 8 (Chen et al.).
Konfigurationsdateiconfig.yaml
# Multilingual News Article Similarity
# Based on Chen et al., SemEval 2022
# Paper: https://aclanthology.org/2022.semeval-1.155/
# Dataset: https://github.com/euagendas/semeval-2022-task8
#
# This task asks annotators to rate the similarity between two news
# articles on a 5-point scale, from completely different stories to
# identical stories. Articles may be in different languages.
#
# Scale:
# 1 - Completely Different: The articles cover unrelated topics
# 2 - Somewhat Related: Articles share a general topic but cover different events
# 3 - Related: Articles cover the same general event but with different angles
# 4 - Very Similar: Articles cover the same event with mostly the same details
# 5 - Identical Story: Articles are about the exact same event and details
annotation_task_name: "Multilingual News Article Similarity"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: likert
name: similarity_rating
description: "How similar are these two news articles?"
min_label: "Completely Different"
max_label: "Identical Story"
size: 5
annotation_instructions: |
You will see two news articles (possibly in different languages) and the language pair.
Rate how similar the two articles are on a scale from 1 to 5.
1: Completely different topics
2: Somewhat related topic but different events
3: Same general event but different angles or details
4: Very similar coverage of the same event
5: Identical story with the same key details
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #ecfdf5; border: 1px solid #a7f3d0; border-radius: 8px; padding: 12px; margin-bottom: 12px;">
<strong style="color: #065f46;">Language Pair:</strong>
<span style="font-size: 15px; margin-left: 8px;">{{language}}</span>
</div>
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Article 1:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
<div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #a16207;">Article 2:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{article_2}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false
Beispieldatensample-data.json
[
{
"id": "news_sim_001",
"text": "The European Central Bank raised interest rates by 0.5 percentage points on Thursday, marking the largest increase in over a decade as the institution battles soaring inflation across the eurozone.",
"article_2": "La Banque centrale europeenne a releve ses taux d'interet de 0,5 point de pourcentage jeudi, soit la plus forte hausse depuis plus de dix ans, dans un effort pour lutter contre l'inflation galopante dans la zone euro.",
"language": "English-French"
},
{
"id": "news_sim_002",
"text": "A powerful 7.2 magnitude earthquake struck southern Haiti early Saturday morning, causing widespread destruction and leaving thousands without shelter in the impoverished Caribbean nation.",
"article_2": "Rescuers pulled survivors from the rubble of a collapsed factory in Bangladesh after a structural failure caused the six-story building to pancake during morning work hours.",
"language": "English-English"
}
]
// ... and 8 more itemsDieses Design herunterladen
Clone or download from the repository
Schnellstart:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/semeval/2022/task08-news-similarity potato start config.yaml
Details
Annotationstypen
Bereich
Anwendungsfälle
Schlagwörter
Problem gefunden oder möchten Sie dieses Design verbessern?
Issue öffnenVerwandte Designs
Assessing Humor in Edited News Headlines
Rate the funniness of edited news headlines and compare humor between original and edited versions, based on SemEval-2020 Task 7 (Hossain et al.). Headlines are minimally edited by replacing a single word to create humorous effect.
Determining Sentiment Intensity of English and Arabic Phrases
Fine-grained sentiment intensity scoring of text phrases on an 11-point scale from most negative to most positive. Based on SemEval-2016 Task 7.
Fine-Grained Sentiment Analysis on Financial Microblogs and News
Graded sentiment analysis of financial text with topic classification, rating market sentiment from very bearish to very bullish on a 7-point scale. Based on SemEval-2017 Task 5.