HatEval - Multilingual Detection of Hate Speech Against Immigrants and Women
Detection and classification of hate speech targeting immigrants and women on Twitter, including fine-grained categorization of hate type. Based on SemEval-2019 Task 5 (HatEval).
File di configurazioneconfig.yaml
# HatEval - Multilingual Detection of Hate Speech
# Based on Basile et al., SemEval 2019
# Paper: https://aclanthology.org/S19-2007/
# Dataset: https://competitions.codalab.org/competitions/19935
#
# This task asks annotators to first determine whether a tweet contains
# hate speech, and if so, to identify the types of hate expressed.
#
# Hate Speech Labels:
# - Hateful: The tweet contains hate speech targeting a group
# - Not Hateful: The tweet does not contain hate speech
#
# Hate Type Categories:
# - Racism: Targeting based on race or ethnicity
# - Sexism: Targeting based on gender
# - Xenophobia: Targeting based on national origin or immigration status
# - Other: Other forms of hate not covered above
annotation_task_name: "HatEval - Hate Speech Detection"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: radio
name: hate_speech_detection
description: "Does this tweet contain hate speech?"
labels:
- "Hateful"
- "Not Hateful"
keyboard_shortcuts:
"Hateful": "1"
"Not Hateful": "2"
tooltips:
"Hateful": "The tweet contains language that targets or demeans a group"
"Not Hateful": "The tweet does not contain hate speech"
- annotation_type: multiselect
name: hate_type
description: "If hateful, what types of hate are expressed? Select all that apply."
labels:
- "Racism"
- "Sexism"
- "Xenophobia"
- "Other"
tooltips:
"Racism": "Hate speech targeting based on race or ethnicity"
"Sexism": "Hate speech targeting based on gender"
"Xenophobia": "Hate speech targeting based on national origin or immigration status"
"Other": "Other forms of hate not covered by the above categories"
annotation_instructions: |
You will be shown a tweet and its target group. Your task is to:
1. Determine whether the tweet contains hate speech (hateful vs. not hateful).
2. If hateful, select all applicable hate type categories.
Consider the context and target group when making your judgment.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 12px; margin-bottom: 12px;">
<strong style="color: #a16207;">Target Group:</strong>
<span style="font-size: 15px;">{{target_group}}</span>
</div>
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Tweet:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
Dati di esempiosample-data.json
[
{
"id": "hateval_001",
"text": "These immigrants are ruining our country. They should all go back where they came from. #buildthewall",
"target_group": "Immigrants"
},
{
"id": "hateval_002",
"text": "Just had the most amazing tacos from the new Mexican restaurant downtown. Highly recommend!",
"target_group": "None"
}
]
// ... and 8 more itemsOttieni questo design
Clone or download from the repository
Avvio rapido:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/semeval/2019/task05-hateval potato start config.yaml
Dettagli
Tipi di annotazione
Dominio
Casi d'uso
Tag
Hai trovato un problema o vuoi migliorare questo design?
Apri un problemaDesign correlati
ADMIRE - Multimodal Idiomaticity Recognition
Multimodal idiomaticity detection task requiring annotators to identify whether expressions are used idiomatically or literally, with supporting cue analysis. Based on SemEval-2025 Task 1 (ADMIRE).
Clickbait Detection (Webis Clickbait Corpus)
Classify headlines and social media posts as clickbait or non-clickbait based on the Webis Clickbait Corpus. Identify manipulative content designed to attract clicks through sensationalism, curiosity gaps, or misleading framing.
Dynamic Hate Speech Detection
Hate speech classification with fine-grained type labels based on the Dynamically Generated Hate Speech Dataset (Vidgen et al., ACL 2021). Classify content as hateful or not, then identify hate type (animosity, derogation, dehumanization, threatening, support for hateful entities) and target group.