HateXplain - Explainable Hate Speech Detection
Multi-task hate speech annotation with classification (hate/offensive/normal), target community identification, and rationale span highlighting. Based on the HateXplain benchmark (Mathew et al., AAAI 2021) - the first dataset covering classification, target identification, and rationale extraction.
text annotation
Configuration Fileconfig.yaml
# HateXplain - Explainable Hate Speech Detection
# Based on Mathew et al., AAAI 2021
# Paper: https://ojs.aaai.org/index.php/AAAI/article/view/17745
# Dataset: https://huggingface.co/datasets/hatexplain
#
# Three annotation tasks:
# 1. Classification: hate speech, offensive, or normal
# 2. Target community: which group is targeted (if hate/offensive)
# 3. Rationale spans: which words justify the classification
#
# Guidelines:
# - Hate speech: attacks or demeans a group based on identity
# - Offensive: rude/disrespectful but not targeting identity groups
# - Normal: neither hateful nor offensive
# - Rationale: highlight words that justify your classification (avg 5.5 tokens)
port: 8000
server_name: localhost
task_name: "HateXplain: Explainable Hate Speech Detection"
data_files:
- sample-data.json
id_key: id
text_key: text
output_file: annotations.json
annotation_schemes:
# Task 1: Classification
- annotation_type: radio
name: classification
description: "Classify this text as hate speech, offensive, or normal"
labels:
- Hate Speech
- Offensive
- Normal
keyboard_shortcuts:
"Hate Speech": "h"
"Offensive": "o"
"Normal": "n"
tooltips:
"Hate Speech": "Content that attacks or demeans a group based on identity attributes (race, religion, gender, etc.)"
"Offensive": "Rude, disrespectful, or profane content that does NOT target identity groups"
"Normal": "Content that is neither hateful nor offensive"
# Task 2: Target community (only for hate/offensive)
- annotation_type: multiselect
name: target_community
description: "If hate/offensive, select the targeted community/communities"
labels:
- African
- Arab
- Asian
- Caucasian
- Hispanic
- Jewish
- LGBTQ
- Islam
- Women
- Refugee
- Other
- None/Not Applicable
tooltips:
African: "People of African descent"
Arab: "People of Arab descent or from Arab countries"
Asian: "People of Asian descent"
Caucasian: "People of European/white descent"
Hispanic: "People of Hispanic/Latino descent"
Jewish: "Jewish people (ethnic or religious)"
LGBTQ: "Lesbian, gay, bisexual, transgender, queer individuals"
Islam: "Muslims or Islamic religion"
Women: "Women or girls"
Refugee: "Refugees, immigrants, or asylum seekers"
Other: "Other identity group not listed"
"None/Not Applicable": "No specific group targeted (for Normal texts)"
# Task 3: Rationale span annotation
- annotation_type: span
name: rationale
description: "Highlight the words/phrases that justify your classification decision"
labels:
- Rationale
label_colors:
Rationale: "#ef4444"
tooltips:
Rationale: "Words or phrases that are the reason for classifying as hate/offensive (avg ~5.5 tokens per post)"
allow_overlapping: false
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: true
Sample Datasample-data.json
[
{
"id": "hatex_001",
"text": "I can't believe how beautiful the sunset was today. Nature is truly amazing."
},
{
"id": "hatex_002",
"text": "These people should go back to where they came from. They don't belong here."
}
]
// ... and 10 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/hatexplain potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
Dynamic Hate Speech Detection
Hate speech classification with fine-grained type labels based on the Dynamically Generated Hate Speech Dataset (Vidgen et al., ACL 2021). Classify content as hateful or not, then identify hate type (animosity, derogation, dehumanization, threatening, support for hateful entities) and target group.
Toxic Spans Detection
Character-level toxic span annotation based on SemEval-2021 Task 5 (Pavlopoulos et al., 2021). Instead of binary toxicity classification, annotators identify the specific words/phrases that make a comment toxic, enabling more nuanced content moderation.
Political Discourse Analysis (AgoraSpeech)
Multi-task annotation of political speeches covering sentiment, polarization, populism, topic identification, and named entities. Based on AgoraSpeech (Sermpezis et al., 2025), featuring human-validated labels for comprehensive political discourse analysis.