Dynamic Hate Speech Detection
Hate speech classification with fine-grained type labels based on the Dynamically Generated Hate Speech Dataset (Vidgen et al., ACL 2021). Classify content as hateful or not, then identify hate type (animosity, derogation, dehumanization, threatening, support for hateful entities) and target group.
text annotation
Configuration Fileconfig.yaml
# Dynamic Hate Speech Detection
# Based on Vidgen et al., ACL 2021
# Paper: https://aclanthology.org/2021.acl-long.132/
# Dataset: https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset
#
# This task uses a hierarchical annotation scheme:
# 1. Binary classification: Hate vs Not Hate
# 2. Hate type (if hateful): 5 categories
# 3. Target group identification
#
# Hate Type Definitions:
# - Animosity: Expression of negative feelings, hostility, or opposition
# - Derogation: Insulting, demeaning, or belittling language
# - Dehumanization: Comparing groups to animals, diseases, or subhuman entities
# - Threatening: Direct or implicit threats of violence or harm
# - Support for Hateful Entities: Praising hateful groups, symbols, or ideologies
#
# Annotation Guidelines:
# 1. Consider the overall message and intent
# 2. Slurs alone may not be hateful (reclaimed language, discussion)
# 3. Criticism of ideas/behaviors differs from attacks on identity groups
# 4. Context matters - sarcasm, quotes, and counter-speech should be considered
# 5. When uncertain, consider how a member of the target group would perceive it
port: 8000
server_name: localhost
task_name: "Hate Speech Detection"
data_files:
- sample-data.json
id_key: id
text_key: text
output_file: annotations.json
annotation_schemes:
# Step 1: Binary hate classification
- annotation_type: radio
name: is_hateful
description: "Is this content hateful?"
labels:
- "Hate"
- "Not Hate"
tooltips:
"Hate": "Content that attacks, demeans, or threatens individuals or groups based on protected characteristics"
"Not Hate": "Content that is not hateful, including legitimate criticism, counter-speech, or neutral discussion"
# Step 2: Hate type classification (if hateful)
- annotation_type: radio
name: hate_type
description: "What type of hate is expressed? (Select the primary type)"
labels:
- "Animosity"
- "Derogation"
- "Dehumanization"
- "Threatening"
- "Support for Hateful Entities"
tooltips:
"Animosity": "Expression of negative feelings, hostility, antipathy, or opposition toward a group"
"Derogation": "Insulting, demeaning, belittling, or degrading language about a group"
"Dehumanization": "Comparing a group to animals, vermin, diseases, or subhuman entities"
"Threatening": "Direct or implicit threats of violence, harm, or calls for exclusion/extermination"
"Support for Hateful Entities": "Praising, supporting, or glorifying hateful groups, symbols, figures, or ideologies"
# Step 3: Target group identification
- annotation_type: multiselect
name: target_groups
description: "Which groups are targeted? (Select all that apply)"
labels:
- "Black people"
- "Asian people"
- "Hispanic/Latino people"
- "Arab/Middle Eastern people"
- "Indigenous people"
- "Jewish people"
- "Muslims"
- "Women"
- "LGBTQ+ people"
- "Trans people"
- "People with disabilities"
- "Immigrants/Refugees"
- "Other group"
label_colors:
"Black people": "#3b82f6"
"Asian people": "#06b6d4"
"Hispanic/Latino people": "#8b5cf6"
"Arab/Middle Eastern people": "#f59e0b"
"Indigenous people": "#22c55e"
"Jewish people": "#6366f1"
"Muslims": "#14b8a6"
"Women": "#ec4899"
"LGBTQ+ people": "#a855f7"
"Trans people": "#d946ef"
"People with disabilities": "#f97316"
"Immigrants/Refugees": "#0ea5e9"
"Other group": "#9ca3af"
min_selections: 0
max_selections: 13
# Step 4: Confidence
- annotation_type: likert
name: confidence
description: "How confident are you in your annotation?"
min_value: 1
max_value: 5
labels:
1: "Very uncertain"
2: "Somewhat uncertain"
3: "Moderately confident"
4: "Confident"
5: "Very confident"
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false
Sample Datasample-data.json
[
{
"id": "dhs_001",
"text": "These people are like cockroaches, infesting our neighborhoods and spreading disease wherever they go."
},
{
"id": "dhs_002",
"text": "I respectfully disagree with the government's immigration policy. I think we need better border security."
}
]
// ... and 8 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/dynamic-hate-speech potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
GoEmotions - Fine-Grained Emotion Classification
Multi-label emotion classification with 27 emotion categories plus neutral, based on the Google Research GoEmotions dataset (Demszky et al., ACL 2020). Taxonomy covers 12 positive, 11 negative, and 4 ambiguous emotions designed for Reddit comment analysis.
HateXplain - Explainable Hate Speech Detection
Multi-task hate speech annotation with classification (hate/offensive/normal), target community identification, and rationale span highlighting. Based on the HateXplain benchmark (Mathew et al., AAAI 2021) - the first dataset covering classification, target identification, and rationale extraction.
Social Bias Frames (SBIC)
Annotate social media posts for bias using structured frames. Based on Sap et al., ACL 2020. Identify offensiveness, intent, implied stereotypes, and targeted groups.