Implicit Hate Speech Detection
Detect and categorize implicit hate speech using a six-category taxonomy. Based on ElSherief et al., EMNLP 2021. Identifies grievance, incitement, stereotypes, inferiority, irony, and threats.
設定ファイルconfig.yaml
# Implicit Hate Speech Detection
# Based on ElSherief et al., EMNLP 2021 "Latent Hatred"
# Paper: https://aclanthology.org/2021.emnlp-main.29/
# Dataset: https://github.com/SALT-NLP/implicit-hate
#
# Implicit hate speech is harder to detect than explicit hate because it
# relies on stereotypes, coded language, and indirect expression.
#
# Six-Category Taxonomy:
# 1. Grievance (24.2%): Positions majority groups as unfairly disadvantaged
# 2. Incitement (20.0%): Promotes hate groups/ideologies, flaunts in-group power
# 3. Stereotypes (17.9%): Associates groups with negative attributes via euphemisms
# 4. Inferiority (13.6%): Implies some group is of lesser value
# 5. Irony (12.6%): Uses sarcasm, humor, satire to demean
# 6. Threats (10.5%): Indirect threats to safety, well-being, reputation
#
# Annotation Guidelines:
# 1. First determine if the text contains implicit hate (vs explicit or none)
# 2. Implicit hate lacks slurs but conveys hateful meaning through context
# 3. Consider: Would someone from the target group feel attacked?
# 4. Look for coded language, dog whistles, and stereotypical references
# 5. Irony/sarcasm requires understanding the speaker's actual intent
#
# Key Distinction from Explicit Hate:
# - Explicit: Uses slurs, direct insults, clear threats
# - Implicit: Uses stereotypes, innuendo, coded language, sarcasm
annotation_task_name: "Implicit Hate Speech Detection"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
# Step 1: Hate classification
- annotation_type: radio
name: hate_type
description: "Does this text contain hate speech?"
labels:
- "Explicit hate"
- "Implicit hate"
- "Not hateful"
tooltips:
"Explicit hate": "Contains clear slurs, direct insults, or obvious threats"
"Implicit hate": "Conveys hate through stereotypes, coded language, or indirect means"
"Not hateful": "Does not contain hateful content"
# Step 2: Implicit hate category (if implicit)
- annotation_type: radio
name: implicit_category
description: "What type of implicit hate is expressed?"
labels:
- "Grievance"
- "Incitement"
- "Stereotypes"
- "Inferiority"
- "Irony"
- "Threats"
tooltips:
"Grievance": "Positions majority groups as unfairly disadvantaged (e.g., 'reverse racism', 'they get special treatment')"
"Incitement": "Promotes hate groups/ideologies, celebrates in-group power (e.g., praising extremist symbols)"
"Stereotypes": "Associates groups with negative attributes using euphemisms or coded language"
"Inferiority": "Implies some group or person is of lesser value or capability"
"Irony": "Uses sarcasm, humor, or satire to mock or demean a group"
"Threats": "Indirect threats to body, well-being, reputation, or rights"
# Step 3: Target group
- annotation_type: radio
name: target
description: "Which group is targeted?"
labels:
- "Racial/ethnic minority"
- "Religious group"
- "Women"
- "LGBTQ+ people"
- "Immigrants"
- "Other group"
- "No specific group"
tooltips:
"Racial/ethnic minority": "Black, Asian, Hispanic, Indigenous, or other racial/ethnic groups"
"Religious group": "Muslims, Jewish people, or other religious groups"
"Women": "Women or girls based on gender"
"LGBTQ+ people": "Gay, lesbian, bisexual, transgender, or other LGBTQ+ individuals"
"Immigrants": "Immigrants, refugees, or non-citizens"
"Other group": "Another identifiable group not listed"
"No specific group": "No clear target group identified"
# Step 4: Confidence
- annotation_type: likert
name: confidence
description: "How confident are you in your annotation?"
min_value: 1
max_value: 5
labels:
1: "Very uncertain"
2: "Somewhat uncertain"
3: "Moderately confident"
4: "Confident"
5: "Very confident"
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false
サンプルデータsample-data.json
[
{
"id": "imp_001",
"text": "It's funny how they get scholarships and jobs handed to them while the rest of us have to actually work for it."
},
{
"id": "imp_002",
"text": "The new employee seems very qualified. I'm excited to work with her on the project."
}
]
// ... and 8 more itemsこのデザインを取得
Clone or download from the repository
クイックスタート:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/text/hate-speech-moderation/implicit-hate-speech potato start config.yaml
詳細
アノテーションタイプ
ドメイン
ユースケース
タグ
問題を見つけた場合やデザインを改善したい場合は?
Issueを作成関連デザイン
Dynamic Hate Speech Detection
Hate speech classification with fine-grained type labels based on the Dynamically Generated Hate Speech Dataset (Vidgen et al., ACL 2021). Classify content as hateful or not, then identify hate type (animosity, derogation, dehumanization, threatening, support for hateful entities) and target group.
Clickbait Detection (Webis Clickbait Corpus)
Classify headlines and social media posts as clickbait or non-clickbait based on the Webis Clickbait Corpus. Identify manipulative content designed to attract clicks through sensationalism, curiosity gaps, or misleading framing.
Rumor Stance Detection (PHEME)
Classify stance toward rumors in social media threads. Based on PHEME (Zubiaga et al.). Label replies as supporting, denying, querying, or commenting on rumorous claims.