Dynamic Hate Speech Detection

Hate speech classification with fine-grained type labels based on the Dynamically Generated Hate Speech Dataset (Vidgen et al., ACL 2021). Classify content as hateful or not, then identify hate type (animosity, derogation, dehumanization, threatening, support for hateful entities) and target group.

Fichier de configurationconfig.yaml

# Dynamic Hate Speech Detection
# Based on Vidgen et al., ACL 2021
# Paper: https://aclanthology.org/2021.acl-long.132/
# Dataset: https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset
#
# This task uses a hierarchical annotation scheme:
# 1. Binary classification: Hate vs Not Hate
# 2. Hate type (if hateful): 5 categories
# 3. Target group identification
#
# Hate Type Definitions:
# - Animosity: Expression of negative feelings, hostility, or opposition
# - Derogation: Insulting, demeaning, or belittling language
# - Dehumanization: Comparing groups to animals, diseases, or subhuman entities
# - Threatening: Direct or implicit threats of violence or harm
# - Support for Hateful Entities: Praising hateful groups, symbols, or ideologies
#
# Annotation Guidelines:
# 1. Consider the overall message and intent
# 2. Slurs alone may not be hateful (reclaimed language, discussion)
# 3. Criticism of ideas/behaviors differs from attacks on identity groups
# 4. Context matters - sarcasm, quotes, and counter-speech should be considered
# 5. When uncertain, consider how a member of the target group would perceive it

annotation_task_name: "Hate Speech Detection"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  # Step 1: Binary hate classification
  - annotation_type: radio
    name: is_hateful
    description: "Is this content hateful?"
    labels:
      - "Hate"
      - "Not Hate"
    tooltips:
      "Hate": "Content that attacks, demeans, or threatens individuals or groups based on protected characteristics"
      "Not Hate": "Content that is not hateful, including legitimate criticism, counter-speech, or neutral discussion"

  # Step 2: Hate type classification (if hateful)
  - annotation_type: radio
    name: hate_type
    description: "What type of hate is expressed? (Select the primary type)"
    labels:
      - "Animosity"
      - "Derogation"
      - "Dehumanization"
      - "Threatening"
      - "Support for Hateful Entities"
    tooltips:
      "Animosity": "Expression of negative feelings, hostility, antipathy, or opposition toward a group"
      "Derogation": "Insulting, demeaning, belittling, or degrading language about a group"
      "Dehumanization": "Comparing a group to animals, vermin, diseases, or subhuman entities"
      "Threatening": "Direct or implicit threats of violence, harm, or calls for exclusion/extermination"
      "Support for Hateful Entities": "Praising, supporting, or glorifying hateful groups, symbols, figures, or ideologies"

  # Step 3: Target group identification
  - annotation_type: multiselect
    name: target_groups
    description: "Which groups are targeted? (Select all that apply)"
    labels:
      - "Black people"
      - "Asian people"
      - "Hispanic/Latino people"
      - "Arab/Middle Eastern people"
      - "Indigenous people"
      - "Jewish people"
      - "Muslims"
      - "Women"
      - "LGBTQ+ people"
      - "Trans people"
      - "People with disabilities"
      - "Immigrants/Refugees"
      - "Other group"
    label_colors:
      "Black people": "#3b82f6"
      "Asian people": "#06b6d4"
      "Hispanic/Latino people": "#8b5cf6"
      "Arab/Middle Eastern people": "#f59e0b"
      "Indigenous people": "#22c55e"
      "Jewish people": "#6366f1"
      "Muslims": "#14b8a6"
      "Women": "#ec4899"
      "LGBTQ+ people": "#a855f7"
      "Trans people": "#d946ef"
      "People with disabilities": "#f97316"
      "Immigrants/Refugees": "#0ea5e9"
      "Other group": "#9ca3af"
    min_selections: 0
    max_selections: 13

  # Step 4: Confidence
  - annotation_type: likert
    name: confidence
    description: "How confident are you in your annotation?"
    min_value: 1
    max_value: 5
    labels:
      1: "Very uncertain"
      2: "Somewhat uncertain"
      3: "Moderately confident"
      4: "Confident"
      5: "Very confident"

allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false

Données d'exemplesample-data.json

[
  {
    "id": "dhs_001",
    "text": "These people are like cockroaches, infesting our neighborhoods and spreading disease wherever they go."
  },
  {
    "id": "dhs_002",
    "text": "I respectfully disagree with the government's immigration policy. I think we need better border security."
  }
]

// ... and 8 more items

Obtenir ce design

View on GitHub

Clone or download from the repository

Démarrage rapide :

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/hate-speech-moderation/dynamic-hate-speech
potato start config.yaml

Détails

Types d'annotation

likertmultiselectradio

Domaine

NLPContent ModerationSocial Media

Cas d'utilisation

Hate Speech DetectionContent ModerationOnline Safety

Étiquettes

hate-speechcontent-moderationsocial-mediaacl2021target-groups

Vous avez trouvé un problème ou souhaitez améliorer ce design ?

Ouvrir un ticket

Designs associés

Clickbait Detection (Webis Clickbait Corpus)

Classify headlines and social media posts as clickbait or non-clickbait based on the Webis Clickbait Corpus. Identify manipulative content designed to attract clicks through sensationalism, curiosity gaps, or misleading framing.

likertmultiselect

Implicit Hate Speech Detection

Detect and categorize implicit hate speech using a six-category taxonomy. Based on ElSherief et al., EMNLP 2021. Identifies grievance, incitement, stereotypes, inferiority, irony, and threats.

likertradio

AnnoMI Counselling Dialogue Annotation

Annotation of motivational interviewing counselling dialogues based on the AnnoMI dataset. Annotators label therapist and client utterances for MI techniques (open questions, reflections, affirmations) and client change talk (sustain talk, change talk), with quality ratings for therapeutic interactions.

radiomultiselect