Toxicity Detection
Multi-label classification for identifying various types of toxic content including hate speech, threats, and harassment.
text annotation
Configuration Fileconfig.yaml
# Toxicity Detection Configuration
# Generated by Potato Annotation Showcase
port: 8000
server_name: localhost
task_name: "Toxicity Detection"
# Data configuration
data_files:
- data.json
id_key: id
text_key: text
# Output
output_file: annotations.json
# Annotation schemes
annotation_schemes:
# Multi-label toxicity categories
- annotation_type: multiselect
name: toxicity_labels
description: "Select ALL toxicity categories that apply to this text"
labels:
- Toxic
- Severe Toxic
- Obscene
- Threat
- Insult
- Identity Hate
keyboard_shortcuts:
Toxic: "1"
Severe Toxic: "2"
Obscene: "3"
Threat: "4"
Insult: "5"
Identity Hate: "6"
tooltips:
Toxic: "Rude, disrespectful, or unreasonable content likely to make someone leave a discussion"
Severe Toxic: "Extremely hateful, aggressive, or disrespectful content"
Obscene: "Lewd, indecent, or profane language"
Threat: "Content that expresses intention to inflict harm"
Insult: "Insulting, inflammatory, or provocative content directed at a person"
Identity Hate: "Hateful content targeting someone's identity (race, religion, gender, etc.)"
# Overall severity rating
- annotation_type: radio
name: overall_severity
description: "Rate the overall severity of toxicity"
labels:
- Not Toxic
- Mildly Toxic
- Moderately Toxic
- Severely Toxic
keyboard_shortcuts:
Not Toxic: "q"
Mildly Toxic: "w"
Moderately Toxic: "e"
Severely Toxic: "r"
# User configuration
allow_all_users: true
# Task assignment
instances_per_annotator: 100
annotation_per_instance: 3
# Allow skipping difficult content
allow_skip: true
skip_reason_required: true
Get This Design
This design is available in our showcase. Copy the configuration below to get started.
Quick start:
# Create your project folder mkdir toxicity-detection cd toxicity-detection # Copy config.yaml from above potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Related Designs
Emotion Detection (SemEval-2018 Task 1)
Multi-label emotion classification with intensity ratings based on SemEval-2018 Task 1. Annotate text for emotions (anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise, trust) with intensity scales.
Machine Translation Evaluation
Evaluate machine translation quality with adequacy and fluency ratings.
Dynamic Hate Speech Detection
Hate speech classification with fine-grained type labels based on the Dynamically Generated Hate Speech Dataset (Vidgen et al., ACL 2021). Classify content as hateful or not, then identify hate type (animosity, derogation, dehumanization, threatening, support for hateful entities) and target group.