Toxic Spans Detection
Character-level toxic span annotation based on SemEval-2021 Task 5 (Pavlopoulos et al., 2021). Instead of binary toxicity classification, annotators identify the specific words/phrases that make a comment toxic, enabling more nuanced content moderation.
Configuration Fileconfig.yaml
# Toxic Spans Detection
# Based on SemEval-2021 Task 5 (Pavlopoulos et al., 2021)
# Paper: https://aclanthology.org/2021.semeval-1.6/
# Dataset: https://github.com/ipavlopoulos/toxic_spans
#
# Task: Identify the specific character sequences within comments that
# contribute to toxicity, rather than making binary judgments about
# entire comments.
#
# Guidelines:
# - Mark the exact words/phrases that make the text toxic
# - Focus on language that is abusive, offensive, or harmful
# - Be precise: highlight only the toxic portions, not surrounding context
# - Multiple spans can be marked in a single comment
# - Some comments may have no toxic spans (false positives in toxicity detection)
annotation_task_name: "Toxic Spans Detection"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
# First: determine if the text contains toxicity
- annotation_type: radio
name: contains_toxicity
description: "Does this text contain any toxic content?"
labels:
- "Yes - contains toxic content"
- "No - not toxic"
keyboard_shortcuts:
"Yes - contains toxic content": "y"
"No - not toxic": "n"
tooltips:
"Yes - contains toxic content": "The text contains language that is abusive, offensive, or harmful"
"No - not toxic": "The text does not contain toxic language (may be critical but not abusive)"
# Then: highlight the specific toxic spans
- annotation_type: span
name: toxic_spans
description: "Highlight the specific words or phrases that make this text toxic"
labels:
- Toxic
label_colors:
Toxic: "#ef4444"
tooltips:
Toxic: "Words or phrases that are abusive, offensive, threatening, or otherwise harmful"
allow_overlapping: false
# Optional: categorize the type of toxicity
- annotation_type: multiselect
name: toxicity_type
description: "What type(s) of toxicity are present? (select all that apply)"
labels:
- Insult
- Profanity
- Threat
- Identity Attack
- Sexual Content
- Other
tooltips:
Insult: "Personal attacks or demeaning language"
Profanity: "Vulgar or obscene language"
Threat: "Expressions of intent to harm"
"Identity Attack": "Attacks based on identity (race, gender, religion, etc.)"
"Sexual Content": "Sexually explicit or inappropriate content"
Other: "Other forms of toxic content"
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false
Sample Datasample-data.json
[
{
"id": "toxic_001",
"text": "This article is well-researched and presents a balanced view of the issue."
},
{
"id": "toxic_002",
"text": "You're such an idiot if you believe this garbage. Completely braindead take."
}
]
// ... and 10 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/text/hate-speech-moderation/toxic-spans potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
HateXplain - Explainable Hate Speech Detection
Multi-task hate speech annotation with classification (hate/offensive/normal), target community identification, and rationale span highlighting. Based on the HateXplain benchmark (Mathew et al., AAAI 2021) - the first dataset covering classification, target identification, and rationale extraction.
Food Hazard Detection
Food safety hazard detection task requiring annotators to identify hazards, products, and risk levels in food incident reports, and classify the type of contamination. Based on SemEval-2025 Task 9.
MediTOD Medical Dialogue Annotation
Medical history-taking dialogue annotation based on the MediTOD dataset. Annotators label dialogue acts, identify medical entities (symptoms, conditions, medications, tests), and assess doctor-patient communication quality across multi-turn clinical conversations.