Toxicity Detection
Multi-label classification for identifying various types of toxic content including hate speech, threats, and harassment.
Obtener este diseño
This design is available in our showcase. Copy the configuration below to get started.
Inicio rápido:
# Create your project folder mkdir toxicity-detection cd toxicity-detection # Copy config.yaml from above potato start config.yaml
Detalles
Tipos de anotación
Dominio
Casos de uso
Etiquetas
Diseños relacionados
HateXplain - Explainable Hate Speech Detection
Multi-task hate speech annotation with classification (hate/offensive/normal), target community identification, and rationale span highlighting. Based on the HateXplain benchmark (Mathew et al., AAAI 2021) - the first dataset covering classification, target identification, and rationale extraction.
Toxic Spans Detection
Character-level toxic span annotation based on SemEval-2021 Task 5 (Pavlopoulos et al., 2021). Instead of binary toxicity classification, annotators identify the specific words/phrases that make a comment toxic, enabling more nuanced content moderation.
Dynamic Hate Speech Detection
Hate speech classification with fine-grained type labels based on the Dynamically Generated Hate Speech Dataset (Vidgen et al., ACL 2021). Classify content as hateful or not, then identify hate type (animosity, derogation, dehumanization, threatening, support for hateful entities) and target group.