RuSentiment - Social Media Sentiment
5-class sentiment annotation for social media posts based on RuSentiment (Rogers et al., COLING 2018). Includes Positive, Negative, Neutral, Speech Act (greetings/thanks), and Skip categories. Achieved 0.654 Fleiss kappa with 250-350 posts/hour annotation speed.
text annotation
Configuration Fileconfig.yaml
# RuSentiment - Social Media Sentiment Classification
# Based on Rogers et al., COLING 2018
# Paper: https://aclanthology.org/C18-1064/
# Dataset: https://github.com/text-machine-lab/rusentiment
#
# 5-class sentiment scheme designed for social media:
# - Positive: explicit or implicit positive sentiment
# - Negative: explicit or implicit negative sentiment
# - Neutral: no sentiment expressed
# - Speech Act: formulaic posts (greetings, thanks, congratulations)
# - Skip: unclear, noisy, or user-generated content like poems
#
# Guidelines:
# - Mixed sentiment: annotate based on dominant sentiment
# - Hashtags and emojis are NOT automatic sentiment labels
# - Speech Acts may not reflect sender's actual sentiment
# - Annotation speed target: 250-350 posts per hour
port: 8000
server_name: localhost
task_name: "RuSentiment: Social Media Sentiment Classification"
data_files:
- sample-data.json
id_key: id
text_key: text
output_file: annotations.json
annotation_schemes:
- annotation_type: radio
name: sentiment
description: "Classify the sentiment of this social media post"
labels:
- Positive
- Negative
- Neutral
- Speech Act
- Skip
keyboard_shortcuts:
Positive: "1"
Negative: "2"
Neutral: "3"
"Speech Act": "4"
Skip: "5"
tooltips:
Positive: "Post expresses positive emotion or favorable attitude (explicit or implicit)"
Negative: "Post expresses negative emotion or unfavorable attitude (explicit or implicit)"
Neutral: "Post contains no sentiment markers; purely informational"
"Speech Act": "Formulaic posts: greetings, thank-yous, congratulations, wishes (may not reflect true sentiment)"
Skip: "Unclear posts, excessive noise, user-generated content like poems or lyrics"
# Optional: For mixed sentiment posts
- annotation_type: radio
name: mixed_sentiment
description: "Does this post contain mixed sentiment?"
labels:
- "No - single sentiment"
- "Yes - but positive dominant"
- "Yes - but negative dominant"
- "Yes - balanced/unclear"
keyboard_shortcuts:
"No - single sentiment": "n"
"Yes - but positive dominant": "p"
"Yes - but negative dominant": "g"
"Yes - balanced/unclear": "b"
tooltips:
"No - single sentiment": "The post expresses only one type of sentiment"
"Yes - but positive dominant": "Mixed, but overall more positive"
"Yes - but negative dominant": "Mixed, but overall more negative"
"Yes - balanced/unclear": "Cannot determine dominant sentiment"
allow_all_users: true
instances_per_annotator: 500
annotation_per_instance: 3
allow_skip: false
Sample Datasample-data.json
[
{
"id": "rusent_001",
"text": "Just had the best coffee of my life! This cafe is amazing!"
},
{
"id": "rusent_002",
"text": "Happy birthday! Wishing you all the best on your special day!"
}
]
// ... and 13 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/rusentiment potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
Dynamic Hate Speech Detection
Hate speech classification with fine-grained type labels based on the Dynamically Generated Hate Speech Dataset (Vidgen et al., ACL 2021). Classify content as hateful or not, then identify hate type (animosity, derogation, dehumanization, threatening, support for hateful entities) and target group.
GoEmotions - Fine-Grained Emotion Classification
Multi-label emotion classification with 27 emotion categories plus neutral, based on the Google Research GoEmotions dataset (Demszky et al., ACL 2020). Taxonomy covers 12 positive, 11 negative, and 4 ambiguous emotions designed for Reddit comment analysis.
Implicit Hate Speech Detection
Detect and categorize implicit hate speech using a six-category taxonomy. Based on ElSherief et al., EMNLP 2021. Identifies grievance, incitement, stereotypes, inferiority, irony, and threats.