RuSentiment - Social Media Sentiment

5-class sentiment annotation for social media posts based on RuSentiment (Rogers et al., COLING 2018). Includes Positive, Negative, Neutral, Speech Act (greetings/thanks), and Skip categories. Achieved 0.654 Fleiss kappa with 250-350 posts/hour annotation speed.

Archivo de configuraciónconfig.yaml

# RuSentiment - Social Media Sentiment Classification
# Based on Rogers et al., COLING 2018
# Paper: https://aclanthology.org/C18-1064/
# Dataset: https://github.com/text-machine-lab/rusentiment
#
# 5-class sentiment scheme designed for social media:
# - Positive: explicit or implicit positive sentiment
# - Negative: explicit or implicit negative sentiment
# - Neutral: no sentiment expressed
# - Speech Act: formulaic posts (greetings, thanks, congratulations)
# - Skip: unclear, noisy, or user-generated content like poems
#
# Guidelines:
# - Mixed sentiment: annotate based on dominant sentiment
# - Hashtags and emojis are NOT automatic sentiment labels
# - Speech Acts may not reflect sender's actual sentiment
# - Annotation speed target: 250-350 posts per hour

annotation_task_name: "RuSentiment: Social Media Sentiment Classification"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "Classify the sentiment of this social media post"
    labels:
      - Positive
      - Negative
      - Neutral
      - Speech Act
      - Skip
    keyboard_shortcuts:
      Positive: "1"
      Negative: "2"
      Neutral: "3"
      "Speech Act": "4"
      Skip: "5"
    tooltips:
      Positive: "Post expresses positive emotion or favorable attitude (explicit or implicit)"
      Negative: "Post expresses negative emotion or unfavorable attitude (explicit or implicit)"
      Neutral: "Post contains no sentiment markers; purely informational"
      "Speech Act": "Formulaic posts: greetings, thank-yous, congratulations, wishes (may not reflect true sentiment)"
      Skip: "Unclear posts, excessive noise, user-generated content like poems or lyrics"

  # Optional: For mixed sentiment posts
  - annotation_type: radio
    name: mixed_sentiment
    description: "Does this post contain mixed sentiment?"
    labels:
      - "No - single sentiment"
      - "Yes - but positive dominant"
      - "Yes - but negative dominant"
      - "Yes - balanced/unclear"
    keyboard_shortcuts:
      "No - single sentiment": "n"
      "Yes - but positive dominant": "p"
      "Yes - but negative dominant": "g"
      "Yes - balanced/unclear": "b"
    tooltips:
      "No - single sentiment": "The post expresses only one type of sentiment"
      "Yes - but positive dominant": "Mixed, but overall more positive"
      "Yes - but negative dominant": "Mixed, but overall more negative"
      "Yes - balanced/unclear": "Cannot determine dominant sentiment"

allow_all_users: true
instances_per_annotator: 500
annotation_per_instance: 3
allow_skip: false

Datos de ejemplosample-data.json

[
  {
    "id": "rusent_001",
    "text": "Just had the best coffee of my life! This cafe is amazing!"
  },
  {
    "id": "rusent_002",
    "text": "Happy birthday! Wishing you all the best on your special day!"
  }
]

// ... and 13 more items

Obtener este diseño

View on GitHub

Clone or download from the repository

Inicio rápido:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/emotion-sentiment/rusentiment
potato start config.yaml

Detalles

Tipos de anotación

radio

Dominio

NLPSocial Media

Casos de uso

Sentiment AnalysisSocial Media Analysis

Etiquetas

sentimentsocial-mediarussianspeech-actscoling2018

¿Encontró un problema o desea mejorar este diseño?

Abrir un issue

Diseños relacionados

AfriSenti - African Language Sentiment

Sentiment analysis for tweets in African languages, classifying text as positive, negative, or neutral. Covers 14 African languages including Amharic, Hausa, Igbo, Yoruba, and Swahili. Based on SemEval-2023 Task 12 (Muhammad et al.).

radio

Detecting Stance in Tweets

Classification of stance expressed in tweets toward specific targets as favor, against, or neither. Based on SemEval-2016 Task 6 (Stance Detection).

radio

Explainable Online Sexism Detection

Detection and fine-grained classification of online sexism with span-level evidence extraction. Categories include threats, derogation, animosity, and prejudiced discussion. Based on SemEval-2023 Task 10 (Kirk et al.).

radiospan