Speech Emotion Recognition

Classify emotional content in speech following IEMOCAP and CREMA-D annotation schemes.

Configuration Fileconfig.yaml

annotation_task_name: "Speech Emotion Recognition"

port: 8000

# Data configuration
data_files:
  - "data/speech_clips.json"

item_properties:
  id_key: id
  text_key: transcript

# Annotation schemes
annotation_schemes:
  # Primary emotion category
  - annotation_type: radio
    name: emotion_category
    description: "What is the primary emotion expressed?"
    labels:
      - name: Angry
        key_value: "a"
      - name: Happy
        key_value: "h"
      - name: Sad
        key_value: "s"
      - name: Neutral
        key_value: "n"
      - name: Frustrated
        key_value: "f"
      - name: Excited
        key_value: "e"
      - name: Fearful
        key_value: "r"
      - name: Surprised
        key_value: "u"
      - name: Disgusted
        key_value: "d"
      - name: Other
        key_value: "o"
    sequential_key_binding: true

  # Emotion intensity
  - annotation_type: likert
    name: intensity
    description: "How intense is the emotional expression?"
    size: 5
    min_label: "Very weak"
    max_label: "Very strong"

  # Valence (positive-negative)
  - annotation_type: likert
    name: valence
    description: "Valence: How positive or negative is the emotion?"
    size: 7
    min_label: "Very negative"
    max_label: "Very positive"

  # Arousal (activation level)
  - annotation_type: likert
    name: arousal
    description: "Arousal: How activated/energetic is the speaker?"
    size: 7
    min_label: "Very calm"
    max_label: "Very excited"

  # Speaking style
  - annotation_type: radio
    name: speaking_style
    description: "Does this seem acted or natural?"
    labels:
      - Clearly acted/performed
      - Somewhat theatrical
      - Natural/spontaneous
      - Cannot determine

  # Audio quality impact
  - annotation_type: radio
    name: quality_impact
    description: "Does audio quality affect emotion perception?"
    labels:
      - No impact (clear audio)
      - Minor impact
      - Significant impact
      - Cannot judge emotion due to quality

  # Confidence
  - annotation_type: likert
    name: confidence
    description: "Confidence in your emotion judgment"
    size: 5
    min_label: "Uncertain"
    max_label: "Very certain"

# User settings
allow_all_users: true
instances_per_annotator: 150

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir speech-emotion-recognition
cd speech-emotion-recognition
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

radiolikert

Domain

AudioSpeech

Use Cases

emotion recognitionaffective computingspeech analysis

Related Designs

Audio Transcription Review

Review and correct automatic speech recognition transcriptions with waveform visualization.

likertmultiselect

Audio-Visual Sentiment Analysis

Rate sentiment in speech segments following CMU-MOSI and CMU-MOSEI multimodal annotation protocols.

likertradio

Speech Intelligibility Rating

Rate speech intelligibility for pathological speech following TORGO database annotation protocols.