Showcase/Speech Emotion Recognition
intermediateaudio

Speech Emotion Recognition

Classify emotional content in speech following IEMOCAP and CREMA-D annotation schemes.

🎧

audio annotation

Configuration Fileconfig.yaml

task_name: "Speech Emotion Recognition"

# Server configuration
server:
  port: 8000

# Audio settings
audio:
  enabled: true
  display: waveform
  waveform_color: "#EC4899"
  progress_color: "#F472B6"
  speed_control: true
  speed_options: [0.75, 1.0, 1.25]

# Data configuration
data_files:
  - path: data/speech_clips.json
    audio_field: audio_file
    text_field: transcript

# Annotation schemes
annotation_schemes:
  # Primary emotion category
  - annotation_type: radio
    name: emotion_category
    description: "What is the primary emotion expressed?"
    labels:
      - Angry
      - Happy
      - Sad
      - Neutral
      - Frustrated
      - Excited
      - Fearful
      - Surprised
      - Disgusted
      - Other
    keyboard_shortcuts:
      "Angry": "a"
      "Happy": "h"
      "Sad": "s"
      "Neutral": "n"
      "Frustrated": "f"
      "Excited": "e"
      "Fearful": "r"
      "Surprised": "u"
      "Disgusted": "d"
      "Other": "o"

  # Emotion intensity
  - annotation_type: likert
    name: intensity
    description: "How intense is the emotional expression?"
    size: 5
    min_label: "Very weak"
    max_label: "Very strong"

  # Valence (positive-negative)
  - annotation_type: likert
    name: valence
    description: "Valence: How positive or negative is the emotion?"
    size: 7
    min_label: "Very negative"
    max_label: "Very positive"

  # Arousal (activation level)
  - annotation_type: likert
    name: arousal
    description: "Arousal: How activated/energetic is the speaker?"
    size: 7
    min_label: "Very calm"
    max_label: "Very excited"

  # Speaking style
  - annotation_type: radio
    name: speaking_style
    description: "Does this seem acted or natural?"
    labels:
      - Clearly acted/performed
      - Somewhat theatrical
      - Natural/spontaneous
      - Cannot determine

  # Audio quality impact
  - annotation_type: radio
    name: quality_impact
    description: "Does audio quality affect emotion perception?"
    labels:
      - No impact (clear audio)
      - Minor impact
      - Significant impact
      - Cannot judge emotion due to quality

  # Confidence
  - annotation_type: likert
    name: confidence
    description: "Confidence in your emotion judgment"
    size: 5
    min_label: "Uncertain"
    max_label: "Very certain"

# User settings
allow_all_users: true
instances_per_annotator: 150

# Output
output:
  path: annotations/
  format: json

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir speech-emotion-recognition
cd speech-emotion-recognition
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

radiolikert

Domain

AudioSpeech

Use Cases

emotion recognitionaffective computingspeech analysis

Tags

audioemotionspeechiemocapaffective computing