Skip to content
Showcase/Speech Emotion Recognition
intermediateaudio

Speech Emotion Recognition

Classify emotional content in speech following IEMOCAP and CREMA-D annotation schemes.

1:42Classify this audio:HappySadAngryNeutralSubmit

Configuration Fileconfig.yaml

annotation_task_name: "Speech Emotion Recognition"

port: 8000

# Data configuration
data_files:
  - "data/speech_clips.json"

item_properties:
  id_key: id
  text_key: transcript

# Annotation schemes
annotation_schemes:
  # Primary emotion category
  - annotation_type: radio
    name: emotion_category
    description: "What is the primary emotion expressed?"
    labels:
      - name: Angry
        key_value: "a"
      - name: Happy
        key_value: "h"
      - name: Sad
        key_value: "s"
      - name: Neutral
        key_value: "n"
      - name: Frustrated
        key_value: "f"
      - name: Excited
        key_value: "e"
      - name: Fearful
        key_value: "r"
      - name: Surprised
        key_value: "u"
      - name: Disgusted
        key_value: "d"
      - name: Other
        key_value: "o"
    sequential_key_binding: true

  # Emotion intensity
  - annotation_type: likert
    name: intensity
    description: "How intense is the emotional expression?"
    size: 5
    min_label: "Very weak"
    max_label: "Very strong"

  # Valence (positive-negative)
  - annotation_type: likert
    name: valence
    description: "Valence: How positive or negative is the emotion?"
    size: 7
    min_label: "Very negative"
    max_label: "Very positive"

  # Arousal (activation level)
  - annotation_type: likert
    name: arousal
    description: "Arousal: How activated/energetic is the speaker?"
    size: 7
    min_label: "Very calm"
    max_label: "Very excited"

  # Speaking style
  - annotation_type: radio
    name: speaking_style
    description: "Does this seem acted or natural?"
    labels:
      - Clearly acted/performed
      - Somewhat theatrical
      - Natural/spontaneous
      - Cannot determine

  # Audio quality impact
  - annotation_type: radio
    name: quality_impact
    description: "Does audio quality affect emotion perception?"
    labels:
      - No impact (clear audio)
      - Minor impact
      - Significant impact
      - Cannot judge emotion due to quality

  # Confidence
  - annotation_type: likert
    name: confidence
    description: "Confidence in your emotion judgment"
    size: 5
    min_label: "Uncertain"
    max_label: "Very certain"

# User settings
allow_all_users: true
instances_per_annotator: 150

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir speech-emotion-recognition
cd speech-emotion-recognition
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

radiolikert

Domain

AudioSpeech

Use Cases

emotion recognitionaffective computingspeech analysis

Tags

audioemotionspeechiemocapaffective computing