intermediateaudio
Speech Emotion Recognition
Classify emotional content in speech following IEMOCAP and CREMA-D annotation schemes.
🎧
audio annotation
Configuration Fileconfig.yaml
task_name: "Speech Emotion Recognition"
# Server configuration
server:
port: 8000
# Audio settings
audio:
enabled: true
display: waveform
waveform_color: "#EC4899"
progress_color: "#F472B6"
speed_control: true
speed_options: [0.75, 1.0, 1.25]
# Data configuration
data_files:
- path: data/speech_clips.json
audio_field: audio_file
text_field: transcript
# Annotation schemes
annotation_schemes:
# Primary emotion category
- annotation_type: radio
name: emotion_category
description: "What is the primary emotion expressed?"
labels:
- Angry
- Happy
- Sad
- Neutral
- Frustrated
- Excited
- Fearful
- Surprised
- Disgusted
- Other
keyboard_shortcuts:
"Angry": "a"
"Happy": "h"
"Sad": "s"
"Neutral": "n"
"Frustrated": "f"
"Excited": "e"
"Fearful": "r"
"Surprised": "u"
"Disgusted": "d"
"Other": "o"
# Emotion intensity
- annotation_type: likert
name: intensity
description: "How intense is the emotional expression?"
size: 5
min_label: "Very weak"
max_label: "Very strong"
# Valence (positive-negative)
- annotation_type: likert
name: valence
description: "Valence: How positive or negative is the emotion?"
size: 7
min_label: "Very negative"
max_label: "Very positive"
# Arousal (activation level)
- annotation_type: likert
name: arousal
description: "Arousal: How activated/energetic is the speaker?"
size: 7
min_label: "Very calm"
max_label: "Very excited"
# Speaking style
- annotation_type: radio
name: speaking_style
description: "Does this seem acted or natural?"
labels:
- Clearly acted/performed
- Somewhat theatrical
- Natural/spontaneous
- Cannot determine
# Audio quality impact
- annotation_type: radio
name: quality_impact
description: "Does audio quality affect emotion perception?"
labels:
- No impact (clear audio)
- Minor impact
- Significant impact
- Cannot judge emotion due to quality
# Confidence
- annotation_type: likert
name: confidence
description: "Confidence in your emotion judgment"
size: 5
min_label: "Uncertain"
max_label: "Very certain"
# User settings
allow_all_users: true
instances_per_annotator: 150
# Output
output:
path: annotations/
format: json
Get This Design
This design is available in our showcase. Copy the configuration below to get started.
Quick start:
# Create your project folder mkdir speech-emotion-recognition cd speech-emotion-recognition # Copy config.yaml from above potato start config.yaml
Details
Annotation Types
radiolikert
Domain
AudioSpeech
Use Cases
emotion recognitionaffective computingspeech analysis
Tags
audioemotionspeechiemocapaffective computing
Related Designs
Audio-Visual Sentiment Analysis
Rate sentiment in speech segments following CMU-MOSI and CMU-MOSEI multimodal annotation protocols.
likertradio
Speech Intelligibility Rating
Rate speech intelligibility for pathological speech following TORGO database annotation protocols.
likertradio
Speech Quality MOS Rating
Rate speech quality using Mean Opinion Score following ITU-T P.800 and Blizzard Challenge protocols.
likertradio