Tutorials5 min read
语音情感分类
创建带有波形显示、播放速度控制和李克特量表的音频情感分类任务。
Potato Team·
语音情感分类
语音情感识别(SER)为虚拟助手、心理健康应用和客户服务分析提供支持。本教程展示如何构建分类情感、维度评分和混合情感检测的标注界面。
情感标注方法
标注语音情感有几种方式:
- 分类式:离散标签(快乐、悲伤、愤怒)
- 维度式:连续量表(效价、唤醒度、支配度)
- 混合式:多种情感及强度评分
- 片段式:不同时间戳对应不同情感
分类式情感分类
基本配置
yaml
annotation_task_name: "Speech Emotion Recognition"
data_files:
- data/utterances.json
item_properties:
id_key: id
audio_key: audio_path
text_key: transcript # Optional transcript
audio:
enabled: true
display: waveform
waveform_color: "#8B5CF6"
progress_color: "#A78BFA"
speed_control: true
speed_options: [0.75, 1.0, 1.25]
annotation_schemes:
- annotation_type: radio
name: emotion
description: "What emotion is expressed in this speech?"
labels:
- name: Happy
description: "Joy, excitement, amusement"
keyboard_shortcut: "h"
- name: Sad
description: "Sorrow, disappointment, grief"
keyboard_shortcut: "s"
- name: Angry
description: "Frustration, irritation, rage"
keyboard_shortcut: "a"
- name: Fearful
description: "Anxiety, worry, terror"
keyboard_shortcut: "f"
- name: Surprised
description: "Astonishment, shock"
keyboard_shortcut: "u"
- name: Disgusted
description: "Revulsion, distaste"
keyboard_shortcut: "d"
- name: Neutral
description: "No clear emotion"
keyboard_shortcut: "n"
required: true添加强度评分
yaml
annotation_schemes:
- annotation_type: radio
name: emotion
labels: [Happy, Sad, Angry, Fearful, Surprised, Disgusted, Neutral]
required: true
- annotation_type: likert
name: intensity
description: "How intense is this emotion?"
size: 5
min_label: "Very weak"
max_label: "Very strong"
conditional:
depends_on: emotion
hide_when: ["Neutral"]维度式情感标注
VAD(效价-唤醒度-支配度)模型:
yaml
annotation_task_name: "Dimensional Emotion Rating"
annotation_schemes:
# Valence: negative to positive
- annotation_type: likert
name: valence
description: "Valence: How positive or negative?"
size: 7
min_label: "Very negative"
max_label: "Very positive"
# Arousal: calm to excited
- annotation_type: likert
name: arousal
description: "Arousal: How calm or excited?"
size: 7
min_label: "Very calm"
max_label: "Very excited"
# Dominance: submissive to dominant
- annotation_type: likert
name: dominance
description: "Dominance: How submissive or dominant?"
size: 7
min_label: "Very submissive"
max_label: "Very dominant"视觉量表 (SAM)
自我评估人偶(Self-Assessment Manikin)风格:
yaml
annotation_schemes:
- annotation_type: image_scale
name: valence
description: "Select the figure that matches the emotional valence"
images:
- path: /images/sam_valence_1.png
value: 1
- path: /images/sam_valence_2.png
value: 2
# ... etc
size: 9混合情感检测
对于包含多种情感的语音:
yaml
annotation_schemes:
- annotation_type: multiselect
name: emotions_present
description: "Select ALL emotions you detect (can be multiple)"
labels:
- Happy
- Sad
- Angry
- Fearful
- Surprised
- Disgusted
- Contempt
min_selections: 1
- annotation_type: radio
name: primary_emotion
description: "Which emotion is MOST prominent?"
labels:
- Happy
- Sad
- Angry
- Fearful
- Surprised
- Disgusted
- Contempt
- Mixed (no dominant)综合情感标注
yaml
annotation_task_name: "Comprehensive Speech Emotion Annotation"
data_files:
- data/speech_samples.json
item_properties:
id_key: id
audio_key: audio_url
text_key: transcript
audio:
enabled: true
display: waveform
waveform_color: "#EC4899"
progress_color: "#F472B6"
height: 120
speed_control: true
speed_options: [0.5, 0.75, 1.0, 1.25]
show_duration: true
autoplay: false
# Show transcript if available
display:
show_text: true
text_field: transcript
text_label: "Transcript (for reference)"
annotation_schemes:
# Primary categorical emotion
- annotation_type: radio
name: primary_emotion
description: "Primary emotion expressed"
labels:
- name: Happiness
color: "#FCD34D"
keyboard_shortcut: "1"
- name: Sadness
color: "#60A5FA"
keyboard_shortcut: "2"
- name: Anger
color: "#F87171"
keyboard_shortcut: "3"
- name: Fear
color: "#A78BFA"
keyboard_shortcut: "4"
- name: Surprise
color: "#34D399"
keyboard_shortcut: "5"
- name: Disgust
color: "#FB923C"
keyboard_shortcut: "6"
- name: Neutral
color: "#9CA3AF"
keyboard_shortcut: "7"
required: true
# Emotional intensity
- annotation_type: likert
name: intensity
description: "Emotional intensity"
size: 5
min_label: "Very mild"
max_label: "Very intense"
required: true
# Dimensional ratings
- annotation_type: likert
name: valence
description: "Valence (negative to positive)"
size: 7
min_label: "Negative"
max_label: "Positive"
- annotation_type: likert
name: arousal
description: "Arousal (calm to excited)"
size: 7
min_label: "Calm"
max_label: "Excited"
# Voice quality
- annotation_type: multiselect
name: voice_qualities
description: "Voice characteristics (select all that apply)"
labels:
- Trembling voice
- Raised pitch
- Lowered pitch
- Loud/shouting
- Soft/whisper
- Fast speech rate
- Slow speech rate
- Breathy
- Tense/strained
- Crying
- Laughing
# Genuineness
- annotation_type: radio
name: authenticity
description: "Does the emotion seem genuine?"
labels:
- Clearly genuine
- Likely genuine
- Uncertain
- Likely acted/fake
- Clearly acted/fake
# Confidence
- annotation_type: likert
name: confidence
description: "How confident are you in your annotation?"
size: 5
min_label: "Guessing"
max_label: "Certain"
annotation_guidelines:
title: "Emotion Annotation Guidelines"
content: |
## Listening Instructions
1. Listen to the entire clip before annotating
2. You may replay as many times as needed
3. Focus on the VOICE, not just the words
## Emotion Categories
- **Happiness**: Joy, amusement, contentment
- **Sadness**: Sorrow, disappointment, melancholy
- **Anger**: Frustration, irritation, rage
- **Fear**: Anxiety, nervousness, terror
- **Surprise**: Astonishment, startle
- **Disgust**: Revulsion, contempt
- **Neutral**: Calm, matter-of-fact
## Tips
- Consider tone, pitch, speaking rate
- The transcript may not match the emotion
- When unsure between two emotions, choose the stronger one
- Use the intensity scale for unclear cases
output_annotation_dir: annotations/
output_annotation_format: jsonl输出格式
json
{
"id": "utt_001",
"audio_url": "/audio/sample_001.wav",
"transcript": "I can't believe this happened!",
"annotations": {
"primary_emotion": "Surprise",
"intensity": 4,
"valence": 2,
"arousal": 6,
"voice_qualities": ["Raised pitch", "Fast speech rate"],
"authenticity": "Clearly genuine",
"confidence": 4
},
"annotator": "rater_01",
"timestamp": "2024-12-05T10:30:00Z"
}片段级情感标注
对于包含情感变化的较长音频:
yaml
annotation_schemes:
- annotation_type: audio_segments
name: emotion_segments
description: "Mark time segments with different emotions"
labels:
- name: Happy
color: "#FCD34D"
- name: Sad
color: "#60A5FA"
- name: Angry
color: "#F87171"
- name: Neutral
color: "#9CA3AF"
segment_attributes:
- name: intensity
type: likert
size: 5质量控制
yaml
quality_control:
attention_checks:
enabled: true
gold_items:
- audio: "/audio/gold/clearly_happy.wav"
expected:
primary_emotion: "Happiness"
intensity: [4, 5] # Accept 4 or 5
- audio: "/audio/gold/clearly_angry.wav"
expected:
primary_emotion: "Anger"情感标注技巧
- 完整聆听:始终听完整个片段
- 关注声音:情感信息在于"怎么说"
- 文化意识:不同文化的表达规范有所不同
- 疲劳管理:适时休息——情感标注很消耗精力
- 校准对齐:定期团队讨论以提高一致性