Classifier les émotions dans la parole
Créez une tâche de classification des émotions audio avec affichage de la forme d'onde, contrôles de vitesse de lecture et échelles de Likert.
Classifier les émotions dans la parole
La reconnaissance des émotions dans la parole (SER) alimente les assistants virtuels, les applications de santé mentale et l'analyse du service client. Ce tutoriel montre comment construire des interfaces d'annotation pour les émotions catégorielles, les évaluations dimensionnelles et la détection d'émotions mixtes.
Approches d'annotation des émotions
Il existe plusieurs façons d'annoter les émotions dans la parole :
- Catégorielle : Étiquettes discrètes (joyeux, triste, en colère)
- Dimensionnelle : Échelles continues (valence, éveil, dominance)
- Mixte : Émotions multiples avec évaluations d'intensité
- Par segment : Différentes émotions à différents horodatages
Classification catégorielle des émotions
Configuration de base
annotation_task_name: "Speech Emotion Recognition"
data_files:
- data/utterances.json
item_properties:
id_key: id
audio_key: audio_path
text_key: transcript # Optional transcript
audio:
enabled: true
display: waveform
waveform_color: "#8B5CF6"
progress_color: "#A78BFA"
speed_control: true
speed_options: [0.75, 1.0, 1.25]
annotation_schemes:
- annotation_type: radio
name: emotion
description: "What emotion is expressed in this speech?"
labels:
- name: Happy
description: "Joy, excitement, amusement"
keyboard_shortcut: "h"
- name: Sad
description: "Sorrow, disappointment, grief"
keyboard_shortcut: "s"
- name: Angry
description: "Frustration, irritation, rage"
keyboard_shortcut: "a"
- name: Fearful
description: "Anxiety, worry, terror"
keyboard_shortcut: "f"
- name: Surprised
description: "Astonishment, shock"
keyboard_shortcut: "u"
- name: Disgusted
description: "Revulsion, distaste"
keyboard_shortcut: "d"
- name: Neutral
description: "No clear emotion"
keyboard_shortcut: "n"
required: trueAjout de l'intensité
annotation_schemes:
- annotation_type: radio
name: emotion
labels: [Happy, Sad, Angry, Fearful, Surprised, Disgusted, Neutral]
required: true
- annotation_type: likert
name: intensity
description: "How intense is this emotion?"
size: 5
min_label: "Very weak"
max_label: "Very strong"
conditional:
depends_on: emotion
hide_when: ["Neutral"]Annotation dimensionnelle des émotions
Le modèle VAD (Valence-Arousal-Dominance) :
annotation_task_name: "Dimensional Emotion Rating"
annotation_schemes:
# Valence: negative to positive
- annotation_type: likert
name: valence
description: "Valence: How positive or negative?"
size: 7
min_label: "Very negative"
max_label: "Very positive"
# Arousal: calm to excited
- annotation_type: likert
name: arousal
description: "Arousal: How calm or excited?"
size: 7
min_label: "Very calm"
max_label: "Very excited"
# Dominance: submissive to dominant
- annotation_type: likert
name: dominance
description: "Dominance: How submissive or dominant?"
size: 7
min_label: "Very submissive"
max_label: "Very dominant"Échelles visuelles (SAM)
Style Self-Assessment Manikin :
annotation_schemes:
- annotation_type: image_scale
name: valence
description: "Select the figure that matches the emotional valence"
images:
- path: /images/sam_valence_1.png
value: 1
- path: /images/sam_valence_2.png
value: 2
# ... etc
size: 9Détection d'émotions mixtes
Pour la parole contenant plusieurs émotions :
annotation_schemes:
- annotation_type: multiselect
name: emotions_present
description: "Select ALL emotions you detect (can be multiple)"
labels:
- Happy
- Sad
- Angry
- Fearful
- Surprised
- Disgusted
- Contempt
min_selections: 1
- annotation_type: radio
name: primary_emotion
description: "Which emotion is MOST prominent?"
labels:
- Happy
- Sad
- Angry
- Fearful
- Surprised
- Disgusted
- Contempt
- Mixed (no dominant)Annotation complète des émotions
annotation_task_name: "Comprehensive Speech Emotion Annotation"
data_files:
- data/speech_samples.json
item_properties:
id_key: id
audio_key: audio_url
text_key: transcript
audio:
enabled: true
display: waveform
waveform_color: "#EC4899"
progress_color: "#F472B6"
height: 120
speed_control: true
speed_options: [0.5, 0.75, 1.0, 1.25]
show_duration: true
autoplay: false
# Show transcript if available
display:
show_text: true
text_field: transcript
text_label: "Transcript (for reference)"
annotation_schemes:
# Primary categorical emotion
- annotation_type: radio
name: primary_emotion
description: "Primary emotion expressed"
labels:
- name: Happiness
color: "#FCD34D"
keyboard_shortcut: "1"
- name: Sadness
color: "#60A5FA"
keyboard_shortcut: "2"
- name: Anger
color: "#F87171"
keyboard_shortcut: "3"
- name: Fear
color: "#A78BFA"
keyboard_shortcut: "4"
- name: Surprise
color: "#34D399"
keyboard_shortcut: "5"
- name: Disgust
color: "#FB923C"
keyboard_shortcut: "6"
- name: Neutral
color: "#9CA3AF"
keyboard_shortcut: "7"
required: true
# Emotional intensity
- annotation_type: likert
name: intensity
description: "Emotional intensity"
size: 5
min_label: "Very mild"
max_label: "Very intense"
required: true
# Dimensional ratings
- annotation_type: likert
name: valence
description: "Valence (negative to positive)"
size: 7
min_label: "Negative"
max_label: "Positive"
- annotation_type: likert
name: arousal
description: "Arousal (calm to excited)"
size: 7
min_label: "Calm"
max_label: "Excited"
# Voice quality
- annotation_type: multiselect
name: voice_qualities
description: "Voice characteristics (select all that apply)"
labels:
- Trembling voice
- Raised pitch
- Lowered pitch
- Loud/shouting
- Soft/whisper
- Fast speech rate
- Slow speech rate
- Breathy
- Tense/strained
- Crying
- Laughing
# Genuineness
- annotation_type: radio
name: authenticity
description: "Does the emotion seem genuine?"
labels:
- Clearly genuine
- Likely genuine
- Uncertain
- Likely acted/fake
- Clearly acted/fake
# Confidence
- annotation_type: likert
name: confidence
description: "How confident are you in your annotation?"
size: 5
min_label: "Guessing"
max_label: "Certain"
annotation_guidelines:
title: "Emotion Annotation Guidelines"
content: |
## Listening Instructions
1. Listen to the entire clip before annotating
2. You may replay as many times as needed
3. Focus on the VOICE, not just the words
## Emotion Categories
- **Happiness**: Joy, amusement, contentment
- **Sadness**: Sorrow, disappointment, melancholy
- **Anger**: Frustration, irritation, rage
- **Fear**: Anxiety, nervousness, terror
- **Surprise**: Astonishment, startle
- **Disgust**: Revulsion, contempt
- **Neutral**: Calm, matter-of-fact
## Tips
- Consider tone, pitch, speaking rate
- The transcript may not match the emotion
- When unsure between two emotions, choose the stronger one
- Use the intensity scale for unclear cases
output_annotation_dir: annotations/
output_annotation_format: jsonlFormat de sortie
{
"id": "utt_001",
"audio_url": "/audio/sample_001.wav",
"transcript": "I can't believe this happened!",
"annotations": {
"primary_emotion": "Surprise",
"intensity": 4,
"valence": 2,
"arousal": 6,
"voice_qualities": ["Raised pitch", "Fast speech rate"],
"authenticity": "Clearly genuine",
"confidence": 4
},
"annotator": "rater_01",
"timestamp": "2024-12-05T10:30:00Z"
}Émotion au niveau du segment
Pour les fichiers audio plus longs avec des émotions changeantes :
annotation_schemes:
- annotation_type: audio_segments
name: emotion_segments
description: "Mark time segments with different emotions"
labels:
- name: Happy
color: "#FCD34D"
- name: Sad
color: "#60A5FA"
- name: Angry
color: "#F87171"
- name: Neutral
color: "#9CA3AF"
segment_attributes:
- name: intensity
type: likert
size: 5Contrôle qualité
quality_control:
attention_checks:
enabled: true
gold_items:
- audio: "/audio/gold/clearly_happy.wav"
expected:
primary_emotion: "Happiness"
intensity: [4, 5] # Accept 4 or 5
- audio: "/audio/gold/clearly_angry.wav"
expected:
primary_emotion: "Anger"Conseils pour l'annotation des émotions
- Écoute complète : Écoutez toujours le clip en entier
- Concentrez-vous sur la voix : L'information émotionnelle est dans la MANIÈRE dont les choses sont dites
- Sensibilité culturelle : Les normes d'expression varient selon les cultures
- Gestion de la fatigue : Faites des pauses - l'annotation des émotions est exigeante
- Calibration : Les discussions régulières en équipe améliorent la cohérence
Prochaines étapes
- Ajoutez la diarisation des locuteurs pour le suivi des émotions multi-locuteurs
- Mettez en place le crowdsourcing pour la collecte à grande échelle
- Calculez l'accord inter-annotateurs pour les tâches d'émotion
Documentation sur /docs/features/audio-annotation.