Tutorials6 min read
Clasificación de Emociones en el Habla
Crea una tarea de clasificación de emociones en audio con visualización de forma de onda, controles de velocidad de reproducción y escalas Likert.
Potato Team·
Clasificación de Emociones en el Habla
El reconocimiento de emociones en el habla (SER) impulsa asistentes virtuales, aplicaciones de salud mental y análisis de servicio al cliente. Este tutorial muestra cómo construir interfaces de anotación para emociones categóricas, calificaciones dimensionales y detección de emociones mixtas.
Enfoques de Anotación de Emociones
Existen varias formas de anotar emociones en el habla:
- Categórico: Etiquetas discretas (feliz, triste, enojado)
- Dimensional: Escalas continuas (valencia, activación, dominancia)
- Mixto: Múltiples emociones con calificaciones de intensidad
- Basado en segmentos: Diferentes emociones en diferentes marcas de tiempo
Clasificación Categórica de Emociones
Configuración Básica
yaml
annotation_task_name: "Speech Emotion Recognition"
data_files:
- data/utterances.json
item_properties:
id_key: id
audio_key: audio_path
text_key: transcript # Optional transcript
audio:
enabled: true
display: waveform
waveform_color: "#8B5CF6"
progress_color: "#A78BFA"
speed_control: true
speed_options: [0.75, 1.0, 1.25]
annotation_schemes:
- annotation_type: radio
name: emotion
description: "What emotion is expressed in this speech?"
labels:
- name: Happy
description: "Joy, excitement, amusement"
keyboard_shortcut: "h"
- name: Sad
description: "Sorrow, disappointment, grief"
keyboard_shortcut: "s"
- name: Angry
description: "Frustration, irritation, rage"
keyboard_shortcut: "a"
- name: Fearful
description: "Anxiety, worry, terror"
keyboard_shortcut: "f"
- name: Surprised
description: "Astonishment, shock"
keyboard_shortcut: "u"
- name: Disgusted
description: "Revulsion, distaste"
keyboard_shortcut: "d"
- name: Neutral
description: "No clear emotion"
keyboard_shortcut: "n"
required: trueAgregar Intensidad
yaml
annotation_schemes:
- annotation_type: radio
name: emotion
labels: [Happy, Sad, Angry, Fearful, Surprised, Disgusted, Neutral]
required: true
- annotation_type: likert
name: intensity
description: "How intense is this emotion?"
size: 5
min_label: "Very weak"
max_label: "Very strong"
conditional:
depends_on: emotion
hide_when: ["Neutral"]Anotación Dimensional de Emociones
El modelo VAD (Valencia-Activación-Dominancia):
yaml
annotation_task_name: "Dimensional Emotion Rating"
annotation_schemes:
# Valence: negative to positive
- annotation_type: likert
name: valence
description: "Valence: How positive or negative?"
size: 7
min_label: "Very negative"
max_label: "Very positive"
# Arousal: calm to excited
- annotation_type: likert
name: arousal
description: "Arousal: How calm or excited?"
size: 7
min_label: "Very calm"
max_label: "Very excited"
# Dominance: submissive to dominant
- annotation_type: likert
name: dominance
description: "Dominance: How submissive or dominant?"
size: 7
min_label: "Very submissive"
max_label: "Very dominant"Escalas Visuales (SAM)
Estilo Self-Assessment Manikin:
yaml
annotation_schemes:
- annotation_type: image_scale
name: valence
description: "Select the figure that matches the emotional valence"
images:
- path: /images/sam_valence_1.png
value: 1
- path: /images/sam_valence_2.png
value: 2
# ... etc
size: 9Detección de Emociones Mixtas
Para habla que contiene múltiples emociones:
yaml
annotation_schemes:
- annotation_type: multiselect
name: emotions_present
description: "Select ALL emotions you detect (can be multiple)"
labels:
- Happy
- Sad
- Angry
- Fearful
- Surprised
- Disgusted
- Contempt
min_selections: 1
- annotation_type: radio
name: primary_emotion
description: "Which emotion is MOST prominent?"
labels:
- Happy
- Sad
- Angry
- Fearful
- Surprised
- Disgusted
- Contempt
- Mixed (no dominant)Anotación Integral de Emociones
yaml
annotation_task_name: "Comprehensive Speech Emotion Annotation"
data_files:
- data/speech_samples.json
item_properties:
id_key: id
audio_key: audio_url
text_key: transcript
audio:
enabled: true
display: waveform
waveform_color: "#EC4899"
progress_color: "#F472B6"
height: 120
speed_control: true
speed_options: [0.5, 0.75, 1.0, 1.25]
show_duration: true
autoplay: false
# Show transcript if available
display:
show_text: true
text_field: transcript
text_label: "Transcript (for reference)"
annotation_schemes:
# Primary categorical emotion
- annotation_type: radio
name: primary_emotion
description: "Primary emotion expressed"
labels:
- name: Happiness
color: "#FCD34D"
keyboard_shortcut: "1"
- name: Sadness
color: "#60A5FA"
keyboard_shortcut: "2"
- name: Anger
color: "#F87171"
keyboard_shortcut: "3"
- name: Fear
color: "#A78BFA"
keyboard_shortcut: "4"
- name: Surprise
color: "#34D399"
keyboard_shortcut: "5"
- name: Disgust
color: "#FB923C"
keyboard_shortcut: "6"
- name: Neutral
color: "#9CA3AF"
keyboard_shortcut: "7"
required: true
# Emotional intensity
- annotation_type: likert
name: intensity
description: "Emotional intensity"
size: 5
min_label: "Very mild"
max_label: "Very intense"
required: true
# Dimensional ratings
- annotation_type: likert
name: valence
description: "Valence (negative to positive)"
size: 7
min_label: "Negative"
max_label: "Positive"
- annotation_type: likert
name: arousal
description: "Arousal (calm to excited)"
size: 7
min_label: "Calm"
max_label: "Excited"
# Voice quality
- annotation_type: multiselect
name: voice_qualities
description: "Voice characteristics (select all that apply)"
labels:
- Trembling voice
- Raised pitch
- Lowered pitch
- Loud/shouting
- Soft/whisper
- Fast speech rate
- Slow speech rate
- Breathy
- Tense/strained
- Crying
- Laughing
# Genuineness
- annotation_type: radio
name: authenticity
description: "Does the emotion seem genuine?"
labels:
- Clearly genuine
- Likely genuine
- Uncertain
- Likely acted/fake
- Clearly acted/fake
# Confidence
- annotation_type: likert
name: confidence
description: "How confident are you in your annotation?"
size: 5
min_label: "Guessing"
max_label: "Certain"
annotation_guidelines:
title: "Emotion Annotation Guidelines"
content: |
## Listening Instructions
1. Listen to the entire clip before annotating
2. You may replay as many times as needed
3. Focus on the VOICE, not just the words
## Emotion Categories
- **Happiness**: Joy, amusement, contentment
- **Sadness**: Sorrow, disappointment, melancholy
- **Anger**: Frustration, irritation, rage
- **Fear**: Anxiety, nervousness, terror
- **Surprise**: Astonishment, startle
- **Disgust**: Revulsion, contempt
- **Neutral**: Calm, matter-of-fact
## Tips
- Consider tone, pitch, speaking rate
- The transcript may not match the emotion
- When unsure between two emotions, choose the stronger one
- Use the intensity scale for unclear cases
output_annotation_dir: annotations/
output_annotation_format: jsonlFormato de Salida
json
{
"id": "utt_001",
"audio_url": "/audio/sample_001.wav",
"transcript": "I can't believe this happened!",
"annotations": {
"primary_emotion": "Surprise",
"intensity": 4,
"valence": 2,
"arousal": 6,
"voice_qualities": ["Raised pitch", "Fast speech rate"],
"authenticity": "Clearly genuine",
"confidence": 4
},
"annotator": "rater_01",
"timestamp": "2024-12-05T10:30:00Z"
}Emoción a Nivel de Segmento
Para audio más largo con emociones cambiantes:
yaml
annotation_schemes:
- annotation_type: audio_segments
name: emotion_segments
description: "Mark time segments with different emotions"
labels:
- name: Happy
color: "#FCD34D"
- name: Sad
color: "#60A5FA"
- name: Angry
color: "#F87171"
- name: Neutral
color: "#9CA3AF"
segment_attributes:
- name: intensity
type: likert
size: 5Control de Calidad
yaml
quality_control:
attention_checks:
enabled: true
gold_items:
- audio: "/audio/gold/clearly_happy.wav"
expected:
primary_emotion: "Happiness"
intensity: [4, 5] # Accept 4 or 5
- audio: "/audio/gold/clearly_angry.wav"
expected:
primary_emotion: "Anger"Consejos para la Anotación de Emociones
- Escucha completa: Siempre escucha el clip completo
- Enfoque en la voz: La información emocional está en CÓMO se dicen las cosas
- Conciencia cultural: Las normas de expresión varían entre culturas
- Gestión de la fatiga: Toma descansos - la anotación de emociones es exigente
- Calibración: Las discusiones regulares del equipo mejoran la consistencia
Próximos Pasos
- Agregar diarización de hablantes para seguimiento de emociones con múltiples hablantes
- Configurar crowdsourcing para recolección a gran escala
- Calcular acuerdo entre anotadores para tareas de emociones
Documentación en /docs/features/audio-annotation.