Skip to content
Tutorials6 min read

Clasificación de Emociones en el Habla

Crea una tarea de clasificación de emociones en audio con visualización de forma de onda, controles de velocidad de reproducción y escalas Likert.

Potato Team·

Clasificación de Emociones en el Habla

El reconocimiento de emociones en el habla (SER) impulsa asistentes virtuales, aplicaciones de salud mental y análisis de servicio al cliente. Este tutorial muestra cómo construir interfaces de anotación para emociones categóricas, calificaciones dimensionales y detección de emociones mixtas.

Enfoques de Anotación de Emociones

Existen varias formas de anotar emociones en el habla:

  1. Categórico: Etiquetas discretas (feliz, triste, enojado)
  2. Dimensional: Escalas continuas (valencia, activación, dominancia)
  3. Mixto: Múltiples emociones con calificaciones de intensidad
  4. Basado en segmentos: Diferentes emociones en diferentes marcas de tiempo

Clasificación Categórica de Emociones

Configuración Básica

yaml
annotation_task_name: "Speech Emotion Recognition"
 
data_files:
  - data/utterances.json
 
item_properties:
  id_key: id
  audio_key: audio_path
  text_key: transcript  # Optional transcript
 
audio:
  enabled: true
  display: waveform
  waveform_color: "#8B5CF6"
  progress_color: "#A78BFA"
  speed_control: true
  speed_options: [0.75, 1.0, 1.25]
 
annotation_schemes:
  - annotation_type: radio
    name: emotion
    description: "What emotion is expressed in this speech?"
    labels:
      - name: Happy
        description: "Joy, excitement, amusement"
        keyboard_shortcut: "h"
      - name: Sad
        description: "Sorrow, disappointment, grief"
        keyboard_shortcut: "s"
      - name: Angry
        description: "Frustration, irritation, rage"
        keyboard_shortcut: "a"
      - name: Fearful
        description: "Anxiety, worry, terror"
        keyboard_shortcut: "f"
      - name: Surprised
        description: "Astonishment, shock"
        keyboard_shortcut: "u"
      - name: Disgusted
        description: "Revulsion, distaste"
        keyboard_shortcut: "d"
      - name: Neutral
        description: "No clear emotion"
        keyboard_shortcut: "n"
    required: true

Agregar Intensidad

yaml
annotation_schemes:
  - annotation_type: radio
    name: emotion
    labels: [Happy, Sad, Angry, Fearful, Surprised, Disgusted, Neutral]
    required: true
 
  - annotation_type: likert
    name: intensity
    description: "How intense is this emotion?"
    size: 5
    min_label: "Very weak"
    max_label: "Very strong"
    conditional:
      depends_on: emotion
      hide_when: ["Neutral"]

Anotación Dimensional de Emociones

El modelo VAD (Valencia-Activación-Dominancia):

yaml
annotation_task_name: "Dimensional Emotion Rating"
 
annotation_schemes:
  # Valence: negative to positive
  - annotation_type: likert
    name: valence
    description: "Valence: How positive or negative?"
    size: 7
    min_label: "Very negative"
    max_label: "Very positive"
 
  # Arousal: calm to excited
  - annotation_type: likert
    name: arousal
    description: "Arousal: How calm or excited?"
    size: 7
    min_label: "Very calm"
    max_label: "Very excited"
 
  # Dominance: submissive to dominant
  - annotation_type: likert
    name: dominance
    description: "Dominance: How submissive or dominant?"
    size: 7
    min_label: "Very submissive"
    max_label: "Very dominant"

Escalas Visuales (SAM)

Estilo Self-Assessment Manikin:

yaml
annotation_schemes:
  - annotation_type: image_scale
    name: valence
    description: "Select the figure that matches the emotional valence"
    images:
      - path: /images/sam_valence_1.png
        value: 1
      - path: /images/sam_valence_2.png
        value: 2
      # ... etc
    size: 9

Detección de Emociones Mixtas

Para habla que contiene múltiples emociones:

yaml
annotation_schemes:
  - annotation_type: multiselect
    name: emotions_present
    description: "Select ALL emotions you detect (can be multiple)"
    labels:
      - Happy
      - Sad
      - Angry
      - Fearful
      - Surprised
      - Disgusted
      - Contempt
    min_selections: 1
 
  - annotation_type: radio
    name: primary_emotion
    description: "Which emotion is MOST prominent?"
    labels:
      - Happy
      - Sad
      - Angry
      - Fearful
      - Surprised
      - Disgusted
      - Contempt
      - Mixed (no dominant)

Anotación Integral de Emociones

yaml
annotation_task_name: "Comprehensive Speech Emotion Annotation"
 
data_files:
  - data/speech_samples.json
 
item_properties:
  id_key: id
  audio_key: audio_url
  text_key: transcript
 
audio:
  enabled: true
  display: waveform
  waveform_color: "#EC4899"
  progress_color: "#F472B6"
  height: 120
  speed_control: true
  speed_options: [0.5, 0.75, 1.0, 1.25]
  show_duration: true
  autoplay: false
 
# Show transcript if available
display:
  show_text: true
  text_field: transcript
  text_label: "Transcript (for reference)"
 
annotation_schemes:
  # Primary categorical emotion
  - annotation_type: radio
    name: primary_emotion
    description: "Primary emotion expressed"
    labels:
      - name: Happiness
        color: "#FCD34D"
        keyboard_shortcut: "1"
      - name: Sadness
        color: "#60A5FA"
        keyboard_shortcut: "2"
      - name: Anger
        color: "#F87171"
        keyboard_shortcut: "3"
      - name: Fear
        color: "#A78BFA"
        keyboard_shortcut: "4"
      - name: Surprise
        color: "#34D399"
        keyboard_shortcut: "5"
      - name: Disgust
        color: "#FB923C"
        keyboard_shortcut: "6"
      - name: Neutral
        color: "#9CA3AF"
        keyboard_shortcut: "7"
    required: true
 
  # Emotional intensity
  - annotation_type: likert
    name: intensity
    description: "Emotional intensity"
    size: 5
    min_label: "Very mild"
    max_label: "Very intense"
    required: true
 
  # Dimensional ratings
  - annotation_type: likert
    name: valence
    description: "Valence (negative to positive)"
    size: 7
    min_label: "Negative"
    max_label: "Positive"
 
  - annotation_type: likert
    name: arousal
    description: "Arousal (calm to excited)"
    size: 7
    min_label: "Calm"
    max_label: "Excited"
 
  # Voice quality
  - annotation_type: multiselect
    name: voice_qualities
    description: "Voice characteristics (select all that apply)"
    labels:
      - Trembling voice
      - Raised pitch
      - Lowered pitch
      - Loud/shouting
      - Soft/whisper
      - Fast speech rate
      - Slow speech rate
      - Breathy
      - Tense/strained
      - Crying
      - Laughing
 
  # Genuineness
  - annotation_type: radio
    name: authenticity
    description: "Does the emotion seem genuine?"
    labels:
      - Clearly genuine
      - Likely genuine
      - Uncertain
      - Likely acted/fake
      - Clearly acted/fake
 
  # Confidence
  - annotation_type: likert
    name: confidence
    description: "How confident are you in your annotation?"
    size: 5
    min_label: "Guessing"
    max_label: "Certain"
 
annotation_guidelines:
  title: "Emotion Annotation Guidelines"
  content: |
    ## Listening Instructions
    1. Listen to the entire clip before annotating
    2. You may replay as many times as needed
    3. Focus on the VOICE, not just the words
 
    ## Emotion Categories
    - **Happiness**: Joy, amusement, contentment
    - **Sadness**: Sorrow, disappointment, melancholy
    - **Anger**: Frustration, irritation, rage
    - **Fear**: Anxiety, nervousness, terror
    - **Surprise**: Astonishment, startle
    - **Disgust**: Revulsion, contempt
    - **Neutral**: Calm, matter-of-fact
 
    ## Tips
    - Consider tone, pitch, speaking rate
    - The transcript may not match the emotion
    - When unsure between two emotions, choose the stronger one
    - Use the intensity scale for unclear cases
 
output_annotation_dir: annotations/
output_annotation_format: jsonl

Formato de Salida

json
{
  "id": "utt_001",
  "audio_url": "/audio/sample_001.wav",
  "transcript": "I can't believe this happened!",
  "annotations": {
    "primary_emotion": "Surprise",
    "intensity": 4,
    "valence": 2,
    "arousal": 6,
    "voice_qualities": ["Raised pitch", "Fast speech rate"],
    "authenticity": "Clearly genuine",
    "confidence": 4
  },
  "annotator": "rater_01",
  "timestamp": "2024-12-05T10:30:00Z"
}

Emoción a Nivel de Segmento

Para audio más largo con emociones cambiantes:

yaml
annotation_schemes:
  - annotation_type: audio_segments
    name: emotion_segments
    description: "Mark time segments with different emotions"
    labels:
      - name: Happy
        color: "#FCD34D"
      - name: Sad
        color: "#60A5FA"
      - name: Angry
        color: "#F87171"
      - name: Neutral
        color: "#9CA3AF"
 
    segment_attributes:
      - name: intensity
        type: likert
        size: 5

Control de Calidad

yaml
quality_control:
  attention_checks:
    enabled: true
    gold_items:
      - audio: "/audio/gold/clearly_happy.wav"
        expected:
          primary_emotion: "Happiness"
          intensity: [4, 5]  # Accept 4 or 5
      - audio: "/audio/gold/clearly_angry.wav"
        expected:
          primary_emotion: "Anger"

Consejos para la Anotación de Emociones

  1. Escucha completa: Siempre escucha el clip completo
  2. Enfoque en la voz: La información emocional está en CÓMO se dicen las cosas
  3. Conciencia cultural: Las normas de expresión varían entre culturas
  4. Gestión de la fatiga: Toma descansos - la anotación de emociones es exigente
  5. Calibración: Las discusiones regulares del equipo mejoran la consistencia

Próximos Pasos


Documentación en /docs/features/audio-annotation.