El reconocimiento de emociones en el habla (SER) impulsa asistentes virtuales, aplicaciones de salud mental y análisis de servicio al cliente. Este tutorial muestra cómo construir interfaces de anotación para emociones categóricas, calificaciones dimensionales y detección de emociones mixtas.

Enfoques de Anotación de Emociones

Existen varias formas de anotar emociones en el habla:

Categórico: Etiquetas discretas (feliz, triste, enojado)
Dimensional: Escalas continuas (valencia, activación, dominancia)
Mixto: Múltiples emociones con calificaciones de intensidad
Basado en segmentos: Diferentes emociones en diferentes marcas de tiempo

Clasificación Categórica de Emociones

Configuración Básica

yaml

annotation_task_name: "Speech Emotion Recognition"
 
data_files:
  - data/utterances.json
 
item_properties:
  id_key: id
  audio_key: audio_path
  text_key: transcript  # Optional transcript
 
audio:
  enabled: true
  display: waveform
  waveform_color: "#8B5CF6"
  progress_color: "#A78BFA"
  speed_control: true
  speed_options: [0.75, 1.0, 1.25]
 
annotation_schemes:
  - annotation_type: radio
    name: emotion
    description: "What emotion is expressed in this speech?"
    labels:
      - name: Happy
        description: "Joy, excitement, amusement"
        keyboard_shortcut: "h"
      - name: Sad
        description: "Sorrow, disappointment, grief"
        keyboard_shortcut: "s"
      - name: Angry
        description: "Frustration, irritation, rage"
        keyboard_shortcut: "a"
      - name: Fearful
        description: "Anxiety, worry, terror"
        keyboard_shortcut: "f"
      - name: Surprised
        description: "Astonishment, shock"
        keyboard_shortcut: "u"
      - name: Disgusted
        description: "Revulsion, distaste"
        keyboard_shortcut: "d"
      - name: Neutral
        description: "No clear emotion"
        keyboard_shortcut: "n"
    required: true

Agregar Intensidad

yaml

annotation_schemes:
  - annotation_type: radio
    name: emotion
    labels: [Happy, Sad, Angry, Fearful, Surprised, Disgusted, Neutral]
    required: true
 
  - annotation_type: likert
    name: intensity
    description: "How intense is this emotion?"
    size: 5
    min_label: "Very weak"
    max_label: "Very strong"
    conditional:
      depends_on: emotion
      hide_when: ["Neutral"]

Anotación Dimensional de Emociones

El modelo VAD (Valencia-Activación-Dominancia):

yaml

annotation_task_name: "Dimensional Emotion Rating"
 
annotation_schemes:
  # Valence: negative to positive
  - annotation_type: likert
    name: valence
    description: "Valence: How positive or negative?"
    size: 7
    min_label: "Very negative"
    max_label: "Very positive"
 
  # Arousal: calm to excited
  - annotation_type: likert
    name: arousal
    description: "Arousal: How calm or excited?"
    size: 7
    min_label: "Very calm"
    max_label: "Very excited"
 
  # Dominance: submissive to dominant
  - annotation_type: likert
    name: dominance
    description: "Dominance: How submissive or dominant?"
    size: 7
    min_label: "Very submissive"
    max_label: "Very dominant"

Escalas Visuales (SAM)

Estilo Self-Assessment Manikin:

yaml

annotation_schemes:
  - annotation_type: image_scale
    name: valence
    description: "Select the figure that matches the emotional valence"
    images:
      - path: /images/sam_valence_1.png
        value: 1
      - path: /images/sam_valence_2.png
        value: 2
      # ... etc
    size: 9

Detección de Emociones Mixtas

Para habla que contiene múltiples emociones:

yaml

annotation_schemes:
  - annotation_type: multiselect
    name: emotions_present
    description: "Select ALL emotions you detect (can be multiple)"
    labels:
      - Happy
      - Sad
      - Angry
      - Fearful
      - Surprised
      - Disgusted
      - Contempt
    min_selections: 1
 
  - annotation_type: radio
    name: primary_emotion
    description: "Which emotion is MOST prominent?"
    labels:
      - Happy
      - Sad
      - Angry
      - Fearful
      - Surprised
      - Disgusted
      - Contempt
      - Mixed (no dominant)

Anotación Integral de Emociones

yaml

annotation_task_name: "Comprehensive Speech Emotion Annotation"
 
data_files:
  - data/speech_samples.json
 
item_properties:
  id_key: id
  audio_key: audio_url
  text_key: transcript
 
audio:
  enabled: true
  display: waveform
  waveform_color: "#EC4899"
  progress_color: "#F472B6"
  height: 120
  speed_control: true
  speed_options: [0.5, 0.75, 1.0, 1.25]
  show_duration: true
  autoplay: false
 
# Show transcript if available
display:
  show_text: true
  text_field: transcript
  text_label: "Transcript (for reference)"
 
annotation_schemes:
  # Primary categorical emotion
  - annotation_type: radio
    name: primary_emotion
    description: "Primary emotion expressed"
    labels:
      - name: Happiness
        color: "#FCD34D"
        keyboard_shortcut: "1"
      - name: Sadness
        color: "#60A5FA"
        keyboard_shortcut: "2"
      - name: Anger
        color: "#F87171"
        keyboard_shortcut: "3"
      - name: Fear
        color: "#A78BFA"
        keyboard_shortcut: "4"
      - name: Surprise
        color: "#34D399"
        keyboard_shortcut: "5"
      - name: Disgust
        color: "#FB923C"
        keyboard_shortcut: "6"
      - name: Neutral
        color: "#9CA3AF"
        keyboard_shortcut: "7"
    required: true
 
  # Emotional intensity
  - annotation_type: likert
    name: intensity
    description: "Emotional intensity"
    size: 5
    min_label: "Very mild"
    max_label: "Very intense"
    required: true
 
  # Dimensional ratings
  - annotation_type: likert
    name: valence
    description: "Valence (negative to positive)"
    size: 7
    min_label: "Negative"
    max_label: "Positive"
 
  - annotation_type: likert
    name: arousal
    description: "Arousal (calm to excited)"
    size: 7
    min_label: "Calm"
    max_label: "Excited"
 
  # Voice quality
  - annotation_type: multiselect
    name: voice_qualities
    description: "Voice characteristics (select all that apply)"
    labels:
      - Trembling voice
      - Raised pitch
      - Lowered pitch
      - Loud/shouting
      - Soft/whisper
      - Fast speech rate
      - Slow speech rate
      - Breathy
      - Tense/strained
      - Crying
      - Laughing
 
  # Genuineness
  - annotation_type: radio
    name: authenticity
    description: "Does the emotion seem genuine?"
    labels:
      - Clearly genuine
      - Likely genuine
      - Uncertain
      - Likely acted/fake
      - Clearly acted/fake
 
  # Confidence
  - annotation_type: likert
    name: confidence
    description: "How confident are you in your annotation?"
    size: 5
    min_label: "Guessing"
    max_label: "Certain"
 
annotation_guidelines:
  title: "Emotion Annotation Guidelines"
  content: |
    ## Listening Instructions
    1. Listen to the entire clip before annotating
    2. You may replay as many times as needed
    3. Focus on the VOICE, not just the words
 
    ## Emotion Categories
    - **Happiness**: Joy, amusement, contentment
    - **Sadness**: Sorrow, disappointment, melancholy
    - **Anger**: Frustration, irritation, rage
    - **Fear**: Anxiety, nervousness, terror
    - **Surprise**: Astonishment, startle
    - **Disgust**: Revulsion, contempt
    - **Neutral**: Calm, matter-of-fact
 
    ## Tips
    - Consider tone, pitch, speaking rate
    - The transcript may not match the emotion
    - When unsure between two emotions, choose the stronger one
    - Use the intensity scale for unclear cases
 
output_annotation_dir: annotations/
export_annotation_format: jsonl

Formato de Salida

json

{
  "id": "utt_001",
  "audio_url": "/audio/sample_001.wav",
  "transcript": "I can't believe this happened!",
  "annotations": {
    "primary_emotion": "Surprise",
    "intensity": 4,
    "valence": 2,
    "arousal": 6,
    "voice_qualities": ["Raised pitch", "Fast speech rate"],
    "authenticity": "Clearly genuine",
    "confidence": 4
  },
  "annotator": "rater_01",
  "timestamp": "2024-12-05T10:30:00Z"
}

Emoción a Nivel de Segmento

Para audio más largo con emociones cambiantes:

yaml

annotation_schemes:
  - annotation_type: audio_segments
    name: emotion_segments
    description: "Mark time segments with different emotions"
    labels:
      - name: Happy
        color: "#FCD34D"
      - name: Sad
        color: "#60A5FA"
      - name: Angry
        color: "#F87171"
      - name: Neutral
        color: "#9CA3AF"
 
    segment_attributes:
      - name: intensity
        type: likert
        size: 5

Control de Calidad

yaml

quality_control:
  attention_checks:
    enabled: true
    gold_items:
      - audio: "/audio/gold/clearly_happy.wav"
        expected:
          primary_emotion: "Happiness"
          intensity: [4, 5]  # Accept 4 or 5
      - audio: "/audio/gold/clearly_angry.wav"
        expected:
          primary_emotion: "Anger"

Consejos para la Anotación de Emociones

Escucha completa: Siempre escucha el clip completo
Enfoque en la voz: La información emocional está en CÓMO se dicen las cosas
Conciencia cultural: Las normas de expresión varían entre culturas
Gestión de la fatiga: Toma descansos - la anotación de emociones es exigente
Calibración: Las discusiones regulares del equipo mejoran la consistencia

Próximos Pasos

Agregar diarización de hablantes para seguimiento de emociones con múltiples hablantes
Configurar crowdsourcing para recolección a gran escala
Calcular acuerdo entre anotadores para tareas de emociones

Documentación en /docs/features/audio-annotation.