Speech Emotion Recognition

Classify emotional states from speech audio including happiness, sadness, anger, fear, and more.

Archivo de configuraciónconfig.yaml

# Speech Emotion Recognition Configuration
# Classify emotional states from speech audio

task_dir: "."
annotation_task_name: "Speech Emotion Recognition"

data_files:
  - "data/speech_clips.json"

item_properties:
  id_key: "id"
  text_key: "transcript"
  audio_key: "audio_url"
  text_display_key: "transcript"

user_config:
  allow_all_users: true

annotation_schemes:
  - annotation_type: "radio"
    name: "primary_emotion"
    description: "What is the primary emotion expressed?"
    labels:
      - name: "Neutral"
        tooltip: "No strong emotion detected"
        key_value: "n"
      - name: "Happy"
        tooltip: "Joy, excitement, contentment"
        key_value: "h"
      - name: "Sad"
        tooltip: "Sorrow, disappointment, grief"
        key_value: "s"
      - name: "Angry"
        tooltip: "Frustration, irritation, rage"
        key_value: "a"
      - name: "Fearful"
        tooltip: "Anxiety, worry, terror"
        key_value: "f"
      - name: "Disgusted"
        tooltip: "Revulsion, distaste"
        key_value: "d"
      - name: "Surprised"
        tooltip: "Shock, amazement"
        key_value: "u"

  - annotation_type: "likert"
    name: "emotion_intensity"
    description: "How intense is the emotion?"
    size: 5
    min_label: "Very mild"
    max_label: "Very intense"

  - annotation_type: "likert"
    name: "confidence"
    description: "How confident are you in your judgment?"
    size: 5
    min_label: "Not confident"
    max_label: "Very confident"

  - annotation_type: "multiselect"
    name: "secondary_emotions"
    description: "Any secondary emotions present?"
    labels:
      - name: "Contempt"
      - name: "Boredom"
      - name: "Interest"
      - name: "Confusion"
      - name: "Embarrassment"
      - name: "Pride"

  - annotation_type: "radio"
    name: "valence"
    description: "Overall emotional valence"
    labels:
      - name: "Positive"
      - name: "Neutral"
      - name: "Negative"

audio_display:
  show_waveform: true
  playback_controls: true
  allow_speed_control: true

output: "annotation_output/"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Datos de ejemplosample-data.json

[
  {
    "id": "emo_001",
    "audio_url": "https://example.com/audio/emotion_sample1.wav",
    "transcript": "I can't believe this is happening! This is the best day ever!",
    "duration": 3.2
  },
  {
    "id": "emo_002",
    "audio_url": "https://example.com/audio/emotion_sample2.wav",
    "transcript": "I'm so disappointed. I really thought things would be different.",
    "duration": 4.1
  }
]

// ... and 1 more items

Obtener este diseño

View on GitHub

Clone or download from the repository

Inicio rápido:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/audio/emotion-recognition
potato start config.yaml

Detalles

Tipos de anotación

radiolikert

Dominio

speechaffective-computing

Casos de uso

emotion-detectionsentiment

Etiquetas

audioemotionspeechaffectivesentiment

¿Encontró un problema o desea mejorar este diseño?

Abrir un issue

Diseños relacionados

Acoustic Scene Classification

Classify audio recordings by acoustic environment following the TUT/DCASE dataset format.

radiolikert

Audio Transcription Review

Review and correct automatic speech recognition transcriptions with waveform visualization.

likertmultiselect

Audio-Visual Sentiment Analysis

Rate sentiment in speech segments following CMU-MOSI and CMU-MOSEI multimodal annotation protocols.

likertradio