intermediateaudio
EmoBox - Multilingual Speech Emotion Recognition
Multilingual speech emotion recognition across multiple languages and corpora. Annotators classify emotional states in speech clips and rate emotional intensity, based on the EmoBox toolkit and benchmark (Ma et al., INTERSPEECH 2024).
配置文件config.yaml
# EmoBox - Multilingual Speech Emotion Recognition
# Based on Ma et al., INTERSPEECH 2024
# Paper: https://www.isca-archive.org/interspeech_2024/ma24_interspeech.html
# Dataset: https://github.com/emo-box/EmoBox
#
# Task: Classify emotional states in multilingual speech clips and rate intensity.
# Covers multiple languages and corpora for cross-lingual emotion recognition.
#
# Guidelines:
# - Listen to the full audio clip before making a judgment
# - Focus on vocal cues (tone, pitch, rhythm) rather than transcript content alone
# - Rate emotional intensity on a 1-5 scale
# - Consider arousal level (energy/activation) separately from emotion category
annotation_task_name: "EmoBox: Multilingual Speech Emotion Recognition"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "audio_url"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
- annotation_type: radio
name: primary_emotion
description: "What is the primary emotion expressed in this speech clip?"
labels:
- name: "Happy"
tooltip: "Joy, excitement, contentment, amusement"
key_value: "h"
- name: "Sad"
tooltip: "Sorrow, disappointment, grief, melancholy"
key_value: "s"
- name: "Angry"
tooltip: "Frustration, irritation, rage, hostility"
key_value: "a"
- name: "Fearful"
tooltip: "Anxiety, worry, terror, nervousness"
key_value: "f"
- name: "Disgusted"
tooltip: "Revulsion, distaste, contempt"
key_value: "d"
- name: "Surprised"
tooltip: "Shock, amazement, astonishment"
key_value: "u"
- name: "Neutral"
tooltip: "No strong emotion detected, calm speech"
key_value: "n"
- annotation_type: likert
name: emotion_intensity
description: "How intense is the expressed emotion? (1 = Very mild, 5 = Very intense)"
size: 5
min_label: "Very mild"
max_label: "Very intense"
- annotation_type: radio
name: arousal_level
description: "What is the arousal/energy level of the speaker?"
labels:
- name: "Low"
tooltip: "Calm, subdued, low-energy speech"
key_value: "l"
- name: "Medium"
tooltip: "Moderate energy, conversational tone"
key_value: "m"
- name: "High"
tooltip: "Excited, agitated, high-energy speech"
key_value: "i"
audio_display:
show_waveform: true
playback_controls: true
allow_speed_control: true
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false
示例数据sample-data.json
[
{
"id": "emobox_001",
"audio_url": "https://example.com/audio/emobox/en_happy_001.wav",
"language": "English",
"duration": 3.4,
"transcript": "Oh my gosh, I just got the promotion! I can't believe it, this is amazing!"
},
{
"id": "emobox_002",
"audio_url": "https://example.com/audio/emobox/zh_sad_001.wav",
"language": "Mandarin Chinese",
"duration": 4.1,
"transcript": "我真的很想念她,已经好几个月没见面了。"
}
]
// ... and 8 more items获取此设计
View on GitHub
Clone or download from the repository
快速开始:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/audio/emobox-multilingual-speech-emotion potato start config.yaml
详情
标注类型
radiolikert
领域
Speech ProcessingAffective Computing
应用场景
Emotion DetectionMultilingual Speech Analysis
标签
audioemotionspeechmultilingualaffective-computinginterspeech2024
发现问题或想改进此设计?
提交 Issue相关设计
Empathetic Dialogue Annotation
Annotate emotional situations and empathetic responses in conversations. Based on EmpatheticDialogues (Rashkin et al., ACL 2019). Classify the emotional context and evaluate response empathy.
radiolikert
News Headline Emotion Roles (GoodNewsEveryone)
Annotate emotions in news headlines with semantic roles. Based on Bostan et al., LREC 2020. Identify emotion, experiencer, cause, target, and textual cue.
likertradio
Acoustic Scene Classification
Classify audio recordings by acoustic environment following the TUT/DCASE dataset format.
radiolikert