intermediateaudio
VoiceMOS Challenge 2024 - Speech Quality Assessment
Speech quality assessment using Mean Opinion Score (MOS). Annotators rate synthesized or processed speech on naturalness, intelligibility, and overall quality on 1-5 scales (Cooper et al., INTERSPEECH 2024).
ملف الإعدادconfig.yaml
# VoiceMOS Challenge 2024 - Speech Quality Assessment
# Based on Cooper et al., INTERSPEECH 2024
# Paper: https://www.isca-archive.org/interspeech_2024/cooper24_interspeech.html
# Dataset: https://voicemos-challenge-2024.github.io/
#
# Task: Rate synthesized/processed speech quality using Mean Opinion Score (MOS).
# Evaluate naturalness, intelligibility, and overall quality on 1-5 scales.
#
# Guidelines:
# - Listen to each clip fully before rating
# - Rate naturalness: How close to natural human speech?
# - Rate intelligibility: How easy is it to understand the words?
# - Rate overall quality: General impression of the speech quality
# - Use the full 1-5 scale; avoid always rating in the middle
annotation_task_name: "VoiceMOS Challenge 2024: Speech Quality Assessment"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "audio_url"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
- annotation_type: likert
name: naturalness
description: "How natural does the speech sound? (1 = Very unnatural, 5 = Completely natural)"
size: 5
min_label: "Very unnatural"
max_label: "Completely natural"
labels:
- "1 - Very unnatural"
- "2 - Somewhat unnatural"
- "3 - Moderately natural"
- "4 - Mostly natural"
- "5 - Completely natural"
- annotation_type: likert
name: intelligibility
description: "How easy is it to understand the speech? (1 = Unintelligible, 5 = Perfectly clear)"
size: 5
min_label: "Unintelligible"
max_label: "Perfectly clear"
labels:
- "1 - Unintelligible"
- "2 - Mostly unintelligible"
- "3 - Somewhat intelligible"
- "4 - Mostly intelligible"
- "5 - Perfectly clear"
- annotation_type: likert
name: overall_quality
description: "What is the overall quality of this speech? (1 = Bad, 5 = Excellent)"
size: 5
min_label: "Bad"
max_label: "Excellent"
labels:
- "1 - Bad"
- "2 - Poor"
- "3 - Fair"
- "4 - Good"
- "5 - Excellent"
audio_display:
show_waveform: true
playback_controls: true
allow_speed_control: true
allow_all_users: true
instances_per_annotator: 200
annotation_per_instance: 5
allow_skip: true
skip_reason_required: false
بيانات نموذجيةsample-data.json
[
{
"id": "voicemos_001",
"audio_url": "https://example.com/audio/voicemos/tts_system_a_001.wav",
"system_id": "system_a",
"duration": 4.2,
"text_content": "The weather forecast for tomorrow predicts clear skies and mild temperatures."
},
{
"id": "voicemos_002",
"audio_url": "https://example.com/audio/voicemos/tts_system_b_001.wav",
"system_id": "system_b",
"duration": 3.8,
"text_content": "Please remember to submit your report by the end of the business day."
}
]
// ... and 8 more itemsاحصل على هذا التصميم
View on GitHub
Clone or download from the repository
بدء سريع:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/audio/voicemos-quality-assessment potato start config.yaml
التفاصيل
أنواع التوسيم
likert
المجال
Speech ProcessingQuality Assessment
حالات الاستخدام
Speech Quality RatingTTS EvaluationVoice Synthesis Assessment
الوسوم
audiospeech-qualitymosttsvoice-synthesisinterspeech2024
وجدت مشكلة أو تريد تحسين هذا التصميم؟
افتح مشكلةتصاميم ذات صلة
EmoBox - Multilingual Speech Emotion Recognition
Multilingual speech emotion recognition across multiple languages and corpora. Annotators classify emotional states in speech clips and rate emotional intensity, based on the EmoBox toolkit and benchmark (Ma et al., INTERSPEECH 2024).
radiolikert
Acoustic Scene Classification
Classify audio recordings by acoustic environment following the TUT/DCASE dataset format.
radiolikert
Audio Transcription Review
Review and correct automatic speech recognition transcriptions with waveform visualization.
likertmultiselect