beginneraudio
Speech Quality MOS Rating
Rate speech quality using Mean Opinion Score following ITU-T P.800 and Blizzard Challenge protocols.
🎧
audio annotation
Configuration Fileconfig.yaml
task_name: "Speech Quality MOS Rating"
# Server configuration
server:
port: 8000
# Audio settings
audio:
enabled: true
display: waveform
waveform_color: "#3B82F6"
progress_color: "#60A5FA"
speed_control: false # Important: evaluate at normal speed
# Data configuration
data_files:
- path: data/speech_samples.json
audio_field: audio_file
text_field: transcript
# Annotation schemes
annotation_schemes:
# Overall MOS (ITU-T P.800 scale)
- annotation_type: likert
name: mos_quality
description: "Overall speech quality (ITU-T MOS scale)"
size: 5
labels:
- "1 - Bad"
- "2 - Poor"
- "3 - Fair"
- "4 - Good"
- "5 - Excellent"
keyboard_shortcuts: true
# Naturalness (Blizzard Challenge)
- annotation_type: likert
name: naturalness
description: "How natural does the speech sound?"
size: 5
labels:
- "1 - Completely unnatural"
- "2 - Mostly unnatural"
- "3 - Equally natural/unnatural"
- "4 - Mostly natural"
- "5 - Completely natural"
# Intelligibility
- annotation_type: likert
name: intelligibility
description: "How easy is it to understand what is being said?"
size: 5
min_label: "Very difficult"
max_label: "Very easy"
# Prosody quality
- annotation_type: likert
name: prosody
description: "Rate the rhythm, stress, and intonation"
size: 5
min_label: "Very poor"
max_label: "Excellent"
# Artifacts
- annotation_type: multiselect
name: artifacts
description: "What artifacts or issues did you notice? (Select all)"
labels:
- Robotic/mechanical sound
- Unnatural pauses
- Pronunciation errors
- Breathing artifacts
- Background noise
- Clipping/distortion
- Monotone delivery
- None
# Sample type (for analysis)
- annotation_type: radio
name: sample_type
description: "What type of speech is this? (If known)"
labels:
- Natural human speech
- Text-to-speech (synthetic)
- Voice conversion
- Enhanced/processed
- Unknown
# Would use in application
- annotation_type: radio
name: usability
description: "Would this quality be acceptable for a voice assistant?"
labels:
- Yes, definitely
- Probably yes
- Uncertain
- Probably not
- Definitely not
# User settings
allow_all_users: true
instances_per_annotator: 100
# Output
output:
path: annotations/
format: json
Get This Design
This design is available in our showcase. Copy the configuration below to get started.
Quick start:
# Create your project folder mkdir speech-quality-mos cd speech-quality-mos # Copy config.yaml from above potato start config.yaml
Details
Annotation Types
likertradio
Domain
AudioSpeech
Use Cases
speech qualityMOS ratingTTS evaluation
Tags
audiospeech qualitymosttsevaluation
Related Designs
Audio-Visual Sentiment Analysis
Rate sentiment in speech segments following CMU-MOSI and CMU-MOSEI multimodal annotation protocols.
likertradio
Speech Emotion Recognition
Classify emotional content in speech following IEMOCAP and CREMA-D annotation schemes.
radiolikert
Speech Intelligibility Rating
Rate speech intelligibility for pathological speech following TORGO database annotation protocols.
likertradio