beginneraudio
Speech Quality MOS Rating
Rate speech quality using Mean Opinion Score following ITU-T P.800 and Blizzard Challenge protocols.

Configuration Fileconfig.yaml
yaml
annotation_task_name: "Speech Quality MOS Rating"
task_dir: "."
port: 8000
# Data configuration
data_files:
- "data/speech_samples.json"
item_properties:
id_key: id
text_key: transcript
# Annotation schemes
annotation_schemes:
# Overall MOS (ITU-T P.800 scale)
- annotation_type: likert
name: mos_quality
description: "Overall speech quality (ITU-T MOS scale)"
size: 5
labels:
- "1 - Bad"
- "2 - Poor"
- "3 - Fair"
- "4 - Good"
- "5 - Excellent"
sequential_key_binding: true
# Naturalness (Blizzard Challenge)
- annotation_type: likert
name: naturalness
description: "How natural does the speech sound?"
size: 5
labels:
- "1 - Completely unnatural"
- "2 - Mostly unnatural"
- "3 - Equally natural/unnatural"
- "4 - Mostly natural"
- "5 - Completely natural"
# Intelligibility
- annotation_type: likert
name: intelligibility
description: "How easy is it to understand what is being said?"
size: 5
min_label: "Very difficult"
max_label: "Very easy"
# Prosody quality
- annotation_type: likert
name: prosody
description: "Rate the rhythm, stress, and intonation"
size: 5
min_label: "Very poor"
max_label: "Excellent"
# Artifacts
- annotation_type: multiselect
name: artifacts
description: "What artifacts or issues did you notice? (Select all)"
labels:
- Robotic/mechanical sound
- Unnatural pauses
- Pronunciation errors
- Breathing artifacts
- Background noise
- Clipping/distortion
- Monotone delivery
- None
# Sample type (for analysis)
- annotation_type: radio
name: sample_type
description: "What type of speech is this? (If known)"
labels:
- Natural human speech
- Text-to-speech (synthetic)
- Voice conversion
- Enhanced/processed
- Unknown
# Would use in application
- annotation_type: radio
name: usability
description: "Would this quality be acceptable for a voice assistant?"
labels:
- Yes, definitely
- Probably yes
- Uncertain
- Probably not
- Definitely not
# User settings
require_password: false
# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
Get This Design
This design is available in our showcase. Copy the configuration below to get started.
Quick start:
# Create your project folder mkdir speech-quality-mos cd speech-quality-mos # Copy config.yaml from above potato start config.yaml
Details
Annotation Types
likertradio
Domain
AudioSpeech
Use Cases
speech qualityMOS ratingTTS evaluation
Tags
audiospeech qualitymosttsevaluation
Related Designs
Audio Transcription Review
Review and correct automatic speech recognition transcriptions with waveform visualization.
likertmultiselect
Audio-Visual Sentiment Analysis
Rate sentiment in speech segments following CMU-MOSI and CMU-MOSEI multimodal annotation protocols.
likertradio
Speech Emotion Recognition
Classify emotional content in speech following IEMOCAP and CREMA-D annotation schemes.
radiolikert