Showcase/Speech Quality MOS Rating
beginneraudio

Speech Quality MOS Rating

Rate speech quality using Mean Opinion Score following ITU-T P.800 and Blizzard Challenge protocols.

🎧

audio annotation

Configuration Fileconfig.yaml

task_name: "Speech Quality MOS Rating"

# Server configuration
server:
  port: 8000

# Audio settings
audio:
  enabled: true
  display: waveform
  waveform_color: "#3B82F6"
  progress_color: "#60A5FA"
  speed_control: false  # Important: evaluate at normal speed

# Data configuration
data_files:
  - path: data/speech_samples.json
    audio_field: audio_file
    text_field: transcript

# Annotation schemes
annotation_schemes:
  # Overall MOS (ITU-T P.800 scale)
  - annotation_type: likert
    name: mos_quality
    description: "Overall speech quality (ITU-T MOS scale)"
    size: 5
    labels:
      - "1 - Bad"
      - "2 - Poor"
      - "3 - Fair"
      - "4 - Good"
      - "5 - Excellent"
    keyboard_shortcuts: true

  # Naturalness (Blizzard Challenge)
  - annotation_type: likert
    name: naturalness
    description: "How natural does the speech sound?"
    size: 5
    labels:
      - "1 - Completely unnatural"
      - "2 - Mostly unnatural"
      - "3 - Equally natural/unnatural"
      - "4 - Mostly natural"
      - "5 - Completely natural"

  # Intelligibility
  - annotation_type: likert
    name: intelligibility
    description: "How easy is it to understand what is being said?"
    size: 5
    min_label: "Very difficult"
    max_label: "Very easy"

  # Prosody quality
  - annotation_type: likert
    name: prosody
    description: "Rate the rhythm, stress, and intonation"
    size: 5
    min_label: "Very poor"
    max_label: "Excellent"

  # Artifacts
  - annotation_type: multiselect
    name: artifacts
    description: "What artifacts or issues did you notice? (Select all)"
    labels:
      - Robotic/mechanical sound
      - Unnatural pauses
      - Pronunciation errors
      - Breathing artifacts
      - Background noise
      - Clipping/distortion
      - Monotone delivery
      - None

  # Sample type (for analysis)
  - annotation_type: radio
    name: sample_type
    description: "What type of speech is this? (If known)"
    labels:
      - Natural human speech
      - Text-to-speech (synthetic)
      - Voice conversion
      - Enhanced/processed
      - Unknown

  # Would use in application
  - annotation_type: radio
    name: usability
    description: "Would this quality be acceptable for a voice assistant?"
    labels:
      - Yes, definitely
      - Probably yes
      - Uncertain
      - Probably not
      - Definitely not

# User settings
allow_all_users: true
instances_per_annotator: 100

# Output
output:
  path: annotations/
  format: json

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir speech-quality-mos
cd speech-quality-mos
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

likertradio

Domain

AudioSpeech

Use Cases

speech qualityMOS ratingTTS evaluation

Tags

audiospeech qualitymosttsevaluation