Skip to content
Showcase/Speech Quality MOS Rating
beginneraudio

Speech Quality MOS Rating

Rate speech quality using Mean Opinion Score following ITU-T P.800 and Blizzard Challenge protocols.

Speech Quality MOS Rating annotation interface preview

Configuration Fileconfig.yaml

yaml
annotation_task_name: "Speech Quality MOS Rating"
task_dir: "."

port: 8000

# Data configuration
data_files:
  - "data/speech_samples.json"

item_properties:
  id_key: id
  text_key: transcript

# Annotation schemes
annotation_schemes:
  # Overall MOS (ITU-T P.800 scale)
  - annotation_type: likert
    name: mos_quality
    description: "Overall speech quality (ITU-T MOS scale)"
    size: 5
    labels:
      - "1 - Bad"
      - "2 - Poor"
      - "3 - Fair"
      - "4 - Good"
      - "5 - Excellent"
    sequential_key_binding: true

  # Naturalness (Blizzard Challenge)
  - annotation_type: likert
    name: naturalness
    description: "How natural does the speech sound?"
    size: 5
    labels:
      - "1 - Completely unnatural"
      - "2 - Mostly unnatural"
      - "3 - Equally natural/unnatural"
      - "4 - Mostly natural"
      - "5 - Completely natural"

  # Intelligibility
  - annotation_type: likert
    name: intelligibility
    description: "How easy is it to understand what is being said?"
    size: 5
    min_label: "Very difficult"
    max_label: "Very easy"

  # Prosody quality
  - annotation_type: likert
    name: prosody
    description: "Rate the rhythm, stress, and intonation"
    size: 5
    min_label: "Very poor"
    max_label: "Excellent"

  # Artifacts
  - annotation_type: multiselect
    name: artifacts
    description: "What artifacts or issues did you notice? (Select all)"
    labels:
      - Robotic/mechanical sound
      - Unnatural pauses
      - Pronunciation errors
      - Breathing artifacts
      - Background noise
      - Clipping/distortion
      - Monotone delivery
      - None

  # Sample type (for analysis)
  - annotation_type: radio
    name: sample_type
    description: "What type of speech is this? (If known)"
    labels:
      - Natural human speech
      - Text-to-speech (synthetic)
      - Voice conversion
      - Enhanced/processed
      - Unknown

  # Would use in application
  - annotation_type: radio
    name: usability
    description: "Would this quality be acceptable for a voice assistant?"
    labels:
      - Yes, definitely
      - Probably yes
      - Uncertain
      - Probably not
      - Definitely not

# User settings
require_password: false

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir speech-quality-mos
cd speech-quality-mos
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

likertradio

Domain

AudioSpeech

Use Cases

speech qualityMOS ratingTTS evaluation

Tags

audiospeech qualitymosttsevaluation