Speech Quality MOS Rating

Rate speech quality using Mean Opinion Score following ITU-T P.800 and Blizzard Challenge protocols.

Configuration Fileconfig.yaml

yaml

annotation_task_name: "Speech Quality MOS Rating"
task_dir: "."

port: 8000

# Data configuration
data_files:
  - "data/speech_samples.json"

item_properties:
  id_key: id
  text_key: transcript

# Annotation schemes
annotation_schemes:
  # Overall MOS (ITU-T P.800 scale)
  - annotation_type: likert
    name: mos_quality
    description: "Overall speech quality (ITU-T MOS scale)"
    size: 5
    labels:
      - "1 - Bad"
      - "2 - Poor"
      - "3 - Fair"
      - "4 - Good"
      - "5 - Excellent"
    sequential_key_binding: true

  # Naturalness (Blizzard Challenge)
  - annotation_type: likert
    name: naturalness
    description: "How natural does the speech sound?"
    size: 5
    labels:
      - "1 - Completely unnatural"
      - "2 - Mostly unnatural"
      - "3 - Equally natural/unnatural"
      - "4 - Mostly natural"
      - "5 - Completely natural"

  # Intelligibility
  - annotation_type: likert
    name: intelligibility
    description: "How easy is it to understand what is being said?"
    size: 5
    min_label: "Very difficult"
    max_label: "Very easy"

  # Prosody quality
  - annotation_type: likert
    name: prosody
    description: "Rate the rhythm, stress, and intonation"
    size: 5
    min_label: "Very poor"
    max_label: "Excellent"

  # Artifacts
  - annotation_type: multiselect
    name: artifacts
    description: "What artifacts or issues did you notice? (Select all)"
    labels:
      - Robotic/mechanical sound
      - Unnatural pauses
      - Pronunciation errors
      - Breathing artifacts
      - Background noise
      - Clipping/distortion
      - Monotone delivery
      - None

  # Sample type (for analysis)
  - annotation_type: radio
    name: sample_type
    description: "What type of speech is this? (If known)"
    labels:
      - Natural human speech
      - Text-to-speech (synthetic)
      - Voice conversion
      - Enhanced/processed
      - Unknown

  # Would use in application
  - annotation_type: radio
    name: usability
    description: "Would this quality be acceptable for a voice assistant?"
    labels:
      - Yes, definitely
      - Probably yes
      - Uncertain
      - Probably not
      - Definitely not

# User settings
require_password: false

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir speech-quality-mos
cd speech-quality-mos
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

likertradio

Domain

AudioSpeech

Use Cases

speech qualityMOS ratingTTS evaluation

Related Designs

Audio Transcription Review

Review and correct automatic speech recognition transcriptions with waveform visualization.

likertmultiselect

Audio-Visual Sentiment Analysis

Rate sentiment in speech segments following CMU-MOSI and CMU-MOSEI multimodal annotation protocols.