Speech Intelligibility Rating

Rate speech intelligibility for pathological speech following TORGO database annotation protocols.

Configuration Fileconfig.yaml

annotation_task_name: "Speech Intelligibility Rating"

port: 8000

# Data configuration
data_files:
  - "data/speech_samples.json"

item_properties:
  id_key: "id"
  text_key: "target_text"

# Annotation schemes
annotation_schemes:
  # Transcription (orthographic)
  - annotation_type: text
    name: transcription
    description: "Write exactly what you hear (orthographic transcription)"
    textarea: false
    placeholder: "Type what you hear..."

  # Overall intelligibility
  - annotation_type: likert
    name: intelligibility
    description: "Overall speech intelligibility"
    size: 5
    labels:
      - "1: Unintelligible (cannot understand)"
      - "2: Mostly unintelligible"
      - "3: Partially intelligible"
      - "4: Mostly intelligible"
      - "5: Fully intelligible (clear)"

  # Severity rating (clinical scale)
  - annotation_type: radio
    name: severity
    description: "Speech disorder severity (if applicable)"
    labels:
      - Normal (no apparent disorder)
      - Mild (noticeable but easily understood)
      - Moderate (requires effort to understand)
      - Severe (very difficult to understand)
      - Profound (essentially unintelligible)

  # Articulation issues
  - annotation_type: multiselect
    name: articulation_issues
    description: "What articulation issues are present? (Select all)"
    labels:
      - Slurred consonants
      - Vowel distortions
      - Sound substitutions
      - Sound omissions
      - Hypernasality
      - Breathy voice
      - Strained voice
      - Monopitch (flat intonation)
      - Slow rate
      - Fast/rushed rate
      - Irregular rhythm
      - None apparent

  # Speech rate
  - annotation_type: radio
    name: speech_rate
    description: "How would you characterize the speech rate?"
    labels:
      - Much too slow
      - Somewhat slow
      - Normal rate
      - Somewhat fast
      - Much too fast/rushed

  # Effort to understand
  - annotation_type: likert
    name: listener_effort
    description: "How much effort was required to understand?"
    size: 5
    min_label: "No effort"
    max_label: "Extreme effort"

  # Speaker consistency
  - annotation_type: radio
    name: consistency
    description: "Was intelligibility consistent throughout?"
    labels:
      - Yes, consistent throughout
      - Variable (some parts clearer than others)
      - Progressively worse
      - Progressively better

  # Audio quality impact
  - annotation_type: radio
    name: quality_impact
    description: "Did recording quality affect your rating?"
    labels:
      - No (good quality recording)
      - Minor impact
      - Significant impact
      - Cannot rate due to poor quality

  # Confidence
  - annotation_type: likert
    name: confidence
    description: "Confidence in your intelligibility rating"
    size: 5
    min_label: "Low"
    max_label: "High"

# User settings
allow_all_users: true
instances_per_annotator: 100

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir speech-intelligibility-rating
cd speech-intelligibility-rating
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

likertradiotext

Domain

AudioSpeechMedical

Use Cases

speech pathologyintelligibility ratingdysarthria assessment

Related Designs

Audio Transcription Review

Review and correct automatic speech recognition transcriptions with waveform visualization.

likertmultiselect

Clotho Audio Captioning

Audio captioning and quality assessment based on the Clotho dataset (Drossos et al., ICASSP 2020). Annotators write natural language captions for audio clips, rate caption accuracy on a Likert scale, and classify the audio environment.

textlikert

CoVoST 2 - Speech Translation Evaluation

Speech translation quality evaluation based on the CoVoST 2 dataset (Wang et al., arXiv 2020). Annotators listen to source audio, review translations, label audio segments, and rate overall translation quality.

textradio