Skip to content
Tutorials6 min read

发音评估标注

构建一个带有音频播放和 Likert 量表的语言学习标注任务,用于评估发音质量。

Potato Team·

发音评估标注

发音评估对于语言学习应用、语言治疗和口音纠正至关重要。本教程涵盖构建发音质量评分界面、识别具体错误以及提供详细反馈。

基本发音评分

yaml
annotation_task_name: "Pronunciation Assessment"
 
data_files:
  - data/learner_recordings.json
 
item_properties:
  audio_path: audio_path
  text_field: target_text
 
# Show what they should have said
display:
  show_text: true
  text_field: target_text
  text_label: "Target Sentence"
  text_style: large
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    waveform_color: "#06B6D4"
    progress_color: "#22D3EE"
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    name: overall_quality
    description: "Overall pronunciation quality"
    size: 5
    labels:
      - "1: Very poor"
      - "2: Poor"
      - "3: Fair"
      - "4: Good"
      - "5: Excellent"

详细评估配置

yaml
annotation_task_name: "Detailed Pronunciation Assessment"
 
data_files:
  - data/recordings.json
 
item_properties:
  audio_path: learner_audio
  reference_audio_field: native_audio
  text_field: sentence
  metadata_fields: [learner_id, native_language]
 
display:
  show_text: true
  text_field: sentence
  text_label: "Target Text"
  show_metadata: true
  metadata_fields:
    - label: "Native Language"
      field: native_language
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    height: 100
    dual_audio: true
    primary_label: "Learner"
    secondary_label: "Native Reference"
    primary_color: "#6366F1"
    secondary_color: "#22C55E"
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    loop: true
 
  # Overall ratings
  - annotation_type: likert
    name: overall
    description: "Overall pronunciation quality"
    size: 5
    min_label: "Very poor"
    max_label: "Excellent"
    required: true
 
  - annotation_type: likert
    name: intelligibility
    description: "How understandable is the speaker?"
    size: 5
    min_label: "Unintelligible"
    max_label: "Perfectly clear"
    required: true
 
  - annotation_type: likert
    name: accent_strength
    description: "How strong is the foreign accent?"
    size: 5
    min_label: "No accent"
    max_label: "Very strong"
 
  # Specific aspects
  - annotation_type: likert
    name: intonation
    description: "Intonation and rhythm"
    size: 5
    min_label: "Unnatural"
    max_label: "Native-like"
 
  - annotation_type: likert
    name: stress
    description: "Word and sentence stress"
    size: 5
    min_label: "Incorrect"
    max_label: "Correct"
 
  - annotation_type: likert
    name: fluency
    description: "Fluency (smoothness, pauses)"
    size: 5
    min_label: "Very disfluent"
    max_label: "Very fluent"
 
  # Error identification
  - annotation_type: multiselect
    name: error_types
    description: "What errors are present? (select all)"
    labels:
      - Vowel errors
      - Consonant errors
      - Word stress errors
      - Sentence intonation errors
      - Rhythm/timing errors
      - Mispronounced words
      - Added sounds
      - Deleted sounds
      - Substituted sounds
      - No significant errors
 
  # Specific problem words
  - annotation_type: text
    name: problem_words
    description: "List any mispronounced words"
    placeholder: "e.g., 'thought' pronounced as 'taught'"
    required: false
 
  # Comparison to native (if reference provided)
  - annotation_type: radio
    name: native_comparison
    description: "Compared to native reference"
    labels:
      - Very different
      - Somewhat different
      - Fairly similar
      - Very similar
      - Indistinguishable
    conditional:
      show_when_field: native_audio
      is_present: true
 
  # Confidence
  - annotation_type: likert
    name: confidence
    description: "Your confidence in this assessment"
    size: 5
    min_label: "Low"
    max_label: "High"

词级评估

用于详细的音素/词汇反馈:

yaml
annotation_schemes:
  # Overall score
  - annotation_type: likert
    name: overall
    size: 5
 
  # Word-by-word rating
  - annotation_type: span_rating
    name: word_ratings
    source_field: sentence
    unit: word
    rating_scale:
      size: 3
      labels:
        - Incorrect
        - Acceptable
        - Correct
    allow_skip: true

音素级标注

用于语音研究:

yaml
data_files:
  - data/phoneme_data.json
 
item_properties:
  audio_path: audio
  phoneme_field: expected_phonemes
 
annotation_schemes:
  - annotation_type: phoneme_assessment
    name: phonemes
    source_field: expected_phonemes
    labels:
      - name: correct
        color: "#22C55E"
      - name: substitution
        color: "#F59E0B"
      - name: deletion
        color: "#EF4444"
      - name: insertion
        color: "#8B5CF6"
 
    attributes:
      - name: actual_phoneme
        type: text
        show_when: substitution

完整配置

yaml
annotation_task_name: "L2 English Pronunciation Assessment"
 
data_files:
  - data/esl_recordings.json
 
item_properties:
  audio_path: audio_url
  text_field: target_sentence
  metadata_fields:
    - learner_id
    - native_language
    - proficiency_level
 
display:
  show_text: true
  text_field: target_sentence
  text_label: "The learner was asked to say:"
  text_style: quote
 
  show_metadata: true
  metadata_layout: inline
  metadata_fields:
    - label: "L1"
      field: native_language
    - label: "Level"
      field: proficiency_level
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    waveform_color: "#0EA5E9"
    progress_color: "#38BDF8"
    height: 120
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    default_speed: 1.0
    loop: true
    volume_control: true
 
  # Global scores
  - annotation_type: likert
    name: overall_score
    description: "Overall pronunciation score"
    size: 9
    min_label: "1 (Very poor)"
    max_label: "9 (Native-like)"
    required: true
 
  - annotation_type: likert
    name: comprehensibility
    description: "How easy is it to understand?"
    size: 9
    min_label: "1 (Very hard)"
    max_label: "9 (Very easy)"
    required: true
 
  - annotation_type: likert
    name: accentedness
    description: "How strong is the foreign accent?"
    size: 9
    min_label: "1 (No accent)"
    max_label: "9 (Very strong)"
 
  # Segmental features
  - annotation_type: likert
    name: vowels
    description: "Vowel pronunciation accuracy"
    size: 5
    min_label: "Poor"
    max_label: "Excellent"
 
  - annotation_type: likert
    name: consonants
    description: "Consonant pronunciation accuracy"
    size: 5
    min_label: "Poor"
    max_label: "Excellent"
 
  # Suprasegmental features
  - annotation_type: likert
    name: word_stress
    description: "Word stress patterns"
    size: 5
    min_label: "Incorrect"
    max_label: "Correct"
 
  - annotation_type: likert
    name: sentence_intonation
    description: "Sentence intonation"
    size: 5
    min_label: "Unnatural"
    max_label: "Natural"
 
  - annotation_type: likert
    name: rhythm
    description: "Speech rhythm and timing"
    size: 5
    min_label: "Unnatural"
    max_label: "Natural"
 
  - annotation_type: likert
    name: fluency
    description: "Speaking fluency"
    size: 5
    min_label: "Disfluent"
    max_label: "Fluent"
 
  # Error analysis
  - annotation_type: multiselect
    name: common_errors
    description: "Check all that apply"
    labels:
      - Th-sounds (think/this)
      - R-sounds
      - L-sounds
      - Vowel length
      - Final consonants dropped
      - Consonant clusters simplified
      - Word stress on wrong syllable
      - Flat intonation
      - No errors detected
    required: false
 
  # Specific feedback
  - annotation_type: text
    name: feedback
    description: "Specific words or sounds that need work"
    textarea: true
    rows: 2
    placeholder: "e.g., 'world' - /r/ needs work, 'through' - th sound"
    required: false
 
  # Rater confidence
  - annotation_type: radio
    name: confidence
    description: "How confident are you in your ratings?"
    labels:
      - Not confident
      - Somewhat confident
      - Very confident
 
annotation_guidelines:
  title: "Pronunciation Assessment Guidelines"
  content: |
    ## Rating Scales
 
    **Overall Score (1-9)**
    - 1-3: Significant difficulty, hard to understand
    - 4-6: Noticeable accent, but understandable
    - 7-9: Minor issues to native-like
 
    **Comprehensibility**
    - How much effort to understand?
    - Ignore accent if message is clear
 
    **Accentedness**
    - Degree of foreign accent
    - Independent of intelligibility
 
    ## What to Listen For
    - Individual sounds (phonemes)
    - Word stress patterns
    - Sentence intonation
    - Speaking rate and pauses
 
    ## Tips
    - Listen at least twice
    - Use slow playback for details
    - Be consistent across recordings
 
quality_control:
  training_phase:
    enabled: true
    items: 5
    feedback: true

输出格式

json
{
  "id": "rec_001",
  "audio_url": "/audio/learner_001_sent_05.wav",
  "target_sentence": "The weather is really nice today.",
  "learner_id": "L001",
  "native_language": "Mandarin",
  "annotations": {
    "overall_score": 6,
    "comprehensibility": 7,
    "accentedness": 5,
    "vowels": 4,
    "consonants": 3,
    "word_stress": 4,
    "sentence_intonation": 3,
    "rhythm": 3,
    "fluency": 4,
    "common_errors": ["Th-sounds (think/this)", "R-sounds"],
    "feedback": "'weather' - th sound; 'really' - r sound",
    "confidence": "Very confident"
  }
}

发音评估技巧

  1. 校准:先一起评分样本录音
  2. 一致性:使用参考录音作为锚定
  3. 多次聆听:第一遍听整体,第二遍听细节
  4. 音频质量:好的耳机是必需品
  5. 评分者培训:有语音学背景会有帮助

下一步


完整音频文档请见 /docs/features/audio-annotation