Tutorials6 min read
发音评估标注
构建一个带有音频播放和 Likert 量表的语言学习标注任务,用于评估发音质量。
Potato Team·
发音评估标注
发音评估对于语言学习应用、语言治疗和口音纠正至关重要。本教程涵盖构建发音质量评分界面、识别具体错误以及提供详细反馈。
基本发音评分
yaml
annotation_task_name: "Pronunciation Assessment"
data_files:
- data/learner_recordings.json
item_properties:
audio_path: audio_path
text_field: target_text
# Show what they should have said
display:
show_text: true
text_field: target_text
text_label: "Target Sentence"
text_style: large
annotation_schemes:
- annotation_type: audio_annotation
audio_display: waveform
waveform_color: "#06B6D4"
progress_color: "#22D3EE"
speed_control: true
speed_options: [0.5, 0.75, 1.0]
name: overall_quality
description: "Overall pronunciation quality"
size: 5
labels:
- "1: Very poor"
- "2: Poor"
- "3: Fair"
- "4: Good"
- "5: Excellent"详细评估配置
yaml
annotation_task_name: "Detailed Pronunciation Assessment"
data_files:
- data/recordings.json
item_properties:
audio_path: learner_audio
reference_audio_field: native_audio
text_field: sentence
metadata_fields: [learner_id, native_language]
display:
show_text: true
text_field: sentence
text_label: "Target Text"
show_metadata: true
metadata_fields:
- label: "Native Language"
field: native_language
annotation_schemes:
- annotation_type: audio_annotation
audio_display: waveform
height: 100
dual_audio: true
primary_label: "Learner"
secondary_label: "Native Reference"
primary_color: "#6366F1"
secondary_color: "#22C55E"
speed_control: true
speed_options: [0.5, 0.75, 1.0]
loop: true
# Overall ratings
- annotation_type: likert
name: overall
description: "Overall pronunciation quality"
size: 5
min_label: "Very poor"
max_label: "Excellent"
required: true
- annotation_type: likert
name: intelligibility
description: "How understandable is the speaker?"
size: 5
min_label: "Unintelligible"
max_label: "Perfectly clear"
required: true
- annotation_type: likert
name: accent_strength
description: "How strong is the foreign accent?"
size: 5
min_label: "No accent"
max_label: "Very strong"
# Specific aspects
- annotation_type: likert
name: intonation
description: "Intonation and rhythm"
size: 5
min_label: "Unnatural"
max_label: "Native-like"
- annotation_type: likert
name: stress
description: "Word and sentence stress"
size: 5
min_label: "Incorrect"
max_label: "Correct"
- annotation_type: likert
name: fluency
description: "Fluency (smoothness, pauses)"
size: 5
min_label: "Very disfluent"
max_label: "Very fluent"
# Error identification
- annotation_type: multiselect
name: error_types
description: "What errors are present? (select all)"
labels:
- Vowel errors
- Consonant errors
- Word stress errors
- Sentence intonation errors
- Rhythm/timing errors
- Mispronounced words
- Added sounds
- Deleted sounds
- Substituted sounds
- No significant errors
# Specific problem words
- annotation_type: text
name: problem_words
description: "List any mispronounced words"
placeholder: "e.g., 'thought' pronounced as 'taught'"
required: false
# Comparison to native (if reference provided)
- annotation_type: radio
name: native_comparison
description: "Compared to native reference"
labels:
- Very different
- Somewhat different
- Fairly similar
- Very similar
- Indistinguishable
conditional:
show_when_field: native_audio
is_present: true
# Confidence
- annotation_type: likert
name: confidence
description: "Your confidence in this assessment"
size: 5
min_label: "Low"
max_label: "High"词级评估
用于详细的音素/词汇反馈:
yaml
annotation_schemes:
# Overall score
- annotation_type: likert
name: overall
size: 5
# Word-by-word rating
- annotation_type: span_rating
name: word_ratings
source_field: sentence
unit: word
rating_scale:
size: 3
labels:
- Incorrect
- Acceptable
- Correct
allow_skip: true音素级标注
用于语音研究:
yaml
data_files:
- data/phoneme_data.json
item_properties:
audio_path: audio
phoneme_field: expected_phonemes
annotation_schemes:
- annotation_type: phoneme_assessment
name: phonemes
source_field: expected_phonemes
labels:
- name: correct
color: "#22C55E"
- name: substitution
color: "#F59E0B"
- name: deletion
color: "#EF4444"
- name: insertion
color: "#8B5CF6"
attributes:
- name: actual_phoneme
type: text
show_when: substitution完整配置
yaml
annotation_task_name: "L2 English Pronunciation Assessment"
data_files:
- data/esl_recordings.json
item_properties:
audio_path: audio_url
text_field: target_sentence
metadata_fields:
- learner_id
- native_language
- proficiency_level
display:
show_text: true
text_field: target_sentence
text_label: "The learner was asked to say:"
text_style: quote
show_metadata: true
metadata_layout: inline
metadata_fields:
- label: "L1"
field: native_language
- label: "Level"
field: proficiency_level
annotation_schemes:
- annotation_type: audio_annotation
audio_display: waveform
waveform_color: "#0EA5E9"
progress_color: "#38BDF8"
height: 120
speed_control: true
speed_options: [0.5, 0.75, 1.0]
default_speed: 1.0
loop: true
volume_control: true
# Global scores
- annotation_type: likert
name: overall_score
description: "Overall pronunciation score"
size: 9
min_label: "1 (Very poor)"
max_label: "9 (Native-like)"
required: true
- annotation_type: likert
name: comprehensibility
description: "How easy is it to understand?"
size: 9
min_label: "1 (Very hard)"
max_label: "9 (Very easy)"
required: true
- annotation_type: likert
name: accentedness
description: "How strong is the foreign accent?"
size: 9
min_label: "1 (No accent)"
max_label: "9 (Very strong)"
# Segmental features
- annotation_type: likert
name: vowels
description: "Vowel pronunciation accuracy"
size: 5
min_label: "Poor"
max_label: "Excellent"
- annotation_type: likert
name: consonants
description: "Consonant pronunciation accuracy"
size: 5
min_label: "Poor"
max_label: "Excellent"
# Suprasegmental features
- annotation_type: likert
name: word_stress
description: "Word stress patterns"
size: 5
min_label: "Incorrect"
max_label: "Correct"
- annotation_type: likert
name: sentence_intonation
description: "Sentence intonation"
size: 5
min_label: "Unnatural"
max_label: "Natural"
- annotation_type: likert
name: rhythm
description: "Speech rhythm and timing"
size: 5
min_label: "Unnatural"
max_label: "Natural"
- annotation_type: likert
name: fluency
description: "Speaking fluency"
size: 5
min_label: "Disfluent"
max_label: "Fluent"
# Error analysis
- annotation_type: multiselect
name: common_errors
description: "Check all that apply"
labels:
- Th-sounds (think/this)
- R-sounds
- L-sounds
- Vowel length
- Final consonants dropped
- Consonant clusters simplified
- Word stress on wrong syllable
- Flat intonation
- No errors detected
required: false
# Specific feedback
- annotation_type: text
name: feedback
description: "Specific words or sounds that need work"
textarea: true
rows: 2
placeholder: "e.g., 'world' - /r/ needs work, 'through' - th sound"
required: false
# Rater confidence
- annotation_type: radio
name: confidence
description: "How confident are you in your ratings?"
labels:
- Not confident
- Somewhat confident
- Very confident
annotation_guidelines:
title: "Pronunciation Assessment Guidelines"
content: |
## Rating Scales
**Overall Score (1-9)**
- 1-3: Significant difficulty, hard to understand
- 4-6: Noticeable accent, but understandable
- 7-9: Minor issues to native-like
**Comprehensibility**
- How much effort to understand?
- Ignore accent if message is clear
**Accentedness**
- Degree of foreign accent
- Independent of intelligibility
## What to Listen For
- Individual sounds (phonemes)
- Word stress patterns
- Sentence intonation
- Speaking rate and pauses
## Tips
- Listen at least twice
- Use slow playback for details
- Be consistent across recordings
quality_control:
training_phase:
enabled: true
items: 5
feedback: true输出格式
json
{
"id": "rec_001",
"audio_url": "/audio/learner_001_sent_05.wav",
"target_sentence": "The weather is really nice today.",
"learner_id": "L001",
"native_language": "Mandarin",
"annotations": {
"overall_score": 6,
"comprehensibility": 7,
"accentedness": 5,
"vowels": 4,
"consonants": 3,
"word_stress": 4,
"sentence_intonation": 3,
"rhythm": 3,
"fluency": 4,
"common_errors": ["Th-sounds (think/this)", "R-sounds"],
"feedback": "'weather' - th sound; 'really' - r sound",
"confidence": "Very confident"
}
}发音评估技巧
- 校准:先一起评分样本录音
- 一致性:使用参考录音作为锚定
- 多次聆听:第一遍听整体,第二遍听细节
- 音频质量:好的耳机是必需品
- 评分者培训:有语音学背景会有帮助
下一步
完整音频文档请见 /docs/features/audio-annotation。