Tutorials6 min read
발음 평가 어노테이션
오디오 재생, 파형 시각화, Likert 평가 척도, 녹음별 자유 텍스트 피드백 필드를 갖춘 발음 품질 어노테이션 작업을 Potato로 구축합니다.
Potato Team
발음 평가는 언어 학습 앱, 언어 치료, 액센트 코칭에서 등장합니다. 이 튜토리얼에서는 발음 품질을 평가하고, 특정 오류를 표시하며, 상세한 피드백을 작성하기 위한 인터페이스를 구축하는 과정을 설명합니다. 기반이 되는 오디오 기능에 대해서는 오디오 어노테이션 문서를 참조하십시오.
기본 발음 평가
yaml
annotation_task_name: "Pronunciation Assessment"
data_files:
- data/learner_recordings.json
item_properties:
audio_path: audio_path
text_field: target_text
# Show what they should have said
display:
show_text: true
text_field: target_text
text_label: "Target Sentence"
text_style: large
annotation_schemes:
- annotation_type: audio_annotation
audio_display: waveform
waveform_color: "#06B6D4"
progress_color: "#22D3EE"
speed_control: true
speed_options: [0.5, 0.75, 1.0]
name: overall_quality
description: "Overall pronunciation quality"
size: 5
labels:
- "1: Very poor"
- "2: Poor"
- "3: Fair"
- "4: Good"
- "5: Excellent"상세 평가 구성
yaml
annotation_task_name: "Detailed Pronunciation Assessment"
data_files:
- data/recordings.json
item_properties:
audio_path: learner_audio
reference_audio_field: native_audio
text_field: sentence
metadata_fields: [learner_id, native_language]
display:
show_text: true
text_field: sentence
text_label: "Target Text"
show_metadata: true
metadata_fields:
- label: "Native Language"
field: native_language
annotation_schemes:
- annotation_type: audio_annotation
audio_display: waveform
height: 100
dual_audio: true
primary_label: "Learner"
secondary_label: "Native Reference"
primary_color: "#6366F1"
secondary_color: "#22C55E"
speed_control: true
speed_options: [0.5, 0.75, 1.0]
loop: true
# Overall ratings
- annotation_type: likert
name: overall
description: "Overall pronunciation quality"
size: 5
min_label: "Very poor"
max_label: "Excellent"
required: true
- annotation_type: likert
name: intelligibility
description: "How understandable is the speaker?"
size: 5
min_label: "Unintelligible"
max_label: "Perfectly clear"
required: true
- annotation_type: likert
name: accent_strength
description: "How strong is the foreign accent?"
size: 5
min_label: "No accent"
max_label: "Very strong"
# Specific aspects
- annotation_type: likert
name: intonation
description: "Intonation and rhythm"
size: 5
min_label: "Unnatural"
max_label: "Native-like"
- annotation_type: likert
name: stress
description: "Word and sentence stress"
size: 5
min_label: "Incorrect"
max_label: "Correct"
- annotation_type: likert
name: fluency
description: "Fluency (smoothness, pauses)"
size: 5
min_label: "Very disfluent"
max_label: "Very fluent"
# Error identification
- annotation_type: multiselect
name: error_types
description: "What errors are present? (select all)"
labels:
- Vowel errors
- Consonant errors
- Word stress errors
- Sentence intonation errors
- Rhythm/timing errors
- Mispronounced words
- Added sounds
- Deleted sounds
- Substituted sounds
- No significant errors
# Specific problem words
- annotation_type: text
name: problem_words
description: "List any mispronounced words"
placeholder: "e.g., 'thought' pronounced as 'taught'"
required: false
# Comparison to native (if reference provided)
- annotation_type: radio
name: native_comparison
description: "Compared to native reference"
labels:
- Very different
- Somewhat different
- Fairly similar
- Very similar
- Indistinguishable
conditional:
show_when_field: native_audio
is_present: true
# Confidence
- annotation_type: likert
name: confidence
description: "Your confidence in this assessment"
size: 5
min_label: "Low"
max_label: "High"단어 수준 평가
상세한 음소/단어 피드백을 위해:
yaml
annotation_schemes:
# Overall score
- annotation_type: likert
name: overall
size: 5
# Word-by-word rating
- annotation_type: span_rating
name: word_ratings
source_field: sentence
unit: word
rating_scale:
size: 3
labels:
- Incorrect
- Acceptable
- Correct
allow_skip: true음소 수준 어노테이션
음성 연구를 위해:
yaml
data_files:
- data/phoneme_data.json
item_properties:
audio_path: audio
phoneme_field: expected_phonemes
annotation_schemes:
- annotation_type: phoneme_assessment
name: phonemes
source_field: expected_phonemes
labels:
- name: correct
color: "#22C55E"
- name: substitution
color: "#F59E0B"
- name: deletion
color: "#EF4444"
- name: insertion
color: "#8B5CF6"
attributes:
- name: actual_phoneme
type: text
show_when: substitution전체 구성
yaml
annotation_task_name: "L2 English Pronunciation Assessment"
data_files:
- data/esl_recordings.json
item_properties:
audio_path: audio_url
text_field: target_sentence
metadata_fields:
- learner_id
- native_language
- proficiency_level
display:
show_text: true
text_field: target_sentence
text_label: "The learner was asked to say:"
text_style: quote
show_metadata: true
metadata_layout: inline
metadata_fields:
- label: "L1"
field: native_language
- label: "Level"
field: proficiency_level
annotation_schemes:
- annotation_type: audio_annotation
audio_display: waveform
waveform_color: "#0EA5E9"
progress_color: "#38BDF8"
height: 120
speed_control: true
speed_options: [0.5, 0.75, 1.0]
default_speed: 1.0
loop: true
volume_control: true
# Global scores
- annotation_type: likert
name: overall_score
description: "Overall pronunciation score"
size: 9
min_label: "1 (Very poor)"
max_label: "9 (Native-like)"
required: true
- annotation_type: likert
name: comprehensibility
description: "How easy is it to understand?"
size: 9
min_label: "1 (Very hard)"
max_label: "9 (Very easy)"
required: true
- annotation_type: likert
name: accentedness
description: "How strong is the foreign accent?"
size: 9
min_label: "1 (No accent)"
max_label: "9 (Very strong)"
# Segmental features
- annotation_type: likert
name: vowels
description: "Vowel pronunciation accuracy"
size: 5
min_label: "Poor"
max_label: "Excellent"
- annotation_type: likert
name: consonants
description: "Consonant pronunciation accuracy"
size: 5
min_label: "Poor"
max_label: "Excellent"
# Suprasegmental features
- annotation_type: likert
name: word_stress
description: "Word stress patterns"
size: 5
min_label: "Incorrect"
max_label: "Correct"
- annotation_type: likert
name: sentence_intonation
description: "Sentence intonation"
size: 5
min_label: "Unnatural"
max_label: "Natural"
- annotation_type: likert
name: rhythm
description: "Speech rhythm and timing"
size: 5
min_label: "Unnatural"
max_label: "Natural"
- annotation_type: likert
name: fluency
description: "Speaking fluency"
size: 5
min_label: "Disfluent"
max_label: "Fluent"
# Error analysis
- annotation_type: multiselect
name: common_errors
description: "Check all that apply"
labels:
- Th-sounds (think/this)
- R-sounds
- L-sounds
- Vowel length
- Final consonants dropped
- Consonant clusters simplified
- Word stress on wrong syllable
- Flat intonation
- No errors detected
required: false
# Specific feedback
- annotation_type: text
name: feedback
description: "Specific words or sounds that need work"
multiline: true
rows: 2
placeholder: "e.g., 'world' - /r/ needs work, 'through' - th sound"
required: false
# Rater confidence
- annotation_type: radio
name: confidence
description: "How confident are you in your ratings?"
labels:
- Not confident
- Somewhat confident
- Very confident
annotation_guidelines:
title: "Pronunciation Assessment Guidelines"
content: |
## Rating Scales
**Overall Score (1-9)**
- 1-3: Significant difficulty, hard to understand
- 4-6: Noticeable accent, but understandable
- 7-9: Minor issues to native-like
**Comprehensibility**
- How much effort to understand?
- Ignore accent if message is clear
**Accentedness**
- Degree of foreign accent
- Independent of intelligibility
## What to Listen For
- Individual sounds (phonemes)
- Word stress patterns
- Sentence intonation
- Speaking rate and pauses
## Tips
- Listen at least twice
- Use slow playback for details
- Be consistent across recordings
quality_control:
training_phase:
enabled: true
items: 5
feedback: true출력 형식
json
{
"id": "rec_001",
"audio_url": "/audio/learner_001_sent_05.wav",
"target_sentence": "The weather is really nice today.",
"learner_id": "L001",
"native_language": "Mandarin",
"annotations": {
"overall_score": 6,
"comprehensibility": 7,
"accentedness": 5,
"vowels": 4,
"consonants": 3,
"word_stress": 4,
"sentence_intonation": 3,
"rhythm": 3,
"fluency": 4,
"common_errors": ["Th-sounds (think/this)", "R-sounds"],
"feedback": "'weather' - th sound; 'really' - r sound",
"confidence": "Very confident"
}
}발음 평가를 위한 팁
시작하기 전에 몇 개의 샘플 녹음을 그룹으로 함께 평가하여 모두가 같은 기준에 맞춰 보정되도록 하십시오. 평가를 고정하기 위해 참조 녹음을 가까이에 두십시오. 최소 두 번 들으십시오. 한 번은 전반적인 인상을 위해, 다시 한 번은 세부 사항을 위해서입니다. 좋은 헤드폰을 사용하십시오. 휴대폰 스피커는 손해를 봅니다. 그리고 음성학 배경 지식이 있는 평가자는 이를 더 빨리 익힙니다.
다음 단계
전체 오디오 문서는 /docs/features/audio-annotation에 있습니다.