발음 평가는 언어 학습 앱, 언어 치료, 액센트 코칭에서 등장합니다. 이 튜토리얼에서는 발음 품질을 평가하고, 특정 오류를 표시하며, 상세한 피드백을 작성하기 위한 인터페이스를 구축하는 과정을 설명합니다. 기반이 되는 오디오 기능에 대해서는 오디오 어노테이션 문서를 참조하십시오.

기본 발음 평가

yaml

annotation_task_name: "Pronunciation Assessment"
 
data_files:
  - data/learner_recordings.json
 
item_properties:
  audio_path: audio_path
  text_field: target_text
 
# Show what they should have said
display:
  show_text: true
  text_field: target_text
  text_label: "Target Sentence"
  text_style: large
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    waveform_color: "#06B6D4"
    progress_color: "#22D3EE"
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    name: overall_quality
    description: "Overall pronunciation quality"
    size: 5
    labels:
      - "1: Very poor"
      - "2: Poor"
      - "3: Fair"
      - "4: Good"
      - "5: Excellent"

상세 평가 구성

yaml

annotation_task_name: "Detailed Pronunciation Assessment"
 
data_files:
  - data/recordings.json
 
item_properties:
  audio_path: learner_audio
  reference_audio_field: native_audio
  text_field: sentence
  metadata_fields: [learner_id, native_language]
 
display:
  show_text: true
  text_field: sentence
  text_label: "Target Text"
  show_metadata: true
  metadata_fields:
    - label: "Native Language"
      field: native_language
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    height: 100
    dual_audio: true
    primary_label: "Learner"
    secondary_label: "Native Reference"
    primary_color: "#6366F1"
    secondary_color: "#22C55E"
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    loop: true
 
  # Overall ratings
  - annotation_type: likert
    name: overall
    description: "Overall pronunciation quality"
    size: 5
    min_label: "Very poor"
    max_label: "Excellent"
    required: true
 
  - annotation_type: likert
    name: intelligibility
    description: "How understandable is the speaker?"
    size: 5
    min_label: "Unintelligible"
    max_label: "Perfectly clear"
    required: true
 
  - annotation_type: likert
    name: accent_strength
    description: "How strong is the foreign accent?"
    size: 5
    min_label: "No accent"
    max_label: "Very strong"
 
  # Specific aspects
  - annotation_type: likert
    name: intonation
    description: "Intonation and rhythm"
    size: 5
    min_label: "Unnatural"
    max_label: "Native-like"
 
  - annotation_type: likert
    name: stress
    description: "Word and sentence stress"
    size: 5
    min_label: "Incorrect"
    max_label: "Correct"
 
  - annotation_type: likert
    name: fluency
    description: "Fluency (smoothness, pauses)"
    size: 5
    min_label: "Very disfluent"
    max_label: "Very fluent"
 
  # Error identification
  - annotation_type: multiselect
    name: error_types
    description: "What errors are present? (select all)"
    labels:
      - Vowel errors
      - Consonant errors
      - Word stress errors
      - Sentence intonation errors
      - Rhythm/timing errors
      - Mispronounced words
      - Added sounds
      - Deleted sounds
      - Substituted sounds
      - No significant errors
 
  # Specific problem words
  - annotation_type: text
    name: problem_words
    description: "List any mispronounced words"
    placeholder: "e.g., 'thought' pronounced as 'taught'"
    required: false
 
  # Comparison to native (if reference provided)
  - annotation_type: radio
    name: native_comparison
    description: "Compared to native reference"
    labels:
      - Very different
      - Somewhat different
      - Fairly similar
      - Very similar
      - Indistinguishable
    conditional:
      show_when_field: native_audio
      is_present: true
 
  # Confidence
  - annotation_type: likert
    name: confidence
    description: "Your confidence in this assessment"
    size: 5
    min_label: "Low"
    max_label: "High"

단어 수준 평가

상세한 음소/단어 피드백을 위해:

yaml

annotation_schemes:
  # Overall score
  - annotation_type: likert
    name: overall
    size: 5
 
  # Word-by-word rating
  - annotation_type: span_rating
    name: word_ratings
    source_field: sentence
    unit: word
    rating_scale:
      size: 3
      labels:
        - Incorrect
        - Acceptable
        - Correct
    allow_skip: true

음소 수준 어노테이션

음성 연구를 위해:

yaml

data_files:
  - data/phoneme_data.json
 
item_properties:
  audio_path: audio
  phoneme_field: expected_phonemes
 
annotation_schemes:
  - annotation_type: phoneme_assessment
    name: phonemes
    source_field: expected_phonemes
    labels:
      - name: correct
        color: "#22C55E"
      - name: substitution
        color: "#F59E0B"
      - name: deletion
        color: "#EF4444"
      - name: insertion
        color: "#8B5CF6"
 
    attributes:
      - name: actual_phoneme
        type: text
        show_when: substitution

전체 구성

yaml

annotation_task_name: "L2 English Pronunciation Assessment"
 
data_files:
  - data/esl_recordings.json
 
item_properties:
  audio_path: audio_url
  text_field: target_sentence
  metadata_fields:
    - learner_id
    - native_language
    - proficiency_level
 
display:
  show_text: true
  text_field: target_sentence
  text_label: "The learner was asked to say:"
  text_style: quote
 
  show_metadata: true
  metadata_layout: inline
  metadata_fields:
    - label: "L1"
      field: native_language
    - label: "Level"
      field: proficiency_level
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    waveform_color: "#0EA5E9"
    progress_color: "#38BDF8"
    height: 120
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    default_speed: 1.0
    loop: true
    volume_control: true
 
  # Global scores
  - annotation_type: likert
    name: overall_score
    description: "Overall pronunciation score"
    size: 9
    min_label: "1 (Very poor)"
    max_label: "9 (Native-like)"
    required: true
 
  - annotation_type: likert
    name: comprehensibility
    description: "How easy is it to understand?"
    size: 9
    min_label: "1 (Very hard)"
    max_label: "9 (Very easy)"
    required: true
 
  - annotation_type: likert
    name: accentedness
    description: "How strong is the foreign accent?"
    size: 9
    min_label: "1 (No accent)"
    max_label: "9 (Very strong)"
 
  # Segmental features
  - annotation_type: likert
    name: vowels
    description: "Vowel pronunciation accuracy"
    size: 5
    min_label: "Poor"
    max_label: "Excellent"
 
  - annotation_type: likert
    name: consonants
    description: "Consonant pronunciation accuracy"
    size: 5
    min_label: "Poor"
    max_label: "Excellent"
 
  # Suprasegmental features
  - annotation_type: likert
    name: word_stress
    description: "Word stress patterns"
    size: 5
    min_label: "Incorrect"
    max_label: "Correct"
 
  - annotation_type: likert
    name: sentence_intonation
    description: "Sentence intonation"
    size: 5
    min_label: "Unnatural"
    max_label: "Natural"
 
  - annotation_type: likert
    name: rhythm
    description: "Speech rhythm and timing"
    size: 5
    min_label: "Unnatural"
    max_label: "Natural"
 
  - annotation_type: likert
    name: fluency
    description: "Speaking fluency"
    size: 5
    min_label: "Disfluent"
    max_label: "Fluent"
 
  # Error analysis
  - annotation_type: multiselect
    name: common_errors
    description: "Check all that apply"
    labels:
      - Th-sounds (think/this)
      - R-sounds
      - L-sounds
      - Vowel length
      - Final consonants dropped
      - Consonant clusters simplified
      - Word stress on wrong syllable
      - Flat intonation
      - No errors detected
    required: false
 
  # Specific feedback
  - annotation_type: text
    name: feedback
    description: "Specific words or sounds that need work"
    multiline: true
    rows: 2
    placeholder: "e.g., 'world' - /r/ needs work, 'through' - th sound"
    required: false
 
  # Rater confidence
  - annotation_type: radio
    name: confidence
    description: "How confident are you in your ratings?"
    labels:
      - Not confident
      - Somewhat confident
      - Very confident
 
annotation_guidelines:
  title: "Pronunciation Assessment Guidelines"
  content: |
    ## Rating Scales
 
    **Overall Score (1-9)**
    - 1-3: Significant difficulty, hard to understand
    - 4-6: Noticeable accent, but understandable
    - 7-9: Minor issues to native-like
 
    **Comprehensibility**
    - How much effort to understand?
    - Ignore accent if message is clear
 
    **Accentedness**
    - Degree of foreign accent
    - Independent of intelligibility
 
    ## What to Listen For
    - Individual sounds (phonemes)
    - Word stress patterns
    - Sentence intonation
    - Speaking rate and pauses
 
    ## Tips
    - Listen at least twice
    - Use slow playback for details
    - Be consistent across recordings
 
quality_control:
  training_phase:
    enabled: true
    items: 5
    feedback: true

출력 형식

json

{
  "id": "rec_001",
  "audio_url": "/audio/learner_001_sent_05.wav",
  "target_sentence": "The weather is really nice today.",
  "learner_id": "L001",
  "native_language": "Mandarin",
  "annotations": {
    "overall_score": 6,
    "comprehensibility": 7,
    "accentedness": 5,
    "vowels": 4,
    "consonants": 3,
    "word_stress": 4,
    "sentence_intonation": 3,
    "rhythm": 3,
    "fluency": 4,
    "common_errors": ["Th-sounds (think/this)", "R-sounds"],
    "feedback": "'weather' - th sound; 'really' - r sound",
    "confidence": "Very confident"
  }
}

발음 평가를 위한 팁

시작하기 전에 몇 개의 샘플 녹음을 그룹으로 함께 평가하여 모두가 같은 기준에 맞춰 보정되도록 하십시오. 평가를 고정하기 위해 참조 녹음을 가까이에 두십시오. 최소 두 번 들으십시오. 한 번은 전반적인 인상을 위해, 다시 한 번은 세부 사항을 위해서입니다. 좋은 헤드폰을 사용하십시오. 휴대폰 스피커는 손해를 봅니다. 그리고 음성학 배경 지식이 있는 평가자는 이를 더 빨리 익힙니다.

다음 단계

실제로 말한 내용에 대한 전사를 추가하십시오
평가자 간 신뢰도 검사를 설정하십시오
발음 평가의 크라우드소싱에 대해 알아보십시오

전체 오디오 문서는 /docs/features/audio-annotation에 있습니다.