Skip to content
Tutorials5 min read

発音評価アノテーション

オーディオ再生とリッカートスケールを使用した発音品質評価のための語学学習アノテーションタスクの構築。

Potato Team·

発音評価アノテーション

発音評価は、語学学習アプリ、言語療法、アクセントコーチングにとって重要です。本チュートリアルでは、発音品質の評価、特定のエラーの識別、詳細なフィードバックの提供のためのインターフェース構築を紹介します。

基本的な発音評価

yaml
annotation_task_name: "Pronunciation Assessment"
 
data_files:
  - data/learner_recordings.json
 
item_properties:
  audio_path: audio_path
  text_field: target_text
 
# Show what they should have said
display:
  show_text: true
  text_field: target_text
  text_label: "Target Sentence"
  text_style: large
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    waveform_color: "#06B6D4"
    progress_color: "#22D3EE"
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    name: overall_quality
    description: "Overall pronunciation quality"
    size: 5
    labels:
      - "1: Very poor"
      - "2: Poor"
      - "3: Fair"
      - "4: Good"
      - "5: Excellent"

詳細な評価設定

yaml
annotation_task_name: "Detailed Pronunciation Assessment"
 
data_files:
  - data/recordings.json
 
item_properties:
  audio_path: learner_audio
  reference_audio_field: native_audio
  text_field: sentence
  metadata_fields: [learner_id, native_language]
 
display:
  show_text: true
  text_field: sentence
  text_label: "Target Text"
  show_metadata: true
  metadata_fields:
    - label: "Native Language"
      field: native_language
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    height: 100
    dual_audio: true
    primary_label: "Learner"
    secondary_label: "Native Reference"
    primary_color: "#6366F1"
    secondary_color: "#22C55E"
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    loop: true
 
  # Overall ratings
  - annotation_type: likert
    name: overall
    description: "Overall pronunciation quality"
    size: 5
    min_label: "Very poor"
    max_label: "Excellent"
    required: true
 
  - annotation_type: likert
    name: intelligibility
    description: "How understandable is the speaker?"
    size: 5
    min_label: "Unintelligible"
    max_label: "Perfectly clear"
    required: true
 
  - annotation_type: likert
    name: accent_strength
    description: "How strong is the foreign accent?"
    size: 5
    min_label: "No accent"
    max_label: "Very strong"
 
  # Specific aspects
  - annotation_type: likert
    name: intonation
    description: "Intonation and rhythm"
    size: 5
    min_label: "Unnatural"
    max_label: "Native-like"
 
  - annotation_type: likert
    name: stress
    description: "Word and sentence stress"
    size: 5
    min_label: "Incorrect"
    max_label: "Correct"
 
  - annotation_type: likert
    name: fluency
    description: "Fluency (smoothness, pauses)"
    size: 5
    min_label: "Very disfluent"
    max_label: "Very fluent"
 
  # Error identification
  - annotation_type: multiselect
    name: error_types
    description: "What errors are present? (select all)"
    labels:
      - Vowel errors
      - Consonant errors
      - Word stress errors
      - Sentence intonation errors
      - Rhythm/timing errors
      - Mispronounced words
      - Added sounds
      - Deleted sounds
      - Substituted sounds
      - No significant errors
 
  # Specific problem words
  - annotation_type: text
    name: problem_words
    description: "List any mispronounced words"
    placeholder: "e.g., 'thought' pronounced as 'taught'"
    required: false
 
  # Comparison to native (if reference provided)
  - annotation_type: radio
    name: native_comparison
    description: "Compared to native reference"
    labels:
      - Very different
      - Somewhat different
      - Fairly similar
      - Very similar
      - Indistinguishable
    conditional:
      show_when_field: native_audio
      is_present: true
 
  # Confidence
  - annotation_type: likert
    name: confidence
    description: "Your confidence in this assessment"
    size: 5
    min_label: "Low"
    max_label: "High"

単語レベルの評価

詳細な音素/単語フィードバック用:

yaml
annotation_schemes:
  # Overall score
  - annotation_type: likert
    name: overall
    size: 5
 
  # Word-by-word rating
  - annotation_type: span_rating
    name: word_ratings
    source_field: sentence
    unit: word
    rating_scale:
      size: 3
      labels:
        - Incorrect
        - Acceptable
        - Correct
    allow_skip: true

音素レベルのアノテーション

音声研究用:

yaml
data_files:
  - data/phoneme_data.json
 
item_properties:
  audio_path: audio
  phoneme_field: expected_phonemes
 
annotation_schemes:
  - annotation_type: phoneme_assessment
    name: phonemes
    source_field: expected_phonemes
    labels:
      - name: correct
        color: "#22C55E"
      - name: substitution
        color: "#F59E0B"
      - name: deletion
        color: "#EF4444"
      - name: insertion
        color: "#8B5CF6"
 
    attributes:
      - name: actual_phoneme
        type: text
        show_when: substitution

完全な設定

yaml
annotation_task_name: "L2 English Pronunciation Assessment"
 
data_files:
  - data/esl_recordings.json
 
item_properties:
  audio_path: audio_url
  text_field: target_sentence
  metadata_fields:
    - learner_id
    - native_language
    - proficiency_level
 
display:
  show_text: true
  text_field: target_sentence
  text_label: "The learner was asked to say:"
  text_style: quote
 
  show_metadata: true
  metadata_layout: inline
  metadata_fields:
    - label: "L1"
      field: native_language
    - label: "Level"
      field: proficiency_level
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    waveform_color: "#0EA5E9"
    progress_color: "#38BDF8"
    height: 120
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    default_speed: 1.0
    loop: true
    volume_control: true
 
  # Global scores
  - annotation_type: likert
    name: overall_score
    description: "Overall pronunciation score"
    size: 9
    min_label: "1 (Very poor)"
    max_label: "9 (Native-like)"
    required: true
 
  - annotation_type: likert
    name: comprehensibility
    description: "How easy is it to understand?"
    size: 9
    min_label: "1 (Very hard)"
    max_label: "9 (Very easy)"
    required: true
 
  - annotation_type: likert
    name: accentedness
    description: "How strong is the foreign accent?"
    size: 9
    min_label: "1 (No accent)"
    max_label: "9 (Very strong)"
 
  # Segmental features
  - annotation_type: likert
    name: vowels
    description: "Vowel pronunciation accuracy"
    size: 5
    min_label: "Poor"
    max_label: "Excellent"
 
  - annotation_type: likert
    name: consonants
    description: "Consonant pronunciation accuracy"
    size: 5
    min_label: "Poor"
    max_label: "Excellent"
 
  # Suprasegmental features
  - annotation_type: likert
    name: word_stress
    description: "Word stress patterns"
    size: 5
    min_label: "Incorrect"
    max_label: "Correct"
 
  - annotation_type: likert
    name: sentence_intonation
    description: "Sentence intonation"
    size: 5
    min_label: "Unnatural"
    max_label: "Natural"
 
  - annotation_type: likert
    name: rhythm
    description: "Speech rhythm and timing"
    size: 5
    min_label: "Unnatural"
    max_label: "Natural"
 
  - annotation_type: likert
    name: fluency
    description: "Speaking fluency"
    size: 5
    min_label: "Disfluent"
    max_label: "Fluent"
 
  # Error analysis
  - annotation_type: multiselect
    name: common_errors
    description: "Check all that apply"
    labels:
      - Th-sounds (think/this)
      - R-sounds
      - L-sounds
      - Vowel length
      - Final consonants dropped
      - Consonant clusters simplified
      - Word stress on wrong syllable
      - Flat intonation
      - No errors detected
    required: false
 
  # Specific feedback
  - annotation_type: text
    name: feedback
    description: "Specific words or sounds that need work"
    textarea: true
    rows: 2
    placeholder: "e.g., 'world' - /r/ needs work, 'through' - th sound"
    required: false
 
  # Rater confidence
  - annotation_type: radio
    name: confidence
    description: "How confident are you in your ratings?"
    labels:
      - Not confident
      - Somewhat confident
      - Very confident
 
annotation_guidelines:
  title: "Pronunciation Assessment Guidelines"
  content: |
    ## 評価スケール
 
    **総合スコア(1-9)**
    - 1-3: 大きな困難、理解しにくい
    - 4-6: 目立つアクセントだが理解可能
    - 7-9: 軽微な問題からネイティブレベル
 
    **理解しやすさ**
    - 理解するのにどれくらいの努力が必要か?
    - メッセージが明確であればアクセントは無視
 
    **アクセントの強さ**
    - 外国語アクセントの程度
    - 了解度とは独立
 
    ## 聴取ポイント
    - 個々の音(音素)
    - 語ストレスパターン
    - 文イントネーション
    - 発話速度とポーズ
 
    ## コツ
    - 少なくとも2回聴く
    - 詳細にはスロー再生を使用
    - 録音間で一貫性を保つ
 
quality_control:
  training_phase:
    enabled: true
    items: 5
    feedback: true

出力フォーマット

json
{
  "id": "rec_001",
  "audio_url": "/audio/learner_001_sent_05.wav",
  "target_sentence": "The weather is really nice today.",
  "learner_id": "L001",
  "native_language": "Mandarin",
  "annotations": {
    "overall_score": 6,
    "comprehensibility": 7,
    "accentedness": 5,
    "vowels": 4,
    "consonants": 3,
    "word_stress": 4,
    "sentence_intonation": 3,
    "rhythm": 3,
    "fluency": 4,
    "common_errors": ["Th-sounds (think/this)", "R-sounds"],
    "feedback": "'weather' - th sound; 'really' - r sound",
    "confidence": "Very confident"
  }
}

発音評価のコツ

  1. キャリブレーション: 最初にサンプル録音を一緒に評価する
  2. 一貫性: アンカリングにリファレンス録音を使用する
  3. 複数回の聴取: 1回目は全体、2回目は詳細
  4. 高品質オーディオ: 良いヘッドフォンが不可欠
  5. 評価者のトレーニング: 音声学の背景知識が役立つ

次のステップ


オーディオの完全なドキュメントは/docs/features/audio-annotationをご覧ください。