Skip to content
Tutorials6 min read

Annotation d'évaluation de la prononciation

Construisez une tâche d'annotation pour l'apprentissage des langues permettant d'évaluer la qualité de la prononciation avec lecture audio et échelles de Likert.

Potato Team·

Annotation d'évaluation de la prononciation

L'évaluation de la prononciation est cruciale pour les applications d'apprentissage des langues, l'orthophonie et le coaching d'accent. Ce tutoriel couvre la construction d'interfaces pour noter la qualité de la prononciation, identifier les erreurs spécifiques et fournir des retours détaillés.

Notation basique de la prononciation

yaml
annotation_task_name: "Pronunciation Assessment"
 
data_files:
  - data/learner_recordings.json
 
item_properties:
  audio_path: audio_path
  text_field: target_text
 
# Show what they should have said
display:
  show_text: true
  text_field: target_text
  text_label: "Target Sentence"
  text_style: large
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    waveform_color: "#06B6D4"
    progress_color: "#22D3EE"
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    name: overall_quality
    description: "Overall pronunciation quality"
    size: 5
    labels:
      - "1: Very poor"
      - "2: Poor"
      - "3: Fair"
      - "4: Good"
      - "5: Excellent"

Configuration d'évaluation détaillée

yaml
annotation_task_name: "Detailed Pronunciation Assessment"
 
data_files:
  - data/recordings.json
 
item_properties:
  audio_path: learner_audio
  reference_audio_field: native_audio
  text_field: sentence
  metadata_fields: [learner_id, native_language]
 
display:
  show_text: true
  text_field: sentence
  text_label: "Target Text"
  show_metadata: true
  metadata_fields:
    - label: "Native Language"
      field: native_language
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    height: 100
    dual_audio: true
    primary_label: "Learner"
    secondary_label: "Native Reference"
    primary_color: "#6366F1"
    secondary_color: "#22C55E"
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    loop: true
 
  # Overall ratings
  - annotation_type: likert
    name: overall
    description: "Overall pronunciation quality"
    size: 5
    min_label: "Very poor"
    max_label: "Excellent"
    required: true
 
  - annotation_type: likert
    name: intelligibility
    description: "How understandable is the speaker?"
    size: 5
    min_label: "Unintelligible"
    max_label: "Perfectly clear"
    required: true
 
  - annotation_type: likert
    name: accent_strength
    description: "How strong is the foreign accent?"
    size: 5
    min_label: "No accent"
    max_label: "Very strong"
 
  # Specific aspects
  - annotation_type: likert
    name: intonation
    description: "Intonation and rhythm"
    size: 5
    min_label: "Unnatural"
    max_label: "Native-like"
 
  - annotation_type: likert
    name: stress
    description: "Word and sentence stress"
    size: 5
    min_label: "Incorrect"
    max_label: "Correct"
 
  - annotation_type: likert
    name: fluency
    description: "Fluency (smoothness, pauses)"
    size: 5
    min_label: "Very disfluent"
    max_label: "Very fluent"
 
  # Error identification
  - annotation_type: multiselect
    name: error_types
    description: "What errors are present? (select all)"
    labels:
      - Vowel errors
      - Consonant errors
      - Word stress errors
      - Sentence intonation errors
      - Rhythm/timing errors
      - Mispronounced words
      - Added sounds
      - Deleted sounds
      - Substituted sounds
      - No significant errors
 
  # Specific problem words
  - annotation_type: text
    name: problem_words
    description: "List any mispronounced words"
    placeholder: "e.g., 'thought' pronounced as 'taught'"
    required: false
 
  # Comparison to native (if reference provided)
  - annotation_type: radio
    name: native_comparison
    description: "Compared to native reference"
    labels:
      - Very different
      - Somewhat different
      - Fairly similar
      - Very similar
      - Indistinguishable
    conditional:
      show_when_field: native_audio
      is_present: true
 
  # Confidence
  - annotation_type: likert
    name: confidence
    description: "Your confidence in this assessment"
    size: 5
    min_label: "Low"
    max_label: "High"

Évaluation au niveau des mots

Pour des retours détaillés au niveau des phonèmes/mots :

yaml
annotation_schemes:
  # Overall score
  - annotation_type: likert
    name: overall
    size: 5
 
  # Word-by-word rating
  - annotation_type: span_rating
    name: word_ratings
    source_field: sentence
    unit: word
    rating_scale:
      size: 3
      labels:
        - Incorrect
        - Acceptable
        - Correct
    allow_skip: true

Annotation au niveau des phonèmes

Pour la recherche en parole :

yaml
data_files:
  - data/phoneme_data.json
 
item_properties:
  audio_path: audio
  phoneme_field: expected_phonemes
 
annotation_schemes:
  - annotation_type: phoneme_assessment
    name: phonemes
    source_field: expected_phonemes
    labels:
      - name: correct
        color: "#22C55E"
      - name: substitution
        color: "#F59E0B"
      - name: deletion
        color: "#EF4444"
      - name: insertion
        color: "#8B5CF6"
 
    attributes:
      - name: actual_phoneme
        type: text
        show_when: substitution

Configuration complète

yaml
annotation_task_name: "L2 English Pronunciation Assessment"
 
data_files:
  - data/esl_recordings.json
 
item_properties:
  audio_path: audio_url
  text_field: target_sentence
  metadata_fields:
    - learner_id
    - native_language
    - proficiency_level
 
display:
  show_text: true
  text_field: target_sentence
  text_label: "The learner was asked to say:"
  text_style: quote
 
  show_metadata: true
  metadata_layout: inline
  metadata_fields:
    - label: "L1"
      field: native_language
    - label: "Level"
      field: proficiency_level
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    waveform_color: "#0EA5E9"
    progress_color: "#38BDF8"
    height: 120
    speed_control: true
    speed_options: [0.5, 0.75, 1.0]
    default_speed: 1.0
    loop: true
    volume_control: true
 
  # Global scores
  - annotation_type: likert
    name: overall_score
    description: "Overall pronunciation score"
    size: 9
    min_label: "1 (Very poor)"
    max_label: "9 (Native-like)"
    required: true
 
  - annotation_type: likert
    name: comprehensibility
    description: "How easy is it to understand?"
    size: 9
    min_label: "1 (Very hard)"
    max_label: "9 (Very easy)"
    required: true
 
  - annotation_type: likert
    name: accentedness
    description: "How strong is the foreign accent?"
    size: 9
    min_label: "1 (No accent)"
    max_label: "9 (Very strong)"
 
  # Segmental features
  - annotation_type: likert
    name: vowels
    description: "Vowel pronunciation accuracy"
    size: 5
    min_label: "Poor"
    max_label: "Excellent"
 
  - annotation_type: likert
    name: consonants
    description: "Consonant pronunciation accuracy"
    size: 5
    min_label: "Poor"
    max_label: "Excellent"
 
  # Suprasegmental features
  - annotation_type: likert
    name: word_stress
    description: "Word stress patterns"
    size: 5
    min_label: "Incorrect"
    max_label: "Correct"
 
  - annotation_type: likert
    name: sentence_intonation
    description: "Sentence intonation"
    size: 5
    min_label: "Unnatural"
    max_label: "Natural"
 
  - annotation_type: likert
    name: rhythm
    description: "Speech rhythm and timing"
    size: 5
    min_label: "Unnatural"
    max_label: "Natural"
 
  - annotation_type: likert
    name: fluency
    description: "Speaking fluency"
    size: 5
    min_label: "Disfluent"
    max_label: "Fluent"
 
  # Error analysis
  - annotation_type: multiselect
    name: common_errors
    description: "Check all that apply"
    labels:
      - Th-sounds (think/this)
      - R-sounds
      - L-sounds
      - Vowel length
      - Final consonants dropped
      - Consonant clusters simplified
      - Word stress on wrong syllable
      - Flat intonation
      - No errors detected
    required: false
 
  # Specific feedback
  - annotation_type: text
    name: feedback
    description: "Specific words or sounds that need work"
    textarea: true
    rows: 2
    placeholder: "e.g., 'world' - /r/ needs work, 'through' - th sound"
    required: false
 
  # Rater confidence
  - annotation_type: radio
    name: confidence
    description: "How confident are you in your ratings?"
    labels:
      - Not confident
      - Somewhat confident
      - Very confident
 
annotation_guidelines:
  title: "Pronunciation Assessment Guidelines"
  content: |
    ## Échelles de notation
 
    **Score global (1-9)**
    - 1-3 : Difficulté significative, difficile à comprendre
    - 4-6 : Accent notable, mais compréhensible
    - 7-9 : Problèmes mineurs à prononciation native
 
    **Compréhensibilité**
    - Quel effort pour comprendre ?
    - Ignorez l'accent si le message est clair
 
    **Accent**
    - Degré d'accent étranger
    - Indépendant de l'intelligibilité
 
    ## Quoi écouter
    - Sons individuels (phonèmes)
    - Schémas d'accent tonique
    - Intonation de la phrase
    - Débit de parole et pauses
 
    ## Conseils
    - Écoutez au moins deux fois
    - Utilisez la lecture ralentie pour les détails
    - Soyez cohérent entre les enregistrements
 
quality_control:
  training_phase:
    enabled: true
    items: 5
    feedback: true

Format de sortie

json
{
  "id": "rec_001",
  "audio_url": "/audio/learner_001_sent_05.wav",
  "target_sentence": "The weather is really nice today.",
  "learner_id": "L001",
  "native_language": "Mandarin",
  "annotations": {
    "overall_score": 6,
    "comprehensibility": 7,
    "accentedness": 5,
    "vowels": 4,
    "consonants": 3,
    "word_stress": 4,
    "sentence_intonation": 3,
    "rhythm": 3,
    "fluency": 4,
    "common_errors": ["Th-sounds (think/this)", "R-sounds"],
    "feedback": "'weather' - th sound; 'really' - r sound",
    "confidence": "Very confident"
  }
}

Conseils pour l'évaluation de la prononciation

  1. Calibration : notez des enregistrements d'exemple ensemble d'abord
  2. Cohérence : utilisez des enregistrements de référence pour l'ancrage
  3. Écoutes multiples : d'abord pour le global, ensuite pour les détails
  4. Audio de qualité : un bon casque est essentiel
  5. Formation des évaluateurs : une formation en phonétique aide

Prochaines étapes


Documentation audio complète sur /docs/features/audio-annotation.