Skip to content
Tutorials4 min read

设置音频转写审核任务

配置波形可视化、播放控制和文本校正界面,用于音频转写任务。

Potato Team·

设置音频转写审核任务

转写审核对于高质量的 ASR 训练数据至关重要。本教程展示如何构建一个界面,让标注者可以听取音频、查看波形并校正机器生成的转写文本。

我们要构建的内容

一个包含以下功能的界面:

  • 波形可视化
  • 播放控制(播放、暂停、速度调节)
  • 可编辑的转写文本
  • 音频质量评分
  • 不确定片段的置信度标记

基本配置

yaml
annotation_task_name: "Transcription Review"
 
data_files:
  - "data/transcripts.json"
 
item_properties:
  id_key: id
  text_key: asr_transcript
 
annotation_schemes:
  # Audio playback
  - annotation_type: audio_annotation
    name: audio_player
    audio_key: audio_path
 
  # Corrected transcript
  - annotation_type: text
    name: corrected_transcript
    description: "Edit the transcript to match what you hear"
    textarea: true
    placeholder: "Type the corrected transcript..."
    required: true
 
  # Quality rating
  - annotation_type: radio
    name: audio_quality
    description: "Rate the audio quality"
    labels:
      - Clear
      - Slightly noisy
      - Very noisy
      - Unintelligible

示例数据格式

创建 data/transcripts.json

json
{"id": "audio_001", "audio_path": "/audio/recording_001.wav", "asr_transcript": "Hello how are you doing today"}
{"id": "audio_002", "audio_path": "/audio/recording_002.wav", "asr_transcript": "The weather is nice outside"}
{"id": "audio_003", "audio_path": "/audio/recording_003.wav", "asr_transcript": "Please call me back when your free"}

音频标注设置

Potato 中的音频标注使用标注方案中的 audio_annotation 类型。音频播放器自动提供波形可视化和播放控制。

yaml
annotation_schemes:
  - annotation_type: audio_annotation
    name: audio_player
    audio_key: audio_path
    description: "Listen to the audio recording"

音频播放器包含内置的播放/暂停、定位和速度调节控制。

综合转写界面

yaml
annotation_task_name: "ASR Correction and Annotation"
 
data_files:
  - "data/asr_output.json"
 
item_properties:
  id_key: id
  text_key: hypothesis
 
annotation_schemes:
  # Audio player
  - annotation_type: audio_annotation
    name: audio_player
    audio_key: audio_url
 
  # Main transcript correction
  - annotation_type: text
    name: transcript
    description: "Correct the transcript below"
    textarea: true
    rows: 4
    required: true
 
  # Speaker identification
  - annotation_type: radio
    name: num_speakers
    description: "How many speakers are in this recording?"
    labels:
      - "1 speaker"
      - "2 speakers"
      - "3+ speakers"
      - "Cannot determine"
 
  # Audio quality
  - annotation_type: radio
    name: quality
    description: "Overall audio quality"
    labels:
      - name: Excellent
        description: "Crystal clear, studio quality"
      - name: Good
        description: "Clear speech, minor background noise"
      - name: Fair
        description: "Understandable but noisy"
      - name: Poor
        description: "Very difficult to understand"
      - name: Unusable
        description: "Cannot transcribe accurately"
 
  # Issues checklist
  - annotation_type: multiselect
    name: issues
    description: "Select all issues present (if any)"
    labels:
      - Background noise
      - Overlapping speech
      - Accented speech
      - Fast speech
      - Mumbling/unclear
      - Technical audio issues
      - Non-English words
      - Profanity present
      - None
 
  # Confidence
  - annotation_type: likert
    name: confidence
    description: "How confident are you in your transcription?"
    size: 5
    min_label: "Guessing"
    max_label: "Certain"
 
annotation_guidelines:
  title: "Transcription Guidelines"
  content: |
    ## Your Task
    Listen to the audio and correct the ASR transcript.
 
    ## Transcription Rules
    - Transcribe exactly what is said
    - Include filler words (um, uh, like)
    - Use proper punctuation and capitalization
    - Mark unintelligible sections with [unintelligible]
    - Mark uncertain words with [word?]
 
    ## Special Notations
    - [unintelligible] - Cannot understand
    - [word?] - Uncertain about word
    - [crosstalk] - Overlapping speech
    - [noise] - Non-speech sound
    - [pause] - Significant silence

词级标注

对于详细的词级校正,可以在文本字段旁使用 span 标注:

yaml
annotation_schemes:
  - annotation_type: audio_annotation
    name: audio_player
    audio_key: audio_path
 
  - annotation_type: text
    name: transcript
    textarea: true
 
  - annotation_type: span
    name: word_corrections
    description: "Mark words that needed correction"
    source_field: transcript
    labels:
      - name: corrected
        color: "#FCD34D"
        description: "Word was changed"
      - name: inserted
        color: "#4ADE80"
        description: "Word was added"
      - name: uncertain
        color: "#F87171"
        description: "Still not sure"

分段转写

对于长音频文件,可以将数据准备为带有时间信息的片段:

yaml
data_files:
  - "data/segments.json"
 
item_properties:
  id_key: id
  text_key: asr_text
 
annotation_schemes:
  - annotation_type: audio_annotation
    name: audio_player
    audio_key: audio_path
 
  - annotation_type: text
    name: transcript
    textarea: true
    description: "Correct the transcript for this segment"

带有分段时间的数据格式:

json
{
  "id": "seg_001",
  "audio_path": "/audio/long_recording.wav",
  "start_time": 0.0,
  "end_time": 5.5,
  "asr_text": "Welcome to today's presentation"
}

输出格式

json
{
  "id": "audio_001",
  "audio_path": "/audio/recording_001.wav",
  "original_transcript": "Hello how are you doing today",
  "annotations": {
    "transcript": "Hello, how are you doing today?",
    "num_speakers": "1 speaker",
    "quality": "Good",
    "issues": ["None"],
    "confidence": 5
  },
  "annotator": "transcriber_01",
  "time_spent_seconds": 45
}

质量控制

Potato 自动跟踪标注时间。对于质量控制,可以考虑在数据文件中包含注意力检测项目——即包含已知正确答案的项目,用于验证标注者的准确性。

可以配置输出设置来跟踪标注:

yaml
output_annotation_dir: "annotation_output"
output_annotation_format: "json"

转写任务技巧

  1. 好的耳机:准确性的必备工具
  2. 安静的环境:减少疲劳
  3. 速度调节:困难片段时放慢速度
  4. 多次聆听:先听一遍,转写,然后验证
  5. 定期休息:转写工作对脑力消耗很大

下一步


完整音频文档请参阅 /docs/features/audio-annotation