音频标注
使用波形可视化和播放控件标注音频文件。
音频标注
Potato 2.0 提供强大的音频标注功能,包括由 Peaks.js 驱动的波形可视化、片段标注和全面的键盘快捷键。
用例
- 语音转录和审核
- 说话人分离
- 音乐分析
- 音频事件检测
- 语音情感识别
- 呼叫中心质量保证
启用音频支持
在配置中添加 audio_annotation 部分:
yaml
annotation_schemes:
- annotation_type: audio
name: audio_segments
description: "Segment and label the audio"
labels:
- Speech
- Music
- Silence
- Noise操作模式
Potato 支持三种音频标注模式:
标签模式
分割音频并为每个片段分配类别标签:
yaml
annotation_schemes:
- annotation_type: audio
name: speaker_diarization
mode: label
description: "Identify speakers in the audio"
labels:
- Speaker A
- Speaker B
- Overlap
label_colors:
"Speaker A": "#3b82f6"
"Speaker B": "#10b981"
"Overlap": "#f59e0b"问题模式
为每个片段添加标注问题:
yaml
annotation_schemes:
- annotation_type: audio
name: speech_quality
mode: questions
description: "Evaluate speech segments"
segment_questions:
- name: clarity
type: likert
size: 5
min_label: "Unclear"
max_label: "Very clear"
- name: emotion
type: radio
labels: [Neutral, Happy, Sad, Angry]混合模式
将标签与每片段问题结合:
yaml
annotation_schemes:
- annotation_type: audio
name: full_analysis
mode: both
description: "Label and analyze audio segments"
labels:
- Speech
- Music
- Noise
segment_questions:
- name: quality
type: likert
size: 5配置选项
基本设置
yaml
annotation_schemes:
- annotation_type: audio
name: segments
description: "Create audio segments"
labels:
- Label A
- Label B
# Optional constraints
min_segments: 1
max_segments: 50键盘快捷键
标签可以使用数字键 1-9 分配:
yaml
annotation_schemes:
- annotation_type: audio
name: speakers
labels:
- Speaker A # Press 1
- Speaker B # Press 2
- Overlap # Press 3标签颜色
自定义片段颜色:
yaml
annotation_schemes:
- annotation_type: audio
name: segments
labels:
- Speech
- Music
- Silence
label_colors:
"Speech": "#3b82f6"
"Music": "#10b981"
"Silence": "#6b7280"波形性能
为了在长音频文件中获得最佳性能,请安装 BBC audiowaveform 工具:
bash
# macOS
brew install audiowaveform
# Ubuntu/Debian
sudo apt-get install audiowaveform
# Or build from source
# https://github.com/bbc/audiowaveform这将启用服务器端波形生成。没有它,将使用客户端生成(适用于 30 分钟以内的文件)。
波形缓存
配置缓存以获得更好的性能:
yaml
audio_config:
cache_dir: "audio_cache/"
precompute_depth: 100 # Pre-generate waveforms for first N items
client_fallback_max_duration: 1800 # 30 minutes in seconds数据格式
简单音频引用
json
[
{"id": "1", "audio_path": "audio/recording_001.wav"},
{"id": "2", "audio_path": "audio/recording_002.wav"}
]yaml
data_files:
- "data/audio_data.json"
item_properties:
id_key: id
audio_key: audio_path带转录文本
json
[
{
"id": "1",
"audio_path": "audio/call_001.wav",
"transcript": "Hello, how can I help you today?"
}
]输出格式
标注与片段时间戳一起保存:
json
{
"id": "audio_1",
"annotations": {
"segments": [
{
"start": 0.0,
"end": 2.5,
"label": "Speaker A",
"questions": {
"clarity": 4,
"emotion": "Neutral"
}
},
{
"start": 2.5,
"end": 5.2,
"label": "Speaker B"
}
]
}
}键盘快捷键
Potato 提供丰富的键盘快捷键以实现高效标注:
| 快捷键 | 操作 |
|---|---|
Space | 播放/暂停 |
[ | 在当前位置设置片段起点 |
] | 在当前位置设置片段终点 |
1-9 | 为当前片段分配标签 |
Delete | 删除当前片段 |
Left Arrow | 后退 5 秒 |
Right Arrow | 前进 5 秒 |
Up Arrow | 放大 |
Down Arrow | 缩小 |
Home | 跳到开头 |
End | 跳到结尾 |
+ | 加快播放速度 |
- | 减慢播放速度 |
示例配置
说话人分离
yaml
task_name: "Speaker Diarization"
task_dir: "."
port: 8000
data_files:
- "data/recordings.json"
item_properties:
id_key: id
audio_key: audio_path
annotation_schemes:
- annotation_type: audio
name: speakers
mode: label
description: "Identify who is speaking"
labels:
- Speaker 1
- Speaker 2
- Speaker 3
- Overlap
- Silence
label_colors:
"Speaker 1": "#3b82f6"
"Speaker 2": "#10b981"
"Speaker 3": "#f59e0b"
"Overlap": "#ef4444"
"Silence": "#6b7280"
min_segments: 1
audio_config:
cache_dir: "audio_cache/"
precompute_depth: 50
output_annotation_dir: "output/"
output_annotation_format: "json"
allow_all_users: true转录审核
yaml
task_name: "Transcription Quality Review"
task_dir: "."
port: 8000
data_files:
- "data/transcripts.json"
item_properties:
id_key: id
text_key: transcript
audio_key: audio_path
annotation_schemes:
- annotation_type: audio
name: errors
mode: questions
description: "Mark transcription errors"
segment_questions:
- name: error_type
type: radio
labels:
- Missing word
- Wrong word
- Extra word
- Spelling error
- name: severity
type: likert
size: 3
min_label: "Minor"
max_label: "Major"
- annotation_type: radio
name: overall_accuracy
description: "Overall transcript accuracy"
labels:
- Accurate
- Minor errors
- Major errors
- Unusable
output_annotation_dir: "output/"
output_annotation_format: "json"呼叫中心质量保证
yaml
task_name: "Call Center Quality Assurance"
task_dir: "."
port: 8000
data_files:
- "data/calls.json"
item_properties:
id_key: call_id
audio_key: recording_path
annotation_schemes:
# Segment-level annotation
- annotation_type: audio
name: conversation
mode: both
description: "Segment the conversation"
labels:
- Agent
- Customer
- Hold
- Silence
segment_questions:
- name: sentiment
type: radio
labels: [Positive, Neutral, Negative, Frustrated]
# Call-level assessment
- annotation_type: likert
name: professionalism
description: "Agent professionalism"
size: 5
min_label: "Poor"
max_label: "Excellent"
- annotation_type: likert
name: resolution
description: "Issue resolution"
size: 5
min_label: "Unresolved"
max_label: "Fully resolved"
- annotation_type: multiselect
name: issues
description: "Select any issues observed"
labels:
- Long hold time
- Agent interrupted
- Incorrect information
- Missing greeting
- Unprofessional language
- annotation_type: text
name: notes
description: "Additional observations"
textarea: true
output_annotation_dir: "output/"
output_annotation_format: "json"支持的音频格式
- WAV(推荐,质量最佳)
- MP3
- OGG
- FLAC
- M4A
- WebM
性能提示
- 安装 audiowaveform - 对长音频文件至关重要
- 启用缓存 - 使用
cache_dir存储预生成的波形 - 使用 WAV 以保证质量 - 压缩格式可能引入伪影
- 预处理音频 - 标准化音量、裁剪不必要的静音
- 注意文件大小 - 大文件会减慢加载速度
- 使用预计算 - 为初始实例预生成波形
故障排除
波形未加载
- 检查音频文件路径是否正确
- 验证文件格式是否支持
- 为长文件安装 audiowaveform
- 检查浏览器控制台的错误信息
性能缓慢
- 安装 audiowaveform 工具
- 启用波形缓存
- 减小音频文件大小
- 使用 precompute_depth 设置
片段未保存
- 确保输出目录可写
- 检查标注格式配置
- 验证片段有起始和结束时间