音频标注
通过波形可视化分割音频文件并为时间区域分配标签。
音频标注
Potato 的音频标注工具使标注者能够通过基于波形的界面分割音频文件并为时间区域分配标签。
功能
- 波形可视化
- 基于时间的片段创建
- 为片段分配标签
- 可变速度的播放控制
- 缩放和滚动导航
- 键盘快捷键
- 服务端波形缓存
基本配置
yaml
annotation_schemes:
- name: "speakers"
description: "Mark when each speaker is talking"
annotation_type: "audio_annotation"
labels:
- name: "Speaker 1"
color: "#3B82F6"
- name: "Speaker 2"
color: "#10B981"配置选项
| 字段 | 类型 | 默认值 | 描述 |
|---|---|---|---|
name | string | 必填 | 标注的唯一标识符 |
description | string | 必填 | 显示给标注者的说明 |
annotation_type | string | 必填 | 必须为 "audio_annotation" |
mode | string | "label" | 标注模式:"label"、"questions" 或 "both" |
labels | list | 条件必填 | label 或 both 模式必填 |
segment_schemes | list | 条件必填 | questions 或 both 模式必填 |
min_segments | integer | 0 | 所需的最少片段数 |
max_segments | integer | null | 允许的最大片段数(null = 无限制) |
zoom_enabled | boolean | true | 启用缩放控制 |
playback_rate_control | boolean | false | 显示播放速度选择器 |
标签配置
yaml
labels:
- name: "speech"
color: "#3B82F6"
key_value: "1"
- name: "music"
color: "#10B981"
key_value: "2"
- name: "silence"
color: "#64748B"
key_value: "3"标注模式
标签模式(默认)
片段接收分类标签:
yaml
annotation_schemes:
- name: "emotion"
description: "Label the emotion in each segment"
annotation_type: "audio_annotation"
mode: "label"
labels:
- name: "happy"
color: "#22C55E"
- name: "sad"
color: "#3B82F6"
- name: "angry"
color: "#EF4444"
- name: "neutral"
color: "#64748B"问题模式
每个片段回答专门的问题:
yaml
annotation_schemes:
- name: "transcription"
description: "Transcribe each segment"
annotation_type: "audio_annotation"
mode: "questions"
segment_schemes:
- name: "transcript"
annotation_type: "text"
description: "Enter the transcription"
- name: "confidence"
annotation_type: "likert"
description: "How confident are you?"
size: 5两者兼用模式
将标签与逐片段问卷结合:
yaml
annotation_schemes:
- name: "detailed_diarization"
description: "Label speakers and add notes"
annotation_type: "audio_annotation"
mode: "both"
labels:
- name: "Speaker A"
color: "#3B82F6"
- name: "Speaker B"
color: "#10B981"
segment_schemes:
- name: "notes"
annotation_type: "text"
description: "Any notes about this segment?"全局音频配置
在配置文件中配置波形处理:
yaml
audio_annotation:
waveform_cache_dir: "waveform_cache/"
waveform_look_ahead: 5
waveform_cache_max_size: 1000
client_fallback_max_duration: 1800| 字段 | 描述 |
|---|---|
waveform_cache_dir | 缓存波形数据的目录 |
waveform_look_ahead | 预先计算的后续实例数量 |
waveform_cache_max_size | 缓存波形文件的最大数量 |
client_fallback_max_duration | 浏览器端波形生成的最大秒数(默认:1800) |
示例
说话人日志
yaml
annotation_schemes:
- name: "diarization"
description: "Identify who is speaking at each moment"
annotation_type: "audio_annotation"
mode: "label"
labels:
- name: "Interviewer"
color: "#8B5CF6"
key_value: "1"
- name: "Guest"
color: "#EC4899"
key_value: "2"
- name: "Overlap"
color: "#F59E0B"
key_value: "3"
zoom_enabled: true
playback_rate_control: true声音事件检测
yaml
annotation_schemes:
- name: "sound_events"
description: "Mark all sound events"
annotation_type: "audio_annotation"
labels:
- name: "speech"
color: "#3B82F6"
- name: "music"
color: "#10B981"
- name: "applause"
color: "#F59E0B"
- name: "laughter"
color: "#EC4899"
- name: "silence"
color: "#64748B"
min_segments: 1转录审核
yaml
annotation_schemes:
- name: "transcription_review"
description: "Review and correct the transcription for each segment"
annotation_type: "audio_annotation"
mode: "questions"
segment_schemes:
- name: "transcript"
annotation_type: "text"
description: "Enter or correct the transcription"
textarea: true
- name: "quality"
annotation_type: "radio"
description: "Audio quality"
labels:
- "Clear"
- "Noisy"
- "Unintelligible"键盘快捷键
| 按键 | 操作 |
|---|---|
Space | 播放/暂停 |
← / → | 后退/前进 |
[ | 标记片段起始 |
] | 标记片段结束 |
Enter | 创建片段 |
Delete | 删除选中的片段 |
1-9 | 选择标签 |
+ / - | 放大/缩小 |
0 | 适应视图 |
数据格式
输入数据
您的数据文件应包含音频文件路径或 URL:
json
[
{
"id": "audio_001",
"audio_url": "https://example.com/audio/recording1.mp3"
},
{
"id": "audio_002",
"audio_url": "/data/audio/recording2.wav"
}
]配置音频字段:
yaml
item_properties:
id_key: id
text_key: audio_url输出格式
json
{
"id": "audio_001",
"annotations": {
"diarization": [
{
"start": 0.0,
"end": 5.5,
"label": "Interviewer"
},
{
"start": 5.5,
"end": 12.3,
"label": "Guest"
},
{
"start": 12.3,
"end": 14.0,
"label": "Overlap"
}
]
}
}在问题模式下,片段包含嵌套的回复:
json
{
"start": 0.0,
"end": 5.5,
"transcript": "Hello and welcome to the show.",
"quality": "Clear"
}支持的音频格式
- MP3(推荐)
- WAV
- OGG
- M4A
最佳实践
- 预缓存波形 - 对大型数据集使用服务端缓存
- 启用播放控制 - 可变速度有助于精确分割
- 使用键盘快捷键 - 比点击快得多
- 定义清晰的边界 - 明确什么构成片段的起始/结束
- 选择合适的模式 - 分类使用 "label",详细标注使用 "questions"
- 设置片段限制 - 使用
min_segments确保覆盖