Tutorials3 min read
说话人分离标注
构建一个带有音频波形、时间戳标记和说话人标签分配的说话人识别任务。
Potato Team·
说话人分离标注
说话人分离回答了"谁在什么时候说话?"这个问题。本教程涵盖构建标注说话人轮次的界面、纠正自动分离结果以及处理多说话人对话。
什么是说话人分离?
说话人分离将音频分割为说话人同质的区域。应用场景包括:
- 会议转录
- 客服中心分析
- 播客制作
- 访谈处理
- 法庭/法律录音
基本分离设置
yaml
annotation_task_name: "Speaker Diarization"
data_files:
- "data/conversations.json"
annotation_schemes:
- annotation_type: audio_annotation
name: speakers
description: "Mark when each speaker talks"
labels:
- name: Speaker 1
color: "#FF6B6B"
keyboard_shortcut: "1"
- name: Speaker 2
color: "#4ECDC4"
keyboard_shortcut: "2"
- name: Speaker 3
color: "#45B7D1"
keyboard_shortcut: "3"
- name: Overlap
color: "#FFEAA7"
keyboard_shortcut: "o"
- name: Silence
color: "#9CA3AF"
keyboard_shortcut: "s"创建说话人片段
工作流
- 播放音频或点击波形进行导航
- 在波形上点击并拖动以选择时间范围
- 按数字键或点击说话人标签
- 该片段将被着色和标记
- 通过拖动边缘调整边界
- 继续直到整个音频都被分段
键盘控制
Potato 为音频播放控制提供内置键盘快捷键,包括播放/暂停和导航。
预标注分离纠正
通常您需要纠正自动分离的结果:
yaml
data_files:
- "data/auto_diarized.json"数据格式:
json
{
"id": "meeting_001",
"audio_path": "/audio/meeting_001.wav",
"auto_segments": [
{"start": 0.0, "end": 3.5, "speaker": "Speaker 1"},
{"start": 3.5, "end": 8.2, "speaker": "Speaker 2"},
{"start": 8.2, "end": 12.0, "speaker": "Speaker 1"}
]
}详细说话人信息
捕获额外的说话人元数据:
yaml
annotation_schemes:
- annotation_type: audio_annotation
name: speakers
labels:
- name: Speaker A
color: "#FF6B6B"
- name: Speaker B
color: "#4ECDC4"
- name: Speaker C
color: "#45B7D1"
- name: Unknown
color: "#9CA3AF"
# Speaker characteristics
- annotation_type: radio
name: speaker_a_gender
description: "Speaker A Gender"
labels:
- Male
- Female
- Unknown
- annotation_type: text
name: speaker_a_role
description: "Speaker A Role (if identifiable)"
- annotation_type: radio
name: speaker_b_gender
description: "Speaker B Gender"
labels:
- Male
- Female
- Unknown处理重叠语音
yaml
annotation_schemes:
- annotation_type: audio_annotation
name: speakers
labels:
- name: Speaker 1
color: "#FF6B6B"
- name: Speaker 2
color: "#4ECDC4"
- name: Overlap
color: "#FFEAA7"会议/访谈分离
yaml
annotation_task_name: "Meeting Diarization"
data_files:
- "data/meetings.json"
annotation_schemes:
# Speaker turns
- annotation_type: audio_annotation
name: turns
description: "Mark each speaker turn"
labels:
- name: Moderator
color: "#EF4444"
keyboard_shortcut: "m"
- name: Participant 1
color: "#3B82F6"
keyboard_shortcut: "1"
- name: Participant 2
color: "#10B981"
keyboard_shortcut: "2"
- name: Participant 3
color: "#F59E0B"
keyboard_shortcut: "3"
- name: Participant 4
color: "#8B5CF6"
keyboard_shortcut: "4"
- name: Unknown
color: "#6B7280"
keyboard_shortcut: "u"
- name: Overlap
color: "#FCD34D"
keyboard_shortcut: "o"
- name: Silence/Noise
color: "#D1D5DB"
keyboard_shortcut: "s"
# Speech type annotation
- annotation_type: radio
name: speech_type
description: "Type of speech"
labels:
- Statement
- Question
- Response
- Interruption
- Backchannel
# Overall quality
- annotation_type: radio
name: recording_quality
description: "Overall recording quality"
labels:
- Excellent - All speakers clear
- Good - Most speech understandable
- Fair - Some difficulty
- Poor - Significant issues输出格式
json
{
"id": "meeting_001",
"audio_path": "/audio/meeting_001.wav",
"annotations": {
"turns": [
{
"start": 0.0,
"end": 5.2,
"label": "Moderator",
"attributes": {
"speech_type": "Statement"
}
},
{
"start": 5.2,
"end": 12.8,
"label": "Participant 1",
"attributes": {
"speech_type": "Response"
}
},
{
"start": 11.5,
"end": 12.8,
"label": "Overlap"
}
],
"recording_quality": "Good - Most speech understandable"
}
}分离标注技巧
- 先听一遍:在标注前先熟悉说话人
- 记录说话人特征:音高、口音、说话风格
- 一致地处理重叠:预先确定处理策略
- 使用速度控制:对困难片段放慢速度
- 标记不确定性:需要时可以使用"未知"
下一步
完整音频文档请见 /docs/features/audio-annotation。