화자 분리는 "누가 언제 말했는가?"라는 질문에 답합니다. 이 튜토리얼에서는 발화 차례를 주석하고, 자동 화자 분리 결과를 수정하며, 여러 화자가 등장하는 대화를 처리하는 인터페이스를 구축하는 방법을 다룹니다. 전체 오디오 스키마 참조는 원본 문서를 확인하십시오.

화자 분리란 무엇입니까?

화자 분리는 오디오를 한 명의 화자에게 속하는 구간들로 나눕니다. 다음과 같은 곳에서 활용됩니다.

회의 전사
콜센터 분석
팟캐스트 제작
인터뷰 처리
법정/법률 녹음

기본 화자 분리 설정

yaml

annotation_task_name: "Speaker Diarization"
 
data_files:
  - "data/conversations.json"
 
annotation_schemes:
  - annotation_type: audio_annotation
    name: speakers
    description: "Mark when each speaker talks"
    labels:
      - name: Speaker 1
        color: "#FF6B6B"
        keyboard_shortcut: "1"
      - name: Speaker 2
        color: "#4ECDC4"
        keyboard_shortcut: "2"
      - name: Speaker 3
        color: "#45B7D1"
        keyboard_shortcut: "3"
      - name: Overlap
        color: "#FFEAA7"
        keyboard_shortcut: "o"
      - name: Silence
        color: "#9CA3AF"
        keyboard_shortcut: "s"

화자 구간 만들기

작업 흐름

오디오를 재생하거나 파형을 클릭해 이동합니다
파형 위에서 클릭하고 드래그하여 시간 범위를 선택합니다
숫자 키를 누르거나 화자 레이블을 클릭합니다
해당 구간에 색상과 레이블이 지정됩니다
가장자리를 드래그하여 경계를 조정합니다
전체 오디오가 구간으로 나뉠 때까지 계속합니다

키보드 컨트롤

Potato는 재생/일시정지와 탐색을 포함한 오디오 재생 제어용 키보드 단축키를 기본으로 제공합니다.

사전 주석된 화자 분리 수정

많은 경우 처음부터 레이블을 다는 것이 아니라, 자동 화자 분리기의 출력을 수정하게 됩니다.

yaml

data_files:
  - "data/auto_diarized.json"

데이터 형식:

json

{
  "id": "meeting_001",
  "audio_path": "/audio/meeting_001.wav",
  "auto_segments": [
    {"start": 0.0, "end": 3.5, "speaker": "Speaker 1"},
    {"start": 3.5, "end": 8.2, "speaker": "Speaker 2"},
    {"start": 8.2, "end": 12.0, "speaker": "Speaker 1"}
  ]
}

상세 화자 정보

각 화자에 대한 추가 메타데이터를 수집합니다.

yaml

annotation_schemes:
  - annotation_type: audio_annotation
    name: speakers
    labels:
      - name: Speaker A
        color: "#FF6B6B"
      - name: Speaker B
        color: "#4ECDC4"
      - name: Speaker C
        color: "#45B7D1"
      - name: Unknown
        color: "#9CA3AF"
 
  # Speaker characteristics
  - annotation_type: radio
    name: speaker_a_gender
    description: "Speaker A Gender"
    labels:
      - Male
      - Female
      - Unknown
 
  - annotation_type: text
    name: speaker_a_role
    description: "Speaker A Role (if identifiable)"
 
  - annotation_type: radio
    name: speaker_b_gender
    description: "Speaker B Gender"
    labels:
      - Male
      - Female
      - Unknown

겹치는 발화 처리하기

yaml

annotation_schemes:
  - annotation_type: audio_annotation
    name: speakers
    labels:
      - name: Speaker 1
        color: "#FF6B6B"
      - name: Speaker 2
        color: "#4ECDC4"
      - name: Overlap
        color: "#FFEAA7"

회의/인터뷰 화자 분리

yaml

annotation_task_name: "Meeting Diarization"
 
data_files:
  - "data/meetings.json"
 
annotation_schemes:
  # Speaker turns
  - annotation_type: audio_annotation
    name: turns
    description: "Mark each speaker turn"
    labels:
      - name: Moderator
        color: "#EF4444"
        keyboard_shortcut: "m"
      - name: Participant 1
        color: "#3B82F6"
        keyboard_shortcut: "1"
      - name: Participant 2
        color: "#10B981"
        keyboard_shortcut: "2"
      - name: Participant 3
        color: "#F59E0B"
        keyboard_shortcut: "3"
      - name: Participant 4
        color: "#8B5CF6"
        keyboard_shortcut: "4"
      - name: Unknown
        color: "#6B7280"
        keyboard_shortcut: "u"
      - name: Overlap
        color: "#FCD34D"
        keyboard_shortcut: "o"
      - name: Silence/Noise
        color: "#D1D5DB"
        keyboard_shortcut: "s"
 
  # Speech type annotation
  - annotation_type: radio
    name: speech_type
    description: "Type of speech"
    labels:
      - Statement
      - Question
      - Response
      - Interruption
      - Backchannel
 
  # Overall quality
  - annotation_type: radio
    name: recording_quality
    description: "Overall recording quality"
    labels:
      - Excellent - All speakers clear
      - Good - Most speech understandable
      - Fair - Some difficulty
      - Poor - Significant issues

출력 형식

json

{
  "id": "meeting_001",
  "audio_path": "/audio/meeting_001.wav",
  "annotations": {
    "turns": [
      {
        "start": 0.0,
        "end": 5.2,
        "label": "Moderator",
        "attributes": {
          "speech_type": "Statement"
        }
      },
      {
        "start": 5.2,
        "end": 12.8,
        "label": "Participant 1",
        "attributes": {
          "speech_type": "Response"
        }
      },
      {
        "start": 11.5,
        "end": 12.8,
        "label": "Overlap"
      }
    ],
    "recording_quality": "Good - Most speech understandable"
  }
}

화자 분리를 위한 팁

먼저 들어보기: 주석을 달기 전에 화자들에게 익숙해지십시오
화자 특성 기록: 음높이, 억양, 말하는 스타일
겹침을 일관되게 처리: 처음부터 전략을 정하십시오
속도 조절 사용: 어려운 구간에서는 속도를 늦추십시오
불확실성 표시: 필요할 때 "Unknown"을 사용해도 괜찮습니다

다음 단계

완전한 회의록을 위해 전사와 결합하십시오
화자별로 감정 감지를 추가하십시오
다중 주석자 일치도를 위한 품질 관리를 설정하십시오

전체 오디오 문서는 /docs/features/audio-annotation를 참조하십시오.