오디오 어노테이션

Potato에서 오디오 파일을 구간으로 나누고 시간 영역에 라벨을 지정합니다. 재생 컨트롤, 속도 조절, 시간 경계 표시가 있는 인터랙티브 파형을 표시합니다.

Potato의 오디오 어노테이션 도구를 사용하면 어노테이터가 파형 기반 인터페이스를 통해 오디오 파일을 구간으로 나누고 시간 영역에 라벨을 지정할 수 있습니다.

기능

파형 시각화
시간 기반 구간 생성
구간에 대한 라벨 지정
가변 속도 재생 컨트롤
확대/축소 및 스크롤 탐색
키보드 단축키
서버 측 파형 캐싱

기본 구성

yaml

annotation_schemes:
  - name: "speakers"
    description: "Mark when each speaker is talking"
    annotation_type: "audio_annotation"
    labels:
      - name: "Speaker 1"
        color: "#3B82F6"
      - name: "Speaker 2"
        color: "#10B981"

구성 옵션

필드	타입	기본값	설명
`name`	string	필수	어노테이션의 고유 식별자
`description`	string	필수	어노테이터에게 표시되는 지침
`annotation_type`	string	필수	`"audio_annotation"`이어야 합니다
`mode`	string	`"label"`	어노테이션 모드: `"label"`, `"questions"` 또는 `"both"`
`labels`	list	조건부	`label` 또는 `both` 모드에 필수
`segment_schemes`	list	조건부	`questions` 또는 `both` 모드에 필수
`min_segments`	integer	0	필요한 최소 구간 수
`max_segments`	integer	null	허용되는 최대 구간 수 (null = 무제한)
`zoom_enabled`	boolean	true	확대/축소 컨트롤 활성화
`playback_rate_control`	boolean	false	재생 속도 선택기 표시

라벨 구성

yaml

labels:
  - name: "speech"
    color: "#3B82F6"
    key_value: "1"
  - name: "music"
    color: "#10B981"
    key_value: "2"
  - name: "silence"
    color: "#64748B"
    key_value: "3"

어노테이션 모드

라벨 모드 (기본값)

구간에 카테고리 라벨이 지정됩니다.

yaml

annotation_schemes:
  - name: "emotion"
    description: "Label the emotion in each segment"
    annotation_type: "audio_annotation"
    mode: "label"
    labels:
      - name: "happy"
        color: "#22C55E"
      - name: "sad"
        color: "#3B82F6"
      - name: "angry"
        color: "#EF4444"
      - name: "neutral"
        color: "#64748B"

질문 모드

각 구간이 전용 질문에 답합니다.

yaml

annotation_schemes:
  - name: "transcription"
    description: "Transcribe each segment"
    annotation_type: "audio_annotation"
    mode: "questions"
    segment_schemes:
      - name: "transcript"
        annotation_type: "text"
        description: "Enter the transcription"
      - name: "confidence"
        annotation_type: "likert"
        description: "How confident are you?"
        size: 5

둘 다 모드

라벨링과 구간별 설문을 결합합니다.

yaml

annotation_schemes:
  - name: "detailed_diarization"
    description: "Label speakers and add notes"
    annotation_type: "audio_annotation"
    mode: "both"
    labels:
      - name: "Speaker A"
        color: "#3B82F6"
      - name: "Speaker B"
        color: "#10B981"
    segment_schemes:
      - name: "notes"
        annotation_type: "text"
        description: "Any notes about this segment?"

전역 오디오 구성

구성 파일에서 파형 처리를 설정합니다.

yaml

audio_annotation:
  waveform_cache_dir: "waveform_cache/"
  waveform_look_ahead: 5
  waveform_cache_max_size: 1000
  client_fallback_max_duration: 1800

필드	설명
`waveform_cache_dir`	캐시된 파형 데이터를 저장할 디렉터리
`waveform_look_ahead`	미리 계산할 향후 인스턴스 수
`waveform_cache_max_size`	캐시되는 파형 파일의 최대 개수
`client_fallback_max_duration`	브라우저 측 파형 생성을 위한 최대 초 (기본값: 1800)

예시

화자 분리

yaml

annotation_schemes:
  - name: "diarization"
    description: "Identify who is speaking at each moment"
    annotation_type: "audio_annotation"
    mode: "label"
    labels:
      - name: "Interviewer"
        color: "#8B5CF6"
        key_value: "1"
      - name: "Guest"
        color: "#EC4899"
        key_value: "2"
      - name: "Overlap"
        color: "#F59E0B"
        key_value: "3"
    zoom_enabled: true
    playback_rate_control: true

음향 이벤트 탐지

yaml

annotation_schemes:
  - name: "sound_events"
    description: "Mark all sound events"
    annotation_type: "audio_annotation"
    labels:
      - name: "speech"
        color: "#3B82F6"
      - name: "music"
        color: "#10B981"
      - name: "applause"
        color: "#F59E0B"
      - name: "laughter"
        color: "#EC4899"
      - name: "silence"
        color: "#64748B"
    min_segments: 1

전사 검토

yaml

annotation_schemes:
  - name: "transcription_review"
    description: "Review and correct the transcription for each segment"
    annotation_type: "audio_annotation"
    mode: "questions"
    segment_schemes:
      - name: "transcript"
        annotation_type: "text"
        description: "Enter or correct the transcription"
        multiline: true
      - name: "quality"
        annotation_type: "radio"
        description: "Audio quality"
        labels:
          - "Clear"
          - "Noisy"
          - "Unintelligible"

키보드 단축키

키	동작
`Space`	재생/일시정지
`←` / `→`	뒤로/앞으로 탐색
`[`	구간 시작 표시
`]`	구간 끝 표시
`Enter`	구간 생성
`Delete`	선택한 구간 제거
`1-9`	라벨 선택
`+` / `-`	확대/축소
`0`	화면에 맞추기

데이터 형식

입력 데이터

데이터 파일에는 오디오 파일 경로 또는 URL이 포함되어야 합니다.

json

[
  {
    "id": "audio_001",
    "audio_url": "https://example.com/audio/recording1.mp3"
  },
  {
    "id": "audio_002",
    "audio_url": "/data/audio/recording2.wav"
  }
]

오디오 필드를 구성합니다.

yaml

item_properties:
  id_key: id
  text_key: audio_url

출력 형식

json

{
  "id": "audio_001",
  "annotations": {
    "diarization": [
      {
        "start": 0.0,
        "end": 5.5,
        "label": "Interviewer"
      },
      {
        "start": 5.5,
        "end": 12.3,
        "label": "Guest"
      },
      {
        "start": 12.3,
        "end": 14.0,
        "label": "Overlap"
      }
    ]
  }
}

질문 모드에서는 구간에 중첩된 응답이 포함됩니다.

json

{
  "start": 0.0,
  "end": 5.5,
  "transcript": "Hello and welcome to the show.",
  "quality": "Clear"
}

지원되는 오디오 형식

MP3 (권장)
WAV
OGG
M4A

모범 사례

파형을 미리 캐시하세요 - 대규모 데이터셋에는 서버 측 캐싱을 사용하세요
재생 컨트롤을 활성화하세요 - 가변 속도는 정밀한 구간 분할에 도움이 됩니다
키보드 단축키를 사용하세요 - 클릭보다 훨씬 빠릅니다
명확한 경계를 정의하세요 - 구간의 시작과 끝을 무엇으로 볼지 명시하세요
적절한 모드를 선택하세요 - 분류에는 "label", 상세 어노테이션에는 "questions"를 사용하세요
구간 한도를 설정하세요 - min_segments를 사용해 커버리지를 보장하세요