オーディオアノテーション

波形可視化によるオーディオファイルのセグメンテーションと時間領域へのラベル付け。

Potatoのオーディオアノテーションツールは、波形ベースのインターフェースを通じて、オーディオファイルのセグメント化と時間領域へのラベル割り当てを可能にします。

機能

波形可視化
時間ベースのセグメント作成
セグメントへのラベル割り当て
可変速度付き再生コントロール
ズームとスクロールナビゲーション
キーボードショートカット
サーバーサイドの波形キャッシュ

基本設定

yaml

annotation_schemes:
  - name: "speakers"
    description: "Mark when each speaker is talking"
    annotation_type: "audio_annotation"
    labels:
      - name: "Speaker 1"
        color: "#3B82F6"
      - name: "Speaker 2"
        color: "#10B981"

設定オプション

フィールド	タイプ	デフォルト	説明
`name`	string	必須	アノテーションの一意識別子
`description`	string	必須	アノテーターに表示される指示
`annotation_type`	string	必須	`"audio_annotation"`でなければならない
`mode`	string	`"label"`	アノテーションモード：`"label"`、`"questions"`、または`"both"`
`labels`	list	条件付き	`label`または`both`モードで必須
`segment_schemes`	list	条件付き	`questions`または`both`モードで必須
`min_segments`	integer	0	必要な最小セグメント数
`max_segments`	integer	null	許可される最大セグメント数（null = 無制限）
`zoom_enabled`	boolean	true	ズームコントロールを有効化
`playback_rate_control`	boolean	false	再生速度セレクターを表示

ラベル設定

yaml

labels:
  - name: "speech"
    color: "#3B82F6"
    key_value: "1"
  - name: "music"
    color: "#10B981"
    key_value: "2"
  - name: "silence"
    color: "#64748B"
    key_value: "3"

アノテーションモード

ラベルモード（デフォルト）

セグメントにカテゴリラベルを付与：

yaml

annotation_schemes:
  - name: "emotion"
    description: "Label the emotion in each segment"
    annotation_type: "audio_annotation"
    mode: "label"
    labels:
      - name: "happy"
        color: "#22C55E"
      - name: "sad"
        color: "#3B82F6"
      - name: "angry"
        color: "#EF4444"
      - name: "neutral"
        color: "#64748B"

質問モード

各セグメントに専用の質問に回答：

yaml

annotation_schemes:
  - name: "transcription"
    description: "Transcribe each segment"
    annotation_type: "audio_annotation"
    mode: "questions"
    segment_schemes:
      - name: "transcript"
        annotation_type: "text"
        description: "Enter the transcription"
      - name: "confidence"
        annotation_type: "likert"
        description: "How confident are you?"
        size: 5

両方モード

ラベリングとセグメントごとのアンケートを組み合わせ：

yaml

annotation_schemes:
  - name: "detailed_diarization"
    description: "Label speakers and add notes"
    annotation_type: "audio_annotation"
    mode: "both"
    labels:
      - name: "Speaker A"
        color: "#3B82F6"
      - name: "Speaker B"
        color: "#10B981"
    segment_schemes:
      - name: "notes"
        annotation_type: "text"
        description: "Any notes about this segment?"

グローバルオーディオ設定

設定ファイルで波形処理を設定：

yaml

audio_annotation:
  waveform_cache_dir: "waveform_cache/"
  waveform_look_ahead: 5
  waveform_cache_max_size: 1000
  client_fallback_max_duration: 1800

フィールド	説明
`waveform_cache_dir`	キャッシュされた波形データのディレクトリ
`waveform_look_ahead`	事前計算する今後のインスタンス数
`waveform_cache_max_size`	キャッシュされた波形ファイルの最大数
`client_fallback_max_duration`	ブラウザ側波形生成の最大秒数（デフォルト：1800）

例

話者ダイアライゼーション

yaml

annotation_schemes:
  - name: "diarization"
    description: "Identify who is speaking at each moment"
    annotation_type: "audio_annotation"
    mode: "label"
    labels:
      - name: "Interviewer"
        color: "#8B5CF6"
        key_value: "1"
      - name: "Guest"
        color: "#EC4899"
        key_value: "2"
      - name: "Overlap"
        color: "#F59E0B"
        key_value: "3"
    zoom_enabled: true
    playback_rate_control: true

サウンドイベント検出

yaml

annotation_schemes:
  - name: "sound_events"
    description: "Mark all sound events"
    annotation_type: "audio_annotation"
    labels:
      - name: "speech"
        color: "#3B82F6"
      - name: "music"
        color: "#10B981"
      - name: "applause"
        color: "#F59E0B"
      - name: "laughter"
        color: "#EC4899"
      - name: "silence"
        color: "#64748B"
    min_segments: 1

書き起こしレビュー

yaml

annotation_schemes:
  - name: "transcription_review"
    description: "Review and correct the transcription for each segment"
    annotation_type: "audio_annotation"
    mode: "questions"
    segment_schemes:
      - name: "transcript"
        annotation_type: "text"
        description: "Enter or correct the transcription"
        textarea: true
      - name: "quality"
        annotation_type: "radio"
        description: "Audio quality"
        labels:
          - "Clear"
          - "Noisy"
          - "Unintelligible"

キーボードショートカット

キー	アクション
`Space`	再生/一時停止
`←` / `→`	前方/後方にシーク
`[`	セグメント開始をマーク
`]`	セグメント終了をマーク
`Enter`	セグメントを作成
`Delete`	選択したセグメントを削除
`1-9`	ラベルを選択
`+` / `-`	ズームイン/アウト
`0`	ビューをフィット

データ形式

入力データ

データファイルにはオーディオファイルのパスまたはURLを含めます：

json

[
  {
    "id": "audio_001",
    "audio_url": "https://example.com/audio/recording1.mp3"
  },
  {
    "id": "audio_002",
    "audio_url": "/data/audio/recording2.wav"
  }
]

オーディオフィールドの設定：

yaml

item_properties:
  id_key: id
  text_key: audio_url

出力形式

json

{
  "id": "audio_001",
  "annotations": {
    "diarization": [
      {
        "start": 0.0,
        "end": 5.5,
        "label": "Interviewer"
      },
      {
        "start": 5.5,
        "end": 12.3,
        "label": "Guest"
      },
      {
        "start": 12.3,
        "end": 14.0,
        "label": "Overlap"
      }
    ]
  }
}

質問モードでは、セグメントにネストされた回答が含まれます：

json

{
  "start": 0.0,
  "end": 5.5,
  "transcript": "Hello and welcome to the show.",
  "quality": "Clear"
}

サポートされるオーディオ形式

MP3（推奨）
WAV
OGG
M4A

ベストプラクティス

波形を事前キャッシュ - 大規模データセットにはサーバーサイドキャッシュを使用
再生コントロールを有効に - 可変速度は正確なセグメンテーションに役立つ
キーボードショートカットを使用 - クリックよりはるかに高速
明確な境界を定義 - セグメントの開始/終了の基準を指定
適切なモードを選択 - 分類には"label"、詳細アノテーションには"questions"を使用
セグメント制限を設定 - min_segmentsを使用してカバレッジを確保