Questa pagina non è ancora disponibile nella tua lingua. Viene mostrata la versione in inglese.

Audio-Annotation

Audiodateien mit Wellenformvisualisierung und Wiedergabesteuerung annotieren.

Audio-Annotation

Potato 2.0 bietet leistungsstarke Audio-Annotation mit Wellenformvisualisierung auf Basis von Peaks.js, Segment-Beschriftung und umfangreichen Tastaturkürzeln.

Anwendungsfälle

Sprachtranskription und -überprüfung
Sprecherdiarisierung
Musikanalyse
Audioreizerkennung
Emotionserkennung in Sprache
Qualitätssicherung im Call-Center

Audio-Unterstützung aktivieren

Fügen Sie Ihrer Konfiguration einen audio_annotation-Abschnitt hinzu:

yaml

annotation_schemes:
  - annotation_type: audio
    name: audio_segments
    description: "Segment and label the audio"
    labels:
      - Speech
      - Music
      - Silence
      - Noise

Betriebsmodi

Potato unterstützt drei Audio-Annotationsmodi:

Label-Modus

Audio segmentieren und jedem Segment Kategorie-Labels zuweisen:

yaml

annotation_schemes:
  - annotation_type: audio
    name: speaker_diarization
    mode: label
    description: "Identify speakers in the audio"
    labels:
      - Speaker A
      - Speaker B
      - Overlap
    label_colors:
      "Speaker A": "#3b82f6"
      "Speaker B": "#10b981"
      "Overlap": "#f59e0b"

Fragen-Modus

Fragen pro Segment hinzufügen:

yaml

annotation_schemes:
  - annotation_type: audio
    name: speech_quality
    mode: questions
    description: "Evaluate speech segments"
    segment_questions:
      - name: clarity
        type: likert
        size: 5
        min_label: "Unclear"
        max_label: "Very clear"
      - name: emotion
        type: radio
        labels: [Neutral, Happy, Sad, Angry]

Kombinierter Modus

Beschriftung mit Fragen pro Segment kombinieren:

yaml

annotation_schemes:
  - annotation_type: audio
    name: full_analysis
    mode: both
    description: "Label and analyze audio segments"
    labels:
      - Speech
      - Music
      - Noise
    segment_questions:
      - name: quality
        type: likert
        size: 5

Konfigurationsoptionen

Grundeinrichtung

yaml

annotation_schemes:
  - annotation_type: audio
    name: segments
    description: "Create audio segments"
    labels:
      - Label A
      - Label B
 
    # Optional constraints
    min_segments: 1
    max_segments: 50

Tastaturkürzel

Labels können mit den Zifferntasten 1–9 zugewiesen werden:

yaml

annotation_schemes:
  - annotation_type: audio
    name: speakers
    labels:
      - Speaker A  # Press 1
      - Speaker B  # Press 2
      - Overlap    # Press 3

Label-Farben

Segmentfarben anpassen:

yaml

annotation_schemes:
  - annotation_type: audio
    name: segments
    labels:
      - Speech
      - Music
      - Silence
    label_colors:
      "Speech": "#3b82f6"
      "Music": "#10b981"
      "Silence": "#6b7280"

Wellenformleistung

Für optimale Leistung bei langen Audiodateien installieren Sie das BBC-Audiowaveform-Tool:

bash

# macOS
brew install audiowaveform
 
# Ubuntu/Debian
sudo apt-get install audiowaveform
 
# Or build from source
# https://github.com/bbc/audiowaveform

Dies ermöglicht die serverseitige Wellenformgenerierung. Ohne dieses Tool wird die clientseitige Generierung verwendet (geeignet für Dateien unter 30 Minuten).

Wellenform-Caching

Caching für bessere Leistung konfigurieren:

yaml

audio_config:
  cache_dir: "audio_cache/"
  precompute_depth: 100  # Pre-generate waveforms for first N items
  client_fallback_max_duration: 1800  # 30 minutes in seconds

Datenformat

Einfache Audio-Referenz

json

[
  {"id": "1", "audio_path": "audio/recording_001.wav"},
  {"id": "2", "audio_path": "audio/recording_002.wav"}
]

yaml

data_files:
  - "data/audio_data.json"
 
item_properties:
  id_key: id
  audio_key: audio_path

Mit Transkripten

json

[
  {
    "id": "1",
    "audio_path": "audio/call_001.wav",
    "transcript": "Hello, how can I help you today?"
  }
]

Ausgabeformat

Annotationen werden mit Segment-Zeitstempeln gespeichert:

json

{
  "id": "audio_1",
  "annotations": {
    "segments": [
      {
        "start": 0.0,
        "end": 2.5,
        "label": "Speaker A",
        "questions": {
          "clarity": 4,
          "emotion": "Neutral"
        }
      },
      {
        "start": 2.5,
        "end": 5.2,
        "label": "Speaker B"
      }
    ]
  }
}

Tastaturkürzel

Potato bietet umfangreiche Tastaturkürzel für effiziente Annotation:

Kürzel	Aktion
`Leertaste`	Wiedergabe/Pause
`[`	Segmentanfang an aktueller Position setzen
`]`	Segmentende an aktueller Position setzen
`1–9`	Label dem aktuellen Segment zuweisen
`Entf`	Aktuelles Segment entfernen
`Pfeil links`	5 Sekunden zurückspulen
`Pfeil rechts`	5 Sekunden vorspulen
`Pfeil oben`	Hineinzoomen
`Pfeil unten`	Herauszoomen
`Pos1`	Zum Anfang springen
`Ende`	Zum Ende springen
`+`	Wiedergabegeschwindigkeit erhöhen
`-`	Wiedergabegeschwindigkeit verringern

Beispielkonfigurationen

Sprecherdiarisierung

yaml

task_name: "Speaker Diarization"
task_dir: "."
port: 8000
 
data_files:
  - "data/recordings.json"
 
item_properties:
  id_key: id
  audio_key: audio_path
 
annotation_schemes:
  - annotation_type: audio
    name: speakers
    mode: label
    description: "Identify who is speaking"
    labels:
      - Speaker 1
      - Speaker 2
      - Speaker 3
      - Overlap
      - Silence
    label_colors:
      "Speaker 1": "#3b82f6"
      "Speaker 2": "#10b981"
      "Speaker 3": "#f59e0b"
      "Overlap": "#ef4444"
      "Silence": "#6b7280"
    min_segments: 1
 
audio_config:
  cache_dir: "audio_cache/"
  precompute_depth: 50
 
output_annotation_dir: "output/"
output_annotation_format: "json"
allow_all_users: true

Transkriptionsüberprüfung

yaml

task_name: "Transcription Quality Review"
task_dir: "."
port: 8000
 
data_files:
  - "data/transcripts.json"
 
item_properties:
  id_key: id
  text_key: transcript
  audio_key: audio_path
 
annotation_schemes:
  - annotation_type: audio
    name: errors
    mode: questions
    description: "Mark transcription errors"
    segment_questions:
      - name: error_type
        type: radio
        labels:
          - Missing word
          - Wrong word
          - Extra word
          - Spelling error
      - name: severity
        type: likert
        size: 3
        min_label: "Minor"
        max_label: "Major"
 
  - annotation_type: radio
    name: overall_accuracy
    description: "Overall transcript accuracy"
    labels:
      - Accurate
      - Minor errors
      - Major errors
      - Unusable
 
output_annotation_dir: "output/"
output_annotation_format: "json"

Call-Center-Qualitätssicherung

yaml

task_name: "Call Center Quality Assurance"
task_dir: "."
port: 8000
 
data_files:
  - "data/calls.json"
 
item_properties:
  id_key: call_id
  audio_key: recording_path
 
annotation_schemes:
  # Segment-level annotation
  - annotation_type: audio
    name: conversation
    mode: both
    description: "Segment the conversation"
    labels:
      - Agent
      - Customer
      - Hold
      - Silence
    segment_questions:
      - name: sentiment
        type: radio
        labels: [Positive, Neutral, Negative, Frustrated]
 
  # Call-level assessment
  - annotation_type: likert
    name: professionalism
    description: "Agent professionalism"
    size: 5
    min_label: "Poor"
    max_label: "Excellent"
 
  - annotation_type: likert
    name: resolution
    description: "Issue resolution"
    size: 5
    min_label: "Unresolved"
    max_label: "Fully resolved"
 
  - annotation_type: multiselect
    name: issues
    description: "Select any issues observed"
    labels:
      - Long hold time
      - Agent interrupted
      - Incorrect information
      - Missing greeting
      - Unprofessional language
 
  - annotation_type: text
    name: notes
    description: "Additional observations"
    textarea: true
 
output_annotation_dir: "output/"
output_annotation_format: "json"

Unterstützte Audioformate

WAV (empfohlen für beste Qualität)
MP3
OGG
FLAC
M4A
WebM

Leistungstipps

Audiowaveform installieren – Essenziell für lange Audiodateien
Caching aktivieren – cache_dir verwenden, um vorberechnete Wellenformen zu speichern
WAV für Qualität nutzen – Komprimierte Formate können Artefakte erzeugen
Audio vorverarbeiten – Pegel normalisieren, unnötige Stille entfernen
Dateigrößen beachten – Große Dateien verlangsamen das Laden
Vorberechnung nutzen – Wellenformen für initiale Instanzen vorberechnen

Fehlerbehebung

Wellenform lädt nicht

Audio-Dateipfad auf Richtigkeit prüfen
Sicherstellen, dass das Dateiformat unterstützt wird
Audiowaveform für lange Dateien installieren
Browser-Konsole auf Fehler prüfen

Langsame Leistung

Audiowaveform-Tool installieren
Wellenform-Caching aktivieren
Audiodateigrößen reduzieren
precompute_depth-Einstellung verwenden

Segmente werden nicht gespeichert

Sicherstellen, dass das Ausgabeverzeichnis beschreibbar ist
Annotationsformat-Konfiguration prüfen
Sicherstellen, dass das Segment sowohl Start- als auch Endzeiten hat