Audio Annotation
Waveform visualization और playback controls के साथ audio files को annotate करें।
Audio Annotation
Potato 2.0 Peaks.js द्वारा संचालित waveform visualization, segment labeling, और व्यापक keyboard shortcuts के साथ शक्तिशाली audio annotation प्रदान करता है।
उपयोग के मामले
- Speech transcription और समीक्षा
- Speaker diarization
- संगीत विश्लेषण
- Audio event detection
- Speech में भावना पहचान
- Call center गुणवत्ता आश्वासन
Audio Support सक्षम करना
अपने कॉन्फ़िगरेशन में audio_annotation section जोड़ें:
annotation_schemes:
- annotation_type: audio
name: audio_segments
description: "Segment and label the audio"
labels:
- Speech
- Music
- Silence
- NoiseOperational Modes
Potato तीन audio annotation modes का समर्थन करता है:
Label Mode
Audio को segment करें और प्रत्येक segment को category labels assign करें:
annotation_schemes:
- annotation_type: audio
name: speaker_diarization
mode: label
description: "Identify speakers in the audio"
labels:
- Speaker A
- Speaker B
- Overlap
label_colors:
"Speaker A": "#3b82f6"
"Speaker B": "#10b981"
"Overlap": "#f59e0b"Questions Mode
Per-segment annotation questions जोड़ें:
annotation_schemes:
- annotation_type: audio
name: speech_quality
mode: questions
description: "Evaluate speech segments"
segment_questions:
- name: clarity
type: likert
size: 5
min_label: "Unclear"
max_label: "Very clear"
- name: emotion
type: radio
labels: [Neutral, Happy, Sad, Angry]Both Mode
Labeling को per-segment questions के साथ मिलाएँ:
annotation_schemes:
- annotation_type: audio
name: full_analysis
mode: both
description: "Label and analyze audio segments"
labels:
- Speech
- Music
- Noise
segment_questions:
- name: quality
type: likert
size: 5कॉन्फ़िगरेशन Options
बुनियादी सेटअप
annotation_schemes:
- annotation_type: audio
name: segments
description: "Create audio segments"
labels:
- Label A
- Label B
# Optional constraints
min_segments: 1
max_segments: 50Keyboard Shortcuts
Labels को number keys 1-9 का उपयोग करके assign किया जा सकता है:
annotation_schemes:
- annotation_type: audio
name: speakers
labels:
- Speaker A # Press 1
- Speaker B # Press 2
- Overlap # Press 3Label Colors
Segment colors customize करें:
annotation_schemes:
- annotation_type: audio
name: segments
labels:
- Speech
- Music
- Silence
label_colors:
"Speech": "#3b82f6"
"Music": "#10b981"
"Silence": "#6b7280"Waveform प्रदर्शन
लंबी audio files के साथ इष्टतम प्रदर्शन के लिए, BBC audiowaveform tool install करें:
# macOS
brew install audiowaveform
# Ubuntu/Debian
sudo apt-get install audiowaveform
# Or build from source
# https://github.com/bbc/audiowaveformयह server-side waveform generation सक्षम करता है। इसके बिना, client-side generation का उपयोग किया जाता है (30 मिनट से कम की files के लिए उपयुक्त)।
Waveform Caching
बेहतर प्रदर्शन के लिए caching configure करें:
audio_config:
cache_dir: "audio_cache/"
precompute_depth: 100 # Pre-generate waveforms for first N items
client_fallback_max_duration: 1800 # 30 minutes in secondsData Format
सरल Audio Reference
[
{"id": "1", "audio_path": "audio/recording_001.wav"},
{"id": "2", "audio_path": "audio/recording_002.wav"}
]data_files:
- "data/audio_data.json"
item_properties:
id_key: id
audio_key: audio_pathTranscripts के साथ
[
{
"id": "1",
"audio_path": "audio/call_001.wav",
"transcript": "Hello, how can I help you today?"
}
]Output Format
Annotations segment timestamps के साथ सहेजी जाती हैं:
{
"id": "audio_1",
"annotations": {
"segments": [
{
"start": 0.0,
"end": 2.5,
"label": "Speaker A",
"questions": {
"clarity": 4,
"emotion": "Neutral"
}
},
{
"start": 2.5,
"end": 5.2,
"label": "Speaker B"
}
]
}
}Keyboard Shortcuts
Potato कुशल annotation के लिए व्यापक keyboard shortcuts प्रदान करता है:
| Shortcut | क्रिया |
|---|---|
Space | Play/Pause |
[ | वर्तमान स्थिति पर segment start सेट करें |
] | वर्तमान स्थिति पर segment end सेट करें |
1-9 | वर्तमान segment को label assign करें |
Delete | वर्तमान segment हटाएँ |
Left Arrow | 5 seconds पीछे जाएँ |
Right Arrow | 5 seconds आगे जाएँ |
Up Arrow | Zoom in |
Down Arrow | Zoom out |
Home | शुरुआत पर जाएँ |
End | अंत पर जाएँ |
+ | Playback speed बढ़ाएँ |
- | Playback speed घटाएँ |
उदाहरण कॉन्फ़िगरेशन
Speaker Diarization
task_name: "Speaker Diarization"
task_dir: "."
port: 8000
data_files:
- "data/recordings.json"
item_properties:
id_key: id
audio_key: audio_path
annotation_schemes:
- annotation_type: audio
name: speakers
mode: label
description: "Identify who is speaking"
labels:
- Speaker 1
- Speaker 2
- Speaker 3
- Overlap
- Silence
label_colors:
"Speaker 1": "#3b82f6"
"Speaker 2": "#10b981"
"Speaker 3": "#f59e0b"
"Overlap": "#ef4444"
"Silence": "#6b7280"
min_segments: 1
audio_config:
cache_dir: "audio_cache/"
precompute_depth: 50
output_annotation_dir: "output/"
output_annotation_format: "json"
allow_all_users: trueTranscription Review
task_name: "Transcription Quality Review"
task_dir: "."
port: 8000
data_files:
- "data/transcripts.json"
item_properties:
id_key: id
text_key: transcript
audio_key: audio_path
annotation_schemes:
- annotation_type: audio
name: errors
mode: questions
description: "Mark transcription errors"
segment_questions:
- name: error_type
type: radio
labels:
- Missing word
- Wrong word
- Extra word
- Spelling error
- name: severity
type: likert
size: 3
min_label: "Minor"
max_label: "Major"
- annotation_type: radio
name: overall_accuracy
description: "Overall transcript accuracy"
labels:
- Accurate
- Minor errors
- Major errors
- Unusable
output_annotation_dir: "output/"
output_annotation_format: "json"Call Center QA
task_name: "Call Center Quality Assurance"
task_dir: "."
port: 8000
data_files:
- "data/calls.json"
item_properties:
id_key: call_id
audio_key: recording_path
annotation_schemes:
# Segment-level annotation
- annotation_type: audio
name: conversation
mode: both
description: "Segment the conversation"
labels:
- Agent
- Customer
- Hold
- Silence
segment_questions:
- name: sentiment
type: radio
labels: [Positive, Neutral, Negative, Frustrated]
# Call-level assessment
- annotation_type: likert
name: professionalism
description: "Agent professionalism"
size: 5
min_label: "Poor"
max_label: "Excellent"
- annotation_type: likert
name: resolution
description: "Issue resolution"
size: 5
min_label: "Unresolved"
max_label: "Fully resolved"
- annotation_type: multiselect
name: issues
description: "Select any issues observed"
labels:
- Long hold time
- Agent interrupted
- Incorrect information
- Missing greeting
- Unprofessional language
- annotation_type: text
name: notes
description: "Additional observations"
textarea: true
output_annotation_dir: "output/"
output_annotation_format: "json"समर्थित Audio Formats
- WAV (सर्वोत्तम गुणवत्ता के लिए अनुशंसित)
- MP3
- OGG
- FLAC
- M4A
- WebM
प्रदर्शन सुझाव
- audiowaveform install करें - लंबी audio files के लिए आवश्यक
- Caching सक्षम करें - Pre-generated waveforms संग्रहीत करने के लिए
cache_dirका उपयोग करें - गुणवत्ता के लिए WAV का उपयोग करें - Compressed formats में artifacts हो सकते हैं
- Audio pre-process करें - Levels normalize करें, अनावश्यक silence trim करें
- File sizes पर विचार करें - बड़ी files लोडिंग धीमी करती हैं
- Precompute का उपयोग करें - प्रारंभिक instances के लिए waveforms pre-generate करें
समस्या निवारण
Waveform लोड नहीं हो रहा
- जांचें कि audio file path सही है
- सत्यापित करें कि file format समर्थित है
- लंबी files के लिए audiowaveform install करें
- Browser console में errors जांचें
धीमा प्रदर्शन
- audiowaveform tool install करें
- Waveform caching सक्षम करें
- Audio file sizes कम करें
- precompute_depth setting का उपयोग करें
Segments सहेजे नहीं जा रहे
- सुनिश्चित करें कि output directory writable है
- Annotation format configuration जांचें
- सत्यापित करें कि segment में start और end times दोनों हैं