advancedaudio
Sound Event Detection
Temporal sound event annotation with strong labels following DCASE Challenge protocols.
🎧
audio annotation
Configuration Fileconfig.yaml
task_name: "Sound Event Detection"
# Server configuration
server:
port: 8000
# Audio settings
audio:
enabled: true
display: waveform
waveform_color: "#F59E0B"
progress_color: "#FBBF24"
speed_control: true
speed_options: [0.5, 0.75, 1.0]
keyboard_controls:
play_pause: "space"
rewind_1s: ","
forward_1s: "."
# Data configuration
data_files:
- path: data/audio_recordings.json
audio_field: audio_file
# Annotation schemes
annotation_schemes:
# Event annotations (temporal spans)
# Format: describe events with timestamps
- annotation_type: text
name: event_annotations
description: "List all sound events with start/end times. Format: 'start-end: event_class'"
textarea: true
placeholder: "0.0-2.5: dog_bark\n3.1-4.0: car_horn\n2.0-5.5: speech\n..."
# Events detected (for quick summary)
- annotation_type: multiselect
name: events_present
description: "Which sound events are present in this clip? (Select all)"
labels:
- Speech
- Dog bark
- Cat meow
- Car horn
- Car passing
- Siren
- Alarm
- Door/knock
- Footsteps
- Music
- Bird sounds
- Rain
- Wind
- Construction/drilling
- Glass breaking
- Gunshot
- Scream
- Baby cry
- Applause
- Laughter
# Number of distinct events
- annotation_type: radio
name: event_count
description: "How many distinct sound events did you annotate?"
labels:
- "0 (silence/background only)"
- "1-2 events"
- "3-5 events"
- "6-10 events"
- "More than 10 events"
# Event overlap
- annotation_type: radio
name: event_overlap
description: "Are there overlapping sound events?"
labels:
- No overlap (events are sequential)
- Some overlap
- Heavy overlap (many simultaneous sounds)
# Annotation difficulty
- annotation_type: likert
name: difficulty
description: "How difficult was it to determine event boundaries?"
size: 5
min_label: "Very easy (clear boundaries)"
max_label: "Very difficult (ambiguous)"
# Background noise level
- annotation_type: likert
name: noise_level
description: "How much background noise is present?"
size: 5
min_label: "Silent/clean"
max_label: "Very noisy"
# Confidence
- annotation_type: likert
name: confidence
description: "Confidence in your temporal boundaries"
size: 5
min_label: "Low"
max_label: "High"
# User settings
allow_all_users: true
instances_per_annotator: 75
# Output
output:
path: annotations/
format: json
Get This Design
This design is available in our showcase. Copy the configuration below to get started.
Quick start:
# Create your project folder mkdir sound-event-detection cd sound-event-detection # Copy config.yaml from above potato start config.yaml
Details
Annotation Types
spanmultiselect
Domain
Audio
Use Cases
sound event detectiontemporal annotationacoustic monitoring
Tags
audioevent detectiontemporaldcasesegmentation
Related Designs
AudioSet Event Classification
Multi-label audio event tagging following the AudioSet ontology for weak supervision.
multiselect
HateXplain - Explainable Hate Speech Detection
Multi-task hate speech annotation with classification (hate/offensive/normal), target community identification, and rationale span highlighting. Based on the HateXplain benchmark (Mathew et al., AAAI 2021) - the first dataset covering classification, target identification, and rationale extraction.
radiomultiselect
Music Tagging
Multi-label music tagging following MagnaTagATune dataset format for instrument and genre annotation.
multiselectlikert