Sound Event Detection

Temporal sound event annotation with strong labels following DCASE Challenge protocols.

Configuration Fileconfig.yaml

annotation_task_name: "Sound Event Detection"

port: 8000

# Data configuration
data_files:
  - "data/audio_recordings.json"

item_properties:
  id_key: "id"
  text_key: "text"

# Annotation schemes
annotation_schemes:
  # Event annotations (temporal spans)
  # Format: describe events with timestamps
  - annotation_type: text
    name: event_annotations
    description: "List all sound events with start/end times. Format: 'start-end: event_class'"
    textarea: true
    placeholder: "0.0-2.5: dog_bark\n3.1-4.0: car_horn\n2.0-5.5: speech\n..."

  # Events detected (for quick summary)
  - annotation_type: multiselect
    name: events_present
    description: "Which sound events are present in this clip? (Select all)"
    labels:
      - Speech
      - Dog bark
      - Cat meow
      - Car horn
      - Car passing
      - Siren
      - Alarm
      - Door/knock
      - Footsteps
      - Music
      - Bird sounds
      - Rain
      - Wind
      - Construction/drilling
      - Glass breaking
      - Gunshot
      - Scream
      - Baby cry
      - Applause
      - Laughter

  # Number of distinct events
  - annotation_type: radio
    name: event_count
    description: "How many distinct sound events did you annotate?"
    labels:
      - "0 (silence/background only)"
      - "1-2 events"
      - "3-5 events"
      - "6-10 events"
      - "More than 10 events"

  # Event overlap
  - annotation_type: radio
    name: event_overlap
    description: "Are there overlapping sound events?"
    labels:
      - No overlap (events are sequential)
      - Some overlap
      - Heavy overlap (many simultaneous sounds)

  # Annotation difficulty
  - annotation_type: likert
    name: difficulty
    description: "How difficult was it to determine event boundaries?"
    size: 5
    min_label: "Very easy (clear boundaries)"
    max_label: "Very difficult (ambiguous)"

  # Background noise level
  - annotation_type: likert
    name: noise_level
    description: "How much background noise is present?"
    size: 5
    min_label: "Silent/clean"
    max_label: "Very noisy"

  # Confidence
  - annotation_type: likert
    name: confidence
    description: "Confidence in your temporal boundaries"
    size: 5
    min_label: "Low"
    max_label: "High"

# User settings
allow_all_users: true
instances_per_annotator: 75

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir sound-event-detection
cd sound-event-detection
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

spanmultiselect

Domain

Audio

Use Cases

sound event detectiontemporal annotationacoustic monitoring

Related Designs

Audio Transcription Review

Review and correct automatic speech recognition transcriptions with waveform visualization.

likertmultiselect

AudioSet Event Classification

Multi-label audio event tagging following the AudioSet ontology for weak supervision.

multiselect

Detecting Persuasion Techniques in News

Identification of propaganda and persuasion techniques in news articles through both multi-label classification and span-level detection. Based on SemEval-2023 Task 3 (Piskorski et al.).

multiselectspan