AudioSet Event Classification

Multi-label audio event tagging following the AudioSet ontology for weak supervision.

Configuration Fileconfig.yaml

annotation_task_name: "AudioSet Event Classification"

port: 8000

# Data configuration
data_files:
  - "data/audio_clips.json"

item_properties:
  id_key: id
  text_key: text

# Annotation schemes
annotation_schemes:
  # Human sounds
  - annotation_type: multiselect
    name: human_sounds
    description: "Human sounds present (select all that apply)"
    labels:
      - Speech
      - Singing
      - Shout
      - Whisper
      - Laughter
      - Crying/sobbing
      - Cough
      - Sneeze
      - Breathing
      - Footsteps
      - Clapping
      - None

  # Animal sounds
  - annotation_type: multiselect
    name: animal_sounds
    description: "Animal sounds present (select all that apply)"
    labels:
      - Dog bark
      - Cat meow
      - Bird chirp/song
      - Rooster crow
      - Insect buzz
      - Horse neigh
      - Cow moo
      - None

  # Music and instruments
  - annotation_type: multiselect
    name: music_sounds
    description: "Music and instruments present (select all that apply)"
    labels:
      - Music
      - Guitar
      - Piano
      - Drums
      - Violin
      - Singing (musical)
      - Electronic music
      - None

  # Environmental sounds
  - annotation_type: multiselect
    name: environment_sounds
    description: "Environmental sounds present (select all that apply)"
    labels:
      - Wind
      - Rain
      - Thunder
      - Water (stream/river)
      - Fire crackling
      - Traffic noise
      - Siren
      - Bell
      - Door slam
      - None

  # Mechanical sounds
  - annotation_type: multiselect
    name: mechanical_sounds
    description: "Mechanical/vehicle sounds present (select all that apply)"
    labels:
      - Car engine
      - Motorcycle
      - Train
      - Aircraft
      - Power tools
      - Keyboard typing
      - Phone ringing
      - None

  # Confidence rating
  - annotation_type: likert
    name: confidence
    description: "How confident are you in your labels?"
    size: 5
    min_label: "Not confident"
    max_label: "Very confident"

  # Notes
  - annotation_type: text
    name: notes
    description: "Additional sounds or notes (optional)"
    textarea: false
    required: false

# User settings
allow_all_users: true
instances_per_annotator: 200

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir audioset-event-classification
cd audioset-event-classification
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

multiselect

Domain

AudioSpeech

Use Cases

audio classificationsound event detectionweak labeling

Related Designs

Audio Transcription Review

Review and correct automatic speech recognition transcriptions with waveform visualization.

likertmultiselect

Music Tagging

Multi-label music tagging following MagnaTagATune dataset format for instrument and genre annotation.

multiselectlikert

Respiratory Sound Classification

Classify lung and respiratory sounds for medical diagnosis following ICBHI 2017 Challenge format.

radiomultiselect