Acoustic Scene Classification

Classify audio recordings by acoustic environment following the TUT/DCASE dataset format.

Configuration Fileconfig.yaml

annotation_task_name: "Acoustic Scene Classification"

port: 8000

# Data configuration
data_files:
  - "data/scenes.json"

item_properties:
  id_key: id
  text_key: text

# Annotation schemes
annotation_schemes:
  # Primary scene category
  - annotation_type: radio
    name: scene_category
    description: "Select the acoustic scene/environment"
    labels:
      - name: Airport
        key_value: "1"
      - name: Bus
        key_value: "2"
      - name: Metro station
        key_value: "3"
      - name: Metro (inside train)
        key_value: "4"
      - name: Park
        key_value: "5"
      - name: Public square
        key_value: "6"
      - name: Shopping mall
        key_value: "7"
      - name: Street (pedestrian)
        key_value: "8"
      - name: Street (traffic)
        key_value: "9"
      - name: Tram
        key_value: "0"
    sequential_key_binding: true

  # Indoor/outdoor
  - annotation_type: radio
    name: environment_type
    description: "Is this primarily indoor or outdoor?"
    labels:
      - Indoor
      - Outdoor
      - Mixed/transitional
      - Cannot determine

  # Scene clarity
  - annotation_type: radio
    name: scene_clarity
    description: "How clearly identifiable is the scene?"
    labels:
      - Very clear (unmistakable)
      - Clear (confident identification)
      - Ambiguous (could be multiple scenes)
      - Unclear (cannot identify)

  # Secondary sounds
  - annotation_type: multiselect
    name: prominent_sounds
    description: "What sounds are most prominent? (Select up to 3)"
    labels:
      - Human speech/chatter
      - Vehicle noise
      - Footsteps
      - Announcements/PA system
      - Nature sounds (birds, wind)
      - Music
      - Machinery/equipment
      - Silence/quiet

  # Confidence rating
  - annotation_type: likert
    name: confidence
    description: "Confidence in your scene classification"
    size: 5
    min_label: "Low"
    max_label: "High"

# User settings
allow_all_users: true
instances_per_annotator: 200

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir acoustic-scene-classification
cd acoustic-scene-classification
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

radiolikert

Domain

Audio

Use Cases

scene classificationacoustic environmentcontext awareness

Related Designs

Audio Transcription Review

Review and correct automatic speech recognition transcriptions with waveform visualization.

likertmultiselect

Audio-Visual Sentiment Analysis

Rate sentiment in speech segments following CMU-MOSI and CMU-MOSEI multimodal annotation protocols.

likertradio

Clotho Audio Captioning

Audio captioning and quality assessment based on the Clotho dataset (Drossos et al., ICASSP 2020). Annotators write natural language captions for audio clips, rate caption accuracy on a Likert scale, and classify the audio environment.

textlikert