Skip to content
Showcase/WavCaps - Audio Captioning
beginneraudio

WavCaps - Audio Captioning

Audio captioning - write natural language descriptions of audio content. Annotators listen to audio clips and write detailed captions describing all sounds, events, and acoustic scenes (Mei et al., IEEE TASLP 2024).

1:42Classify this audio:HappySadAngryNeutralSubmit

Fichier de configurationconfig.yaml

# WavCaps - Audio Captioning
# Based on Mei et al., IEEE TASLP 2024
# Paper: https://ieeexplore.ieee.org/document/10637816
# Dataset: https://github.com/XinhaoMei/WavCaps
#
# Task: Write natural language descriptions of audio content.
# Listen to audio clips and write detailed captions describing all sounds,
# events, and acoustic scenes.
#
# Guidelines:
# - Listen to the full audio clip before writing a caption
# - Describe all notable sounds, events, and background ambience
# - Use clear, concise natural language
# - Include temporal information if relevant (e.g., "A dog barks, then a door slams")
# - List individual sound events separately in the sound events field

annotation_task_name: "WavCaps: Audio Captioning"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "audio_url"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - annotation_type: text
    name: caption
    description: "Write a detailed natural language caption describing the audio content. Include all sounds, events, and acoustic scenes you hear."
    min_length: 10
    max_length: 500
    placeholder: "Describe what you hear in this audio clip..."

  - annotation_type: text
    name: sound_events
    description: "List the individual sound events heard in the clip, separated by commas (e.g., 'dog barking, car engine, wind blowing')"
    min_length: 3
    max_length: 300
    placeholder: "List sound events separated by commas..."

audio_display:
  show_waveform: true
  playback_controls: true
  allow_speed_control: true

allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false

Données d'exemplesample-data.json

[
  {
    "id": "wavcaps_001",
    "audio_url": "https://example.com/audio/wavcaps/urban_street_001.wav",
    "duration": 10,
    "source": "Freesound"
  },
  {
    "id": "wavcaps_002",
    "audio_url": "https://example.com/audio/wavcaps/kitchen_cooking_001.wav",
    "duration": 8.5,
    "source": "AudioSet"
  }
]

// ... and 8 more items

Obtenir ce design

View on GitHub

Clone or download from the repository

Démarrage rapide :

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/audio/wavcaps-audio-captioning
potato start config.yaml

Détails

Types d'annotation

text

Domaine

Audio UnderstandingCaptioning

Cas d'utilisation

Audio CaptioningSound Event DescriptionAudio-Language Research

Étiquettes

audiocaptioningsound-eventsaudio-languageenvironmental-audiotaslp2024

Vous avez trouvé un problème ou souhaitez améliorer ce design ?

Ouvrir un ticket