WavCaps - Audio Captioning
Audio captioning - write natural language descriptions of audio content. Annotators listen to audio clips and write detailed captions describing all sounds, events, and acoustic scenes (Mei et al., IEEE TASLP 2024).
Fichier de configurationconfig.yaml
# WavCaps - Audio Captioning
# Based on Mei et al., IEEE TASLP 2024
# Paper: https://ieeexplore.ieee.org/document/10637816
# Dataset: https://github.com/XinhaoMei/WavCaps
#
# Task: Write natural language descriptions of audio content.
# Listen to audio clips and write detailed captions describing all sounds,
# events, and acoustic scenes.
#
# Guidelines:
# - Listen to the full audio clip before writing a caption
# - Describe all notable sounds, events, and background ambience
# - Use clear, concise natural language
# - Include temporal information if relevant (e.g., "A dog barks, then a door slams")
# - List individual sound events separately in the sound events field
annotation_task_name: "WavCaps: Audio Captioning"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "audio_url"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
- annotation_type: text
name: caption
description: "Write a detailed natural language caption describing the audio content. Include all sounds, events, and acoustic scenes you hear."
min_length: 10
max_length: 500
placeholder: "Describe what you hear in this audio clip..."
- annotation_type: text
name: sound_events
description: "List the individual sound events heard in the clip, separated by commas (e.g., 'dog barking, car engine, wind blowing')"
min_length: 3
max_length: 300
placeholder: "List sound events separated by commas..."
audio_display:
show_waveform: true
playback_controls: true
allow_speed_control: true
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false
Données d'exemplesample-data.json
[
{
"id": "wavcaps_001",
"audio_url": "https://example.com/audio/wavcaps/urban_street_001.wav",
"duration": 10,
"source": "Freesound"
},
{
"id": "wavcaps_002",
"audio_url": "https://example.com/audio/wavcaps/kitchen_cooking_001.wav",
"duration": 8.5,
"source": "AudioSet"
}
]
// ... and 8 more itemsObtenir ce design
Clone or download from the repository
Démarrage rapide :
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/audio/wavcaps-audio-captioning potato start config.yaml
Détails
Types d'annotation
Domaine
Cas d'utilisation
Étiquettes
Vous avez trouvé un problème ou souhaitez améliorer ce design ?
Ouvrir un ticketDesigns associés
Audio Transcription Review
Review and correct automatic speech recognition transcriptions with waveform visualization.
Clotho Audio Captioning
Audio captioning and quality assessment based on the Clotho dataset (Drossos et al., ICASSP 2020). Annotators write natural language captions for audio clips, rate caption accuracy on a Likert scale, and classify the audio environment.
CoVoST 2 - Speech Translation Evaluation
Speech translation quality evaluation based on the CoVoST 2 dataset (Wang et al., arXiv 2020). Annotators listen to source audio, review translations, label audio segments, and rate overall translation quality.