LibriSpeech - Audio Transcription and Quality Rating
Audio transcription quality assessment based on the LibriSpeech corpus (Panayotov et al., ICASSP 2015). Annotators rate audio quality using a slider and classify audio segments by content type, supporting speech recognition research and dataset curation.
Archivo de configuraciónconfig.yaml
# LibriSpeech - Audio Transcription and Quality Rating
# Based on Panayotov et al., ICASSP 2015
# Paper: https://ieeexplore.ieee.org/document/7178964
# Dataset: https://www.openslr.org/12
#
# LibriSpeech is a corpus of approximately 1000 hours of English
# read speech derived from public domain audiobooks. This task
# involves two annotation components:
#
# 1. Audio quality rating (slider): Rate the recording quality on
# a scale from 0 (very poor) to 10 (excellent)
#
# 2. Audio content classification: Label what type of audio content
# is present in the segment
#
# Quality considerations:
# - Background noise level
# - Speaker clarity and articulation
# - Recording volume and consistency
# - Presence of artifacts or distortion
annotation_task_name: "LibriSpeech: Audio Transcription and Quality Rating"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: slider
name: audio_quality
description: "Rate the overall audio quality (0 = very poor, 10 = excellent)"
min_value: 0
max_value: 10
starting_value: 5
- annotation_type: audio_annotation
name: content_type
description: "Classify the type of audio content in this segment"
mode: "label"
labels:
- "Clean Speech"
- "Noisy Speech"
- "Music"
- "Silence"
- "Other"
annotation_instructions: |
You will evaluate audio segments from the LibriSpeech corpus:
1. Listen to the audio clip and read the provided transcript.
2. Use the slider to rate the audio quality from 0 (very poor) to 10 (excellent).
3. Classify the audio content type (clean speech, noisy speech, music, silence, or other).
4. Consider background noise, speaker clarity, and recording quality in your rating.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #fef3c7; border: 1px solid #fde68a; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #a16207;">Audio Sample:</strong>
<p style="margin: 8px 0 0 0;"><audio controls src="{{audio_url}}" style="width: 100%;"></audio></p>
</div>
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Reference Transcript:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0; font-style: italic;">{{text}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false
Datos de ejemplosample-data.json
[
{
"id": "libri_001",
"text": "He had never seen such a beautiful sunrise over the mountains, and he stood there for a long time watching the colors change from deep purple to gold.",
"audio_url": "audio/sample_001.wav"
},
{
"id": "libri_002",
"text": "The old professor adjusted his spectacles and began to read aloud from the yellowed pages of the ancient manuscript that had been discovered in the library's basement.",
"audio_url": "audio/sample_002.wav"
}
]
// ... and 8 more itemsObtener este diseño
Clone or download from the repository
Inicio rápido:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/audio/librispeech-transcription potato start config.yaml
Detalles
Tipos de anotación
Dominio
Casos de uso
Etiquetas
¿Encontró un problema o desea mejorar este diseño?
Abrir un issueDiseños relacionados
Continuous Emotion Rating
Rate emotional dimensions (valence, arousal, dominance) continuously following MSP-IMPROV protocol.
Speech Commands - Keyword Recognition
Speech command keyword recognition and quality assessment based on the Speech Commands dataset (Warden, arXiv 2018). Annotators listen to audio clips, classify the spoken command word, and assess the audio quality.
CoVoST 2 - Speech Translation Evaluation
Speech translation quality evaluation based on the CoVoST 2 dataset (Wang et al., arXiv 2020). Annotators listen to source audio, review translations, label audio segments, and rate overall translation quality.