beginneraudio
Keyword Spotting
Classify spoken commands and keywords following the Google Speech Commands dataset format.
Configuration Fileconfig.yaml
annotation_task_name: "Keyword Spotting"
port: 8000
# Data configuration
data_files:
- "data/commands.json"
item_properties:
id_key: "id"
text_key: "text"
# Annotation schemes
annotation_schemes:
# Primary command classification
- annotation_type: radio
name: command
description: "What command/keyword is spoken?"
labels:
# Core commands
- name: "yes"
key_value: "y"
- name: "no"
key_value: "n"
- name: "up"
key_value: "u"
- name: "down"
key_value: "d"
- "left"
- "right"
- "on"
- "off"
- name: "stop"
key_value: "s"
- name: "go"
key_value: "g"
# Numbers
- "zero"
- "one"
- "two"
- "three"
- "four"
- "five"
- "six"
- "seven"
- "eight"
- "nine"
# Special categories
- name: "unknown_word"
key_value: "w"
- name: "silence"
key_value: "i"
- name: "noise"
key_value: "x"
sequential_key_binding: true
# Clarity rating
- annotation_type: radio
name: clarity
description: "How clearly is the command spoken?"
labels:
- Very clear (unmistakable)
- Clear (confident identification)
- Somewhat unclear (some ambiguity)
- Unclear (difficult to identify)
- Cannot determine
# Speaker characteristics
- annotation_type: radio
name: speaker_type
description: "Speaker characteristics (if discernible)"
labels:
- Adult male
- Adult female
- Child
- Cannot determine
# Audio quality
- annotation_type: radio
name: audio_quality
description: "Recording quality"
labels:
- Clean (no noise)
- Light background noise
- Moderate noise
- Heavy noise
- Distorted/clipped
# Accent/pronunciation
- annotation_type: radio
name: pronunciation
description: "Pronunciation characteristics"
labels:
- Standard/neutral
- Regional accent (still clear)
- Strong accent (affects clarity)
- Non-native speaker
- Cannot assess
# Confidence
- annotation_type: likert
name: confidence
description: "How confident are you in your classification?"
size: 5
min_label: "Guessing"
max_label: "Certain"
# User settings
allow_all_users: true
instances_per_annotator: 500 # Short clips allow more
# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
Get This Design
This design is available in our showcase. Copy the configuration below to get started.
Quick start:
# Create your project folder mkdir keyword-spotting cd keyword-spotting # Copy config.yaml from above potato start config.yaml
Details
Annotation Types
radio
Domain
AudioSpeech
Use Cases
keyword spottingvoice commandswake word detection
Tags
audiospeech commandskeyword spottingvoice controlwake word
Related Designs
Audio Transcription Review
Review and correct automatic speech recognition transcriptions with waveform visualization.
likertmultiselect
Audio-Visual Sentiment Analysis
Rate sentiment in speech segments following CMU-MOSI and CMU-MOSEI multimodal annotation protocols.
likertradio
Speech Commands - Keyword Recognition
Speech command keyword recognition and quality assessment based on the Speech Commands dataset (Warden, arXiv 2018). Annotators listen to audio clips, classify the spoken command word, and assess the audio quality.
radioaudio_annotation