beginneraudio
Keyword Spotting
Classify spoken commands and keywords following the Google Speech Commands dataset format.
🎧
audio annotation
Configuration Fileconfig.yaml
task_name: "Keyword Spotting"
# Server configuration
server:
port: 8000
# Audio settings
audio:
enabled: true
display: waveform
waveform_color: "#22C55E"
progress_color: "#4ADE80"
speed_control: false # Short clips, normal speed only
# Data configuration
data_files:
- path: data/commands.json
audio_field: audio_file
# Annotation schemes
annotation_schemes:
# Primary command classification
- annotation_type: radio
name: command
description: "What command/keyword is spoken?"
labels:
# Core commands
- "yes"
- "no"
- "up"
- "down"
- "left"
- "right"
- "on"
- "off"
- "stop"
- "go"
# Numbers
- "zero"
- "one"
- "two"
- "three"
- "four"
- "five"
- "six"
- "seven"
- "eight"
- "nine"
# Special categories
- "unknown_word" # A word but not in command set
- "silence" # No speech
- "noise" # Background noise only
keyboard_shortcuts:
"yes": "y"
"no": "n"
"up": "u"
"down": "d"
"stop": "s"
"go": "g"
"unknown_word": "w"
"silence": "i"
"noise": "x"
# Clarity rating
- annotation_type: radio
name: clarity
description: "How clearly is the command spoken?"
labels:
- Very clear (unmistakable)
- Clear (confident identification)
- Somewhat unclear (some ambiguity)
- Unclear (difficult to identify)
- Cannot determine
# Speaker characteristics
- annotation_type: radio
name: speaker_type
description: "Speaker characteristics (if discernible)"
labels:
- Adult male
- Adult female
- Child
- Cannot determine
# Audio quality
- annotation_type: radio
name: audio_quality
description: "Recording quality"
labels:
- Clean (no noise)
- Light background noise
- Moderate noise
- Heavy noise
- Distorted/clipped
# Accent/pronunciation
- annotation_type: radio
name: pronunciation
description: "Pronunciation characteristics"
labels:
- Standard/neutral
- Regional accent (still clear)
- Strong accent (affects clarity)
- Non-native speaker
- Cannot assess
# Confidence
- annotation_type: likert
name: confidence
description: "How confident are you in your classification?"
size: 5
min_label: "Guessing"
max_label: "Certain"
# User settings
allow_all_users: true
instances_per_annotator: 500 # Short clips allow more
# Output
output:
path: annotations/
format: json
Get This Design
This design is available in our showcase. Copy the configuration below to get started.
Quick start:
# Create your project folder mkdir keyword-spotting cd keyword-spotting # Copy config.yaml from above potato start config.yaml
Details
Annotation Types
radio
Domain
AudioSpeech
Use Cases
keyword spottingvoice commandswake word detection
Tags
audiospeech commandskeyword spottingvoice controlwake word
Related Designs
Audio-Visual Sentiment Analysis
Rate sentiment in speech segments following CMU-MOSI and CMU-MOSEI multimodal annotation protocols.
likertradio
Speech Emotion Recognition
Classify emotional content in speech following IEMOCAP and CREMA-D annotation schemes.
radiolikert
Speech Intelligibility Rating
Rate speech intelligibility for pathological speech following TORGO database annotation protocols.
likertradio