Showcase/Keyword Spotting
beginneraudio

Keyword Spotting

Classify spoken commands and keywords following the Google Speech Commands dataset format.

🎧

audio annotation

Configuration Fileconfig.yaml

task_name: "Keyword Spotting"

# Server configuration
server:
  port: 8000

# Audio settings
audio:
  enabled: true
  display: waveform
  waveform_color: "#22C55E"
  progress_color: "#4ADE80"
  speed_control: false  # Short clips, normal speed only

# Data configuration
data_files:
  - path: data/commands.json
    audio_field: audio_file

# Annotation schemes
annotation_schemes:
  # Primary command classification
  - annotation_type: radio
    name: command
    description: "What command/keyword is spoken?"
    labels:
      # Core commands
      - "yes"
      - "no"
      - "up"
      - "down"
      - "left"
      - "right"
      - "on"
      - "off"
      - "stop"
      - "go"
      # Numbers
      - "zero"
      - "one"
      - "two"
      - "three"
      - "four"
      - "five"
      - "six"
      - "seven"
      - "eight"
      - "nine"
      # Special categories
      - "unknown_word"  # A word but not in command set
      - "silence"       # No speech
      - "noise"         # Background noise only
    keyboard_shortcuts:
      "yes": "y"
      "no": "n"
      "up": "u"
      "down": "d"
      "stop": "s"
      "go": "g"
      "unknown_word": "w"
      "silence": "i"
      "noise": "x"

  # Clarity rating
  - annotation_type: radio
    name: clarity
    description: "How clearly is the command spoken?"
    labels:
      - Very clear (unmistakable)
      - Clear (confident identification)
      - Somewhat unclear (some ambiguity)
      - Unclear (difficult to identify)
      - Cannot determine

  # Speaker characteristics
  - annotation_type: radio
    name: speaker_type
    description: "Speaker characteristics (if discernible)"
    labels:
      - Adult male
      - Adult female
      - Child
      - Cannot determine

  # Audio quality
  - annotation_type: radio
    name: audio_quality
    description: "Recording quality"
    labels:
      - Clean (no noise)
      - Light background noise
      - Moderate noise
      - Heavy noise
      - Distorted/clipped

  # Accent/pronunciation
  - annotation_type: radio
    name: pronunciation
    description: "Pronunciation characteristics"
    labels:
      - Standard/neutral
      - Regional accent (still clear)
      - Strong accent (affects clarity)
      - Non-native speaker
      - Cannot assess

  # Confidence
  - annotation_type: likert
    name: confidence
    description: "How confident are you in your classification?"
    size: 5
    min_label: "Guessing"
    max_label: "Certain"

# User settings
allow_all_users: true
instances_per_annotator: 500  # Short clips allow more

# Output
output:
  path: annotations/
  format: json

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir keyword-spotting
cd keyword-spotting
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

radio

Domain

AudioSpeech

Use Cases

keyword spottingvoice commandswake word detection

Tags

audiospeech commandskeyword spottingvoice controlwake word