Skip to content
Showcase/Keyword Spotting
beginneraudio

Keyword Spotting

Classify spoken commands and keywords following the Google Speech Commands dataset format.

1:42Classify this audio:HappySadAngryNeutralSubmit

Configuration Fileconfig.yaml

annotation_task_name: "Keyword Spotting"

port: 8000

# Data configuration
data_files:
  - "data/commands.json"

item_properties:
  id_key: "id"
  text_key: "text"

# Annotation schemes
annotation_schemes:
  # Primary command classification
  - annotation_type: radio
    name: command
    description: "What command/keyword is spoken?"
    labels:
      # Core commands
      - name: "yes"
        key_value: "y"
      - name: "no"
        key_value: "n"
      - name: "up"
        key_value: "u"
      - name: "down"
        key_value: "d"
      - "left"
      - "right"
      - "on"
      - "off"
      - name: "stop"
        key_value: "s"
      - name: "go"
        key_value: "g"
      # Numbers
      - "zero"
      - "one"
      - "two"
      - "three"
      - "four"
      - "five"
      - "six"
      - "seven"
      - "eight"
      - "nine"
      # Special categories
      - name: "unknown_word"
        key_value: "w"
      - name: "silence"
        key_value: "i"
      - name: "noise"
        key_value: "x"
    sequential_key_binding: true

  # Clarity rating
  - annotation_type: radio
    name: clarity
    description: "How clearly is the command spoken?"
    labels:
      - Very clear (unmistakable)
      - Clear (confident identification)
      - Somewhat unclear (some ambiguity)
      - Unclear (difficult to identify)
      - Cannot determine

  # Speaker characteristics
  - annotation_type: radio
    name: speaker_type
    description: "Speaker characteristics (if discernible)"
    labels:
      - Adult male
      - Adult female
      - Child
      - Cannot determine

  # Audio quality
  - annotation_type: radio
    name: audio_quality
    description: "Recording quality"
    labels:
      - Clean (no noise)
      - Light background noise
      - Moderate noise
      - Heavy noise
      - Distorted/clipped

  # Accent/pronunciation
  - annotation_type: radio
    name: pronunciation
    description: "Pronunciation characteristics"
    labels:
      - Standard/neutral
      - Regional accent (still clear)
      - Strong accent (affects clarity)
      - Non-native speaker
      - Cannot assess

  # Confidence
  - annotation_type: likert
    name: confidence
    description: "How confident are you in your classification?"
    size: 5
    min_label: "Guessing"
    max_label: "Certain"

# User settings
allow_all_users: true
instances_per_annotator: 500  # Short clips allow more

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir keyword-spotting
cd keyword-spotting
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

radio

Domain

AudioSpeech

Use Cases

keyword spottingvoice commandswake word detection

Tags

audiospeech commandskeyword spottingvoice controlwake word