Showcase/Speaker Diarization
advancedaudio

Speaker Diarization

Segment and label speakers in multi-party conversations following AMI Meeting Corpus guidelines.

🎧

audio annotation

Configuration Fileconfig.yaml

task_name: "Speaker Diarization"

# Server configuration
server:
  port: 8000

# Audio settings
audio:
  enabled: true
  display: waveform
  waveform_color: "#8B5CF6"
  progress_color: "#A78BFA"
  speed_control: true
  speed_options: [0.5, 0.75, 1.0, 1.5]
  keyboard_controls:
    play_pause: "space"
    rewind_5s: "left"
    forward_5s: "right"

# Data configuration
data_files:
  - path: data/meeting_segments.json
    audio_field: audio_file

# Annotation schemes
annotation_schemes:
  # Number of speakers
  - annotation_type: radio
    name: num_speakers
    description: "How many distinct speakers are in this segment?"
    labels:
      - "1 speaker"
      - "2 speakers"
      - "3 speakers"
      - "4 speakers"
      - "5+ speakers"

  # Speaker turns (span annotation would be ideal here)
  # For now using structured fields
  - annotation_type: text
    name: speaker_turns
    description: "List speaker turns with timestamps (e.g., '0:00-0:15 Speaker A, 0:15-0:30 Speaker B')"
    textarea: true
    placeholder: "0:00-0:15 Speaker A\n0:15-0:30 Speaker B\n..."

  # Overlapping speech
  - annotation_type: radio
    name: overlap_present
    description: "Is there overlapping speech (multiple speakers at once)?"
    labels:
      - No overlap
      - Minor overlap (brief interruptions)
      - Moderate overlap
      - Significant overlap (hard to distinguish)

  # Speaker characteristics
  - annotation_type: multiselect
    name: speaker_types
    description: "What types of speakers are present? (Select all)"
    labels:
      - Male adult
      - Female adult
      - Child
      - Elderly
      - Non-native speaker
      - Cannot determine

  # Audio events
  - annotation_type: multiselect
    name: audio_events
    description: "Non-speech events present (select all)"
    labels:
      - Laughter
      - Coughing
      - Throat clearing
      - Paper rustling
      - Typing
      - Door sounds
      - Phone ringing
      - Background music
      - Extended silence
      - None

  # Recording quality
  - annotation_type: likert
    name: recording_quality
    description: "How clear is the recording for speaker identification?"
    size: 5
    min_label: "Very poor"
    max_label: "Excellent"

  # Difficulty
  - annotation_type: likert
    name: difficulty
    description: "How difficult was it to identify speaker turns?"
    size: 5
    min_label: "Very easy"
    max_label: "Very difficult"

# User settings
allow_all_users: true
instances_per_annotator: 50

# Output
output:
  path: annotations/
  format: json

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir speaker-diarization
cd speaker-diarization
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

spanradio

Domain

AudioSpeech

Use Cases

speaker diarizationmeeting transcriptionconversation analysis

Tags

audiospeaker diarizationmeetingconversationsegmentation