Speaker Diarization

Segment and label speakers in multi-party conversations following AMI Meeting Corpus guidelines.

Configuration Fileconfig.yaml

annotation_task_name: "Speaker Diarization"

port: 8000

# Data configuration
data_files:
  - "data/meeting_segments.json"

item_properties:
  id_key: id
  text_key: text

# Annotation schemes
annotation_schemes:
  # Number of speakers
  - annotation_type: radio
    name: num_speakers
    description: "How many distinct speakers are in this segment?"
    labels:
      - "1 speaker"
      - "2 speakers"
      - "3 speakers"
      - "4 speakers"
      - "5+ speakers"

  # Speaker turns (span annotation would be ideal here)
  # For now using structured fields
  - annotation_type: text
    name: speaker_turns
    description: "List speaker turns with timestamps (e.g., '0:00-0:15 Speaker A, 0:15-0:30 Speaker B')"
    textarea: true
    placeholder: "0:00-0:15 Speaker A\n0:15-0:30 Speaker B\n..."

  # Overlapping speech
  - annotation_type: radio
    name: overlap_present
    description: "Is there overlapping speech (multiple speakers at once)?"
    labels:
      - No overlap
      - Minor overlap (brief interruptions)
      - Moderate overlap
      - Significant overlap (hard to distinguish)

  # Speaker characteristics
  - annotation_type: multiselect
    name: speaker_types
    description: "What types of speakers are present? (Select all)"
    labels:
      - Male adult
      - Female adult
      - Child
      - Elderly
      - Non-native speaker
      - Cannot determine

  # Audio events
  - annotation_type: multiselect
    name: audio_events
    description: "Non-speech events present (select all)"
    labels:
      - Laughter
      - Coughing
      - Throat clearing
      - Paper rustling
      - Typing
      - Door sounds
      - Phone ringing
      - Background music
      - Extended silence
      - None

  # Recording quality
  - annotation_type: likert
    name: recording_quality
    description: "How clear is the recording for speaker identification?"
    size: 5
    min_label: "Very poor"
    max_label: "Excellent"

  # Difficulty
  - annotation_type: likert
    name: difficulty
    description: "How difficult was it to identify speaker turns?"
    size: 5
    min_label: "Very easy"
    max_label: "Very difficult"

# User settings
allow_all_users: true
instances_per_annotator: 50

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir speaker-diarization
cd speaker-diarization
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

spanradio

Domain

AudioSpeech

Use Cases

speaker diarizationmeeting transcriptionconversation analysis

Related Designs

DISPLACE 2024 - Speaker and Language Diarization

Speaker and language diarization in multilingual conversational audio. Annotators mark speaker turn boundaries, identify speakers, and label the language of each segment in conversational environments (Kundu et al., INTERSPEECH 2024).

radiospan

ToBI Prosodic Annotation

Multi-tier prosodic annotation following the Tones and Break Indices (ToBI) framework. Annotators label pitch accents, phrase accents, boundary tones, and break indices on speech utterances, producing a layered prosodic transcription aligned to the audio timeline (Silverman et al., Speech Communication 1992).

spanradio

Adverse Drug Event Extraction (CADEC)

Named entity recognition for adverse drug events from patient-reported experiences, based on the CADEC corpus (Karimi et al., 2015). Annotates drugs, adverse effects, symptoms, diseases, and findings from colloquial health forum posts with mapping to medical vocabularies (SNOMED-CT, MedDRA).

spanradio

Speaker Diarization

Configuration Fileconfig.yaml

Get This Design

Details

Annotation Types

Domain

Use Cases

Tags

Related Designs

DISPLACE 2024 - Speaker and Language Diarization

ToBI Prosodic Annotation

Adverse Drug Event Extraction (CADEC)