Speaker Diarization
Segment and label speakers in multi-party conversations following AMI Meeting Corpus guidelines.
Configuration Fileconfig.yaml
annotation_task_name: "Speaker Diarization"
port: 8000
# Data configuration
data_files:
- "data/meeting_segments.json"
item_properties:
id_key: id
text_key: text
# Annotation schemes
annotation_schemes:
# Number of speakers
- annotation_type: radio
name: num_speakers
description: "How many distinct speakers are in this segment?"
labels:
- "1 speaker"
- "2 speakers"
- "3 speakers"
- "4 speakers"
- "5+ speakers"
# Speaker turns (span annotation would be ideal here)
# For now using structured fields
- annotation_type: text
name: speaker_turns
description: "List speaker turns with timestamps (e.g., '0:00-0:15 Speaker A, 0:15-0:30 Speaker B')"
textarea: true
placeholder: "0:00-0:15 Speaker A\n0:15-0:30 Speaker B\n..."
# Overlapping speech
- annotation_type: radio
name: overlap_present
description: "Is there overlapping speech (multiple speakers at once)?"
labels:
- No overlap
- Minor overlap (brief interruptions)
- Moderate overlap
- Significant overlap (hard to distinguish)
# Speaker characteristics
- annotation_type: multiselect
name: speaker_types
description: "What types of speakers are present? (Select all)"
labels:
- Male adult
- Female adult
- Child
- Elderly
- Non-native speaker
- Cannot determine
# Audio events
- annotation_type: multiselect
name: audio_events
description: "Non-speech events present (select all)"
labels:
- Laughter
- Coughing
- Throat clearing
- Paper rustling
- Typing
- Door sounds
- Phone ringing
- Background music
- Extended silence
- None
# Recording quality
- annotation_type: likert
name: recording_quality
description: "How clear is the recording for speaker identification?"
size: 5
min_label: "Very poor"
max_label: "Excellent"
# Difficulty
- annotation_type: likert
name: difficulty
description: "How difficult was it to identify speaker turns?"
size: 5
min_label: "Very easy"
max_label: "Very difficult"
# User settings
allow_all_users: true
instances_per_annotator: 50
# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
Get This Design
This design is available in our showcase. Copy the configuration below to get started.
Quick start:
# Create your project folder mkdir speaker-diarization cd speaker-diarization # Copy config.yaml from above potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Related Designs
DISPLACE 2024 - Speaker and Language Diarization
Speaker and language diarization in multilingual conversational audio. Annotators mark speaker turn boundaries, identify speakers, and label the language of each segment in conversational environments (Kundu et al., INTERSPEECH 2024).
ToBI Prosodic Annotation
Multi-tier prosodic annotation following the Tones and Break Indices (ToBI) framework. Annotators label pitch accents, phrase accents, boundary tones, and break indices on speech utterances, producing a layered prosodic transcription aligned to the audio timeline (Silverman et al., Speech Communication 1992).
Adverse Drug Event Extraction (CADEC)
Named entity recognition for adverse drug events from patient-reported experiences, based on the CADEC corpus (Karimi et al., 2015). Annotates drugs, adverse effects, symptoms, diseases, and findings from colloquial health forum posts with mapping to medical vocabularies (SNOMED-CT, MedDRA).