Speaker Diarization
Segment and label speakers in multi-party conversations following AMI Meeting Corpus guidelines.
audio annotation
Configuration Fileconfig.yaml
task_name: "Speaker Diarization"
# Server configuration
server:
port: 8000
# Audio settings
audio:
enabled: true
display: waveform
waveform_color: "#8B5CF6"
progress_color: "#A78BFA"
speed_control: true
speed_options: [0.5, 0.75, 1.0, 1.5]
keyboard_controls:
play_pause: "space"
rewind_5s: "left"
forward_5s: "right"
# Data configuration
data_files:
- path: data/meeting_segments.json
audio_field: audio_file
# Annotation schemes
annotation_schemes:
# Number of speakers
- annotation_type: radio
name: num_speakers
description: "How many distinct speakers are in this segment?"
labels:
- "1 speaker"
- "2 speakers"
- "3 speakers"
- "4 speakers"
- "5+ speakers"
# Speaker turns (span annotation would be ideal here)
# For now using structured fields
- annotation_type: text
name: speaker_turns
description: "List speaker turns with timestamps (e.g., '0:00-0:15 Speaker A, 0:15-0:30 Speaker B')"
textarea: true
placeholder: "0:00-0:15 Speaker A\n0:15-0:30 Speaker B\n..."
# Overlapping speech
- annotation_type: radio
name: overlap_present
description: "Is there overlapping speech (multiple speakers at once)?"
labels:
- No overlap
- Minor overlap (brief interruptions)
- Moderate overlap
- Significant overlap (hard to distinguish)
# Speaker characteristics
- annotation_type: multiselect
name: speaker_types
description: "What types of speakers are present? (Select all)"
labels:
- Male adult
- Female adult
- Child
- Elderly
- Non-native speaker
- Cannot determine
# Audio events
- annotation_type: multiselect
name: audio_events
description: "Non-speech events present (select all)"
labels:
- Laughter
- Coughing
- Throat clearing
- Paper rustling
- Typing
- Door sounds
- Phone ringing
- Background music
- Extended silence
- None
# Recording quality
- annotation_type: likert
name: recording_quality
description: "How clear is the recording for speaker identification?"
size: 5
min_label: "Very poor"
max_label: "Excellent"
# Difficulty
- annotation_type: likert
name: difficulty
description: "How difficult was it to identify speaker turns?"
size: 5
min_label: "Very easy"
max_label: "Very difficult"
# User settings
allow_all_users: true
instances_per_annotator: 50
# Output
output:
path: annotations/
format: json
Get This Design
This design is available in our showcase. Copy the configuration below to get started.
Quick start:
# Create your project folder mkdir speaker-diarization cd speaker-diarization # Copy config.yaml from above potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Related Designs
Adverse Drug Event Extraction (CADEC)
Named entity recognition for adverse drug events from patient-reported experiences, based on the CADEC corpus (Karimi et al., 2015). Annotates drugs, adverse effects, symptoms, diseases, and findings from colloquial health forum posts with mapping to medical vocabularies (SNOMED-CT, MedDRA).
Chemical-Disease Relation Extraction (BC5CDR)
Extract chemical-disease relations from biomedical literature. Based on BioCreative V CDR task. Identify chemical and disease entities, then annotate causal relationships between them (chemical induces disease).
Coreference Resolution (OntoNotes)
Link pronouns and noun phrases to the entities they refer to in text. Based on the OntoNotes coreference annotation guidelines and CoNLL shared tasks. Identify mention spans and cluster coreferent mentions together.