Skip to content
Showcase/DISPLACE 2024 - Speaker and Language Diarization
advancedaudio

DISPLACE 2024 - Speaker and Language Diarization

Speaker and language diarization in multilingual conversational audio. Annotators mark speaker turn boundaries, identify speakers, and label the language of each segment in conversational environments (Kundu et al., INTERSPEECH 2024).

Speaker ASpeaker BSpeaker A00:0002:34Speaker A (2)Speaker B (1)Select spans on waveform

Configuration Fileconfig.yaml

# DISPLACE 2024 - Speaker and Language Diarization
# Based on Kundu et al., INTERSPEECH 2024
# Paper: https://www.isca-archive.org/interspeech_2024/kundu24_interspeech.html
# Dataset: https://displace2024.github.io/
#
# Task: Speaker and language diarization in multilingual conversational audio.
# Mark speaker turn boundaries, identify speakers, and label language per segment.
#
# Guidelines:
# - Listen to the full conversation to identify distinct speakers
# - Mark temporal boundaries where speaker changes occur
# - Assign consistent speaker labels throughout the conversation
# - Identify the language spoken in each segment
# - Note any overlapping speech or code-switching

annotation_task_name: "DISPLACE 2024: Speaker and Language Diarization"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "audio_url"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - annotation_type: span
    name: speaker_segments
    description: "Mark temporal speaker segments with start and end times (in seconds)"
    span_mode: temporal
    labels:
      - name: "Speaker 1"
        color: "#3B82F6"
        tooltip: "First identified speaker"
      - name: "Speaker 2"
        color: "#EF4444"
        tooltip: "Second identified speaker"
      - name: "Speaker 3"
        color: "#10B981"
        tooltip: "Third identified speaker"
      - name: "Speaker 4"
        color: "#F59E0B"
        tooltip: "Fourth identified speaker"
      - name: "Speaker 5"
        color: "#8B5CF6"
        tooltip: "Fifth identified speaker"
      - name: "Overlap"
        color: "#6B7280"
        tooltip: "Multiple speakers talking simultaneously"

  - annotation_type: radio
    name: speaker_identity
    description: "Identify the current speaker for the selected segment"
    labels:
      - name: "Speaker 1"
        tooltip: "First/primary speaker in the conversation"
        key_value: "1"
      - name: "Speaker 2"
        tooltip: "Second speaker in the conversation"
        key_value: "2"
      - name: "Speaker 3"
        tooltip: "Third speaker (if present)"
        key_value: "3"
      - name: "Speaker 4"
        tooltip: "Fourth speaker (if present)"
        key_value: "4"
      - name: "Speaker 5"
        tooltip: "Fifth speaker (if present)"
        key_value: "5"
      - name: "Unknown"
        tooltip: "Cannot determine the speaker"
        key_value: "0"

  - annotation_type: radio
    name: language
    description: "What language is spoken in this segment?"
    labels:
      - name: "Hindi"
        tooltip: "Speaker is using Hindi"
        key_value: "h"
      - name: "English"
        tooltip: "Speaker is using English"
        key_value: "e"
      - name: "Tamil"
        tooltip: "Speaker is using Tamil"
        key_value: "t"
      - name: "Telugu"
        tooltip: "Speaker is using Telugu"
        key_value: "l"
      - name: "Kannada"
        tooltip: "Speaker is using Kannada"
        key_value: "k"
      - name: "Code-switched"
        tooltip: "Speaker switches between languages within this segment"
        key_value: "c"
      - name: "Other"
        tooltip: "Language not listed above"
        key_value: "o"

  - annotation_type: radio
    name: audio_quality
    description: "Rate the audio quality for this segment"
    labels:
      - name: "Clear"
        tooltip: "Audio is clear and easy to understand"
      - name: "Acceptable"
        tooltip: "Some noise but speech is understandable"
      - name: "Poor"
        tooltip: "Significant noise or distortion"
      - name: "Unintelligible"
        tooltip: "Cannot understand the speech"

audio_display:
  show_waveform: true
  playback_controls: true
  allow_speed_control: true
  show_spectrogram: true

allow_all_users: true
instances_per_annotator: 30
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

[
  {
    "id": "displace_001",
    "audio_url": "https://example.com/audio/displace/conversation_001.wav",
    "duration": 45.2,
    "num_speakers": 3,
    "languages": [
      "Hindi",
      "English"
    ]
  },
  {
    "id": "displace_002",
    "audio_url": "https://example.com/audio/displace/conversation_002.wav",
    "duration": 62.8,
    "num_speakers": 2,
    "languages": [
      "Tamil",
      "English"
    ]
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/audio/displace-speaker-diarization
potato start config.yaml

Details

Annotation Types

radiospan

Domain

Speech ProcessingSpeaker Diarization

Use Cases

Speaker IdentificationLanguage IdentificationConversation Analysis

Tags

audiodiarizationspeakerlanguage-identificationmultilingualinterspeech2024

Found an issue or want to improve this design?

Open an Issue