音频事件检测用于识别录音中的特定声音——从语音和音乐到环境声和声学事件。本教程涵盖了基于时间戳的标注方法，用于训练声音识别模型。

音频事件标注类型

片段级标记：标注整个音频片段
时间检测：标记事件的开始/结束时间
强标注：每个事件的精确时间戳
弱标注：仅标记有/无，不含时间戳

片段级声音标记

对于包含单个事件的短片段：

yaml

annotation_task_name: "Sound Event Classification"
 
data_files:
  - data/audio_clips.json
 
item_properties:
  audio_path: audio_path
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    waveform_color: "#10B981"
    progress_color: "#34D399"
    name: sound_class
    description: "What sound is in this clip?"
    labels:
      - Dog bark
      - Car horn
      - Siren
      - Music
      - Speech
      - Footsteps
      - Door knock
      - Glass breaking
      - Gunshot
      - Baby cry
      - Other
      - Silence/noise only

时间维度声音事件检测

标记事件发生的时间：

yaml

annotation_task_name: "Sound Event Detection"
 
data_files:
  - data/recordings.json
 
item_properties:
  audio_path: audio_path
 
annotation_schemes:
  - annotation_type: audio_annotation
    audio_display: waveform
    height: 150
    waveform_color: "#6366F1"
    progress_color: "#A5B4FC"
    show_timestamps: true
    enable_regions: true
    speed_control: true
    name: events
    description: "Mark all sound events with timestamps"
    labels:
      - name: speech
        color: "#3B82F6"
      - name: music
        color: "#8B5CF6"
      - name: vehicle
        color: "#EF4444"
      - name: animal
        color: "#F59E0B"
      - name: nature
        color: "#10B981"
      - name: mechanical
        color: "#6B7280"
    allow_overlap: true
    min_duration: 0.1

完整音频事件配置

yaml

annotation_task_name: "AudioSet-Style Event Detection"
 
data_files:
  - data/audio_10sec.json
 
item_properties:
  audio_path: audio_url
 
annotation_schemes:
  # Temporal event marking with audio playback
  - annotation_type: audio_annotation
    audio_display: waveform
    waveform_color: "#059669"
    progress_color: "#34D399"
    cursor_color: "#F59E0B"
    height: 128
    show_timestamps: true
    time_format: "ss.ms"
    show_duration: true
    speed_control: true
    speed_options: [0.5, 0.75, 1.0, 1.5]
    enable_regions: true
    region_snap: 0.05
    name: sound_events
    description: "Mark all distinct sound events"
    labels:
      # Human sounds
      - name: Speech
        color: "#3B82F6"
        keyboard_shortcut: "1"
        category: human
      - name: Singing
        color: "#8B5CF6"
        keyboard_shortcut: "2"
        category: human
      - name: Laughter
        color: "#EC4899"
        category: human
      - name: Cough/Sneeze
        color: "#F472B6"
        category: human
 
      # Music
      - name: Music
        color: "#A855F7"
        keyboard_shortcut: "m"
        category: music
      - name: Musical instrument
        color: "#7C3AED"
        category: music
 
      # Animals
      - name: Dog
        color: "#F59E0B"
        keyboard_shortcut: "d"
        category: animal
      - name: Cat
        color: "#FBBF24"
        category: animal
      - name: Bird
        color: "#FCD34D"
        category: animal
 
      # Vehicles
      - name: Car
        color: "#EF4444"
        keyboard_shortcut: "c"
        category: vehicle
      - name: Motorcycle
        color: "#DC2626"
        category: vehicle
      - name: Siren
        color: "#B91C1C"
        category: vehicle
      - name: Aircraft
        color: "#991B1B"
        category: vehicle
 
      # Environment
      - name: Rain
        color: "#06B6D4"
        category: nature
      - name: Thunder
        color: "#0891B2"
        category: nature
      - name: Wind
        color: "#0E7490"
        category: nature
      - name: Water
        color: "#0D9488"
        category: nature
 
      # Domestic
      - name: Door
        color: "#84CC16"
        category: domestic
      - name: Alarm
        color: "#65A30D"
        category: domestic
      - name: Appliance
        color: "#4D7C0F"
        category: domestic
 
      # Other
      - name: Noise/Unknown
        color: "#6B7280"
        keyboard_shortcut: "n"
        category: other
 
    allow_overlap: true
    min_duration: 0.1
    show_labels_on_waveform: true
 
    # Segment attributes
    segment_attributes:
      - name: confidence
        type: radio
        options: [Clear, Moderate, Faint]
      - name: foreground
        type: checkbox
        description: "Is this the main/foreground sound?"
 
  # Clip-level tags (weak labels)
  - annotation_type: multiselect
    name: clip_tags
    description: "What sounds are present anywhere in this clip?"
    labels:
      - Speech
      - Music
      - Vehicle sounds
      - Animal sounds
      - Nature sounds
      - Domestic sounds
      - Silence
    min_selections: 1
 
  # Audio quality
  - annotation_type: radio
    name: quality
    description: "Recording quality"
    labels:
      - Clean (clear sounds)
      - Moderate noise
      - Very noisy
      - Distorted/clipped
 
annotation_guidelines:
  title: "Sound Event Detection Guide"
  content: |
    ## Your Task
    Mark the START and END times of each distinct sound event.
 
    ## Event Detection Rules
    - Mark sounds that are clearly audible
    - Include overlapping sounds (use multiple labels)
    - Short sounds (<100ms) may be a single point
 
    ## Segment Boundaries
    - Start: When sound becomes audible
    - End: When sound fades or stops
 
    ## Confidence Levels
    - Clear: Easily identifiable
    - Moderate: Reasonably sure
    - Faint: Background, hard to identify
 
    ## Foreground vs Background
    - Foreground: Main focus of audio
    - Background: Ambient sounds

输出格式

json

{
  "id": "clip_001",
  "audio_url": "/audio/street_scene.wav",
  "duration": 10.0,
  "annotations": {
    "sound_events": [
      {
        "label": "Speech",
        "start": 0.5,
        "end": 3.2,
        "attributes": {
          "confidence": "Clear",
          "foreground": true
        }
      },
      {
        "label": "Car",
        "start": 1.8,
        "end": 4.5,
        "attributes": {
          "confidence": "Moderate",
          "foreground": false
        }
      },
      {
        "label": "Dog",
        "start": 6.1,
        "end": 6.8,
        "attributes": {
          "confidence": "Clear",
          "foreground": true
        }
      }
    ],
    "clip_tags": ["Speech", "Vehicle sounds", "Animal sounds"],
    "quality": "Moderate noise"
  }
}

使用检测器进行预标注

使用模型预测作为起点：

yaml

pre_annotation:
  enabled: true
  field: detected_events
  show_confidence: true
  confidence_threshold: 0.3
  allow_modification: true

音频事件标注技巧

好的耳机：检测细微声音的必备工具
安静的环境：背景噪音会影响感知
多次聆听：第一遍识别，第二遍精确时间戳
慢速播放：使用0.5倍速精确定位边界
一致的标准：清晰定义"可听见"的阈值

下一步

添加音乐分类用于音乐内容
学习说话人分割用于语音
设置质量控制用于事件检测

完整音频文档请参阅 /docs/features/audio-annotation。