توسيم الصوت

تقسيم ملفات الصوت وتعيين تسميات لمناطق زمنية مع تصور الموجة الصوتية.

تتيح أداة توسيم الصوت في Potato للموسِّمين تقسيم ملفات الصوت وتعيين تسميات لمناطق زمنية من خلال واجهة قائمة على الموجة الصوتية.

الميزات

تصور الموجة الصوتية
إنشاء مقاطع زمنية
تعيين تسميات للمقاطع
عناصر تحكم التشغيل مع سرعة متغيرة
تنقل بالتكبير والتمرير
اختصارات لوحة المفاتيح
تخزين الموجة الصوتية مؤقتًا على الخادم

الإعداد الأساسي

yaml

annotation_schemes:
  - name: "speakers"
    description: "Mark when each speaker is talking"
    annotation_type: "audio_annotation"
    labels:
      - name: "Speaker 1"
        color: "#3B82F6"
      - name: "Speaker 2"
        color: "#10B981"

خيارات الإعدادات

الحقل	النوع	الافتراضي	الوصف
`name`	string	مطلوب	معرّف فريد للتوسيم
`description`	string	مطلوب	التعليمات المعروضة للموسِّمين
`annotation_type`	string	مطلوب	يجب أن يكون `"audio_annotation"`
`mode`	string	`"label"`	وضع التوسيم: `"label"` أو `"questions"` أو `"both"`
`labels`	list	مشروط	مطلوب لوضعي `label` أو `both`
`segment_schemes`	list	مشروط	مطلوب لوضعي `questions` أو `both`
`min_segments`	integer	0	الحد الأدنى للمقاطع المطلوبة
`max_segments`	integer	null	الحد الأقصى للمقاطع المسموحة (null = غير محدود)
`zoom_enabled`	boolean	true	تفعيل عناصر التكبير
`playback_rate_control`	boolean	false	عرض محدد سرعة التشغيل

إعداد التسميات

yaml

labels:
  - name: "speech"
    color: "#3B82F6"
    key_value: "1"
  - name: "music"
    color: "#10B981"
    key_value: "2"
  - name: "silence"
    color: "#64748B"
    key_value: "3"

أوضاع التوسيم

وضع التسميات (الافتراضي)

تحصل المقاطع على تسميات فئوية:

yaml

annotation_schemes:
  - name: "emotion"
    description: "Label the emotion in each segment"
    annotation_type: "audio_annotation"
    mode: "label"
    labels:
      - name: "happy"
        color: "#22C55E"
      - name: "sad"
        color: "#3B82F6"
      - name: "angry"
        color: "#EF4444"
      - name: "neutral"
        color: "#64748B"

وضع الأسئلة

كل مقطع يجيب على أسئلة مخصصة:

yaml

annotation_schemes:
  - name: "transcription"
    description: "Transcribe each segment"
    annotation_type: "audio_annotation"
    mode: "questions"
    segment_schemes:
      - name: "transcript"
        annotation_type: "text"
        description: "Enter the transcription"
      - name: "confidence"
        annotation_type: "likert"
        description: "How confident are you?"
        size: 5

الوضع المزدوج

يجمع بين التوسيم واستبيانات لكل مقطع:

yaml

annotation_schemes:
  - name: "detailed_diarization"
    description: "Label speakers and add notes"
    annotation_type: "audio_annotation"
    mode: "both"
    labels:
      - name: "Speaker A"
        color: "#3B82F6"
      - name: "Speaker B"
        color: "#10B981"
    segment_schemes:
      - name: "notes"
        annotation_type: "text"
        description: "Any notes about this segment?"

إعدادات الصوت العامة

إعداد معالجة الموجة الصوتية في ملف الإعدادات:

yaml

audio_annotation:
  waveform_cache_dir: "waveform_cache/"
  waveform_look_ahead: 5
  waveform_cache_max_size: 1000
  client_fallback_max_duration: 1800

الحقل	الوصف
`waveform_cache_dir`	مجلد بيانات الموجة الصوتية المخزنة مؤقتًا
`waveform_look_ahead`	عدد العناصر القادمة لحسابها مسبقًا
`waveform_cache_max_size`	الحد الأقصى لملفات الموجة الصوتية المخزنة مؤقتًا
`client_fallback_max_duration`	الحد الأقصى بالثواني لتوليد الموجة الصوتية في المتصفح (الافتراضي: 1800)

أمثلة

تحديد المتحدثين

yaml

annotation_schemes:
  - name: "diarization"
    description: "Identify who is speaking at each moment"
    annotation_type: "audio_annotation"
    mode: "label"
    labels:
      - name: "Interviewer"
        color: "#8B5CF6"
        key_value: "1"
      - name: "Guest"
        color: "#EC4899"
        key_value: "2"
      - name: "Overlap"
        color: "#F59E0B"
        key_value: "3"
    zoom_enabled: true
    playback_rate_control: true

كشف أحداث الصوت

yaml

annotation_schemes:
  - name: "sound_events"
    description: "Mark all sound events"
    annotation_type: "audio_annotation"
    labels:
      - name: "speech"
        color: "#3B82F6"
      - name: "music"
        color: "#10B981"
      - name: "applause"
        color: "#F59E0B"
      - name: "laughter"
        color: "#EC4899"
      - name: "silence"
        color: "#64748B"
    min_segments: 1

مراجعة النسخ النصية

yaml

annotation_schemes:
  - name: "transcription_review"
    description: "Review and correct the transcription for each segment"
    annotation_type: "audio_annotation"
    mode: "questions"
    segment_schemes:
      - name: "transcript"
        annotation_type: "text"
        description: "Enter or correct the transcription"
        textarea: true
      - name: "quality"
        annotation_type: "radio"
        description: "Audio quality"
        labels:
          - "Clear"
          - "Noisy"
          - "Unintelligible"

اختصارات لوحة المفاتيح

المفتاح	الإجراء
`Space`	تشغيل/إيقاف مؤقت
`←` / `→`	تقديم/ترجيع
`[`	تحديد بداية المقطع
`]`	تحديد نهاية المقطع
`Enter`	إنشاء مقطع
`Delete`	حذف المقطع المحدد
`1-9`	اختيار التسمية
`+` / `-`	تكبير/تصغير
`0`	ملاءمة العرض

صيغة البيانات

بيانات الإدخال

يجب أن يتضمن ملف البيانات مسارات ملفات الصوت أو عناوين URL:

json

[
  {
    "id": "audio_001",
    "audio_url": "https://example.com/audio/recording1.mp3"
  },
  {
    "id": "audio_002",
    "audio_url": "/data/audio/recording2.wav"
  }
]

إعداد حقل الصوت:

yaml

item_properties:
  id_key: id
  text_key: audio_url

صيغة المخرجات

json

{
  "id": "audio_001",
  "annotations": {
    "diarization": [
      {
        "start": 0.0,
        "end": 5.5,
        "label": "Interviewer"
      },
      {
        "start": 5.5,
        "end": 12.3,
        "label": "Guest"
      },
      {
        "start": 12.3,
        "end": 14.0,
        "label": "Overlap"
      }
    ]
  }
}

لوضع الأسئلة، تتضمن المقاطع ردودًا متداخلة:

json

{
  "start": 0.0,
  "end": 5.5,
  "transcript": "Hello and welcome to the show.",
  "quality": "Clear"
}

صيغ الصوت المدعومة

MP3 (موصى به)
WAV
OGG
M4A

أفضل الممارسات

خزّن الموجات الصوتية مسبقًا - استخدم التخزين المؤقت على الخادم لمجموعات البيانات الكبيرة
فعّل التحكم في التشغيل - السرعة المتغيرة تساعد في التقسيم الدقيق
استخدم اختصارات لوحة المفاتيح - أسرع بكثير من النقر
حدد حدودًا واضحة - حدد ما يشكل بداية/نهاية المقطع
اختر الوضع المناسب - استخدم "label" للتصنيف و"questions" للتوسيم المفصل
حدد حدود المقاطع - استخدم min_segments لضمان التغطية