مراقبة الجودة

فحوصات الانتباه والمعايير الذهبية ومقاييس اتفاق المعلّقين.

يوفر Potato ميزات شاملة لمراقبة الجودة لضمان تعليقات توضيحية عالية الجودة. يشمل ذلك فحوصات الانتباه والمعايير الذهبية ودعم التعليق التوضيحي المسبق ومقاييس الاتفاق في الوقت الفعلي.

أربع ضوابط جودة، مجموعة بيانات موثوقة

نظرة عامة

تتكون مراقبة الجودة في Potato من أربع ميزات رئيسية:

فحوصات الانتباه - التحقق من تفاعل المعلّق من خلال عناصر ذات إجابات معروفة
المعايير الذهبية - تتبع الدقة مقابل عناصر مصنفة من قبل خبراء
دعم التعليق التوضيحي المسبق - ملء النماذج مسبقًا بتوقعات النموذج
مقاييس الاتفاق - حساب اتفاق المعلّقين في الوقت الفعلي

فحوصات الانتباه

فحوصات الانتباه هي عناصر ذات إجابات صحيحة معروفة تتحقق من أن المعلّقين منتبهون ولا ينقرون بشكل عشوائي.

الإعداد

yaml

attention_checks:
  enabled: true
  items_file: "attention_checks.json"
 
  # How often to inject attention checks
  frequency: 10              # Insert one every 10 items
  # OR
  probability: 0.1           # 10% chance per item
 
  # Optional: flag suspiciously fast responses
  min_response_time: 3.0     # Flag if answered in < 3 seconds
 
  # Failure handling
  failure_handling:
    warn_threshold: 2        # Show warning after 2 failures
    warn_message: "Please read items carefully before answering."
    block_threshold: 5       # Block user after 5 failures
    block_message: "You have been blocked due to too many incorrect responses."

ملف عناصر فحص الانتباه

json

[
  {
    "id": "attn_001",
    "text": "Please select 'Positive' for this item to verify you are reading carefully.",
    "expected_answer": {
      "sentiment": "positive"
    }
  }
]

المعايير الذهبية

المعايير الذهبية هي عناصر مصنفة من قبل خبراء تُستخدم لقياس دقة المعلّق. بشكل افتراضي، تكون المعايير الذهبية صامتة - يتم تسجيل النتائج لمراجعة المسؤول، لكن المعلّقين لا يرون ملاحظات.

الإعداد

yaml

gold_standards:
  enabled: true
  items_file: "gold_standards.json"
 
  # How to use gold standards
  mode: "mixed"              # Options: training, mixed, separate
  frequency: 20              # Insert one every 20 items
 
  # Accuracy requirements
  accuracy:
    min_threshold: 0.7       # Minimum required accuracy (70%)
    evaluation_count: 10     # Evaluate after this many gold items
 
  # Feedback settings (disabled by default)
  feedback:
    show_correct_answer: false
    show_explanation: false
 
  # Auto-promotion from high-agreement items
  auto_promote:
    enabled: true
    min_annotators: 3
    agreement_threshold: 1.0   # 1.0 = unanimous

ملف عناصر المعايير الذهبية

json

[
  {
    "id": "gold_001",
    "text": "The service was absolutely terrible and I will never return.",
    "gold_label": {
      "sentiment": "negative"
    },
    "explanation": "Strong negative language clearly indicates negative sentiment.",
    "difficulty": "easy"
  }
]

الترقية التلقائية

يمكن أن تصبح العناصر معايير ذهبية تلقائيًا عندما يتفق عدة معلّقين:

yaml

gold_standards:
  auto_promote:
    enabled: true
    min_annotators: 3          # Wait for at least 3 annotators
    agreement_threshold: 1.0   # 100% must agree (unanimous)

دعم التعليق التوضيحي المسبق

يتيح التعليق التوضيحي المسبق ملء نماذج التعليق التوضيحي مسبقًا بتوقعات النموذج، وهو مفيد لسير عمل التعلم النشط والتصحيح.

الإعداد

yaml

pre_annotation:
  enabled: true
  field: "predictions"        # Field in data containing predictions
  allow_modification: true    # Can annotators change pre-filled values?
  show_confidence: true
  highlight_low_confidence: 0.7

تنسيق البيانات

قم بتضمين التوقعات في عناصر البيانات الخاصة بك:

json

{
  "id": "item_001",
  "text": "I love this product!",
  "predictions": {
    "sentiment": "positive",
    "confidence": 0.92
  }
}

مقاييس الاتفاق

تتوفر مقاييس اتفاق المعلّقين في الوقت الفعلي باستخدام معامل Krippendorff's alpha في لوحة تحكم المسؤول.

الإعداد

yaml

agreement_metrics:
  enabled: true
  min_overlap: 2             # Minimum annotators per item
  auto_refresh: true
  refresh_interval: 60       # Seconds between updates

تفسير معامل Krippendorff's Alpha

قيمة Alpha	التفسير
α ≥ 0.8	اتفاق جيد - موثوق لمعظم الأغراض
0.67 ≤ α ≤ 0.8	اتفاق مبدئي - استنتاجات مبدئية
0.33 ≤ α ≤ 0.67	اتفاق منخفض - مراجعة الإرشادات
α ≤ 0.33	اتفاق ضعيف - مشاكل كبيرة

التكامل مع لوحة تحكم المسؤول

اعرض مقاييس مراقبة الجودة في لوحة تحكم المسؤول على /admin:

فحوصات الانتباه: معدلات النجاح/الفشل الإجمالية، إحصائيات لكل معلّق
المعايير الذهبية: دقة كل معلّق، تحليل صعوبة كل عنصر
الاتفاق: معامل Krippendorff's alpha لكل مخطط مع التفسير
العناصر المرقّاة تلقائيًا: قائمة العناصر المرقّاة من الاتفاق العالي

نقاط نهاية API

مقاييس مراقبة الجودة

http

GET /admin/api/quality_control

تُرجع إحصائيات فحوصات الانتباه والمعايير الذهبية.

مقاييس الاتفاق

http

GET /admin/api/agreement

تُرجع معامل Krippendorff's alpha حسب المخطط مع التفسير.

مثال كامل

yaml

annotation_task_name: "Sentiment Analysis with Quality Control"
 
annotation_schemes:
  - name: sentiment
    annotation_type: radio
    labels: [positive, negative, neutral]
    description: "Select the sentiment of the text"
 
attention_checks:
  enabled: true
  items_file: "data/attention_checks.json"
  frequency: 15
  failure_handling:
    warn_threshold: 2
    block_threshold: 5
 
gold_standards:
  enabled: true
  items_file: "data/gold_standards.json"
  mode: mixed
  frequency: 25
  accuracy:
    min_threshold: 0.7
    evaluation_count: 5
 
agreement_metrics:
  enabled: true
  min_overlap: 2
  refresh_interval: 60

استكشاف الأخطاء وإصلاحها

فحوصات الانتباه لا تظهر

تحقق من أن مسار items_file صحيح (نسبي إلى مجلد المهمة)
تحقق من أن العناصر تحتوي على الحقول المطلوبة (id، expected_answer)
تأكد من تعيين frequency أو probability

مقاييس الاتفاق تعرض "No items with N+ annotators"

تأكد من أن العناصر تم تعليقها من قبل عدة مستخدمين
قلّل min_overlap إذا لزم الأمر
تحقق من أن التعليقات التوضيحية تُحفظ بشكل صحيح

قراءة إضافية

مرحلة التدريب - تأهيل المعلّقين
لوحة تحكم المسؤول - مراقبة المقاييس
تعيين المهام - التحكم في توزيع التعليقات التوضيحية

لمزيد من تفاصيل التنفيذ، راجع الوثائق المصدرية.