Skip to content

质量控制

注意力检查、金标准和标注者间一致性指标。

质量控制

Potato 提供全面的质量控制功能以确保高质量标注。包括注意力检查、金标准、预标注支持和实时一致性指标。

概述

Potato 中的质量控制由四个关键功能组成:

  1. 注意力检查 - 使用已知答案的项目验证标注者的参与度
  2. 金标准 - 根据专家标注的项目追踪准确率
  3. 预标注支持 - 使用模型预测预填充表单
  4. 一致性指标 - 实时计算标注者间一致性

注意力检查

注意力检查是具有已知正确答案的项目,用于验证标注者是否在认真工作而非随机点击。

配置

yaml
attention_checks:
  enabled: true
  items_file: "attention_checks.json"
 
  # How often to inject attention checks
  frequency: 10              # Insert one every 10 items
  # OR
  probability: 0.1           # 10% chance per item
 
  # Optional: flag suspiciously fast responses
  min_response_time: 3.0     # Flag if answered in < 3 seconds
 
  # Failure handling
  failure_handling:
    warn_threshold: 2        # Show warning after 2 failures
    warn_message: "Please read items carefully before answering."
    block_threshold: 5       # Block user after 5 failures
    block_message: "You have been blocked due to too many incorrect responses."

注意力检查项目文件

json
[
  {
    "id": "attn_001",
    "text": "Please select 'Positive' for this item to verify you are reading carefully.",
    "expected_answer": {
      "sentiment": "positive"
    }
  }
]

金标准

金标准是专家标注的项目,用于衡量标注者准确率。默认情况下,金标准是静默的——结果记录供管理员审查,但标注者不会看到反馈。

配置

yaml
gold_standards:
  enabled: true
  items_file: "gold_standards.json"
 
  # How to use gold standards
  mode: "mixed"              # Options: training, mixed, separate
  frequency: 20              # Insert one every 20 items
 
  # Accuracy requirements
  accuracy:
    min_threshold: 0.7       # Minimum required accuracy (70%)
    evaluation_count: 10     # Evaluate after this many gold items
 
  # Feedback settings (disabled by default)
  feedback:
    show_correct_answer: false
    show_explanation: false
 
  # Auto-promotion from high-agreement items
  auto_promote:
    enabled: true
    min_annotators: 3
    agreement_threshold: 1.0   # 1.0 = unanimous

金标准项目文件

json
[
  {
    "id": "gold_001",
    "text": "The service was absolutely terrible and I will never return.",
    "gold_label": {
      "sentiment": "negative"
    },
    "explanation": "Strong negative language clearly indicates negative sentiment.",
    "difficulty": "easy"
  }
]

自动提升

当多个标注者一致时,项目可以自动成为金标准:

yaml
gold_standards:
  auto_promote:
    enabled: true
    min_annotators: 3          # Wait for at least 3 annotators
    agreement_threshold: 1.0   # 100% must agree (unanimous)

预标注支持

预标注允许你使用模型预测预填充标注表单,适用于主动学习和校正工作流。

配置

yaml
pre_annotation:
  enabled: true
  field: "predictions"        # Field in data containing predictions
  allow_modification: true    # Can annotators change pre-filled values?
  show_confidence: true
  highlight_low_confidence: 0.7

数据格式

在数据项中包含预测:

json
{
  "id": "item_001",
  "text": "I love this product!",
  "predictions": {
    "sentiment": "positive",
    "confidence": 0.92
  }
}

一致性指标

管理员仪表板中提供使用 Krippendorff's alpha 的实时标注者间一致性指标。

配置

yaml
agreement_metrics:
  enabled: true
  min_overlap: 2             # Minimum annotators per item
  auto_refresh: true
  refresh_interval: 60       # Seconds between updates

解读 Krippendorff's Alpha

Alpha 值解释
α ≥ 0.8良好一致性 - 对大多数目的可靠
0.67 ≤ α ≤ 0.8初步一致性 - 可以得出初步结论
0.33 ≤ α ≤ 0.67低一致性 - 需审查指南
α ≤ 0.33差一致性 - 存在重大问题

管理员仪表板集成

/admin 的管理员仪表板中查看质量控制指标:

  • 注意力检查:总体通过/失败率,按标注者统计
  • 金标准:按标注者准确率,按项目难度分析
  • 一致性:按模式的 Krippendorff's alpha 及解释
  • 自动提升项目:从高一致性提升的项目列表

API 端点

质量控制指标

http
GET /admin/api/quality_control

返回注意力检查和金标准统计数据。

一致性指标

http
GET /admin/api/agreement

返回按模式分类的 Krippendorff's alpha 及解释。

完整示例

yaml
annotation_task_name: "Sentiment Analysis with Quality Control"
 
annotation_schemes:
  - name: sentiment
    annotation_type: radio
    labels: [positive, negative, neutral]
    description: "Select the sentiment of the text"
 
attention_checks:
  enabled: true
  items_file: "data/attention_checks.json"
  frequency: 15
  failure_handling:
    warn_threshold: 2
    block_threshold: 5
 
gold_standards:
  enabled: true
  items_file: "data/gold_standards.json"
  mode: mixed
  frequency: 25
  accuracy:
    min_threshold: 0.7
    evaluation_count: 5
 
agreement_metrics:
  enabled: true
  min_overlap: 2
  refresh_interval: 60

故障排除

注意力检查未出现

  1. 验证 items_file 路径正确(相对于任务目录)
  2. 检查项目是否有必需字段(idexpected_answer
  3. 确保设置了 frequencyprobability

一致性指标显示"没有 N+ 个标注者的项目"

  1. 确保项目已被多个用户标注
  2. 如需要可降低 min_overlap
  3. 检查标注是否正确保存

延伸阅读

有关实现细节,请参阅源代码文档