Quality Control

Ensure annotation quality in Potato with attention check items, gold-standard validation, configurable annotator overlap, and Krippendorff's Alpha reporting.

Potato provides quality control features to keep annotations reliable: attention checks, gold standards, pre-annotation support, and real-time agreement metrics.

Four quality controls, one reliable dataset

Overview

Quality control in Potato consists of four key features:

Attention Checks - Verify annotator engagement with known-answer items
Gold Standards - Track accuracy against expert-labeled items
Pre-annotation Support - Pre-fill forms with model predictions
Agreement Metrics - Calculate inter-annotator agreement in real-time

Attention Checks

Attention checks are items with known correct answers that verify annotators are paying attention and not randomly clicking.

Configuration

yaml

attention_checks:
  enabled: true
  items_file: "attention_checks.json"
 
  # How often to inject attention checks
  frequency: 10              # Insert one every 10 items
  # OR
  probability: 0.1           # 10% chance per item
 
  # Optional: flag suspiciously fast responses
  min_response_time: 3.0     # Flag if answered in < 3 seconds
 
  # Failure handling
  failure_handling:
    warn_threshold: 2        # Show warning after 2 failures
    warn_message: "Please read items carefully before answering."
    block_threshold: 5       # Block user after 5 failures
    block_message: "You have been blocked due to too many incorrect responses."

Attention Check Items File

json

[
  {
    "id": "attn_001",
    "text": "Please select 'Positive' for this item to verify you are reading carefully.",
    "expected_answer": {
      "sentiment": "positive"
    }
  }
]

Gold Standards

Gold standards are expert-labeled items used to measure annotator accuracy. By default, gold standards are silent - results are recorded for admin review, but annotators don't see feedback.

Configuration

yaml

gold_standards:
  enabled: true
  items_file: "gold_standards.json"
 
  # How to use gold standards
  mode: "mixed"              # Options: training, mixed, separate
  frequency: 20              # Insert one every 20 items
 
  # Accuracy requirements
  accuracy:
    min_threshold: 0.7       # Minimum required accuracy (70%)
    evaluation_count: 10     # Evaluate after this many gold items
 
  # Feedback settings (disabled by default)
  feedback:
    show_correct_answer: false
    show_explanation: false
 
  # Auto-promotion from high-agreement items
  auto_promote:
    enabled: true
    min_annotators: 3
    agreement_threshold: 1.0   # 1.0 = unanimous

Gold Standard Items File

json

[
  {
    "id": "gold_001",
    "text": "The service was absolutely terrible and I will never return.",
    "gold_label": {
      "sentiment": "negative"
    },
    "explanation": "Strong negative language clearly indicates negative sentiment.",
    "difficulty": "easy"
  }
]

Auto-Promotion

Items can automatically become gold standards when multiple annotators agree:

yaml

gold_standards:
  auto_promote:
    enabled: true
    min_annotators: 3          # Wait for at least 3 annotators
    agreement_threshold: 1.0   # 100% must agree (unanimous)

Pre-annotation Support

Pre-annotation allows you to pre-fill annotation forms with model predictions, useful for active learning and correction workflows.

Configuration

yaml

pre_annotation:
  enabled: true
  field: "predictions"        # Field in data containing predictions
  allow_modification: true    # Can annotators change pre-filled values?
  show_confidence: true
  highlight_low_confidence: 0.7

Data Format

Include predictions in your data items:

json

{
  "id": "item_001",
  "text": "I love this product!",
  "predictions": {
    "sentiment": "positive",
    "confidence": 0.92
  }
}

Agreement Metrics

Real-time inter-annotator agreement metrics using Krippendorff's alpha are available in the admin dashboard.

Configuration

yaml

agreement_metrics:
  enabled: true
  min_overlap: 2             # Minimum annotators per item
  auto_refresh: true
  refresh_interval: 60       # Seconds between updates

Interpreting Krippendorff's Alpha

Alpha Value	Interpretation
α ≥ 0.8	Good agreement - reliable for most purposes
0.67 ≤ α ≤ 0.8	Tentative agreement - draw tentative conclusions
0.33 ≤ α ≤ 0.67	Low agreement - review guidelines
α ≤ 0.33	Poor agreement - significant issues

Admin Dashboard Integration

View quality control metrics in the admin dashboard at /admin:

Attention Checks: Overall pass/fail rates, per-annotator statistics
Gold Standards: Per-annotator accuracy, per-item difficulty analysis
Agreement: Per-schema Krippendorff's alpha with interpretation
Auto-Promoted Items: List of items promoted from high agreement

API Endpoints

Quality Control Metrics

http

GET /admin/api/quality_control

Returns attention check and gold standard statistics.

Agreement Metrics

http

GET /admin/api/agreement

Returns Krippendorff's alpha by schema with interpretation.

Complete Example

yaml

annotation_task_name: "Sentiment Analysis with Quality Control"
 
annotation_schemes:
  - name: sentiment
    annotation_type: radio
    labels: [positive, negative, neutral]
    description: "Select the sentiment of the text"
 
attention_checks:
  enabled: true
  items_file: "data/attention_checks.json"
  frequency: 15
  failure_handling:
    warn_threshold: 2
    block_threshold: 5
 
gold_standards:
  enabled: true
  items_file: "data/gold_standards.json"
  mode: mixed
  frequency: 25
  accuracy:
    min_threshold: 0.7
    evaluation_count: 5
 
agreement_metrics:
  enabled: true
  min_overlap: 2
  refresh_interval: 60

Quality Control

Overview

Attention Checks

Configuration

Attention Check Items File

Gold Standards

Configuration

Gold Standard Items File

Auto-Promotion

Pre-annotation Support

Configuration

Data Format

Agreement Metrics

Configuration

Interpreting Krippendorff's Alpha

Admin Dashboard Integration

API Endpoints

Quality Control Metrics

Agreement Metrics

Complete Example

Troubleshooting

Attention checks not appearing

Agreement metrics showing "No items with N+ annotators"

Further Reading