# Quality Control

Source: https://www.potatoannotator.com/docs/features/quality-control

Potato provides quality control features to keep annotations reliable: attention checks, gold standards, pre-annotation support, and real-time agreement metrics.

## Overview

Quality control in Potato consists of four key features:

1. **Attention Checks** - Verify annotator engagement with known-answer items
2. **Gold Standards** - Track accuracy against expert-labeled items
3. **Pre-annotation Support** - Pre-fill forms with model predictions
4. **Agreement Metrics** - Calculate inter-annotator agreement in real-time

## Attention Checks

Attention checks are items with known correct answers that verify annotators are paying attention and not randomly clicking.

### Configuration

```yaml
attention_checks:
  enabled: true
  items_file: "attention_checks.json"

  # How often to inject attention checks
  frequency: 10              # Insert one every 10 items
  # OR
  probability: 0.1           # 10% chance per item

  # Optional: flag suspiciously fast responses
  min_response_time: 3.0     # Flag if answered in < 3 seconds

  # Failure handling
  failure_handling:
    warn_threshold: 2        # Show warning after 2 failures
    warn_message: "Please read items carefully before answering."
    block_threshold: 5       # Block user after 5 failures
    block_message: "You have been blocked due to too many incorrect responses."
```

### Attention Check Items File

```json
[
  {
    "id": "attn_001",
    "text": "Please select 'Positive' for this item to verify you are reading carefully.",
    "expected_answer": {
      "sentiment": "positive"
    }
  }
]
```

## Gold Standards

Gold standards are expert-labeled items used to measure annotator accuracy. By default, gold standards are **silent** - results are recorded for admin review, but annotators don't see feedback.

### Configuration

```yaml
gold_standards:
  enabled: true
  items_file: "gold_standards.json"

  # How to use gold standards
  mode: "mixed"              # Options: training, mixed, separate
  frequency: 20              # Insert one every 20 items

  # Accuracy requirements
  accuracy:
    min_threshold: 0.7       # Minimum required accuracy (70%)
    evaluation_count: 10     # Evaluate after this many gold items

  # Feedback settings (disabled by default)
  feedback:
    show_correct_answer: false
    show_explanation: false

  # Auto-promotion from high-agreement items
  auto_promote:
    enabled: true
    min_annotators: 3
    agreement_threshold: 1.0   # 1.0 = unanimous
```

### Gold Standard Items File

```json
[
  {
    "id": "gold_001",
    "text": "The service was absolutely terrible and I will never return.",
    "gold_label": {
      "sentiment": "negative"
    },
    "explanation": "Strong negative language clearly indicates negative sentiment.",
    "difficulty": "easy"
  }
]
```

### Auto-Promotion

Items can automatically become gold standards when multiple annotators agree:

```yaml
gold_standards:
  auto_promote:
    enabled: true
    min_annotators: 3          # Wait for at least 3 annotators
    agreement_threshold: 1.0   # 100% must agree (unanimous)
```

## Pre-annotation Support

Pre-annotation allows you to pre-fill annotation forms with model predictions, useful for active learning and correction workflows.

### Configuration

```yaml
pre_annotation:
  enabled: true
  field: "predictions"        # Field in data containing predictions
  allow_modification: true    # Can annotators change pre-filled values?
  show_confidence: true
  highlight_low_confidence: 0.7
```

### Data Format

Include predictions in your data items:

```json
{
  "id": "item_001",
  "text": "I love this product!",
  "predictions": {
    "sentiment": "positive",
    "confidence": 0.92
  }
}
```

## Agreement Metrics

Real-time inter-annotator agreement metrics using Krippendorff's alpha are available in the admin dashboard.

### Configuration

```yaml
agreement_metrics:
  enabled: true
  min_overlap: 2             # Minimum annotators per item
  auto_refresh: true
  refresh_interval: 60       # Seconds between updates
```

### Interpreting Krippendorff's Alpha

| Alpha Value | Interpretation |
|-------------|----------------|
| α ≥ 0.8 | Good agreement - reliable for most purposes |
| 0.67 ≤ α ≤ 0.8 | Tentative agreement - draw tentative conclusions |
| 0.33 ≤ α ≤ 0.67 | Low agreement - review guidelines |
| α ≤ 0.33 | Poor agreement - significant issues |

## Admin Dashboard Integration

View quality control metrics in the admin dashboard at `/admin`:

- **Attention Checks**: Overall pass/fail rates, per-annotator statistics
- **Gold Standards**: Per-annotator accuracy, per-item difficulty analysis
- **Agreement**: Per-schema Krippendorff's alpha with interpretation
- **Auto-Promoted Items**: List of items promoted from high agreement

## API Endpoints

### Quality Control Metrics

```http
GET /admin/api/quality_control
```

Returns attention check and gold standard statistics.

### Agreement Metrics

```http
GET /admin/api/agreement
```

Returns Krippendorff's alpha by schema with interpretation.

## Complete Example

```yaml
annotation_task_name: "Sentiment Analysis with Quality Control"

annotation_schemes:
  - name: sentiment
    annotation_type: radio
    labels: [positive, negative, neutral]
    description: "Select the sentiment of the text"

attention_checks:
  enabled: true
  items_file: "data/attention_checks.json"
  frequency: 15
  failure_handling:
    warn_threshold: 2
    block_threshold: 5

gold_standards:
  enabled: true
  items_file: "data/gold_standards.json"
  mode: mixed
  frequency: 25
  accuracy:
    min_threshold: 0.7
    evaluation_count: 5

agreement_metrics:
  enabled: true
  min_overlap: 2
  refresh_interval: 60
```

## Troubleshooting

### Attention checks not appearing

1. Verify `items_file` path is correct (relative to task directory)
2. Check that items have required fields (`id`, `expected_answer`)
3. Ensure `frequency` or `probability` is set

### Agreement metrics showing "No items with N+ annotators"

1. Ensure items have been annotated by multiple users
2. Reduce `min_overlap` if needed
3. Check that annotations are being saved correctly

## Further Reading

- [Training Phase](/docs/features/training-phase) - Annotator qualification
- [Admin Dashboard](/docs/features/admin-dashboard) - Monitoring metrics
- [Task Assignment](/docs/features/task-assignment) - Control annotation distribution

For implementation details, see the [source documentation](https://github.com/davidjurgens/potato/blob/main/docs/quality_control.md).