Skip to content
Docs/Features

MACE Competence Estimation

Estimate annotator competence and true labels using the MACE algorithm.

MACE Competence Estimation

MACE (Multi-Annotator Competence Estimation) is a Variational Bayes EM algorithm that jointly estimates true labels for each item and annotator competence scores. It models each annotator as either "knowing" (produces correct labels) or "guessing" (produces random labels), yielding a competence score between 0.0 and 1.0.

When to Use MACE

MACE is useful when you have multiple annotators labeling the same items and want to:

  • Identify which annotators are most reliable
  • Produce higher-quality predicted labels by weighting annotator contributions
  • Detect low-quality annotators (spammers) automatically
  • Measure label uncertainty (entropy) per item

MACE works with categorical annotation types: radio, likert, select, and multiselect. It does not apply to free-text, span, slider, or numeric annotations.

How It Works

  1. Data extraction: Potato collects all annotations for each schema across all annotators, building an items-by-annotators matrix
  2. EM algorithm: MACE runs multiple random restarts of the Variational Bayes EM algorithm, keeping the solution with the best log-likelihood
  3. Output: For each schema, MACE produces predicted labels, label entropy (uncertainty), and per-annotator competence scores
  4. Triggering: MACE runs automatically after every N new annotations (configurable), or can be triggered manually via the admin API

Configuration

yaml
mace:
  enabled: true
 
  # Run MACE after every N new annotations
  trigger_every_n: 10
 
  # Minimum annotators per item before including in computation
  min_annotations_per_item: 3
 
  # Minimum eligible items before MACE will run
  min_items: 5
 
  # EM algorithm parameters
  num_restarts: 10
  num_iters: 50
  alpha: 0.5    # Prior for annotator spamming (Beta distribution)
  beta: 0.5     # Prior for guessing strategy (Dirichlet distribution)

Minimal Configuration

yaml
mace:
  enabled: true

Uses all defaults: triggers every 10 annotations, requires 3 annotators per item, minimum 5 eligible items, 10 restarts with 50 iterations each.

Configuration Reference

OptionTypeDefaultDescription
enabledbooleanfalseEnable MACE
trigger_every_ninteger10Run after every N new annotations
min_annotations_per_iteminteger3Minimum annotators per item (must be >= 2)
min_itemsinteger5Minimum eligible items before running
num_restartsinteger10Random restarts for EM
num_itersinteger50EM iterations per restart
alphafloat0.5Prior for annotator spamming
betafloat0.5Prior for guessing strategy

Admin API Endpoints

All MACE endpoints require admin authentication via the X-API-Key header.

Overview

bash
curl http://localhost:8000/admin/api/mace/overview \
  -H "X-API-Key: your-admin-key"

Returns annotator competence scores and MACE status:

json
{
  "enabled": true,
  "has_results": true,
  "schemas": ["sentiment"],
  "annotator_competence": {
    "user_1": {"average": 0.92, "per_schema": {"sentiment": 0.92}},
    "user_2": {"average": 0.85, "per_schema": {"sentiment": 0.85}},
    "user_3": {"average": 0.45, "per_schema": {"sentiment": 0.45}}
  },
  "total_annotations": 30,
  "annotations_until_next_run": 0
}

Predictions

bash
curl "http://localhost:8000/admin/api/mace/predictions?schema=sentiment" \
  -H "X-API-Key: your-admin-key"

Returns predicted labels and entropy for each item.

Manual Trigger

bash
curl -X POST http://localhost:8000/admin/api/mace/trigger \
  -H "X-API-Key: your-admin-key"

Interpreting Results

Annotator Competence

  • 0.9 - 1.0: Highly reliable annotator
  • 0.7 - 0.9: Good annotator, occasional disagreements
  • 0.5 - 0.7: Moderate annotator, may benefit from additional training
  • Below 0.5: Potential spammer or confused annotator

Label Entropy

  • Near 0.0: High confidence in the predicted label
  • Above 0.5: Moderate uncertainty, item may be genuinely ambiguous
  • Near log(num_labels): Maximum uncertainty, no consensus

Adjudication Integration

When both MACE and adjudication are enabled, MACE predicted labels appear as an additional signal in the adjudication interface:

yaml
adjudication:
  enabled: true
  adjudicator_users: ["admin"]
  min_annotations: 2
 
mace:
  enabled: true
  trigger_every_n: 10
  min_annotations_per_item: 2

Best Practices

  1. Start with defaults - the default configuration works well for most scenarios
  2. Monitor competence scores - use the admin dashboard to track annotator quality over time
  3. Combine with training phases - use training to qualify annotators, then MACE to monitor ongoing quality
  4. Set appropriate thresholds - lower min_annotations_per_item for smaller annotation projects

Further Reading

For implementation details, see the source documentation.