MACE Competence Estimation
Estimate annotator competence and true labels using the MACE algorithm.
MACE Competence Estimation
MACE (Multi-Annotator Competence Estimation) is a Variational Bayes EM algorithm that jointly estimates true labels for each item and annotator competence scores. It models each annotator as either "knowing" (produces correct labels) or "guessing" (produces random labels), yielding a competence score between 0.0 and 1.0.
When to Use MACE
MACE is useful when you have multiple annotators labeling the same items and want to:
- Identify which annotators are most reliable
- Produce higher-quality predicted labels by weighting annotator contributions
- Detect low-quality annotators (spammers) automatically
- Measure label uncertainty (entropy) per item
MACE works with categorical annotation types: radio, likert, select, and multiselect. It does not apply to free-text, span, slider, or numeric annotations.
How It Works
- Data extraction: Potato collects all annotations for each schema across all annotators, building an items-by-annotators matrix
- EM algorithm: MACE runs multiple random restarts of the Variational Bayes EM algorithm, keeping the solution with the best log-likelihood
- Output: For each schema, MACE produces predicted labels, label entropy (uncertainty), and per-annotator competence scores
- Triggering: MACE runs automatically after every N new annotations (configurable), or can be triggered manually via the admin API
Configuration
mace:
enabled: true
# Run MACE after every N new annotations
trigger_every_n: 10
# Minimum annotators per item before including in computation
min_annotations_per_item: 3
# Minimum eligible items before MACE will run
min_items: 5
# EM algorithm parameters
num_restarts: 10
num_iters: 50
alpha: 0.5 # Prior for annotator spamming (Beta distribution)
beta: 0.5 # Prior for guessing strategy (Dirichlet distribution)Minimal Configuration
mace:
enabled: trueUses all defaults: triggers every 10 annotations, requires 3 annotators per item, minimum 5 eligible items, 10 restarts with 50 iterations each.
Configuration Reference
| Option | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable MACE |
trigger_every_n | integer | 10 | Run after every N new annotations |
min_annotations_per_item | integer | 3 | Minimum annotators per item (must be >= 2) |
min_items | integer | 5 | Minimum eligible items before running |
num_restarts | integer | 10 | Random restarts for EM |
num_iters | integer | 50 | EM iterations per restart |
alpha | float | 0.5 | Prior for annotator spamming |
beta | float | 0.5 | Prior for guessing strategy |
Admin API Endpoints
All MACE endpoints require admin authentication via the X-API-Key header.
Overview
curl http://localhost:8000/admin/api/mace/overview \
-H "X-API-Key: your-admin-key"Returns annotator competence scores and MACE status:
{
"enabled": true,
"has_results": true,
"schemas": ["sentiment"],
"annotator_competence": {
"user_1": {"average": 0.92, "per_schema": {"sentiment": 0.92}},
"user_2": {"average": 0.85, "per_schema": {"sentiment": 0.85}},
"user_3": {"average": 0.45, "per_schema": {"sentiment": 0.45}}
},
"total_annotations": 30,
"annotations_until_next_run": 0
}Predictions
curl "http://localhost:8000/admin/api/mace/predictions?schema=sentiment" \
-H "X-API-Key: your-admin-key"Returns predicted labels and entropy for each item.
Manual Trigger
curl -X POST http://localhost:8000/admin/api/mace/trigger \
-H "X-API-Key: your-admin-key"Interpreting Results
Annotator Competence
- 0.9 - 1.0: Highly reliable annotator
- 0.7 - 0.9: Good annotator, occasional disagreements
- 0.5 - 0.7: Moderate annotator, may benefit from additional training
- Below 0.5: Potential spammer or confused annotator
Label Entropy
- Near 0.0: High confidence in the predicted label
- Above 0.5: Moderate uncertainty, item may be genuinely ambiguous
- Near log(num_labels): Maximum uncertainty, no consensus
Adjudication Integration
When both MACE and adjudication are enabled, MACE predicted labels appear as an additional signal in the adjudication interface:
adjudication:
enabled: true
adjudicator_users: ["admin"]
min_annotations: 2
mace:
enabled: true
trigger_every_n: 10
min_annotations_per_item: 2Best Practices
- Start with defaults - the default configuration works well for most scenarios
- Monitor competence scores - use the admin dashboard to track annotator quality over time
- Combine with training phases - use training to qualify annotators, then MACE to monitor ongoing quality
- Set appropriate thresholds - lower
min_annotations_per_itemfor smaller annotation projects
Further Reading
- Quality Control - Other quality control mechanisms
- Admin Dashboard - Monitoring annotation progress
- AI Support - AI-assisted annotation
For implementation details, see the source documentation.