MACE Competence Estimation

Use the MACE algorithm in Potato to estimate annotator competence, weight disagreements, and infer gold-standard labels from noisy multi-annotator annotation data.

MACE (Multi-Annotator Competence Estimation) is a Variational Bayes EM algorithm that jointly estimates true labels for each item and annotator competence scores. It models each annotator as either "knowing" (produces correct labels) or "guessing" (produces random labels), yielding a competence score between 0.0 and 1.0.

MACE takes a noisy items-by-annotators label matrix and estimates each annotator's competence and the inferred true label per item What MACE estimates

When to Use MACE

MACE is useful when you have multiple annotators labeling the same items and want to:

Identify which annotators are most reliable
Produce higher-quality predicted labels by weighting annotator contributions
Detect low-quality annotators (spammers) automatically
Measure label uncertainty (entropy) per item

MACE works with categorical annotation types: radio, likert, select, and multiselect. It does not apply to free-text, span, slider, or numeric annotations.

How It Works

Data extraction: Potato collects all annotations for each schema across all annotators, building an items-by-annotators matrix
EM algorithm: MACE runs multiple random restarts of the Variational Bayes EM algorithm, keeping the solution with the best log-likelihood
Output: For each schema, MACE produces predicted labels, label entropy (uncertainty), and per-annotator competence scores
Triggering: MACE runs automatically after every N new annotations (configurable), or can be triggered manually via the admin API

Configuration

yaml

mace:
  enabled: true
 
  # Run MACE after every N new annotations
  trigger_every_n: 10
 
  # Minimum annotators per item before including in computation
  min_annotations_per_item: 3
 
  # Minimum eligible items before MACE will run
  min_items: 5
 
  # EM algorithm parameters
  num_restarts: 10
  num_iters: 50
  alpha: 0.5    # Prior for annotator spamming (Beta distribution)
  beta: 0.5     # Prior for guessing strategy (Dirichlet distribution)

Minimal Configuration

yaml

mace:
  enabled: true

Uses all defaults: triggers every 10 annotations, requires 3 annotators per item, minimum 5 eligible items, 10 restarts with 50 iterations each.

Configuration Reference

Option	Type	Default	Description
`enabled`	boolean	`false`	Enable MACE
`trigger_every_n`	integer	`10`	Run after every N new annotations
`min_annotations_per_item`	integer	`3`	Minimum annotators per item (must be >= 2)
`min_items`	integer	`5`	Minimum eligible items before running
`num_restarts`	integer	`10`	Random restarts for EM
`num_iters`	integer	`50`	EM iterations per restart
`alpha`	float	`0.5`	Prior for annotator spamming
`beta`	float	`0.5`	Prior for guessing strategy

Admin API Endpoints

All MACE endpoints require admin authentication via the X-API-Key header.

Overview

bash

curl http://localhost:8000/admin/api/mace/overview \
  -H "X-API-Key: your-admin-key"

Returns annotator competence scores and MACE status:

json

{
  "enabled": true,
  "has_results": true,
  "schemas": ["sentiment"],
  "annotator_competence": {
    "user_1": {"average": 0.92, "per_schema": {"sentiment": 0.92}},
    "user_2": {"average": 0.85, "per_schema": {"sentiment": 0.85}},
    "user_3": {"average": 0.45, "per_schema": {"sentiment": 0.45}}
  },
  "total_annotations": 30,
  "annotations_until_next_run": 0
}

Predictions

bash

curl "http://localhost:8000/admin/api/mace/predictions?schema=sentiment" \
  -H "X-API-Key: your-admin-key"

Returns predicted labels and entropy for each item.

Manual Trigger

bash

curl -X POST http://localhost:8000/admin/api/mace/trigger \
  -H "X-API-Key: your-admin-key"

Interpreting Results

Annotator Competence

0.9 - 1.0: Highly reliable annotator
0.7 - 0.9: Good annotator, occasional disagreements
0.5 - 0.7: Moderate annotator, may benefit from additional training
Below 0.5: Potential spammer or confused annotator

Label Entropy

Near 0.0: High confidence in the predicted label
Above 0.5: Moderate uncertainty, item may be genuinely ambiguous
Near log(num_labels): Maximum uncertainty, no consensus

Adjudication Integration

When both MACE and adjudication are enabled, MACE predicted labels appear as an additional signal in the adjudication interface:

yaml

adjudication:
  enabled: true
  adjudicator_users: ["admin"]
  min_annotations: 2
 
mace:
  enabled: true
  trigger_every_n: 10
  min_annotations_per_item: 2

Best Practices

Start with defaults - the default configuration works well for most scenarios
Monitor competence scores - use the admin dashboard to track annotator quality over time
Combine with training phases - use training to qualify annotators, then MACE to monitor ongoing quality
Set appropriate thresholds - lower min_annotations_per_item for smaller annotation projects

MACE Competence Estimation

When to Use MACE

How It Works

Configuration

Minimal Configuration

Configuration Reference

Admin API Endpoints

Overview

Predictions

Manual Trigger

Interpreting Results

Annotator Competence

Label Entropy

Adjudication Integration

Best Practices

Further Reading