# Active Learning for Annotation

Source: https://www.potatoannotator.com/docs/guides/active-learning

**Active learning chooses *which* items to annotate next so a model reaches the same accuracy with far fewer labels. Instead of labeling at random, you label the items the model finds most informative.** When labeling is the bottleneck, this is one of the highest-return techniques available.

See [active learning](https://en.wikipedia.org/wiki/Active_learning_(machine_learning)) for background. For the feature reference, see [Active Learning](/docs/features/active-learning).

## The loop

1. Label a small seed set.
2. Train a quick model on what you have.
3. Score the unlabeled pool and pick the most informative items.
4. Annotate those, add them, retrain. Repeat.

The payoff is data efficiency: the model spends your annotation budget where it learns the most.

## Query strategies Potato supports

- **Uncertainty sampling**: pick items the model is least confident about (near the decision boundary). The simplest and often-effective default.
- **Diversity sampling**: pick items that are different from each other, so you don't waste budget on near-duplicates.
- **[BADGE](https://arxiv.org/abs/1906.03671)**: combines uncertainty and diversity using gradient embeddings.
- **[BALD](https://arxiv.org/abs/1112.5745)**: Bayesian strategy that selects items expected to most reduce model uncertainty.
- **Hybrid**: blends strategies.

```yaml
active_learning:
  enabled: true
  schema_names: [sentiment]
  query_strategy: uncertainty   # or diversity, badge, bald, hybrid
  min_instances_for_training: 20
```

## When active learning helps, and when it doesn't

It helps when labels are expensive, the pool is large, and a useful model can be trained on a small seed. It helps *less* when:

- The task is so easy that random labeling already saturates quickly.
- You need an unbiased held-out test set, keep your evaluation data randomly sampled, because active-learning-selected data is deliberately skewed.
- Labels are cheap relative to engineering effort.

## Further reading

- [Active Learning feature reference](/docs/features/active-learning)
- [LLM and Vision Pre-Annotation](/docs/guides/llm-pre-annotation)
- [Diversity Ordering](/docs/features/diversity-ordering)
