# What Is Data Annotation?

Source: https://www.potatoannotator.com/docs/guides/what-is-data-annotation

**Data annotation is the process of attaching labels to raw data, text, images, audio, video, or model outputs, so that the data can be measured, compared, or used to train and evaluate machine learning models.** A label might be a sentiment category on a tweet, a highlighted name in a sentence, a 1–5 quality rating on a chatbot reply, or a bounding box around a pedestrian in a photo.

Annotation is sometimes called data labeling, tagging, or coding (the term used in the social sciences). See [Data annotation](https://en.wikipedia.org/wiki/Data_annotation) and the related idea of a [labeled training set](https://en.wikipedia.org/wiki/Labeled_data) on Wikipedia.

## Why it matters

Supervised machine learning learns from examples that already carry the right answer. The quality of those answers sets a ceiling on model quality, so careful annotation is often the highest-leverage part of a project. Annotation is also how you *evaluate* a model: to know whether an AI system is correct, a person usually has to judge its outputs.

## The main types of annotation task

Most projects fall into a few families. Each maps to one or more annotation controls in Potato (see [Annotation Schemes](/docs/core-concepts/annotation-schemes)).

- **Classification**: pick one or more categories for a whole item. Example: is this review positive, negative, or neutral? See [Text classification](/docs/guides/text-annotation).
- **Span labeling**: mark a region *inside* an item, such as a name in a sentence or a region of an audio clip. See [Span Annotation](/docs/guides/span-annotation).
- **Rating and scoring**: place an item on a scale, such as a 1–5 quality judgment. See [Rating Scales](/docs/guides/rating-scales).
- **Ranking and comparison**: order items or pick the better of two. See [Pairwise and Best–Worst Scaling](/docs/guides/pairwise-and-best-worst).
- **Structured annotation**: link spans into relations, build coreference chains, or annotate events. See [Relation and Event Extraction](/docs/guides/relation-and-event-extraction).
- **Free-text**: write an explanation, a correction, or a transcription.

## A minimal example

A sentiment task in Potato is a few lines of YAML. The `annotation_schemes` block defines the labels an annotator sees:

```yaml
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the overall sentiment of this review?"
    labels:
      - Positive
      - Negative
      - Neutral
```

That is the whole interface definition. You supply the data, run `potato start`, and annotators label in the browser.

## How a project usually goes

1. **Define the task.** Write down the question and the label set.
2. **Write guidelines.** Give annotators rules and examples. See [Writing Annotation Guidelines](/docs/guides/writing-annotation-guidelines).
3. **Pilot.** Label a small batch, find disagreements, refine the guidelines.
4. **Annotate with overlap.** Have several people label the same items so you can measure agreement.
5. **Measure agreement.** See [Inter-Annotator Agreement](/docs/guides/inter-annotator-agreement).
6. **Adjudicate and export.** Resolve disagreements and export for training or analysis.

## Further reading

- [Choosing an Annotation Scheme](/docs/guides/choosing-an-annotation-scheme)
- [Quick Start](/docs/getting-started/quick-start), get Potato running in five minutes
- [Annotation Schemes reference](/docs/core-concepts/annotation-schemes)
