# Designing Data Formats for Annotation

Source: https://www.potatoannotator.com/docs/guides/data-formats-for-annotation

**Good annotation starts with well-structured input. Each item needs a stable unique identifier and the content to be labeled; everything else is optional context.** Getting this right at the start saves painful re-runs later, because annotations are keyed to your item IDs.

Common interchange formats are [JSON](https://en.wikipedia.org/wiki/JSON), [JSON Lines](https://jsonlines.org/) (one object per line, ideal for large datasets), and [CSV](https://en.wikipedia.org/wiki/Comma-separated_values). Potato reads all three. For the full reference see [Data Formats](/docs/core-concepts/data-formats).

## The minimum each item needs

- **A unique ID** that never changes. Annotations are stored against this ID, so if you renumber items mid-project you lose the link to existing labels.
- **The content to annotate**: a text field, an image URL, an audio path, or a structured trace.

A JSONL file for a text task looks like this:

```json
{"id": "rev_001", "text": "The battery lasts all day. Highly recommend."}
{"id": "rev_002", "text": "Stopped working after a week."}
```

You tell Potato which keys to use:

```yaml
item_properties:
  id_key: id
  text_key: text

data_files:
  - "data/reviews.jsonl"
```

## Carry context, but keep it separate from labels

Extra fields, a source URL, a timestamp, a model name, can ride along on each item and be shown to annotators without becoming labels. Keep them clearly named so the export is easy to read later.

## Plan the export before you label

Decide early how labeled data will feed your pipeline. Potato exports to JSON, JSONL, and CSV, and to ML-native formats such as [CoNLL](https://en.wikipedia.org/wiki/CoNLL) for sequence labeling, Hugging Face Datasets, spaCy, and COCO/YOLO for vision. Choosing the target format up front tells you which fields and ID scheme to use now. See [Exporting Annotations for ML](/docs/guides/exporting-annotations-for-ml).

```yaml
output_annotation_dir: "annotation_output/"
output_annotation_format: "jsonl"
```

## Further reading

- [Data Formats reference](/docs/core-concepts/data-formats)
- [Instance Display](/docs/core-concepts/instance-display), how content is shown
- [What Is Data Annotation?](/docs/guides/what-is-data-annotation)
