Exporting Annotations for Machine Learning
How to export Potato annotations into ML-ready formats, JSON/JSONL, CoNLL, Hugging Face Datasets, spaCy, COCO, and YOLO, and what each is for.
The point of annotation is usually to train or evaluate a model, so the export format matters. Potato writes plain JSON/JSONL/CSV and also ML-native formats that training pipelines read directly, no glue code. Choosing the target format before you label tells you how to structure your data and IDs.
For the reference, see Export Formats.
Pick the format for the job
| Format | Use it for |
|---|---|
| JSON / JSONL | General-purpose; one record per item. The safe default. |
| CSV | Spreadsheets and quick analysis of classification labels. |
| CoNLL | Token-level sequence labeling (NER, chunking) with BIO tags. |
| Hugging Face Datasets | Loading straight into transformers training. |
| spaCy | Training spaCy NER and text-classification models. |
| COCO / YOLO | Object detection and segmentation from image annotation. |
| Parquet | Large-scale columnar storage and analytics. See Parquet Export. |
Setting the output format
output_annotation_dir: "annotation_output/"
output_annotation_format: "jsonl" # json, csv, conll, ...What ends up in the export
A typical record carries the item ID, the original content, every annotator's labels, and metadata (who, when). Keeping all annotators' labels, rather than only an aggregate, lets you compute agreement and re-aggregate later with a different method.
Plan the export before you label
The export format constrains your input design. Sequence-labeling exports need consistent tokenization; COCO/YOLO need image dimensions; Hugging Face needs a stable label set. Decide the destination first so you don't have to re-run the study. See Designing Data Formats for Annotation.