Skip to content
Diese Seite ist in Ihrer Sprache noch nicht verfügbar. Englische Version wird angezeigt.

Exporting Annotations for Machine Learning

How to export Potato annotations into ML-ready formats, JSON/JSONL, CoNLL, Hugging Face Datasets, spaCy, COCO, and YOLO, and what each is for.

The point of annotation is usually to train or evaluate a model, so the export format matters. Potato writes plain JSON/JSONL/CSV and also ML-native formats that training pipelines read directly, no glue code. Choosing the target format before you label tells you how to structure your data and IDs.

For the reference, see Export Formats.

Pick the format for the job

FormatUse it for
JSON / JSONLGeneral-purpose; one record per item. The safe default.
CSVSpreadsheets and quick analysis of classification labels.
CoNLLToken-level sequence labeling (NER, chunking) with BIO tags.
Hugging Face DatasetsLoading straight into transformers training.
spaCyTraining spaCy NER and text-classification models.
COCO / YOLOObject detection and segmentation from image annotation.
ParquetLarge-scale columnar storage and analytics. See Parquet Export.

Setting the output format

yaml
output_annotation_dir: "annotation_output/"
output_annotation_format: "jsonl"   # json, csv, conll, ...

What ends up in the export

A typical record carries the item ID, the original content, every annotator's labels, and metadata (who, when). Keeping all annotators' labels, rather than only an aggregate, lets you compute agreement and re-aggregate later with a different method.

Plan the export before you label

The export format constrains your input design. Sequence-labeling exports need consistent tokenization; COCO/YOLO need image dimensions; Hugging Face needs a stable label set. Decide the destination first so you don't have to re-run the study. See Designing Data Formats for Annotation.

Further reading