Diese Seite ist in Ihrer Sprache noch nicht verfügbar. Englische Version wird angezeigt.

Exporting Annotations for Machine Learning

Name: Potato
Author: Potato Annotation

How to export Potato annotations into ML-ready formats, JSON/JSONL, CoNLL, Hugging Face Datasets, spaCy, COCO, and YOLO, and what each is for.

The point of annotation is usually to train or evaluate a model, so the export format matters. Potato writes plain JSON/JSONL/CSV and also ML-native formats that training pipelines read directly, no glue code. Choosing the target format before you label tells you how to structure your data and IDs.

For the reference, see Export Formats.

Pick the format for the job

Format	Use it for
JSON / JSONL	General-purpose; one record per item. The safe default.
CSV	Spreadsheets and quick analysis of classification labels.
CoNLL	Token-level sequence labeling (NER, chunking) with BIO tags.
Hugging Face Datasets	Loading straight into `transformers` training.
spaCy	Training spaCy NER and text-classification models.
COCO / YOLO	Object detection and segmentation from image annotation.
Parquet	Large-scale columnar storage and analytics. See Parquet Export.

Setting the output format

yaml

output_annotation_dir: "annotation_output/"
output_annotation_format: "jsonl"   # json, csv, conll, ...

What ends up in the export

A typical record carries the item ID, the original content, every annotator's labels, and metadata (who, when). Keeping all annotators' labels, rather than only an aggregate, lets you compute agreement and re-aggregate later with a different method.

Plan the export before you label

The export format constrains your input design. Sequence-labeling exports need consistent tokenization; COCO/YOLO need image dimensions; Hugging Face needs a stable label set. Decide the destination first so you don't have to re-run the study. See Designing Data Formats for Annotation.

Exporting Annotations for Machine Learning

Pick the format for the job

Setting the output format

What ends up in the export

Plan the export before you label

Further reading