Data Annotation Concepts
Find answers to common questions about Potato. Can't find what you're looking for? Join our Discord or check the documentation.
Data Annotation Concepts
Data annotation is the process of adding labels to raw data such as text, images, audio, video, or model outputs, so the data can be used to train or evaluate machine learning models. A label might be a category, a highlighted span, a rating, or a comparison. Potato lets you set up any of these task types with a short YAML configuration.
Inter-annotator agreement measures how often independent annotators give the same label to the same item. It is the standard evidence that a task is well defined and the labels are reliable. Common measures are Cohen's kappa, Fleiss' kappa, and Krippendorff's alpha, which correct for agreement that would happen by chance. Potato reports Krippendorff's alpha in its admin dashboard.
It depends on your data and goals, so there is no single answer. For work that spans text, images, audio, and AI-agent evaluation, Potato is a strong free and open-source option with more than 30 task types and a zero-code YAML setup. Label Studio, Doccano, brat, and Argilla are other open-source choices with different strengths.
Start by defining the task and the label set, then write clear guidelines and have several annotators label overlapping items. Measure agreement, resolve the disagreements, and export the result in a format your training pipeline can read. Potato covers this whole workflow and exports to JSON, CoNLL, Hugging Face, spaCy, and COCO/YOLO.
Clear, objective tasks can often use one annotator, with a small overlapping sample for quality checks. Moderately subjective tasks usually use three annotators resolved by majority vote. Highly subjective tasks use five or more, and sometimes keep the full range of opinions rather than collapsing to one answer. The benefit drops off quickly past three.
Active learning chooses which items to annotate next so a model reaches a target accuracy with fewer labels than random sampling would need. The model flags the items it finds most informative, often the ones it is least certain about, and a person labels those. Potato supports uncertainty, diversity, BADGE, and BALD strategies.
Classification assigns one or more labels to a whole item, such as marking a review positive or negative. Span annotation marks a region inside an item, such as highlighting a name in a sentence or an event on an audio waveform. Named entity recognition and error marking are span tasks. Potato supports both, and you can combine them on one screen.
Have people judge the outputs: rate them on a scale, compare two side by side, score them against a rubric, or mark specific errors with spans. For agents that take multiple steps, you can also judge each step of the trajectory. Potato provides all of these and can read agent traces from formats such as OpenAI, Anthropic, and ReAct.
Still Have Questions?
Our community is here to help. Join Discord for real-time support or browse the documentation for detailed guides.