Text Annotation
A complete guide to text annotation, classification, multi-label tagging, rating, and free-text, and how to build each kind of text task in Potato with copy-paste config.
Text annotation means labeling written language: sorting documents into categories, tagging the topics in an article, rating a passage for quality, or writing a correction. It is the most common annotation task in natural language processing, and it is what Potato was first built for. This guide covers the whole-document text tasks; for marking regions inside text, see Span Annotation.
The text tasks at a glance
- Document classification: one label for the whole text (text classification).
- Multi-label tagging: several labels at once, such as topics or content warnings.
- Rating and scoring: a position on a scale, such as quality or sentiment intensity.
- Free-text: a written answer, paraphrase, or correction.
Classification: one label per document
The workhorse of text annotation. Use radio when the categories are mutually exclusive:
annotation_schemes:
- annotation_type: radio
name: sentiment
description: "What is the overall sentiment of this review?"
labels: [Positive, Negative, Neutral]
sequential_key_binding: truesequential_key_binding maps the labels to keys 1, 2, 3, so annotators keep their hands on the keyboard. On a job of thousands of items this is a large speed-up. See the live sentiment analysis design for a working example.
Multi-label: several tags at once
When more than one label can apply, use multiselect. Bound the selection count to match your guidelines:
annotation_schemes:
- annotation_type: multiselect
name: content_warnings
description: "Select every content warning that applies."
labels: [Violence, Profanity, Sexual content, Self-harm, None]
min_selections: 1
max_selections: 5Content moderation is a classic multi-label text task; the toxicity detection design combines a category with a highlighted span.
Rating text on a scale
To capture degree rather than category, use a Likert scale:
annotation_schemes:
- annotation_type: likert
name: helpfulness
description: "How helpful is this answer?"
size: 5
min_label: "Not helpful"
max_label: "Very helpful"See Rating Scales for scale-design pitfalls such as acquiescence bias and how many points to use.
Free-text and corrections
Sometimes the most useful label is a sentence the annotator writes, a justification, a rewrite, or a transcription. Combine it with a category and show it only when relevant:
annotation_schemes:
- annotation_type: radio
name: factuality
description: "Is the claim supported by the source?"
labels: [Supported, Contradicted, Not enough info]
- annotation_type: text
name: evidence
description: "Quote the sentence that supports your choice."
label_requirement:
required: falseGetting consistent text labels
Text is ambiguous, so consistency comes from the surrounding process, not the interface:
- Write tight guidelines with a "can't tell" option.
- Have multiple annotators overlap on the same items.
- Track inter-annotator agreement and adjudicate disagreements.
- Speed up large jobs with LLM pre-annotation and verify the suggestions by hand.
Further reading
- Span Annotation, marking regions inside text
- Choosing an Annotation Scheme
- Annotation Schemes reference