Docs/Core Concepts

Annotation Schemes

Define what and how annotators will label your data.

Annotation Schemes

Annotation schemes define the labeling tasks for your annotators. Potato 2.0 supports eight annotation types that can be combined to create complex annotation tasks.

Basic Structure

Each scheme is defined in the annotation_schemes array:

annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment?"
    labels:
      - Positive
      - Negative
      - Neutral

Required Fields

FieldDescription
annotation_typeType of annotation (radio, multiselect, likert, span, text, number, slider, multirate)
nameInternal identifier (no spaces, used in output)
descriptionInstructions shown to annotators

Supported Annotation Types

1. Radio (Single Choice)

Select exactly one option from a list:

- annotation_type: radio
  name: sentiment
  description: "What is the sentiment of this text?"
  labels:
    - Positive
    - Negative
    - Neutral
 
  # Optional features
  keyboard_shortcuts:
    Positive: "1"
    Negative: "2"
    Neutral: "3"
 
  # Or use sequential binding (1, 2, 3... automatically)
  sequential_key_binding: true
 
  # Horizontal layout instead of vertical
  horizontal: true

2. Likert Scale

Rating scales with labeled endpoints:

- annotation_type: likert
  name: agreement
  description: "How much do you agree with this statement?"
  size: 5  # Number of scale points
  min_label: "Strongly Disagree"
  max_label: "Strongly Agree"
 
  # Optional mid-point label
  mid_label: "Neutral"
 
  # Show numeric values
  show_numbers: true

3. Multiselect (Multiple Choice)

Select multiple options from a list:

- annotation_type: multiselect
  name: topics
  description: "Select all relevant topics"
  labels:
    - Politics
    - Technology
    - Sports
    - Entertainment
    - Science
 
  # Selection constraints
  min_selections: 1
  max_selections: 3
 
  # Allow free text response
  free_response: true
  free_response_label: "Other (specify)"

4. Span Annotation

Highlight and label text segments:

- annotation_type: span
  name: entities
  description: "Highlight named entities in the text"
  labels:
    - PERSON
    - ORGANIZATION
    - LOCATION
    - DATE
 
  # Visual customization
  label_colors:
    PERSON: "#3b82f6"
    ORGANIZATION: "#10b981"
    LOCATION: "#f59e0b"
    DATE: "#8b5cf6"
 
  # Allow overlapping spans
  allow_overlapping: false
 
  # Keyboard shortcuts for labels
  sequential_key_binding: true

5. Slider

Continuous numerical range:

- annotation_type: slider
  name: confidence
  description: "How confident are you in your answer?"
  min: 0
  max: 100
  step: 1
  default: 50
 
  # Endpoint labels
  min_label: "Not confident"
  max_label: "Very confident"
 
  # Show current value
  show_value: true

6. Text Input

Free-form text responses:

- annotation_type: text
  name: explanation
  description: "Explain your reasoning"
 
  # Multi-line input
  textarea: true
 
  # Character limits
  min_length: 10
  max_length: 500
 
  # Placeholder text
  placeholder: "Enter your explanation here..."
 
  # Disable paste (for transcription tasks)
  disable_paste: true

7. Number Input

Numerical input with constraints:

- annotation_type: number
  name: count
  description: "How many entities are mentioned?"
  min: 0
  max: 100
  step: 1
  default: 0

8. Multirate (Matrix Rating)

Rate multiple items on the same scale:

- annotation_type: multirate
  name: quality_aspects
  description: "Rate each aspect of the response"
  items:
    - Accuracy
    - Clarity
    - Completeness
    - Relevance
  size: 5  # Scale points
  min_label: "Poor"
  max_label: "Excellent"
 
  # Randomize item order
  randomize: true
 
  # Layout options
  compact: false

Common Options

Keyboard Shortcuts

Speed up annotation with keyboard bindings:

# Manual shortcuts
keyboard_shortcuts:
  Positive: "1"
  Negative: "2"
  Neutral: "3"
 
# Or automatic sequential binding
sequential_key_binding: true  # Assigns 1, 2, 3...

Tooltips

Provide hover hints for labels:

tooltips:
  Positive: "Expresses happiness, approval, or satisfaction"
  Negative: "Expresses sadness, anger, or disappointment"
  Neutral: "No clear emotional content"

Label Colors

Custom colors for visual distinction:

label_colors:
  PERSON: "#3b82f6"
  LOCATION: "#10b981"
  ORGANIZATION: "#f59e0b"

Required Fields

Make a scheme required before submission:

- annotation_type: radio
  name: sentiment
  required: true
  labels:
    - Positive
    - Negative

Multiple Schemes

Combine multiple annotation types per instance:

annotation_schemes:
  # Primary classification
  - annotation_type: radio
    name: sentiment
    description: "Overall sentiment"
    labels:
      - Positive
      - Negative
      - Neutral
    required: true
    sequential_key_binding: true
 
  # Confidence rating
  - annotation_type: likert
    name: confidence
    description: "How confident are you?"
    size: 5
    min_label: "Guessing"
    max_label: "Certain"
 
  # Topic tags
  - annotation_type: multiselect
    name: topics
    description: "Select all relevant topics"
    labels:
      - Politics
      - Technology
      - Sports
      - Entertainment
    free_response: true
 
  # Notes
  - annotation_type: text
    name: notes
    description: "Any additional observations?"
    textarea: true
    required: false

Advanced Features

Pairwise Comparison

Compare two items:

- annotation_type: pairwise
  name: preference
  description: "Which response is better?"
  options:
    - label: "Response A"
      value: "A"
    - label: "Response B"
      value: "B"
    - label: "Equal"
      value: "tie"
 
  # Allow tie selection
  allow_tie: true

Best-Worst Scaling

Rank items by selecting best and worst:

- annotation_type: best_worst
  name: ranking
  description: "Select the best and worst items"
  # Items come from the data file

Space-efficient single selection:

- annotation_type: select
  name: category
  description: "Select a category"
  labels:
    - Category A
    - Category B
    - Category C
    - Category D
    - Category E
 
  # Default selection
  default: "Category A"

Data Format Reference

Input

Annotation schemes work with your data format:

{
  "id": "doc_1",
  "text": "This is the text to annotate."
}

Output

Annotations are saved with scheme names as keys:

{
  "id": "doc_1",
  "annotations": {
    "sentiment": "Positive",
    "confidence": 4,
    "topics": ["Technology", "Science"],
    "entities": [
      {"start": 0, "end": 4, "label": "ORGANIZATION", "text": "This"}
    ],
    "notes": "Clear positive sentiment about technology."
  }
}

Best Practices

1. Clear Labels

Use unambiguous, distinct labels:

# Good
labels:
  - Strongly Positive
  - Somewhat Positive
  - Neutral
  - Somewhat Negative
  - Strongly Negative
 
# Avoid
labels:
  - Good
  - OK
  - Fine
  - Acceptable

2. Helpful Tooltips

Add tooltips for nuanced labels:

tooltips:
  Sarcasm: "The text says the opposite of what it means"
  Irony: "A mismatch between expectation and reality"

3. Keyboard Shortcuts

Enable shortcuts for high-volume tasks:

sequential_key_binding: true

4. Logical Order

Order labels consistently:

  • Most common first
  • Alphabetically
  • By intensity (low to high)

5. Limit Options

Too many choices slow annotation:

  • Radio: 2-7 options
  • Multiselect: 5-15 options
  • Likert: 5-7 points

6. Test First

Annotate several examples yourself before deploying to catch:

  • Ambiguous labels
  • Missing categories
  • Unclear instructions