Skip to content

Confidence Annotation

Add confidence ratings paired with other annotations in Potato using Likert scales or sliders to capture annotator certainty.

Confidence Annotation

The confidence annotation schema lets annotators rate their confidence in another annotation they have made. It pairs a confidence scale (Likert or slider) with a target annotation schema, enabling researchers to measure not just what annotators chose but how certain they were about their choice.

Overview

Confidence annotations are essential for studying annotation quality, identifying ambiguous items, and weighting labels during aggregation. When configured, a confidence rating widget appears alongside the target annotation, prompting annotators to indicate how sure they are of their decision.

Quick Start

yaml
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: What is the sentiment of this text?
    labels: ["Positive", "Negative", "Neutral"]
 
  - annotation_type: confidence_annotation
    name: sentiment_confidence
    description: How confident are you in your sentiment label?
    target_schema: sentiment
    scale_type: likert
    scale_points: 5

Configuration Options

FieldTypeDefaultDescription
annotation_typestringRequiredMust be "confidence_annotation"
namestringRequiredUnique identifier for this schema
descriptionstringRequiredInstructions displayed to annotators
target_schemastringOptionalName of the annotation schema this confidence rating applies to
scale_typestring"likert"Type of scale: "likert" for discrete points or "slider" for continuous
scale_pointsinteger5Number of points on the Likert scale (ignored for slider)
labelsarrayOptionalCustom labels for scale points (e.g., ["Not confident", "Very confident"])
slider_minintegerMinimum value for the slider (only used when scale_type is "slider")
slider_maxintegerMaximum value for the slider (only used when scale_type is "slider")
label_requirement.requiredbooleanfalseWhether the confidence rating must be completed before moving on

Examples

Likert Confidence Scale

yaml
annotation_schemes:
  - annotation_type: radio
    name: toxicity
    description: Is this comment toxic?
    labels: ["Toxic", "Not Toxic"]
 
  - annotation_type: confidence_annotation
    name: toxicity_confidence
    description: How confident are you in your toxicity judgment?
    target_schema: toxicity
    scale_type: likert
    scale_points: 5
    labels: ["Not at all confident", "Slightly confident", "Moderately confident", "Very confident", "Extremely confident"]

Slider Confidence Scale

yaml
annotation_schemes:
  - annotation_type: radio
    name: stance
    description: What stance does the author take?
    labels: ["Support", "Oppose", "Neutral"]
 
  - annotation_type: confidence_annotation
    name: stance_confidence
    description: Rate your confidence from 0 (guessing) to 100 (certain).
    target_schema: stance
    scale_type: slider
    slider_min: 0
    slider_max: 100

Required Confidence Rating

yaml
annotation_schemes:
  - annotation_type: multiselect
    name: topics
    description: Select all topics that apply.
    labels: ["Politics", "Economy", "Health", "Education"]
 
  - annotation_type: confidence_annotation
    name: topics_confidence
    description: How confident are you in your topic selections?
    target_schema: topics
    scale_type: likert
    scale_points: 3
    labels: ["Low", "Medium", "High"]
    label_requirement:
      required: true

Standalone Confidence (No Target)

Confidence annotations can also be used without a target schema for general self-assessment:

yaml
annotation_schemes:
  - annotation_type: confidence_annotation
    name: task_familiarity
    description: How familiar are you with this topic area?
    scale_type: likert
    scale_points: 5
    labels: ["Not familiar", "Slightly familiar", "Somewhat familiar", "Very familiar", "Expert"]

Output Format

json
{
  "toxicity_confidence": {
    "labels": {
      "confidence": 4
    }
  }
}

For Likert scales, values range from 1 to scale_points. For sliders, values range from slider_min to slider_max.

Best Practices

  1. Always pair with a target schema - confidence ratings are most useful when linked to a specific annotation decision
  2. Use Likert for simplicity - discrete scales are faster and easier for annotators to use
  3. Use sliders for fine-grained measurement - when you need precise confidence values for downstream analysis
  4. Make confidence required - optional confidence ratings often get skipped, reducing data utility
  5. Analyze confidence patterns - low-confidence items are good candidates for adjudication or additional annotations

Further Reading

For implementation details, see the source documentation.