Skip to content
此页面尚未提供您所选语言的版本,当前显示英文版本。

Rating Scales and Likert Design

How to design rating scales for annotation, Likert vs. sliders, how many points to use, avoiding acquiescence bias, and building rating tasks in Potato.

A rating scale captures degree, how positive, how fluent, how helpful, rather than a category. The two common forms are the discrete Likert scale (e.g. 1–5) and the continuous slider. Small design choices in a scale change your data more than people expect.

Likert: discrete points

Use a Likert scale when you want comparable, easy-to-aggregate ratings:

yaml
annotation_schemes:
  - annotation_type: likert
    name: fluency
    description: "How fluent is this translation?"
    size: 5
    min_label: "Not fluent at all"
    max_label: "Perfectly fluent"

Design decisions that matter:

  • How many points? Five is a safe default. Seven gives more resolution if annotators can use it. An even number removes the neutral midpoint and forces a lean, useful when "neutral" is a cop-out, risky when neutrality is real.
  • Label the ends, and ideally every point. Labeled points are interpreted more consistently than bare numbers.
  • Keep the direction consistent across all your scales so annotators don't flip them by habit.

Sliders: continuous values

Use a slider when the underlying quantity really is continuous, such as a confidence percentage or an emotion intensity:

yaml
annotation_schemes:
  - annotation_type: slider
    name: confidence
    description: "How confident are you in your label?"
    min: 0
    max: 100
    step: 1
    min_label: "Guessing"
    max_label: "Certain"

Continuous scales give resolution but lower agreement, because people don't share a fine-grained sense of "67 vs. 72". Bin the output if you need agreement.

Biases to design around

  • Acquiescence bias: a tendency to agree. Mix in reverse-worded items so agreement isn't the default. See acquiescence bias.
  • Central tendency: clustering on the middle. Clear endpoint labels and, where appropriate, an even number of points push against it.
  • Anchoring: the first few items set a reference. A short calibration set at the start helps.

Beyond a single scale

Further reading