Skip to content

Error Span

Build MQM-style error annotation interfaces in Potato for translation quality evaluation, text error marking, and typed error span annotation with severity scoring.

Error Span

The error span annotation schema provides an MQM-style (Multidimensional Quality Metrics) interface for marking errors in text with typed categories and severity levels. This schema is ideal for translation quality evaluation, text editing review, content quality assessment, and any task requiring fine-grained error annotation.

Overview

The error span schema provides:

  • Typed error categories with optional subtypes for detailed classification
  • Severity levels with configurable point deductions
  • Running quality score that decreases as errors are marked
  • Color-coded spans that visually distinguish error types and severities

Annotators select text spans, assign an error type and severity, and the system automatically computes a quality score.

Quick Start

yaml
annotation_schemes:
  - annotation_type: error_span
    name: translation_errors
    description: Mark all errors in the translation below.
    error_types:
      - name: Accuracy
      - name: Fluency
      - name: Terminology
    show_score: true
    max_score: 100

Configuration Options

FieldTypeDefaultDescription
annotation_typestringRequiredMust be "error_span"
namestringRequiredUnique identifier for this schema
descriptionstringRequiredInstructions displayed to annotators
error_typesarrayRequiredList of error type objects, each with name and optional subtypes array
severitiesarray[{name: "Minor", weight: -1}, {name: "Major", weight: -5}, {name: "Critical", weight: -10}]List of severity levels with name and weight (point deduction)
show_scorebooleantrueDisplay a running quality score
max_scoreinteger100Starting quality score before deductions

Examples

Translation Quality (MQM)

yaml
annotation_schemes:
  - annotation_type: error_span
    name: mqm_errors
    description: >
      Mark all errors in the machine translation.
      Select the error span, choose a category and severity.
    error_types:
      - name: Accuracy
        subtypes:
          - Mistranslation
          - Addition
          - Omission
          - Untranslated
      - name: Fluency
        subtypes:
          - Grammar
          - Spelling
          - Punctuation
          - Register
      - name: Terminology
        subtypes:
          - Inconsistent
          - Wrong Term
      - name: Style
    severities:
      - name: Minor
        weight: -1
      - name: Major
        weight: -5
      - name: Critical
        weight: -10
    show_score: true
    max_score: 100

Content Editing Review

yaml
annotation_schemes:
  - annotation_type: error_span
    name: editing_errors
    description: Mark all issues that need editing in this article.
    error_types:
      - name: Factual Error
      - name: Grammar
        subtypes:
          - Subject-Verb Agreement
          - Tense
          - Pronoun Reference
      - name: Style
        subtypes:
          - Wordiness
          - Passive Voice
          - Jargon
      - name: Formatting
    severities:
      - name: Suggestion
        weight: -1
      - name: Required Fix
        weight: -5
    show_score: false

Code Review Annotation

yaml
annotation_schemes:
  - annotation_type: error_span
    name: code_errors
    description: Mark issues in this code snippet.
    error_types:
      - name: Bug
        subtypes:
          - Logic Error
          - Off-by-One
          - Null Reference
      - name: Style
        subtypes:
          - Naming
          - Formatting
      - name: Security
        subtypes:
          - Injection
          - Exposure
      - name: Performance
    severities:
      - name: Nitpick
        weight: -1
      - name: Warning
        weight: -3
      - name: Blocker
        weight: -10
    max_score: 100
    show_score: true

Output Format

json
{
  "translation_errors": {
    "labels": {
      "errors": [
        {
          "start": 12,
          "end": 25,
          "text": "incorrectly translated",
          "error_type": "Accuracy",
          "subtype": "Mistranslation",
          "severity": "Major"
        },
        {
          "start": 45,
          "end": 52,
          "text": "the the",
          "error_type": "Fluency",
          "subtype": "Grammar",
          "severity": "Minor"
        }
      ],
      "score": 94
    }
  }
}

The score is computed as max_score plus the sum of all severity weights.

Best Practices

  1. Define clear error type boundaries - annotators should not struggle to decide between two error types; provide examples in the description
  2. Use subtypes for granularity - top-level types keep the interface simple while subtypes allow detailed analysis when needed
  3. Calibrate severity weights carefully - the weight ratios should reflect actual impact; a critical error should be meaningfully more costly than a minor one
  4. Set max_score relative to text length - for short texts, a lower max_score prevents single errors from having outsized impact
  5. Provide annotation guidelines - MQM-style annotation benefits greatly from detailed guidelines with examples of each error type and severity

Further Reading

For implementation details, see the source documentation.