Skip to content
Showcase/ESA: Error Span Annotation for Machine Translation
advancedsurvey

ESA: Error Span Annotation for Machine Translation

Error span annotation for machine translation output. Annotators identify error spans in translations, classify error types (accuracy, fluency, terminology, style), and rate severity.

Q1: Rate your experience12345Q2: Primary use case?ResearchIndustryEducationQ3: Additional feedback

Configuration Fileconfig.yaml

# ESA: Error Span Annotation for Machine Translation
# Based on "Error Span Annotation for Machine Translation Evaluation" (Kocmi et al., WMT@EMNLP 2024)
# Task: Identify error spans in translations, classify error types, and rate severity

annotation_task_name: "ESA: Error Span Annotation for MT"
task_dir: "."

# Data configuration
data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

# Display layout showing source and translation
html_layout: |
  <div class="esa-container">
    <div class="source-section" style="background: #e8f5e9; padding: 15px; border-radius: 8px; margin-bottom: 15px;">
      <h3 style="margin-top: 0;">Source Text ({{language_pair}}):</h3>
      <div class="source-text" style="font-size: 16px; line-height: 1.6;">{{source_text}}</div>
    </div>
    <div class="translation-section" style="background: #e3f2fd; padding: 15px; border-radius: 8px; border: 2px solid #1976d2;">
      <h3 style="margin-top: 0; color: #1976d2;">Translation (select error spans below):</h3>
      <div class="translation-text" style="font-size: 16px; line-height: 1.6;">{{text}}</div>
    </div>
  </div>

# Annotation schemes
annotation_schemes:
  # Span annotation for error identification
  - name: "error_spans"
    description: "Select spans in the translation that contain errors. Highlight each error span individually."
    annotation_type: span
    labels:
      - "Accuracy - Mistranslation"
      - "Accuracy - Omission"
      - "Accuracy - Addition"
      - "Fluency - Grammar"
      - "Fluency - Spelling/Punctuation"
      - "Fluency - Register"
      - "Terminology"
      - "Style"
    label_colors:
      "Accuracy - Mistranslation": "#ff5252"
      "Accuracy - Omission": "#ff7043"
      "Accuracy - Addition": "#ff9800"
      "Fluency - Grammar": "#ab47bc"
      "Fluency - Spelling/Punctuation": "#7e57c2"
      "Fluency - Register": "#5c6bc0"
      "Terminology": "#26a69a"
      "Style": "#78909c"

  # Error type classification
  - name: "primary_error_type"
    description: "What is the primary (most severe) error type in this translation?"
    annotation_type: radio
    labels:
      - "Accuracy"
      - "Fluency"
      - "Terminology"
      - "Style"
      - "No errors found"
    keyboard_shortcuts:
      "Accuracy": "1"
      "Fluency": "2"
      "Terminology": "3"
      "Style": "4"
      "No errors found": "0"

  # Severity rating
  - name: "error_severity"
    description: "Rate the overall severity of errors in this translation."
    annotation_type: likert
    size: 5
    min_label: "1 - No errors"
    max_label: "5 - Critical errors"
    labels:
      - "1 - No errors (perfect translation)"
      - "2 - Minor errors (meaning preserved)"
      - "3 - Moderate errors (some meaning lost)"
      - "4 - Major errors (significant meaning loss)"
      - "5 - Critical errors (wrong or incomprehensible)"
    keyboard_shortcuts:
      "1 - No errors (perfect translation)": "q"
      "2 - Minor errors (meaning preserved)": "w"
      "3 - Moderate errors (some meaning lost)": "e"
      "4 - Major errors (significant meaning loss)": "r"
      "5 - Critical errors (wrong or incomprehensible)": "t"

  # Overall quality rating
  - name: "overall_quality"
    description: "Rate the overall translation quality."
    annotation_type: radio
    labels:
      - "Perfect"
      - "Good"
      - "Acceptable"
      - "Poor"
      - "Unacceptable"
    keyboard_shortcuts:
      "Perfect": "z"
      "Good": "x"
      "Acceptable": "c"
      "Poor": "v"
      "Unacceptable": "b"

# User configuration
allow_all_users: true

# Task assignment
instances_per_annotator: 100
annotation_per_instance: 2

Sample Datasample-data.json

[
  {
    "id": "esa_001",
    "text": "The committee decided to postpone the meeting until next week due to the absence of several key members.",
    "source_text": "Das Komitee beschloss, die Sitzung auf nächste Woche zu verschieben, da mehrere wichtige Mitglieder abwesend waren.",
    "language_pair": "German-English"
  },
  {
    "id": "esa_002",
    "text": "The new policy will effect all employees starting from January, including those who work part-time in the remote offices.",
    "source_text": "Die neue Richtlinie wird ab Januar alle Mitarbeiter betreffen, einschließlich derjenigen, die in Teilzeit in den Außenstellen arbeiten.",
    "language_pair": "German-English"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/evaluation/esa-mt-error-spans
potato start config.yaml

Details

Annotation Types

spanradiolikert

Domain

NLPMachine TranslationEvaluation

Use Cases

MT EvaluationError AnalysisTranslation Quality

Tags

machine-translationerror-spanswmtevaluationtranslation-qualitymqm

Found an issue or want to improve this design?

Open an Issue