EA-MT - Entity-Aware Machine Translation

Entity-aware machine translation evaluation requiring annotators to identify entity spans, classify translation errors, and provide corrected translations. Based on SemEval-2025 Task 2.

Configuration Fileconfig.yaml

This Potato config reproduces the annotation task. Save it as config.yaml and run potato start config.yaml to try it.

yaml

# EA-MT - Entity-Aware Machine Translation
# Based on Knowles et al., SemEval 2025
# Paper: https://aclanthology.org/volumes/2025.semeval-1/
# Dataset: https://github.com/SemEval/SemEval2025-Task2
#
# This task evaluates machine translation quality with a focus on
# named entities. Annotators mark entity spans in the translation,
# classify overall translation quality, and provide corrections when
# entity errors are found.
#
# Span Labels:
# - Entity: A correctly translated named entity
# - Mistranslated Entity: An entity that was incorrectly translated
# - Correct Entity: An entity that was correctly preserved/translated
#
# Quality Labels:
# - Correct Translation: The translation is accurate overall
# - Entity Error: The translation has entity-specific errors
# - Other Error: The translation has non-entity errors
# - Multiple Errors: The translation has multiple types of errors

annotation_task_name: "EA-MT - Entity-Aware Machine Translation"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: span
    name: entity_spans
    description: "Highlight entity spans in the translation text and label them."
    labels:
      - "Entity"
      - "Mistranslated Entity"
      - "Correct Entity"

  - annotation_type: radio
    name: translation_quality
    description: "What is the overall quality of this translation with respect to entities?"
    labels:
      - "Correct Translation"
      - "Entity Error"
      - "Other Error"
      - "Multiple Errors"
    keyboard_shortcuts:
      "Correct Translation": "1"
      "Entity Error": "2"
      "Other Error": "3"
      "Multiple Errors": "4"
    tooltips:
      "Correct Translation": "The translation is accurate and entities are correctly handled"
      "Entity Error": "The translation contains errors specifically in entity translation"
      "Other Error": "The translation has errors unrelated to entities"
      "Multiple Errors": "The translation contains both entity and non-entity errors"

  - annotation_type: text
    name: corrected_translation
    description: "If there are entity errors, provide the corrected translation."

annotation_instructions: |
  You will be shown a source sentence and its machine translation. Your tasks are:
  1. Identify and highlight entity spans in the translation (names, places, organizations, etc.).
  2. Label each span as correctly or incorrectly translated.
  3. Assess the overall translation quality with respect to entities.
  4. If entity errors exist, provide a corrected translation.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #f0fdf4; border: 1px solid #bbf7d0; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #166534;">Source ({{source_lang}}):</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
    <div style="background: #fef2f2; border: 1px solid #fecaca; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #991b1b;">Translation ({{target_lang}}):</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{translation}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

json

[
  {
    "id": "eamt_001",
    "text": "Der Premierminister Boris Johnson sprach gestern im britischen Parlament in London.",
    "translation": "Prime Minister Boris Johnson spoke yesterday in the British Parliament in London.",
    "source_lang": "German",
    "target_lang": "English"
  },
  {
    "id": "eamt_002",
    "text": "La empresa Google anuncio una nueva sede en la Ciudad de Mexico.",
    "translation": "The company Google announced a new headquarters in the City of Mexiko.",
    "source_lang": "Spanish",
    "target_lang": "English"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2025/task02-entity-aware-mt
potato start config.yaml

Dataset & paper

Knowles et al., SemEval 2025

Official dataset ↗Read the paper ↗

Citation (BibTeX)

bibtex

@inproceedings{knowles-etal-2025-eamt,
    title = "{EA-MT}: Entity-Aware Machine Translation",
    author = "Knowles, Rebecca and others",
    booktitle = "Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)",
    year = "2025",
    publisher = "Association for Computational Linguistics"
}

Details

Annotation Types

spanradiotext

Domain

SemEvalNLPMachine TranslationNamed Entities

Use Cases

Translation QualityEntity RecognitionMT Evaluation

Related Designs

Clickbait Spoiling

Classification and extraction of spoilers for clickbait posts, including spoiler type identification and span-level spoiler detection. Based on SemEval-2023 Task 5 (Hagen et al.).

textradio

MeasEval - Counts and Measurements

Extract and classify measurements, quantities, units, and measured entities from scientific text, based on SemEval-2021 Task 8 (Harper et al.). Annotators span-annotate measurement components and classify quantity types with normalized values.

spanradio

Code Review Annotation (CodeReviewer)

Annotation of code review activities based on the CodeReviewer benchmark. Annotators identify issues in code diffs, classify defect types, assign severity levels, make review decisions, and provide natural language review comments, supporting research in automated code review and software engineering.