Machine Translation Evaluation
Evaluate machine translation quality with adequacy and fluency ratings.
survey annotation
Configuration Fileconfig.yaml
task_name: "Machine Translation Evaluation"
task_description: "Evaluate the quality of the machine translation."
task_dir: "."
port: 8000
data_files:
- "sample-data.json"
item_properties:
id_key: id
text_key: source
context_key: translation
annotation_schemes:
- annotation_type: likert
name: adequacy
description: "How much of the source meaning is preserved in the translation?"
size: 5
min_label: "None"
max_label: "All"
required: true
- annotation_type: likert
name: fluency
description: "How fluent is the translation in the target language?"
size: 5
min_label: "Incomprehensible"
max_label: "Flawless"
required: true
- annotation_type: multiselect
name: errors
description: "Select any errors present in the translation"
labels:
- "Mistranslation"
- "Omission"
- "Addition"
- "Grammar error"
- "Word order"
- "Terminology"
required: false
output_annotation_dir: "output/"
output_annotation_format: "json"
Sample Datasample-data.json
[
{
"id": "1",
"source": "El gato negro duerme en el sofá.",
"source_lang": "Spanish",
"target_lang": "English",
"translation": "The black cat sleeps on the couch."
},
{
"id": "2",
"source": "Je voudrais réserver une table pour deux personnes.",
"source_lang": "French",
"target_lang": "English",
"translation": "I would like to book a table for two people."
}
]Get This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/machine-translation-eval potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
Emotion Detection (SemEval-2018 Task 1)
Multi-label emotion classification with intensity ratings based on SemEval-2018 Task 1. Annotate text for emotions (anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise, trust) with intensity scales.
Toxicity Detection
Multi-label toxicity classification with severity ratings for content moderation.
Argument Quality Assessment
Multi-dimensional argument quality annotation based on the Wachsmuth et al. (2017) taxonomy. Rates arguments on three dimensions: Cogency (logical validity), Effectiveness (persuasive power), and Reasonableness (contribution to resolution). Used in Dagstuhl-ArgQuality and GAQCorpus datasets.