Text Summarization Evaluation
Rate the quality of AI-generated summaries on fluency, coherence, and faithfulness.
survey annotation
Configuration Fileconfig.yaml
task_name: "Text Summarization Evaluation"
task_description: "Rate the quality of the summary compared to the source document."
task_dir: "."
port: 8000
data_files:
- "sample-data.json"
item_properties:
id_key: id
text_key: source
context_key: summary
annotation_schemes:
- annotation_type: likert
name: fluency
description: "How fluent and grammatical is the summary?"
size: 5
min_label: "Not fluent"
max_label: "Very fluent"
required: true
- annotation_type: likert
name: coherence
description: "How well-organized and coherent is the summary?"
size: 5
min_label: "Incoherent"
max_label: "Very coherent"
required: true
- annotation_type: likert
name: faithfulness
description: "Does the summary accurately reflect the source without hallucinations?"
size: 5
min_label: "Unfaithful"
max_label: "Faithful"
required: true
- annotation_type: text
name: comments
description: "Optional comments on the summary quality"
required: false
output_annotation_dir: "output/"
output_annotation_format: "json"
Sample Datasample-data.json
[
{
"id": "1",
"source": "The International Space Station (ISS) has been continuously occupied since November 2000. It serves as a microgravity and space environment research laboratory where crew members conduct experiments in biology, physics, astronomy, and other fields. The ISS is a joint project among five space agencies: NASA, Roscosmos, JAXA, ESA, and CSA.",
"summary": "The ISS has been occupied since 2000 and serves as a research lab for experiments. It's run by five space agencies including NASA."
},
{
"id": "2",
"source": "Machine learning models require large amounts of training data to achieve good performance. Data annotation is the process of labeling data to provide ground truth for model training. High-quality annotations are essential for building reliable AI systems.",
"summary": "ML models need lots of labeled training data. Good annotations are crucial for building reliable AI."
}
]Get This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/text-summarization-eval potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
Survey Feedback
Multi-question survey with Likert scales, text fields, and multiple choice.
Argument Quality Assessment
Multi-dimensional argument quality annotation based on the Wachsmuth et al. (2017) taxonomy. Rates arguments on three dimensions: Cogency (logical validity), Effectiveness (persuasive power), and Reasonableness (contribution to resolution). Used in Dagstuhl-ArgQuality and GAQCorpus datasets.
Emotion Detection (SemEval-2018 Task 1)
Multi-label emotion classification with intensity ratings based on SemEval-2018 Task 1. Annotate text for emotions (anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise, trust) with intensity scales.