Text Summarization Evaluation
Rate the quality of AI-generated summaries on fluency, coherence, and faithfulness.
Fichier de configurationconfig.yaml
annotation_task_name: "Text Summarization Evaluation"
task_description: "Rate the quality of the summary compared to the source document."
task_dir: "."
port: 8000
data_files:
- "sample-data.json"
item_properties:
id_key: id
text_key: source
context_key: summary
annotation_schemes:
- annotation_type: likert
name: fluency
description: "How fluent and grammatical is the summary?"
size: 5
min_label: "Not fluent"
max_label: "Very fluent"
required: true
- annotation_type: likert
name: coherence
description: "How well-organized and coherent is the summary?"
size: 5
min_label: "Incoherent"
max_label: "Very coherent"
required: true
- annotation_type: likert
name: faithfulness
description: "Does the summary accurately reflect the source without hallucinations?"
size: 5
min_label: "Unfaithful"
max_label: "Faithful"
required: true
- annotation_type: text
name: comments
description: "Optional comments on the summary quality"
required: false
output_annotation_dir: "output/"
output_annotation_format: "json"
Données d'exemplesample-data.json
[
{
"id": "1",
"source": "The International Space Station (ISS) has been continuously occupied since November 2000. It serves as a microgravity and space environment research laboratory where crew members conduct experiments in biology, physics, astronomy, and other fields. The ISS is a joint project among five space agencies: NASA, Roscosmos, JAXA, ESA, and CSA.",
"summary": "The ISS has been occupied since 2000 and serves as a research lab for experiments. It's run by five space agencies including NASA."
},
{
"id": "2",
"source": "Machine learning models require large amounts of training data to achieve good performance. Data annotation is the process of labeling data to provide ground truth for model training. High-quality annotations are essential for building reliable AI systems.",
"summary": "ML models need lots of labeled training data. Good annotations are crucial for building reliable AI."
}
]Obtenir ce design
Clone or download from the repository
Démarrage rapide :
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/evaluation/text-summarization-eval potato start config.yaml
Détails
Types d'annotation
Domaine
Cas d'utilisation
Étiquettes
Vous avez trouvé un problème ou souhaitez améliorer ce design ?
Ouvrir un ticketDesigns associés
Automated Essay Scoring
Holistic and analytic scoring of student essays using a deep-neural approach to automated essay scoring (Uto, arXiv 2022). Annotators provide overall quality ratings, holistic scores on a 1-6 scale, and detailed feedback comments for educational assessment.
Coreference Resolution (OntoNotes)
Link pronouns and noun phrases to the entities they refer to in text. Based on the OntoNotes coreference annotation guidelines and CoNLL shared tasks. Identify mention spans and cluster coreferent mentions together.
FinBERT - Financial Headline Sentiment Analysis
Classify sentiment of financial news headlines as positive, negative, or neutral, based on the FinBERT model (Araci, arXiv 2019). Annotators also rate market outlook on a bearish-to-bullish scale and provide reasoning for their sentiment judgment.