Text Summarization Evaluation

Rate the quality of AI-generated summaries on fluency, coherence, and faithfulness.

Fichier de configurationconfig.yaml

annotation_task_name: "Text Summarization Evaluation"

task_description: "Rate the quality of the summary compared to the source document."
task_dir: "."
port: 8000

data_files:
  - "sample-data.json"

item_properties:
  id_key: id
  text_key: source
  context_key: summary

annotation_schemes:
  - annotation_type: likert
    name: fluency
    description: "How fluent and grammatical is the summary?"
    size: 5
    min_label: "Not fluent"
    max_label: "Very fluent"
    required: true

  - annotation_type: likert
    name: coherence
    description: "How well-organized and coherent is the summary?"
    size: 5
    min_label: "Incoherent"
    max_label: "Very coherent"
    required: true

  - annotation_type: likert
    name: faithfulness
    description: "Does the summary accurately reflect the source without hallucinations?"
    size: 5
    min_label: "Unfaithful"
    max_label: "Faithful"
    required: true

  - annotation_type: text
    name: comments
    description: "Optional comments on the summary quality"
    required: false

output_annotation_dir: "output/"
output_annotation_format: "json"

Données d'exemplesample-data.json

[
  {
    "id": "1",
    "source": "The International Space Station (ISS) has been continuously occupied since November 2000. It serves as a microgravity and space environment research laboratory where crew members conduct experiments in biology, physics, astronomy, and other fields. The ISS is a joint project among five space agencies: NASA, Roscosmos, JAXA, ESA, and CSA.",
    "summary": "The ISS has been occupied since 2000 and serves as a research lab for experiments. It's run by five space agencies including NASA."
  },
  {
    "id": "2",
    "source": "Machine learning models require large amounts of training data to achieve good performance. Data annotation is the process of labeling data to provide ground truth for model training. High-quality annotations are essential for building reliable AI systems.",
    "summary": "ML models need lots of labeled training data. Good annotations are crucial for building reliable AI."
  }
]

Obtenir ce design

View on GitHub

Clone or download from the repository

Démarrage rapide :

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/evaluation/text-summarization-eval
potato start config.yaml

Détails

Types d'annotation

likerttext

Domaine

NLPEvaluation

Cas d'utilisation

SummarizationNLG Evaluation

Étiquettes

summarizationevaluationnlgquality

Vous avez trouvé un problème ou souhaitez améliorer ce design ?

Ouvrir un ticket

Designs associés

Automated Essay Scoring

Holistic and analytic scoring of student essays using a deep-neural approach to automated essay scoring (Uto, arXiv 2022). Annotators provide overall quality ratings, holistic scores on a 1-6 scale, and detailed feedback comments for educational assessment.

likertslider

Coreference Resolution (OntoNotes)

Link pronouns and noun phrases to the entities they refer to in text. Based on the OntoNotes coreference annotation guidelines and CoNLL shared tasks. Identify mention spans and cluster coreferent mentions together.

likertradio

FinBERT - Financial Headline Sentiment Analysis

Classify sentiment of financial news headlines as positive, negative, or neutral, based on the FinBERT model (Araci, arXiv 2019). Annotators also rate market outlook on a bearish-to-bullish scale and provide reasoning for their sentiment judgment.

radiotext