Showcase/Text Summarization Evaluation
intermediatesurvey

Text Summarization Evaluation

Rate the quality of AI-generated summaries on fluency, coherence, and faithfulness.

📊

survey annotation

Configuration Fileconfig.yaml

task_name: "Text Summarization Evaluation"
task_description: "Rate the quality of the summary compared to the source document."
task_dir: "."
port: 8000

data_files:
  - "sample-data.json"

item_properties:
  id_key: id
  text_key: source
  context_key: summary

annotation_schemes:
  - annotation_type: likert
    name: fluency
    description: "How fluent and grammatical is the summary?"
    size: 5
    min_label: "Not fluent"
    max_label: "Very fluent"
    required: true

  - annotation_type: likert
    name: coherence
    description: "How well-organized and coherent is the summary?"
    size: 5
    min_label: "Incoherent"
    max_label: "Very coherent"
    required: true

  - annotation_type: likert
    name: faithfulness
    description: "Does the summary accurately reflect the source without hallucinations?"
    size: 5
    min_label: "Unfaithful"
    max_label: "Faithful"
    required: true

  - annotation_type: text
    name: comments
    description: "Optional comments on the summary quality"
    required: false

output_annotation_dir: "output/"
output_annotation_format: "json"

Sample Datasample-data.json

[
  {
    "id": "1",
    "source": "The International Space Station (ISS) has been continuously occupied since November 2000. It serves as a microgravity and space environment research laboratory where crew members conduct experiments in biology, physics, astronomy, and other fields. The ISS is a joint project among five space agencies: NASA, Roscosmos, JAXA, ESA, and CSA.",
    "summary": "The ISS has been occupied since 2000 and serves as a research lab for experiments. It's run by five space agencies including NASA."
  },
  {
    "id": "2",
    "source": "Machine learning models require large amounts of training data to achieve good performance. Data annotation is the process of labeling data to provide ground truth for model training. High-quality annotations are essential for building reliable AI systems.",
    "summary": "ML models need lots of labeled training data. Good annotations are crucial for building reliable AI."
  }
]

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text-summarization-eval
potato start config.yaml

Details

Annotation Types

likerttext

Domain

NLPEvaluation

Use Cases

SummarizationNLG Evaluation

Tags

summarizationevaluationnlgquality

Found an issue or want to improve this design?

Open an Issue