Text Summarization Evaluation

Rate the quality of AI-generated summaries on fluency, coherence, and faithfulness.

Archivo de configuraciónconfig.yaml

annotation_task_name: "Text Summarization Evaluation"

task_description: "Rate the quality of the summary compared to the source document."
task_dir: "."
port: 8000

data_files:
  - "sample-data.json"

item_properties:
  id_key: id
  text_key: source
  context_key: summary

annotation_schemes:
  - annotation_type: likert
    name: fluency
    description: "How fluent and grammatical is the summary?"
    size: 5
    min_label: "Not fluent"
    max_label: "Very fluent"
    required: true

  - annotation_type: likert
    name: coherence
    description: "How well-organized and coherent is the summary?"
    size: 5
    min_label: "Incoherent"
    max_label: "Very coherent"
    required: true

  - annotation_type: likert
    name: faithfulness
    description: "Does the summary accurately reflect the source without hallucinations?"
    size: 5
    min_label: "Unfaithful"
    max_label: "Faithful"
    required: true

  - annotation_type: text
    name: comments
    description: "Optional comments on the summary quality"
    required: false

output_annotation_dir: "output/"
output_annotation_format: "json"

Datos de ejemplosample-data.json

[
  {
    "id": "1",
    "source": "The International Space Station (ISS) has been continuously occupied since November 2000. It serves as a microgravity and space environment research laboratory where crew members conduct experiments in biology, physics, astronomy, and other fields. The ISS is a joint project among five space agencies: NASA, Roscosmos, JAXA, ESA, and CSA.",
    "summary": "The ISS has been occupied since 2000 and serves as a research lab for experiments. It's run by five space agencies including NASA."
  },
  {
    "id": "2",
    "source": "Machine learning models require large amounts of training data to achieve good performance. Data annotation is the process of labeling data to provide ground truth for model training. High-quality annotations are essential for building reliable AI systems.",
    "summary": "ML models need lots of labeled training data. Good annotations are crucial for building reliable AI."
  }
]

Obtener este diseño

View on GitHub

Clone or download from the repository

Inicio rápido:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/evaluation/text-summarization-eval
potato start config.yaml

Detalles

Tipos de anotación

likerttext

Dominio

NLPEvaluation

Casos de uso

SummarizationNLG Evaluation

Etiquetas

summarizationevaluationnlgquality

¿Encontró un problema o desea mejorar este diseño?

Abrir un issue

Diseños relacionados

Automated Essay Scoring

Holistic and analytic scoring of student essays using a deep-neural approach to automated essay scoring (Uto, arXiv 2022). Annotators provide overall quality ratings, holistic scores on a 1-6 scale, and detailed feedback comments for educational assessment.

likertslider

Coreference Resolution (OntoNotes)

Link pronouns and noun phrases to the entities they refer to in text. Based on the OntoNotes coreference annotation guidelines and CoNLL shared tasks. Identify mention spans and cluster coreferent mentions together.

likertradio

FinBERT - Financial Headline Sentiment Analysis

Classify sentiment of financial news headlines as positive, negative, or neutral, based on the FinBERT model (Araci, arXiv 2019). Annotators also rate market outlook on a bearish-to-bullish scale and provide reasoning for their sentiment judgment.

radiotext