Image Captioning Evaluation

Rate AI-generated image captions for accuracy, fluency, and detail.

Archivo de configuraciónconfig.yaml

annotation_task_name: "Image Captioning Evaluation"
task_name: "Image Captioning Evaluation"
task_description: "Rate the quality of the generated caption for the image."
task_dir: "."
port: 8000

data_files:
  - "sample-data.json"

item_properties:
  id_key: "id"
  text_key: "image_url"
  image_key: "image_url"
  context_key: "caption"

annotation_schemes:
  - annotation_type: likert
    name: accuracy
    description: "Does the caption accurately describe what's in the image?"
    size: 5
    min_label: "Inaccurate"
    max_label: "Accurate"
    required: true

  - annotation_type: likert
    name: detail
    description: "How detailed is the caption?"
    size: 5
    min_label: "Too vague"
    max_label: "Appropriate detail"
    required: true

  - annotation_type: radio
    name: hallucination
    description: "Does the caption mention things not in the image?"
    labels:
      - "Yes, hallucinations present"
      - "No hallucinations"
    required: true

output_annotation_dir: "output/"
output_annotation_format: "json"

Datos de ejemplosample-data.json

[
  {
    "id": "1",
    "image_url": "https://images.unsplash.com/photo-1543466835-00a7907e9de1?w=640",
    "caption": "A brown dog sitting on grass looking at the camera with its tongue out."
  },
  {
    "id": "2",
    "image_url": "https://images.unsplash.com/photo-1504208434309-cb69f4fe52b0?w=640",
    "caption": "A sunset over mountains with orange and purple clouds."
  }
]

Obtener este diseño

View on GitHub

Clone or download from the repository

Inicio rápido:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/evaluation/image-captioning-eval
potato start config.yaml

Detalles

Tipos de anotación

likertradio

Dominio

Computer VisionNLP

Casos de uso

Image CaptioningVLM Evaluation

Etiquetas

captioningimageevaluationvlm

¿Encontró un problema o desea mejorar este diseño?

Abrir un issue

Diseños relacionados

Image Classification

Multi-class image classification with thumbnail preview and zoom controls.

likertmultiselect

T2I-CompBench Text-to-Image Evaluation

Compositional text-to-image generation evaluation based on T2I-CompBench (Huang et al., NeurIPS 2023). Annotators rate image quality on a Likert scale, classify the compositional challenge type, and compare pairs of generated images via pairwise preference.

likertradio

AnnoMI Counselling Dialogue Annotation

Annotation of motivational interviewing counselling dialogues based on the AnnoMI dataset. Annotators label therapist and client utterances for MI techniques (open questions, reflections, affirmations) and client change talk (sustain talk, change talk), with quality ratings for therapeutic interactions.

radiomultiselect