Video-ChatGPT - Video QA Display and Evaluation
Video question answering evaluation based on the Video-ChatGPT benchmark (Maaz et al., ACL 2024). Annotators watch a video, review a model-generated response to a question, and evaluate correctness and quality.
Archivo de configuraciónconfig.yaml
# Video-ChatGPT - Video QA Display and Evaluation
# Based on Maaz et al., ACL 2024
# Paper: https://arxiv.org/abs/2306.05424
# Dataset: https://github.com/mbzuai-oryx/Video-ChatGPT
#
# This task presents a video along with a question and a model-generated
# response. Annotators evaluate the correctness of the response and rate
# the overall quality of the model's answer.
#
# Correctness Labels:
# - Correct: The response accurately answers the question
# - Partially Correct: The response contains some correct and some incorrect information
# - Incorrect: The response does not accurately answer the question
#
# Annotation Guidelines:
# 1. Watch the video carefully
# 2. Read the question
# 3. Review the model's response
# 4. Judge the correctness of the response based on the video content
# 5. Rate the overall response quality
annotation_task_name: "Video-ChatGPT - Video QA Display and Evaluation"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: video
name: video_display
description: "Video display for evaluation"
- annotation_type: radio
name: correctness
description: "How correct is the model's response based on the video content?"
labels:
- "Correct"
- "Partially Correct"
- "Incorrect"
keyboard_shortcuts:
"Correct": "1"
"Partially Correct": "2"
"Incorrect": "3"
tooltips:
"Correct": "The response accurately and fully answers the question based on the video"
"Partially Correct": "The response contains some correct elements but is incomplete or partially wrong"
"Incorrect": "The response does not accurately answer the question based on the video"
- annotation_type: likert
name: response_quality
description: "Rate the overall quality of the model's response"
min_label: "Very Poor"
max_label: "Excellent"
size: 5
annotation_instructions: |
You will be shown a video, a question about the video, and a model-generated response.
1. Watch the video carefully (you may replay it as needed).
2. Read the question and the model's response.
3. Judge the correctness: Correct, Partially Correct, or Incorrect.
4. Rate the overall response quality on a 5-point scale.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #1e293b; border-radius: 8px; padding: 16px; margin-bottom: 16px; text-align: center;">
<video controls style="max-width: 100%; border-radius: 4px;">
<source src="{{video_url}}" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #a16207;">Question:</strong>
<p style="font-size: 16px; line-height: 1.6; margin: 8px 0 0 0;">{{text}}</p>
</div>
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px;">
<strong style="color: #0369a1;">Model Response:</strong>
<p style="font-size: 15px; line-height: 1.7; margin: 8px 0 0 0;">{{model_response}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
Datos de ejemplosample-data.json
[
{
"id": "vcgpt_001",
"text": "What is the person doing in the kitchen at the beginning of the video?",
"video_url": "videos/sample_001.mp4",
"model_response": "The person is chopping vegetables on a cutting board. They appear to be preparing ingredients for cooking, using a large chef's knife to slice carrots and celery into small pieces."
},
{
"id": "vcgpt_002",
"text": "How many people are present in the outdoor scene and what are they doing?",
"video_url": "videos/sample_002.mp4",
"model_response": "There are three people in the outdoor scene. Two of them are playing frisbee on a grassy field while the third person is sitting on a bench watching them. The weather appears to be sunny."
}
]
// ... and 8 more itemsObtener este diseño
Clone or download from the repository
Inicio rápido:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/video/video-chatgpt-qa-display potato start config.yaml
Detalles
Tipos de anotación
Dominio
Casos de uso
Etiquetas
¿Encontró un problema o desea mejorar este diseño?
Abrir un issueDiseños relacionados
RT-2 - Robotic Action Annotation
Robotic manipulation task evaluation and action segmentation based on RT-2 (Brohan et al., CoRL 2023). Annotators evaluate task success, describe actions, rate execution quality, and segment video into action phases.
TVSum Video Summarization
Frame-level importance scoring for video summarization. Annotators rate 2-second shots on a 1-5 importance scale to identify key moments worth including in a summary.
ESA: Error Span Annotation for Machine Translation
Error span annotation for machine translation output. Annotators identify error spans in translations, classify error types (accuracy, fluency, terminology, style), and rate severity.