VQA v2.0 - Visual Question Answering
Visual question answering requiring annotators to answer natural language questions about images, based on the VQA v2.0 dataset (Goyal et al., CVPR 2017). Supports both open-ended and yes/no question types.
ملف الإعدادconfig.yaml
# VQA v2.0 - Visual Question Answering
# Based on Goyal et al., CVPR 2017
# Paper: https://arxiv.org/abs/1612.00837
# Dataset: https://visualqa.org/
#
# This task presents an image and a natural language question about it.
# Annotators provide a free-form answer and, for yes/no questions,
# also select from a radio button group.
#
# Question Types:
# - yes/no: Questions that can be answered with yes or no
# - number: Questions asking about counts or quantities
# - other: Open-ended questions requiring descriptive answers
#
# Annotation Guidelines:
# 1. Look at the image carefully
# 2. Read the question
# 3. Provide a concise, accurate answer in the text field
# 4. For yes/no questions, also select the appropriate radio button
# 5. Answer based only on what you can see in the image
annotation_task_name: "VQA v2.0 - Visual Question Answering"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
# Step 1: Free-form answer
- annotation_type: text
name: answer
description: "Provide a concise answer to the question about the image"
# Step 2: Yes/No/NA for applicable questions
- annotation_type: radio
name: yes_no_answer
description: "For yes/no questions, select the appropriate answer. For other question types, select Not Applicable."
labels:
- "Yes"
- "No"
- "Not Applicable"
keyboard_shortcuts:
"Yes": "1"
"No": "2"
"Not Applicable": "3"
tooltips:
"Yes": "The answer to the yes/no question is yes"
"No": "The answer to the yes/no question is no"
"Not Applicable": "The question is not a yes/no question"
annotation_instructions: |
You will be shown an image and a question about it. Your task is to:
1. Study the image carefully.
2. Read the question.
3. Type a concise answer in the text field (1-3 words preferred).
4. If the question is a yes/no question, also select Yes or No. Otherwise, select Not Applicable.
Answer based only on what you can see in the image. Keep answers brief and specific.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin-bottom: 16px; text-align: center;">
<img src="{{image_url}}" style="max-width: 100%; max-height: 500px; border-radius: 4px;" alt="Image for question answering" />
</div>
<div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #a16207; font-size: 18px;">Question:</strong>
<p style="font-size: 17px; line-height: 1.6; margin: 8px 0 0 0; font-weight: 500;">{{text}}</p>
<p style="font-size: 13px; color: #6b7280; margin: 8px 0 0 0;">Question type: {{question_type}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
بيانات نموذجيةsample-data.json
[
{
"id": "vqa_001",
"text": "What color is the fire hydrant?",
"image_url": "images/vqa_001.jpg",
"question_type": "other"
},
{
"id": "vqa_002",
"text": "Is the person wearing a hat?",
"image_url": "images/vqa_002.jpg",
"question_type": "yes/no"
}
]
// ... and 8 more itemsاحصل على هذا التصميم
Clone or download from the repository
بدء سريع:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/image/visual-qa/vqav2-visual-question-answering potato start config.yaml
التفاصيل
أنواع التوسيم
المجال
حالات الاستخدام
الوسوم
وجدت مشكلة أو تريد تحسين هذا التصميم؟
افتح مشكلةتصاميم ذات صلة
CUB-200-2011 Fine-Grained Bird Classification
Fine-grained visual categorization of 200 bird species (Wah et al., 2011). Annotate bird images with species labels, part locations, and attribute annotations.
EPIC-KITCHENS Egocentric Action Annotation
Annotate fine-grained actions in egocentric kitchen videos with verb-noun pairs. Identify cooking actions from a first-person perspective.
FLAIR: French Land Cover from Aerospace Imagery
Land use and land cover classification from high-resolution aerial imagery. Annotators classify the primary land use category of aerial image patches and identify any secondary land uses present. Based on the FLAIR dataset from the French National Institute of Geographic and Forest Information (IGN).