beginnerimage
Visual Question Answering
Answer questions about images for VQA dataset creation.
Configuration Fileconfig.yaml
annotation_task_name: "Visual Question Answering"
task_name: "Visual Question Answering"
task_description: "Answer the question about the image."
task_dir: "."
port: 8000
data_files:
- "sample-data.json"
item_properties:
id_key: "id"
text_key: "image_url"
image_key: "image_url"
context_key: "question"
annotation_schemes:
- annotation_type: text
name: answer
description: "Provide a concise answer to the question"
required: true
- annotation_type: radio
name: confidence
description: "How confident are you in your answer?"
labels:
- "Very confident"
- "Somewhat confident"
- "Not confident"
required: true
output_annotation_dir: "output/"
output_annotation_format: "json"
Sample Datasample-data.json
[
{
"id": "1",
"image_url": "https://images.unsplash.com/photo-1560807707-8cc77767d783?w=640",
"question": "What color is the dog?"
},
{
"id": "2",
"image_url": "https://images.unsplash.com/photo-1449824913935-59a10b8d2000?w=640",
"question": "What time of day does this appear to be?"
}
]Get This Design
View on GitHub
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/evaluation/visual-qa potato start config.yaml
Details
Annotation Types
radiotext
Domain
Computer VisionNLP
Use Cases
Visual QAMultimodal
Tags
vqavisual-qamultimodalimage
Found an issue or want to improve this design?
Open an IssueRelated Designs
TextVQA - Reading Text in Images
Visual question answering that requires reading and reasoning about text present in images. Based on the TextVQA dataset (Singh et al., CVPR 2019), annotators answer questions about images where understanding scene text (signs, labels, menus, etc.) is essential.
textradio
CUB-200-2011 Fine-Grained Bird Classification
Fine-grained visual categorization of 200 bird species (Wah et al., 2011). Annotate bird images with species labels, part locations, and attribute annotations.
multiselectradio
EPIC-KITCHENS Egocentric Action Annotation
Annotate fine-grained actions in egocentric kitchen videos with verb-noun pairs. Identify cooking actions from a first-person perspective.
radiotext