beginnerimage
Visual Question Answering
Answer questions about images for VQA dataset creation.
Fichier de configurationconfig.yaml
annotation_task_name: "Visual Question Answering"
task_name: "Visual Question Answering"
task_description: "Answer the question about the image."
task_dir: "."
port: 8000
data_files:
- "sample-data.json"
item_properties:
id_key: "id"
text_key: "image_url"
image_key: "image_url"
context_key: "question"
annotation_schemes:
- annotation_type: text
name: answer
description: "Provide a concise answer to the question"
required: true
- annotation_type: radio
name: confidence
description: "How confident are you in your answer?"
labels:
- "Very confident"
- "Somewhat confident"
- "Not confident"
required: true
output_annotation_dir: "output/"
output_annotation_format: "json"
Données d'exemplesample-data.json
[
{
"id": "1",
"image_url": "https://images.unsplash.com/photo-1560807707-8cc77767d783?w=640",
"question": "What color is the dog?"
},
{
"id": "2",
"image_url": "https://images.unsplash.com/photo-1449824913935-59a10b8d2000?w=640",
"question": "What time of day does this appear to be?"
}
]Obtenir ce design
View on GitHub
Clone or download from the repository
Démarrage rapide :
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/evaluation/visual-qa potato start config.yaml
Détails
Types d'annotation
radiotext
Domaine
Computer VisionNLP
Cas d'utilisation
Visual QAMultimodal
Étiquettes
vqavisual-qamultimodalimage
Vous avez trouvé un problème ou souhaitez améliorer ce design ?
Ouvrir un ticketDesigns associés
TextVQA - Reading Text in Images
Visual question answering that requires reading and reasoning about text present in images. Based on the TextVQA dataset (Singh et al., CVPR 2019), annotators answer questions about images where understanding scene text (signs, labels, menus, etc.) is essential.
textradio
CUB-200-2011 Fine-Grained Bird Classification
Fine-grained visual categorization of 200 bird species (Wah et al., 2011). Annotate bird images with species labels, part locations, and attribute annotations.
multiselectradio
EPIC-KITCHENS Egocentric Action Annotation
Annotate fine-grained actions in egocentric kitchen videos with verb-noun pairs. Identify cooking actions from a first-person perspective.
radiotext