Visual Question Answering

Answer questions about images for VQA dataset creation.

Fichier de configurationconfig.yaml

annotation_task_name: "Visual Question Answering"
task_name: "Visual Question Answering"
task_description: "Answer the question about the image."
task_dir: "."
port: 8000

data_files:
  - "sample-data.json"

item_properties:
  id_key: "id"
  text_key: "image_url"
  image_key: "image_url"
  context_key: "question"

annotation_schemes:
  - annotation_type: text
    name: answer
    description: "Provide a concise answer to the question"
    required: true

  - annotation_type: radio
    name: confidence
    description: "How confident are you in your answer?"
    labels:
      - "Very confident"
      - "Somewhat confident"
      - "Not confident"
    required: true

output_annotation_dir: "output/"
output_annotation_format: "json"

Données d'exemplesample-data.json

[
  {
    "id": "1",
    "image_url": "https://images.unsplash.com/photo-1560807707-8cc77767d783?w=640",
    "question": "What color is the dog?"
  },
  {
    "id": "2",
    "image_url": "https://images.unsplash.com/photo-1449824913935-59a10b8d2000?w=640",
    "question": "What time of day does this appear to be?"
  }
]

Obtenir ce design

View on GitHub

Clone or download from the repository

Démarrage rapide :

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/evaluation/visual-qa
potato start config.yaml

Détails

Types d'annotation

radiotext

Domaine

Computer VisionNLP

Cas d'utilisation

Visual QAMultimodal

Étiquettes

vqavisual-qamultimodalimage

Vous avez trouvé un problème ou souhaitez améliorer ce design ?

Ouvrir un ticket

Designs associés

TextVQA - Reading Text in Images

Visual question answering that requires reading and reasoning about text present in images. Based on the TextVQA dataset (Singh et al., CVPR 2019), annotators answer questions about images where understanding scene text (signs, labels, menus, etc.) is essential.

textradio

CUB-200-2011 Fine-Grained Bird Classification

Fine-grained visual categorization of 200 bird species (Wah et al., 2011). Annotate bird images with species labels, part locations, and attribute annotations.

multiselectradio

EPIC-KITCHENS Egocentric Action Annotation

Annotate fine-grained actions in egocentric kitchen videos with verb-noun pairs. Identify cooking actions from a first-person perspective.

radiotext