Visual Question Answering

Answer questions about images for VQA dataset creation.

Archivo de configuraciónconfig.yaml

annotation_task_name: "Visual Question Answering"
task_name: "Visual Question Answering"
task_description: "Answer the question about the image."
task_dir: "."
port: 8000

data_files:
  - "sample-data.json"

item_properties:
  id_key: "id"
  text_key: "image_url"
  image_key: "image_url"
  context_key: "question"

annotation_schemes:
  - annotation_type: text
    name: answer
    description: "Provide a concise answer to the question"
    required: true

  - annotation_type: radio
    name: confidence
    description: "How confident are you in your answer?"
    labels:
      - "Very confident"
      - "Somewhat confident"
      - "Not confident"
    required: true

output_annotation_dir: "output/"
output_annotation_format: "json"

Datos de ejemplosample-data.json

[
  {
    "id": "1",
    "image_url": "https://images.unsplash.com/photo-1560807707-8cc77767d783?w=640",
    "question": "What color is the dog?"
  },
  {
    "id": "2",
    "image_url": "https://images.unsplash.com/photo-1449824913935-59a10b8d2000?w=640",
    "question": "What time of day does this appear to be?"
  }
]

Obtener este diseño

View on GitHub

Clone or download from the repository

Inicio rápido:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/evaluation/visual-qa
potato start config.yaml

Detalles

Tipos de anotación

radiotext

Dominio

Computer VisionNLP

Casos de uso

Visual QAMultimodal

Etiquetas

vqavisual-qamultimodalimage

¿Encontró un problema o desea mejorar este diseño?

Abrir un issue

Diseños relacionados

TextVQA - Reading Text in Images

Visual question answering that requires reading and reasoning about text present in images. Based on the TextVQA dataset (Singh et al., CVPR 2019), annotators answer questions about images where understanding scene text (signs, labels, menus, etc.) is essential.

textradio

CUB-200-2011 Fine-Grained Bird Classification

Fine-grained visual categorization of 200 bird species (Wah et al., 2011). Annotate bird images with species labels, part locations, and attribute annotations.

multiselectradio

EPIC-KITCHENS Egocentric Action Annotation

Annotate fine-grained actions in egocentric kitchen videos with verb-noun pairs. Identify cooking actions from a first-person perspective.

radiotext