ScienceQA Multimodal Reasoning
Multimodal science question answering with chain-of-thought reasoning, based on ScienceQA (Lu et al., NeurIPS 2022). Annotators answer multiple-choice science questions that may include images, provide chain-of-thought explanations, and categorize the science domain.
配置文件config.yaml
# ScienceQA Multimodal Reasoning
# Based on Lu et al., NeurIPS 2022
# Paper: https://arxiv.org/abs/2209.09513
# Dataset: https://scienceqa.github.io/
#
# Multimodal science question answering with chain-of-thought reasoning.
# Questions span natural science, social science, and language science,
# and may include images (diagrams, charts, photos). Annotators select
# the correct answer, provide a chain-of-thought explanation, and
# categorize the science domain.
#
# Answer Options:
# - A, B, C, D: Four possible answers; exactly one is correct
#
# Science Domains:
# - Natural Science: Biology, physics, chemistry, earth science
# - Social Science: History, geography, economics, civics
# - Language Science: Grammar, vocabulary, reading comprehension
#
# Annotation Guidelines:
# 1. Examine any image provided carefully
# 2. Read the question and all four options
# 3. Select the correct answer
# 4. Write a step-by-step chain-of-thought explanation
# 5. Categorize the question into its science domain(s)
annotation_task_name: "ScienceQA Multimodal Reasoning"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
# Step 1: Select the correct answer
- annotation_type: radio
name: answer
description: "Select the correct answer to the science question."
labels:
- "A"
- "B"
- "C"
- "D"
keyboard_shortcuts:
"A": "1"
"B": "2"
"C": "3"
"D": "4"
tooltips:
"A": "Select option A"
"B": "Select option B"
"C": "Select option C"
"D": "Select option D"
# Step 2: Chain-of-thought explanation
- annotation_type: text
name: chain_of_thought
description: "Provide a step-by-step chain-of-thought explanation for your answer."
textarea: true
required: false
placeholder: "Step 1: ... Step 2: ... Therefore, the answer is ..."
# Step 3: Science domain categorization
- annotation_type: multiselect
name: science_domain
description: "Which science domain(s) does this question belong to? Select all that apply."
labels:
- "Natural Science"
- "Social Science"
- "Language Science"
tooltips:
"Natural Science": "Biology, physics, chemistry, earth science, astronomy"
"Social Science": "History, geography, economics, civics, sociology"
"Language Science": "Grammar, vocabulary, reading comprehension, linguistics"
annotation_instructions: |
You will answer science questions from the ScienceQA benchmark. Questions may include images.
For each item:
1. If an image is provided, examine it carefully.
2. Read the question and all four answer options (A-D).
3. Select the single correct answer.
4. Write a chain-of-thought explanation showing your reasoning step by step.
5. Categorize the question into its science domain(s).
Chain-of-Thought Tips:
- Break your reasoning into clear, numbered steps.
- Reference specific information from the image or question.
- Explain why incorrect options are wrong when helpful.
- Conclude with "Therefore, the answer is [letter]."
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #e8eaf6; padding: 8px 15px; border-radius: 8px; margin-bottom: 16px;">
<strong>Subject:</strong> {{subject}}
</div>
<div style="text-align: center; margin-bottom: 16px;">
<img src="{{image_url}}" style="max-width: 100%; max-height: 400px; border: 1px solid #ddd; border-radius: 8px;" />
</div>
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Question:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px;">
<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 12px;">
<strong style="color: #475569;">A:</strong> {{option_a}}
</div>
<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 12px;">
<strong style="color: #475569;">B:</strong> {{option_b}}
</div>
<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 12px;">
<strong style="color: #475569;">C:</strong> {{option_c}}
</div>
<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 12px;">
<strong style="color: #475569;">D:</strong> {{option_d}}
</div>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
示例数据sample-data.json
[
{
"id": "sciqa_001",
"text": "Which of these organisms is a producer in the food chain shown in the diagram?",
"image_url": "https://example.com/scienceqa/image_001.jpg",
"option_a": "Grass",
"option_b": "Rabbit",
"option_c": "Fox",
"option_d": "Eagle",
"subject": "Biology"
},
{
"id": "sciqa_002",
"text": "Based on the weather map, which city is most likely to experience rain tomorrow?",
"image_url": "https://example.com/scienceqa/image_002.jpg",
"option_a": "City A, which is under a high-pressure system",
"option_b": "City B, which is near a cold front",
"option_c": "City C, which is in a clear area",
"option_d": "City D, which is far from any weather systems",
"subject": "Earth Science"
}
]
// ... and 8 more items获取此设计
Clone or download from the repository
快速开始:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/multimodal/scienceqa-multimodal-reasoning potato start config.yaml
详情
标注类型
领域
应用场景
标签
发现问题或想改进此设计?
提交 Issue相关设计
MMBench Multimodal Evaluation
Multimodal evaluation benchmark combining image understanding with multiple-choice questions, based on MMBench (Liu et al., ECCV 2024). Annotators answer image-based questions, provide explanations, and tag the required perception or reasoning skills.
SayCan - Robot Task Planning Evaluation
Evaluate robot action plans generated from natural language instructions, based on the SayCan framework (Ahn et al., CoRL 2022). Annotators assess feasibility, identify primitive actions, describe plans, and rate safety of grounded language-conditioned robot manipulation tasks.
DocBank Document Layout Detection
Document layout analysis benchmark (Li et al., COLING 2020). Detect and classify document elements including titles, abstracts, paragraphs, figures, tables, and captions.