SQuAD - Extractive Question Answering
Extractive question answering over Wikipedia passages, based on the Stanford Question Answering Dataset (Rajpurkar et al., EMNLP 2016). Annotators highlight answer spans in context paragraphs and judge answerability.
Configuration Fileconfig.yaml
# SQuAD - Extractive Question Answering
# Based on Rajpurkar et al., EMNLP 2016
# Paper: https://aclanthology.org/D16-1264/
# Dataset: https://rajpurkar.github.io/SQuAD-explorer/
#
# This task presents a Wikipedia passage and a question. Annotators
# highlight the answer span in the passage, provide a typed answer,
# and indicate whether the question is answerable from the passage.
#
# Annotation Guidelines:
# 1. Read the question carefully before reading the passage
# 2. Read the passage and identify the answer span
# 3. Highlight the minimal span that answers the question
# 4. Type the answer in the text field
# 5. Indicate whether the question is answerable from the passage
annotation_task_name: "SQuAD - Extractive Question Answering"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
# Step 1: Highlight the answer span in the passage
- annotation_type: span
name: answer_span
description: "Highlight the span in the passage that answers the question"
labels:
- "Answer Span"
label_colors:
"Answer Span": "#3b82f6"
# Step 2: Type the answer
- annotation_type: text
name: typed_answer
description: "Type the answer to the question"
# Step 3: Answerability judgment
- annotation_type: radio
name: answerability
description: "Is the question answerable from the given passage?"
labels:
- "Answerable"
- "Unanswerable"
keyboard_shortcuts:
"Answerable": "1"
"Unanswerable": "2"
tooltips:
"Answerable": "The passage contains sufficient information to answer the question"
"Unanswerable": "The passage does not contain enough information to answer the question"
annotation_instructions: |
You will be shown a passage from Wikipedia and a question about it. Your task is to:
1. Highlight the exact span in the passage that answers the question.
2. Type the answer text in the text field.
3. Indicate whether the question is answerable from the passage.
If the question cannot be answered from the passage, select "Unanswerable" and leave the span and text empty.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #a16207; font-size: 18px;">Question:</strong>
<p style="font-size: 17px; line-height: 1.6; margin: 8px 0 0 0; font-weight: 500;">{{question}}</p>
</div>
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Passage:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
Sample Datasample-data.json
[
{
"id": "squad_001",
"text": "The Normans were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia.",
"question": "In what country is Normandy located?"
},
{
"id": "squad_002",
"text": "The Amazon rainforest produces more than 20% of the world's oxygen. It covers 5.5 million square kilometers and spans nine countries in South America, with Brazil containing about 60% of the forest.",
"question": "What percentage of the world's oxygen does the Amazon rainforest produce?"
}
]
// ... and 8 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/text/reading-comprehension/squad-extractive-qa potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
Natural Questions - Open-Domain Question Answering
Open-domain question answering over Wikipedia passages, based on Google's Natural Questions dataset (Kwiatkowski et al., TACL 2019). Annotators identify both short and long answer spans and determine answerability.
Check-COVID: Fact-Checking COVID-19 News Claims
Fact-checking COVID-19 news claims. Annotators verify claims against evidence, identify supporting/refuting spans, and provide verdicts with explanations. Based on the Check-COVID dataset targeting misinformation during the pandemic.
Clickbait Spoiling
Classification and extraction of spoilers for clickbait posts, including spoiler type identification and span-level spoiler detection. Based on SemEval-2023 Task 5 (Hagen et al.).