NLPContributionGraph - Structured Extraction of NLP Contributions
Extract structured contribution information from NLP papers by annotating research problem, approach, model, dataset, metric, and result spans and forming contribution triples, based on SemEval-2021 Task 11 (D'Souza et al.).
Konfigurationsdateiconfig.yaml
# NLPContributionGraph - Structured Extraction of NLP Contributions
# Based on D'Souza et al., SemEval 2021
# Paper: https://aclanthology.org/2021.semeval-1.44/
# Dataset: https://ncg-task.github.io/
#
# Annotators extract structured scientific contribution information from
# NLP paper excerpts by marking spans for research problems, approaches,
# models, datasets, metrics, and results, then forming contribution triples.
annotation_task_name: "NLPContributionGraph - Structured Extraction of NLP Contributions"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: span
name: contribution_spans
description: "Highlight the key contribution elements in the text."
labels:
- "Research Problem"
- "Approach"
- "Model"
- "Dataset"
- "Metric"
- "Result"
- annotation_type: text
name: contribution_triple
description: "Write a contribution triple in the format: (subject, predicate, object). For example: (BERT, achieves, 92.3 F1 on CoNLL-2003)."
annotation_instructions: |
You will see an excerpt from an NLP research paper. Your task is to:
1. Read the text and identify key contribution elements.
2. Highlight spans corresponding to: Research Problem, Approach, Model,
Dataset, Metric, and Result.
3. Write a contribution triple summarizing the main finding in the format:
(subject, predicate, object).
Example triple: (BERT-large, achieves state-of-the-art, 92.3 F1 on CoNLL-2003 NER)
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Paper Excerpt:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
Beispieldatensample-data.json
[
{
"id": "ncg_001",
"text": "We address the task of named entity recognition and propose a novel transformer-based architecture called NERFormer. Our model achieves an F1 score of 93.2 on the CoNLL-2003 dataset, surpassing previous state-of-the-art methods by 1.4 points."
},
{
"id": "ncg_002",
"text": "This paper tackles machine translation for low-resource language pairs. We introduce a cross-lingual transfer learning approach that leverages multilingual BERT to improve BLEU scores by 5.3 points on the FLORES benchmark for English-Nepali translation."
}
]
// ... and 8 more itemsDieses Design herunterladen
Clone or download from the repository
Schnellstart:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/semeval/2021/task11-nlpcontributiongraph potato start config.yaml
Details
Annotationstypen
Bereich
Anwendungsfälle
Schlagwörter
Problem gefunden oder möchten Sie dieses Design verbessern?
Issue öffnenVerwandte Designs
Clickbait Spoiling
Classification and extraction of spoilers for clickbait posts, including spoiler type identification and span-level spoiler detection. Based on SemEval-2023 Task 5 (Hagen et al.).
Entity Linking in Tweets
Named entity recognition and entity linking in tweets, identifying entity mentions and mapping them to knowledge base URIs. Based on SemEval-2022 Task 12 (Agarwal et al.).
MeasEval - Counts and Measurements
Extract and classify measurements, quantities, units, and measured entities from scientific text, based on SemEval-2021 Task 8 (Harper et al.). Annotators span-annotate measurement components and classify quantity types with normalized values.