SciER: Scientific Entity and Relation Extraction

Entity and relation extraction for scientific papers. Annotators identify scientific entities (Method, Task, Material, Metric, Generic) and link them with typed relations (Used-for, Feature-of, Hyponym-of, Part-of, Compare, Conjunction, Evaluate-for).

Configuration Fileconfig.yaml

yaml

# SciER: Scientific Entity and Relation Extraction
# Based on Luan et al., EMNLP 2024
# Paper: https://aclanthology.org/2024.emnlp-main.59/
# Dataset: https://github.com/Alab-NII/SciER
#
# This task annotates scientific entities and relations in research papers.
# Annotators first identify scientific entities, then link them with typed
# relations describing how the entities interact.
#
# Entity Types:
# - Method: Algorithms, models, techniques, systems
# - Task: Research tasks, problems, objectives
# - Material: Datasets, corpora, resources, inputs
# - Metric: Evaluation measures, scores, performance indicators
# - Generic: General scientific terms not fitting other categories
#
# Relation Types:
# - Used-for: Entity A is used for entity B
# - Feature-of: Entity A is a feature/property of entity B
# - Hyponym-of: Entity A is a subtype of entity B
# - Part-of: Entity A is a component of entity B
# - Compare: Entity A is compared with entity B
# - Conjunction: Entity A and entity B are coordinated
# - Evaluate-for: Entity A is used to evaluate entity B
#
# Annotation Guidelines:
# 1. Read the sentence in context of the paper section
# 2. Identify all scientific entity mentions
# 3. Assign appropriate entity types
# 4. For each pair of related entities, select the relation type
# 5. Relations are directional: consider which entity is subject vs object

annotation_task_name: "SciER: Scientific Entity and Relation Extraction"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  # Step 1: Identify scientific entities
  - annotation_type: span
    name: scientific_entities
    description: "Highlight all scientific entities in the text"
    labels:
      - "Method"
      - "Task"
      - "Material"
      - "Metric"
      - "Generic"
    label_colors:
      "Method": "#3b82f6"
      "Task": "#ef4444"
      "Material": "#22c55e"
      "Metric": "#f59e0b"
      "Generic": "#8b5cf6"
    keyboard_shortcuts:
      "Method": "1"
      "Task": "2"
      "Material": "3"
      "Metric": "4"
      "Generic": "5"
    tooltips:
      "Method": "Algorithms, models, techniques, or systems (e.g., BERT, gradient descent, CNN)"
      "Task": "Research tasks, problems, or objectives (e.g., named entity recognition, machine translation)"
      "Material": "Datasets, corpora, resources, or inputs (e.g., CoNLL-2003, Wikipedia, training data)"
      "Metric": "Evaluation measures or performance indicators (e.g., F1 score, BLEU, accuracy)"
      "Generic": "General scientific terms not fitting other categories (e.g., features, parameters, embeddings)"
    allow_overlapping: false

  # Step 2: Link entities with typed relations
  - annotation_type: span_link
    name: scientific_relations
    description: "Draw relations between pairs of scientific entities"
    labels:
      - "Used-for"
      - "Feature-of"
      - "Hyponym-of"
      - "Part-of"
      - "Compare"
      - "Conjunction"
      - "Evaluate-for"
    tooltips:
      "Used-for": "Entity A is used for entity B (e.g., BERT is Used-for NER)"
      "Feature-of": "Entity A is a feature or property of entity B"
      "Hyponym-of": "Entity A is a subtype or instance of entity B"
      "Part-of": "Entity A is a component or part of entity B"
      "Compare": "Entity A is compared with entity B"
      "Conjunction": "Entity A and entity B are coordinated or listed together"
      "Evaluate-for": "Entity A is used to evaluate entity B (e.g., F1 is Evaluate-for NER)"

html_layout: |
  <div style="margin-bottom: 10px; padding: 8px; background: #f0f4f8; border-radius: 4px;">
    <strong>Section:</strong> {{context}}
  </div>
  <div style="font-size: 16px; line-height: 1.6;">
    {{text}}
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

json

[
  {
    "id": "scier_001",
    "text": "We apply BERT to the task of named entity recognition on the CoNLL-2003 dataset, achieving a new state-of-the-art F1 score of 93.5%.",
    "context": "Experiments"
  },
  {
    "id": "scier_002",
    "text": "Our proposed transformer-based architecture incorporates a multi-head attention mechanism and positional encodings to capture long-range dependencies in text classification.",
    "context": "Methods"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/relation-extraction/scier-scientific-entity-relations
potato start config.yaml

Details

Annotation Types

spanspan_link

Domain

NLPScientific Text

Use Cases

Relation ExtractionEntity ExtractionScientific Document Understanding

Related Designs

CrossRE: Cross-Domain Relation Extraction

Cross-domain relation extraction across 6 domains (news, politics, science, music, literature, AI). Annotators identify entities and label 17 relation types between entity pairs, enabling study of domain transfer in relation extraction.

spanspan_link

Event Coreference and Relations (MAVEN-ERE)

Unified event relation extraction covering coreference, temporal, causal, and subevent relations. Annotators identify event mentions and link them with typed relations (coreference, temporal_before/after/overlap, causal, subevent). Based on the MAVEN-ERE dataset which provides large-scale, unified annotations for four types of event relations.

spanspan_link

Legal Event Coreference (LegalCore)

Event coreference resolution in legal documents including court opinions, contracts, and statutes. Annotators identify legal event mentions such as actions, states, obligations, and violations, then link coreferent events that refer to the same real-world event. Based on the LegalCore dataset for domain-specific event coreference in the legal domain.

spanspan_link

SciER: Scientific Entity and Relation Extraction

Configuration Fileconfig.yaml

Sample Datasample-data.json

Get This Design

Details

Annotation Types

Domain

Use Cases

Tags

Related Designs

CrossRE: Cross-Domain Relation Extraction

Event Coreference and Relations (MAVEN-ERE)

Legal Event Coreference (LegalCore)