Skip to content
Showcase/LLMs4Subjects - Automated Subject Tagging
intermediatetext

LLMs4Subjects - Automated Subject Tagging

Automated subject classification of academic texts, requiring annotators to assign subject categories and determine whether texts span single or multiple disciplines. Based on SemEval-2025 Task 5.

Select all that apply:

Configuration Fileconfig.yaml

# LLMs4Subjects - Automated Subject Tagging
# Based on Sinhababu et al., SemEval 2025
# Paper: https://aclanthology.org/volumes/2025.semeval-1/
# Dataset: https://github.com/SemEval/SemEval2025-Task5
#
# This task involves assigning subject categories to academic text
# passages. Annotators select all applicable subject areas and
# indicate whether the text belongs to a single discipline or
# spans multiple fields.
#
# Subject Categories:
# - Computer Science, Mathematics, Physics, Biology, Medicine,
#   Engineering, Social Science, Humanities, Law, Economics
#
# Classification:
# - Single Subject: Text belongs to one clear discipline
# - Multi-Subject: Text spans multiple disciplines
# - Unclear: Subject classification is ambiguous

annotation_task_name: "LLMs4Subjects - Automated Subject Tagging"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: multiselect
    name: subject_categories
    description: "Select all subject areas that apply to this text."
    labels:
      - "Computer Science"
      - "Mathematics"
      - "Physics"
      - "Biology"
      - "Medicine"
      - "Engineering"
      - "Social Science"
      - "Humanities"
      - "Law"
      - "Economics"
    tooltips:
      "Computer Science": "Algorithms, programming, AI, databases, software, etc."
      "Mathematics": "Pure or applied mathematics, statistics, logic"
      "Physics": "Classical, quantum, astrophysics, particle physics, etc."
      "Biology": "Molecular biology, ecology, genetics, evolution, etc."
      "Medicine": "Clinical medicine, pharmacology, public health, etc."
      "Engineering": "Mechanical, electrical, civil, chemical engineering, etc."
      "Social Science": "Psychology, sociology, political science, anthropology, etc."
      "Humanities": "History, philosophy, literature, linguistics, etc."
      "Law": "Legal theory, constitutional law, international law, etc."
      "Economics": "Micro/macroeconomics, finance, econometrics, etc."

  - annotation_type: radio
    name: subject_scope
    description: "Does this text belong to a single subject or multiple subjects?"
    labels:
      - "Single Subject"
      - "Multi-Subject"
      - "Unclear"
    keyboard_shortcuts:
      "Single Subject": "1"
      "Multi-Subject": "2"
      "Unclear": "3"
    tooltips:
      "Single Subject": "The text clearly belongs to one academic discipline"
      "Multi-Subject": "The text spans multiple academic disciplines"
      "Unclear": "The subject classification is ambiguous or hard to determine"

annotation_instructions: |
  You will be shown an academic text passage with its title. Your tasks are:
  1. Read the passage carefully and identify the subject area(s).
  2. Select all applicable subject categories from the list.
  3. Indicate whether the text is primarily about one subject or multiple subjects.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #a16207;">Title:</strong>
      <p style="font-size: 17px; font-weight: 600; line-height: 1.5; margin: 8px 0 0 0;">{{title}}</p>
    </div>
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Text:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

[
  {
    "id": "subj_001",
    "text": "We propose a novel transformer architecture for protein folding prediction that achieves state-of-the-art results on the CASP14 benchmark. Our model combines attention mechanisms with geometric deep learning to capture spatial relationships between amino acid residues.",
    "title": "Deep Learning Approaches to Protein Structure Prediction"
  },
  {
    "id": "subj_002",
    "text": "This paper examines the impact of monetary policy on income inequality across OECD countries from 2000 to 2020. Using panel data regression with fixed effects, we find that expansionary monetary policy disproportionately benefits asset holders.",
    "title": "Monetary Policy and Income Inequality: A Cross-Country Analysis"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2025/task05-llms4subjects
potato start config.yaml

Details

Annotation Types

multiselectradio

Domain

SemEvalNLPText ClassificationAcademic

Use Cases

Subject TaggingDocument ClassificationLibrary Science

Tags

semevalsemeval-2025shared-taskclassificationacademicsubject-tagging

Found an issue or want to improve this design?

Open an Issue