MeasEval - Counts and Measurements

Extract and classify measurements, quantities, units, and measured entities from scientific text, based on SemEval-2021 Task 8 (Harper et al.). Annotators span-annotate measurement components and classify quantity types with normalized values.

Archivo de configuraciónconfig.yaml

# MeasEval - Counts and Measurements
# Based on Harper et al., SemEval 2021
# Paper: https://aclanthology.org/2021.semeval-1.38/
# Dataset: https://github.com/harperco/MeasEval
#
# Annotators extract and classify measurement-related spans from scientific
# text, including quantities, units, measured entities, properties, and
# qualifiers. They also classify the quantity type and provide normalized values.

annotation_task_name: "MeasEval - Counts and Measurements"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: span
    name: measurement_spans
    description: "Highlight measurement components in the text."
    labels:
      - "Quantity"
      - "Unit"
      - "Measured Entity"
      - "Measured Property"
      - "Qualifier"

  - annotation_type: radio
    name: quantity_type
    description: "Classify the type of the primary quantity in this text."
    labels:
      - "Count"
      - "Measurement"
      - "Approximate"
      - "Range"
    keyboard_shortcuts:
      "Count": "1"
      "Measurement": "2"
      "Approximate": "3"
      "Range": "4"
    tooltips:
      "Count": "A discrete count of items or occurrences"
      "Measurement": "A precise measurement with a specific value and unit"
      "Approximate": "An approximate or estimated value"
      "Range": "A range of values (e.g., 10-20, between X and Y)"

  - annotation_type: text
    name: normalized_value
    description: "Provide the normalized numeric value of the primary quantity (e.g., '2.5' for 'two and a half')."

annotation_instructions: |
  You will see a passage from a scientific text containing measurements, counts,
  or quantities. Your task is to:
  1. Highlight the relevant spans: quantities, units, measured entities, measured
     properties, and any qualifiers (e.g., "approximately", "more than").
  2. Classify the type of the primary quantity as Count, Measurement, Approximate,
     or Range.
  3. Provide a normalized numeric value for the primary quantity.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Scientific Text:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Datos de ejemplosample-data.json

[
  {
    "id": "meas_001",
    "text": "The reaction temperature was maintained at 37 degrees Celsius for approximately 24 hours to ensure complete enzyme activation."
  },
  {
    "id": "meas_002",
    "text": "A total of 1,523 participants were enrolled in the clinical trial across 12 medical centers in three countries."
  }
]

// ... and 8 more items

Obtener este diseño

View on GitHub

Clone or download from the repository

Inicio rápido:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2021/task08-measeval
potato start config.yaml

Detalles

Tipos de anotación

spanradiotext

Dominio

NLPSemEval

Casos de uso

Information ExtractionMeasurement ExtractionScientific Text Mining

Etiquetas

semevalsemeval-2021shared-taskmeasurementextractionscientific-textquantities

¿Encontró un problema o desea mejorar este diseño?

Abrir un issue

Diseños relacionados

Clickbait Spoiling

Classification and extraction of spoilers for clickbait posts, including spoiler type identification and span-level spoiler detection. Based on SemEval-2023 Task 5 (Hagen et al.).

textradio

EA-MT - Entity-Aware Machine Translation

Entity-aware machine translation evaluation requiring annotators to identify entity spans, classify translation errors, and provide corrected translations. Based on SemEval-2025 Task 2.

spanradio

Check-COVID: Fact-Checking COVID-19 News Claims

Fact-checking COVID-19 news claims. Annotators verify claims against evidence, identify supporting/refuting spans, and provide verdicts with explanations. Based on the Check-COVID dataset targeting misinformation during the pandemic.

radiospan