Showcase/SemEval-2019 Task 12: Toponym Resolution in Scientific Papers

advancedtext

SemEval-2019 Task 12: Toponym Resolution in Scientific Papers

SemEval-2019 Task 12 covers toponym detection and resolution in scientific text: finding place-name mentions and linking them to GeoNames locations. Task overview, paper and dataset links, and a Potato span-annotation config to replicate it.

About this dataset

SemEval-2019 Task 12 was a shared task on toponym resolution in scientific papers, organized by Weissenbacher and colleagues. A toponym is any mention of a place: a country, city, region, river, or other geographic location. The task measures how well systems can find and ground these mentions in real scientific text.

It is split into three subtasks: toponym detection (locating the place-name spans), toponym disambiguation (choosing the correct location for an already-detected mention), and end-to-end toponym resolution (detection plus disambiguation together). Mentions are linked to entries in the GeoNames geographic database, which fixes each to a canonical name and coordinates.

The corpus is drawn from full-text scientific journal articles, where place names matter for tasks like tracking disease outbreaks and reconstructing study locations. Systems are scored with strict and overlapping precision, recall, and F1 for detection, and with resolution accuracy for disambiguation.

The Potato config below reproduces the annotation step that produces this kind of data: a span scheme for highlighting every toponym in a passage, plus a text field for recording the resolved GeoNames location. Use it to build geoparsing training data or to run a toponym-resolution annotation study of your own.

Shared task: SemEval-2019, Task 12
Goal: Toponym detection + resolution
Subtasks: Detection, disambiguation, end-to-end resolution
Text: Full-text scientific journal articles
Grounding: Linked to GeoNames (name + coordinates)
Metrics: Strict/overlap F1; resolution accuracy

Configuration Fileconfig.yaml

This Potato config reproduces the annotation task. Save it as config.yaml and run potato start config.yaml to try it.

yaml

# Toponym Resolution in Scientific Papers
# Based on Weissenbacher et al., SemEval 2019
# Paper: https://aclanthology.org/S19-2229/
# Dataset: https://competitions.codalab.org/competitions/19948
#
# This task asks annotators to identify place name mentions (toponyms)
# in scientific text and provide the resolved geographic location.
# Annotators first highlight toponym spans, then specify the resolved
# location (e.g., coordinates, canonical name).
#
# Span Labels:
# - Toponym: A mention of a geographic location or place name
#
# Annotation Guidelines:
# 1. Highlight all geographic references in the text
# 2. Include both specific (cities, countries) and relative locations
# 3. Provide the resolved canonical location name

annotation_task_name: "Toponym Resolution in Scientific Papers"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: span
    name: toponym_spans
    description: "Highlight all place name mentions (toponyms) in the text."
    labels:
      - "Toponym"

  - annotation_type: text
    name: resolved_location
    description: "Provide the resolved canonical location for the highlighted toponyms."

annotation_instructions: |
  You will be shown a passage from a scientific paper. Your task is to:
  1. Highlight all mentions of geographic locations (toponyms) in the text.
  2. In the text field, provide the resolved location(s) with canonical names.
  Toponyms include country names, city names, regions, rivers, mountains, etc.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Scientific Text:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

json

[
  {
    "id": "toponym_001",
    "text": "The study was conducted in three hospitals across São Paulo, Brazil, between January and December 2017. Patient recruitment followed standard protocols approved by the local ethics committee."
  },
  {
    "id": "toponym_002",
    "text": "Samples were collected from the Yangtze River Delta region in eastern China, specifically from monitoring stations near Shanghai and Nanjing."
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2019/task12-toponym-resolution
potato start config.yaml

Dataset & paper

Weissenbacher et al., SemEval 2019

Official dataset ↗Read the paper ↗

Citation (BibTeX)

bibtex

@inproceedings{weissenbacher-etal-2019-semeval,
    title = "{S}em{E}val-2019 Task 12: Toponym Resolution in Scientific Papers",
    author = "Weissenbacher, Davy and Magge, Arjun and O{'}Connor, Karen and Scotch, Matthew and Gonzalez-Hernandez, Graciela",
    booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/S19-2155/",
    doi = "10.18653/v1/S19-2155",
    pages = "907--916"
}

Details

Annotation Types

spantext

Domain

SemEvalNLPGeoparsingNamed Entity Recognition

Use Cases

Toponym ResolutionGeoparsingNERGeocoding

Related Designs

Clickbait Spoiling

Classification and extraction of spoilers for clickbait posts, including spoiler type identification and span-level spoiler detection. Based on SemEval-2023 Task 5 (Hagen et al.).

textradio

Entity Linking in Tweets

Named entity recognition and entity linking in tweets, identifying entity mentions and mapping them to knowledge base URIs. Based on SemEval-2022 Task 12 (Agarwal et al.).