REDFM: Filtered and Multilingual Relation Extraction

Multilingual relation extraction across 20+ languages derived from Wikidata. Annotators verify entity spans and label relations from a curated set of 400+ Wikidata relation types, enabling large-scale multilingual relation extraction research.

Configuration Fileconfig.yaml

This Potato config reproduces the annotation task. Save it as config.yaml and run potato start config.yaml to try it.

yaml

# REDFM: Filtered and Multilingual Relation Extraction
# Based on Huguet Cabot et al., ACL 2023
# Paper: https://aclanthology.org/2023.acl-long.367/
# Dataset: https://huggingface.co/datasets/DFKI-SLT/REDFM
#
# REDFM is a large-scale multilingual relation extraction dataset derived
# from Wikidata, covering 20+ languages with curated relation types.
# The dataset uses distant supervision from Wikidata, with filtering
# to reduce noise and improve annotation quality.
#
# Languages include: English, French, German, Spanish, Italian, Portuguese,
# Chinese, Japanese, Korean, Arabic, Hindi, Russian, and more.
#
# Entity Types:
# - Person, Organization, Location, Date, Event, Work, Other
#
# Relation Types (subset of common Wikidata properties):
# - country (P17): Located in country
# - instance_of (P31): Is an instance of
# - part_of (P361): Is a part of
# - occupation (P106): Has occupation
# - located_in (P131): Located in administrative entity
# - birth_place (P19): Place of birth
# - death_place (P20): Place of death
# - nationality (P27): Country of citizenship
# - employer (P108): Employed by
# - educated_at (P69): Educated at institution
# - author (P50): Author of work
# - genre (P136): Genre of work
# - capital (P36): Capital city of
# - language (P407): Language of work or name
# - inception (P571): Date of inception
#
# Annotation Guidelines:
# 1. Read the sentence and note the language
# 2. Verify that subject and object entity spans are correct
# 3. Identify and label all entities in the text
# 4. For each entity pair, select the Wikidata relation type
# 5. If no relation applies, skip the pair

annotation_task_name: "REDFM: Multilingual Relation Extraction"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  # Step 1: Identify and verify entity spans
  - annotation_type: span
    name: entities
    description: "Highlight all entities in the text and verify subject/object spans"
    labels:
      - "Person"
      - "Organization"
      - "Location"
      - "Date"
      - "Event"
      - "Work"
      - "Other"
    label_colors:
      "Person": "#3b82f6"
      "Organization": "#22c55e"
      "Location": "#ef4444"
      "Date": "#8b5cf6"
      "Event": "#f59e0b"
      "Work": "#06b6d4"
      "Other": "#6b7280"
    keyboard_shortcuts:
      "Person": "1"
      "Organization": "2"
      "Location": "3"
      "Date": "4"
      "Event": "5"
      "Work": "6"
      "Other": "7"
    tooltips:
      "Person": "Names of people (any language)"
      "Organization": "Companies, institutions, governments, groups"
      "Location": "Countries, cities, geographic features, addresses"
      "Date": "Dates, years, time periods"
      "Event": "Named events, wars, elections, festivals"
      "Work": "Books, films, songs, artworks, software"
      "Other": "Entities not fitting other categories"
    allow_overlapping: false

  # Step 2: Link entities with Wikidata relation types
  - annotation_type: span_link
    name: wikidata_relations
    description: "Draw relations between entity pairs using Wikidata property types"
    labels:
      - "country (P17)"
      - "instance_of (P31)"
      - "part_of (P361)"
      - "occupation (P106)"
      - "located_in (P131)"
      - "birth_place (P19)"
      - "death_place (P20)"
      - "nationality (P27)"
      - "employer (P108)"
      - "educated_at (P69)"
      - "author (P50)"
      - "genre (P136)"
      - "capital (P36)"
      - "language (P407)"
      - "inception (P571)"
    tooltips:
      "country (P17)": "Entity is located in or belongs to this country"
      "instance_of (P31)": "Entity is an instance of a class or type"
      "part_of (P361)": "Entity is a part or component of another entity"
      "occupation (P106)": "Person has this occupation or profession"
      "located_in (P131)": "Entity is located in this administrative territory"
      "birth_place (P19)": "Person was born in this location"
      "death_place (P20)": "Person died in this location"
      "nationality (P27)": "Person holds citizenship of this country"
      "employer (P108)": "Person is employed by this organization"
      "educated_at (P69)": "Person was educated at this institution"
      "author (P50)": "Work was authored by this person"
      "genre (P136)": "Work belongs to this genre or category"
      "capital (P36)": "Location is the capital of this entity"
      "language (P407)": "Work or name is in this language"
      "inception (P571)": "Entity was created or founded at this date"

html_layout: |
  <div style="margin-bottom: 10px; padding: 8px; background: #f0f4f8; border-radius: 4px;">
    <strong>Language:</strong> {{language}} |
    <strong>Subject:</strong> {{subject_entity}} |
    <strong>Object:</strong> {{object_entity}}
  </div>
  <div style="font-size: 16px; line-height: 1.6;">
    {{text}}
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

json

[
  {
    "id": "redfm_001",
    "text": "Albert Einstein was born in Ulm, in the Kingdom of Württemberg in the German Empire, on 14 March 1879.",
    "language": "en",
    "subject_entity": "Albert Einstein",
    "object_entity": "Ulm"
  },
  {
    "id": "redfm_002",
    "text": "Gabriel García Márquez, écrivain colombien, a reçu le prix Nobel de littérature en 1982 pour ses romans et nouvelles.",
    "language": "fr",
    "subject_entity": "Gabriel García Márquez",
    "object_entity": "prix Nobel de littérature"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/relation-extraction/redfm-multilingual-relations
potato start config.yaml

Dataset & paper

Huguet Cabot et al., ACL 2023

Official dataset ↗Read the paper ↗

Citation (BibTeX)

bibtex

@inproceedings{huguet-cabot-etal-2023-redfm,
    title = "{REDFM}: A Filtered and Multilingual Relation Extraction Dataset",
    author = "Huguet Cabot, Pere-Llu{\'{i}}s  and Navigli, Roberto",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.237"
}

Details

Annotation Types

spanspan_link

Domain

NLPMultilingualInformation Extraction

Use Cases

Relation ExtractionMultilingual NLPKnowledge Base Population

Related Designs

MultiTACRED Multilingual Relation Extraction Dataset

MultiTACRED machine-translates the TACRED relation extraction dataset into 12 languages, keeping 41 TAC relation types plus no_relation. This Potato config reproduces the entity-pair relation task.

radiospan

CrossRE: Cross-Domain Relation Extraction

Cross-domain relation extraction across 6 domains (news, politics, science, music, literature, AI). Annotators identify entities and label 17 relation types between entity pairs, enabling study of domain transfer in relation extraction.

spanspan_link

Multilingual Coreference Resolution (CorefUD)

Multilingual coreference resolution across 17 languages using Universal Dependencies-style annotations. Annotators identify entity mentions (names, nominals, pronouns) and link them into coreference chains. Based on the CorefUD dataset and the CRAC 2023 Shared Task on Multilingual Coreference Resolution.

spanspan_link

REDFM: Filtered and Multilingual Relation Extraction

Configuration Fileconfig.yaml

Sample Datasample-data.json

Get This Design

Dataset & paper

Details

Annotation Types

Domain

Use Cases

Tags

Related Designs

MultiTACRED Multilingual Relation Extraction Dataset

CrossRE: Cross-Domain Relation Extraction

Multilingual Coreference Resolution (CorefUD)