REDFM: Filtered and Multilingual Relation Extraction
Multilingual relation extraction across 20+ languages derived from Wikidata. Annotators verify entity spans and label relations from a curated set of 400+ Wikidata relation types, enabling large-scale multilingual relation extraction research.
Configuration Fileconfig.yaml
# REDFM: Filtered and Multilingual Relation Extraction
# Based on Huguet Cabot et al., ACL 2023
# Paper: https://aclanthology.org/2023.acl-long.367/
# Dataset: https://huggingface.co/datasets/DFKI-SLT/REDFM
#
# REDFM is a large-scale multilingual relation extraction dataset derived
# from Wikidata, covering 20+ languages with curated relation types.
# The dataset uses distant supervision from Wikidata, with filtering
# to reduce noise and improve annotation quality.
#
# Languages include: English, French, German, Spanish, Italian, Portuguese,
# Chinese, Japanese, Korean, Arabic, Hindi, Russian, and more.
#
# Entity Types:
# - Person, Organization, Location, Date, Event, Work, Other
#
# Relation Types (subset of common Wikidata properties):
# - country (P17): Located in country
# - instance_of (P31): Is an instance of
# - part_of (P361): Is a part of
# - occupation (P106): Has occupation
# - located_in (P131): Located in administrative entity
# - birth_place (P19): Place of birth
# - death_place (P20): Place of death
# - nationality (P27): Country of citizenship
# - employer (P108): Employed by
# - educated_at (P69): Educated at institution
# - author (P50): Author of work
# - genre (P136): Genre of work
# - capital (P36): Capital city of
# - language (P407): Language of work or name
# - inception (P571): Date of inception
#
# Annotation Guidelines:
# 1. Read the sentence and note the language
# 2. Verify that subject and object entity spans are correct
# 3. Identify and label all entities in the text
# 4. For each entity pair, select the Wikidata relation type
# 5. If no relation applies, skip the pair
annotation_task_name: "REDFM: Multilingual Relation Extraction"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
# Step 1: Identify and verify entity spans
- annotation_type: span
name: entities
description: "Highlight all entities in the text and verify subject/object spans"
labels:
- "Person"
- "Organization"
- "Location"
- "Date"
- "Event"
- "Work"
- "Other"
label_colors:
"Person": "#3b82f6"
"Organization": "#22c55e"
"Location": "#ef4444"
"Date": "#8b5cf6"
"Event": "#f59e0b"
"Work": "#06b6d4"
"Other": "#6b7280"
keyboard_shortcuts:
"Person": "1"
"Organization": "2"
"Location": "3"
"Date": "4"
"Event": "5"
"Work": "6"
"Other": "7"
tooltips:
"Person": "Names of people (any language)"
"Organization": "Companies, institutions, governments, groups"
"Location": "Countries, cities, geographic features, addresses"
"Date": "Dates, years, time periods"
"Event": "Named events, wars, elections, festivals"
"Work": "Books, films, songs, artworks, software"
"Other": "Entities not fitting other categories"
allow_overlapping: false
# Step 2: Link entities with Wikidata relation types
- annotation_type: span_link
name: wikidata_relations
description: "Draw relations between entity pairs using Wikidata property types"
labels:
- "country (P17)"
- "instance_of (P31)"
- "part_of (P361)"
- "occupation (P106)"
- "located_in (P131)"
- "birth_place (P19)"
- "death_place (P20)"
- "nationality (P27)"
- "employer (P108)"
- "educated_at (P69)"
- "author (P50)"
- "genre (P136)"
- "capital (P36)"
- "language (P407)"
- "inception (P571)"
tooltips:
"country (P17)": "Entity is located in or belongs to this country"
"instance_of (P31)": "Entity is an instance of a class or type"
"part_of (P361)": "Entity is a part or component of another entity"
"occupation (P106)": "Person has this occupation or profession"
"located_in (P131)": "Entity is located in this administrative territory"
"birth_place (P19)": "Person was born in this location"
"death_place (P20)": "Person died in this location"
"nationality (P27)": "Person holds citizenship of this country"
"employer (P108)": "Person is employed by this organization"
"educated_at (P69)": "Person was educated at this institution"
"author (P50)": "Work was authored by this person"
"genre (P136)": "Work belongs to this genre or category"
"capital (P36)": "Location is the capital of this entity"
"language (P407)": "Work or name is in this language"
"inception (P571)": "Entity was created or founded at this date"
html_layout: |
<div style="margin-bottom: 10px; padding: 8px; background: #f0f4f8; border-radius: 4px;">
<strong>Language:</strong> {{language}} |
<strong>Subject:</strong> {{subject_entity}} |
<strong>Object:</strong> {{object_entity}}
</div>
<div style="font-size: 16px; line-height: 1.6;">
{{text}}
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
Sample Datasample-data.json
[
{
"id": "redfm_001",
"text": "Albert Einstein was born in Ulm, in the Kingdom of Württemberg in the German Empire, on 14 March 1879.",
"language": "en",
"subject_entity": "Albert Einstein",
"object_entity": "Ulm"
},
{
"id": "redfm_002",
"text": "Gabriel García Márquez, écrivain colombien, a reçu le prix Nobel de littérature en 1982 pour ses romans et nouvelles.",
"language": "fr",
"subject_entity": "Gabriel García Márquez",
"object_entity": "prix Nobel de littérature"
}
]
// ... and 8 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/text/relation-extraction/redfm-multilingual-relations potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
MultiTACRED: Multilingual TAC Relation Extraction
Multilingual version of the TACRED relation extraction dataset in 12 languages. Annotators identify subject/object entities and classify relations from 41 TAC relation types, with additional translation quality assessment.
CrossRE: Cross-Domain Relation Extraction
Cross-domain relation extraction across 6 domains (news, politics, science, music, literature, AI). Annotators identify entities and label 17 relation types between entity pairs, enabling study of domain transfer in relation extraction.
Multilingual Coreference Resolution (CorefUD)
Multilingual coreference resolution across 17 languages using Universal Dependencies-style annotations. Annotators identify entity mentions (names, nominals, pronouns) and link them into coreference chains. Based on the CorefUD dataset and the CRAC 2023 Shared Task on Multilingual Coreference Resolution.