DISRPT: Discourse Segmentation and Relation Classification
Discourse segmentation and relation classification. Annotators identify Elementary Discourse Units (EDUs) and label rhetorical relations between them. Based on the DISRPT 2023 shared task covering multiple discourse frameworks and languages.
Konfigurationsdateiconfig.yaml
# DISRPT: Discourse Segmentation and Relation Classification
# Based on Braud et al., DISRPT@ACL 2023
# Paper: https://aclanthology.org/2023.disrpt-1.1/
# Dataset: https://github.com/disrpt/sharedtask2023
#
# This task annotates discourse structure: segmenting text into Elementary
# Discourse Units (EDUs) and labeling rhetorical relations between them.
#
# EDU Segmentation:
# - EDUs are minimal discourse units, roughly clause-level segments
# - Each EDU conveys a single proposition or idea
# - Boundaries often align with clause boundaries but not always
#
# Discourse Relation Types (based on RST):
# - Elaboration: One unit provides additional detail about another
# - Contrast: Two units present opposing or contrasting information
# - Cause: One unit presents the cause of the situation in the other
# - Result: One unit presents the result/effect of the other
# - Background: One unit provides background context for the other
# - Condition: One unit specifies a condition for the other
# - Purpose: One unit states the purpose of the action in the other
# - Temporal: Units are related by temporal sequence
# - Joint: Units are equally important and simply joined
# - Attribution: One unit attributes content to a source
#
# Annotation Guidelines:
# 1. Read the full text to understand overall discourse structure
# 2. Mark EDU boundaries by highlighting each discourse unit
# 3. For each adjacent pair of EDUs, select the discourse relation
# 4. Consider which EDU is the nucleus (main) vs satellite (supporting)
# 5. Use Joint when neither unit is subordinate to the other
annotation_task_name: "DISRPT: Discourse Relations"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
# Step 1: Identify EDU boundaries
- annotation_type: span
name: edu_segments
description: "Highlight each Elementary Discourse Unit (EDU) in the text. Each EDU should be a minimal clause-level segment."
labels:
- "EDU"
label_colors:
"EDU": "#3b82f6"
tooltips:
"EDU": "A minimal discourse unit that conveys a single proposition or idea, roughly clause-level"
allow_overlapping: false
# Step 2: Classify discourse relations between adjacent EDUs
- annotation_type: radio
name: discourse_relation
description: "What is the primary discourse relation between the highlighted EDU segments?"
labels:
- "Elaboration"
- "Contrast"
- "Cause"
- "Result"
- "Background"
- "Condition"
- "Purpose"
- "Temporal"
- "Joint"
- "Attribution"
keyboard_shortcuts:
"Elaboration": "1"
"Contrast": "2"
"Cause": "3"
"Result": "4"
"Background": "5"
"Condition": "6"
"Purpose": "7"
"Temporal": "8"
"Joint": "9"
"Attribution": "0"
tooltips:
"Elaboration": "One unit provides additional detail, specification, or explanation of the other"
"Contrast": "Two units present opposing, contrasting, or comparative information"
"Cause": "One unit presents the cause or reason for the situation described in the other"
"Result": "One unit presents the result, effect, or consequence of the other"
"Background": "One unit provides background information or context for understanding the other"
"Condition": "One unit specifies a condition under which the other holds"
"Purpose": "One unit states the purpose, goal, or intention of the action in the other"
"Temporal": "Units are related by temporal sequence or temporal framing"
"Joint": "Units are equally important and coordinated (neither is subordinate)"
"Attribution": "One unit attributes the content of the other to a source"
html_layout: |
<div style="margin-bottom: 10px; padding: 8px; background: #f0f4f8; border-radius: 4px;">
<strong>Genre:</strong> {{genre}}
</div>
<div style="font-size: 16px; line-height: 1.8;">
{{text}}
</div>
allow_all_users: true
instances_per_annotator: 40
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
Beispieldatensample-data.json
[
{
"id": "disrpt_001",
"text": "The company reported a 20% increase in revenue last quarter. This growth was primarily driven by strong demand in the Asian market. However, operating costs also rose significantly due to supply chain disruptions. As a result, net profit margins remained flat compared to the previous year.",
"genre": "business"
},
{
"id": "disrpt_002",
"text": "Although renewable energy sources have become more affordable, many developing nations still rely heavily on fossil fuels. Coal remains the primary energy source in several Southeast Asian countries because infrastructure for solar and wind power requires substantial upfront investment. Governments are exploring public-private partnerships to address this gap.",
"genre": "science"
}
]
// ... and 8 more itemsDieses Design herunterladen
Clone or download from the repository
Schnellstart:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/text/discourse/disrpt-discourse-relations potato start config.yaml
Details
Annotationstypen
Bereich
Anwendungsfälle
Schlagwörter
Problem gefunden oder möchten Sie dieses Design verbessern?
Issue öffnenVerwandte Designs
Aspect-Based Sentiment Analysis
Identification of aspect terms in review text with sentiment polarity classification for each aspect. Based on SemEval-2016 Task 5 (ABSA).
Causal Medical Claim Detection and PICO Extraction
Detection of causal claims in medical texts and extraction of PICO (Population, Intervention, Comparator, Outcome) elements. Based on SemEval-2023 Task 8 (Khetan et al.).
Character Identification on Multiparty Dialogues
Identification and linking of character mentions in TV show dialogue, combining span annotation with entity resolution for the main cast of Friends. Based on SemEval-2018 Task 4.