DISRPT: Discourse Segmentation and Relation Classification

Discourse segmentation and relation classification. Annotators identify Elementary Discourse Units (EDUs) and label rhetorical relations between them. Based on the DISRPT 2023 shared task covering multiple discourse frameworks and languages.

Configuration Fileconfig.yaml

yaml

# DISRPT: Discourse Segmentation and Relation Classification
# Based on Braud et al., DISRPT@ACL 2023
# Paper: https://aclanthology.org/2023.disrpt-1.1/
# Dataset: https://github.com/disrpt/sharedtask2023
#
# This task annotates discourse structure: segmenting text into Elementary
# Discourse Units (EDUs) and labeling rhetorical relations between them.
#
# EDU Segmentation:
# - EDUs are minimal discourse units, roughly clause-level segments
# - Each EDU conveys a single proposition or idea
# - Boundaries often align with clause boundaries but not always
#
# Discourse Relation Types (based on RST):
# - Elaboration: One unit provides additional detail about another
# - Contrast: Two units present opposing or contrasting information
# - Cause: One unit presents the cause of the situation in the other
# - Result: One unit presents the result/effect of the other
# - Background: One unit provides background context for the other
# - Condition: One unit specifies a condition for the other
# - Purpose: One unit states the purpose of the action in the other
# - Temporal: Units are related by temporal sequence
# - Joint: Units are equally important and simply joined
# - Attribution: One unit attributes content to a source
#
# Annotation Guidelines:
# 1. Read the full text to understand overall discourse structure
# 2. Mark EDU boundaries by highlighting each discourse unit
# 3. For each adjacent pair of EDUs, select the discourse relation
# 4. Consider which EDU is the nucleus (main) vs satellite (supporting)
# 5. Use Joint when neither unit is subordinate to the other

annotation_task_name: "DISRPT: Discourse Relations"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  # Step 1: Identify EDU boundaries
  - annotation_type: span
    name: edu_segments
    description: "Highlight each Elementary Discourse Unit (EDU) in the text. Each EDU should be a minimal clause-level segment."
    labels:
      - "EDU"
    label_colors:
      "EDU": "#3b82f6"
    tooltips:
      "EDU": "A minimal discourse unit that conveys a single proposition or idea, roughly clause-level"
    allow_overlapping: false

  # Step 2: Classify discourse relations between adjacent EDUs
  - annotation_type: radio
    name: discourse_relation
    description: "What is the primary discourse relation between the highlighted EDU segments?"
    labels:
      - "Elaboration"
      - "Contrast"
      - "Cause"
      - "Result"
      - "Background"
      - "Condition"
      - "Purpose"
      - "Temporal"
      - "Joint"
      - "Attribution"
    keyboard_shortcuts:
      "Elaboration": "1"
      "Contrast": "2"
      "Cause": "3"
      "Result": "4"
      "Background": "5"
      "Condition": "6"
      "Purpose": "7"
      "Temporal": "8"
      "Joint": "9"
      "Attribution": "0"
    tooltips:
      "Elaboration": "One unit provides additional detail, specification, or explanation of the other"
      "Contrast": "Two units present opposing, contrasting, or comparative information"
      "Cause": "One unit presents the cause or reason for the situation described in the other"
      "Result": "One unit presents the result, effect, or consequence of the other"
      "Background": "One unit provides background information or context for understanding the other"
      "Condition": "One unit specifies a condition under which the other holds"
      "Purpose": "One unit states the purpose, goal, or intention of the action in the other"
      "Temporal": "Units are related by temporal sequence or temporal framing"
      "Joint": "Units are equally important and coordinated (neither is subordinate)"
      "Attribution": "One unit attributes the content of the other to a source"

html_layout: |
  <div style="margin-bottom: 10px; padding: 8px; background: #f0f4f8; border-radius: 4px;">
    <strong>Genre:</strong> {{genre}}
  </div>
  <div style="font-size: 16px; line-height: 1.8;">
    {{text}}
  </div>

allow_all_users: true
instances_per_annotator: 40
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

json

[
  {
    "id": "disrpt_001",
    "text": "The company reported a 20% increase in revenue last quarter. This growth was primarily driven by strong demand in the Asian market. However, operating costs also rose significantly due to supply chain disruptions. As a result, net profit margins remained flat compared to the previous year.",
    "genre": "business"
  },
  {
    "id": "disrpt_002",
    "text": "Although renewable energy sources have become more affordable, many developing nations still rely heavily on fossil fuels. Coal remains the primary energy source in several Southeast Asian countries because infrastructure for solar and wind power requires substantial upfront investment. Governments are exploring public-private partnerships to address this gap.",
    "genre": "science"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/discourse/disrpt-discourse-relations
potato start config.yaml

Details

Annotation Types

spanradio

Domain

NLPDiscourse Analysis

Use Cases

Discourse SegmentationDiscourse Relation ClassificationRhetorical Structure Analysis

Related Designs

Aspect-Based Sentiment Analysis

Identification of aspect terms in review text with sentiment polarity classification for each aspect. Based on SemEval-2016 Task 5 (ABSA).

spanradio

Causal Medical Claim Detection and PICO Extraction

Detection of causal claims in medical texts and extraction of PICO (Population, Intervention, Comparator, Outcome) elements. Based on SemEval-2023 Task 8 (Khetan et al.).

spanradio

Character Identification on Multiparty Dialogues

Identification and linking of character mentions in TV show dialogue, combining span annotation with entity resolution for the main cast of Friends. Based on SemEval-2018 Task 4.

spanradio

DISRPT: Discourse Segmentation and Relation Classification

Configuration Fileconfig.yaml

Sample Datasample-data.json

Get This Design

Details

Annotation Types

Domain

Use Cases

Tags

Related Designs

Aspect-Based Sentiment Analysis

Causal Medical Claim Detection and PICO Extraction

Character Identification on Multiparty Dialogues