Skip to content
Showcase/Detecting Minimal Semantic Units and Their Meanings (DiMSUM)
intermediatetext

Detecting Minimal Semantic Units and Their Meanings (DiMSUM)

Identification of multiword expression components in text through span annotation, detecting minimal semantic units that function as single meaning-bearing elements. Based on SemEval-2016 Task 10.

PERORGLOCPERORGLOCDATESelect text to annotate

Configuration Fileconfig.yaml

# Detecting Minimal Semantic Units and Their Meanings (DiMSUM)
# Based on Schneider et al., SemEval 2016
# Paper: https://aclanthology.org/S16-1084/
# Dataset: https://dimsum16.github.io/
#
# This task asks annotators to identify multiword expression (MWE)
# components in text. MWEs are sequences of words that function together
# as a single semantic unit (e.g., "kick the bucket", "in spite of",
# "hot dog").
#
# Span Labels:
# - MWE Component: A word that is part of a multiword expression

annotation_task_name: "Detecting Minimal Semantic Units (DiMSUM)"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: span
    name: mwe_components
    description: "Highlight all words that are components of multiword expressions."
    labels:
      - "MWE Component"

annotation_instructions: |
  You will be shown a sentence. Your task is to identify all multiword expressions
  (MWEs) by highlighting their component words. MWEs are sequences of words that
  function together as a single meaning unit, including:
  - Idioms: "kick the bucket", "break the ice"
  - Phrasal verbs: "give up", "look forward to"
  - Compound nouns: "hot dog", "ice cream"
  - Fixed phrases: "in spite of", "as well as"
  Highlight all words that belong to any MWE in the sentence.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Sentence:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

[
  {
    "id": "mwe_001",
    "text": "She decided to give up smoking after the doctor's warning about her health."
  },
  {
    "id": "mwe_002",
    "text": "The ice cream shop on Main Street is open until 10 PM every night."
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2016/task10-minimal-semantic-units
potato start config.yaml

Details

Annotation Types

span

Domain

SemEvalNLPMultiword ExpressionsLexical Semantics

Use Cases

MWE DetectionCompositional SemanticsLexical Analysis

Tags

semevalsemeval-2016shared-taskmwemultiword-expressionsdimsumsemantic-units

Found an issue or want to improve this design?

Open an Issue