Skip to content
Showcase/Detecting Minimal Semantic Units and Their Meanings (DiMSUM)
intermediatetext

Detecting Minimal Semantic Units and Their Meanings (DiMSUM)

Identification of multiword expression components in text through span annotation, detecting minimal semantic units that function as single meaning-bearing elements. Based on SemEval-2016 Task 10.

PERORGLOCPERORGLOCDATESelect text to annotate

配置文件config.yaml

# Detecting Minimal Semantic Units and Their Meanings (DiMSUM)
# Based on Schneider et al., SemEval 2016
# Paper: https://aclanthology.org/S16-1084/
# Dataset: https://dimsum16.github.io/
#
# This task asks annotators to identify multiword expression (MWE)
# components in text. MWEs are sequences of words that function together
# as a single semantic unit (e.g., "kick the bucket", "in spite of",
# "hot dog").
#
# Span Labels:
# - MWE Component: A word that is part of a multiword expression

annotation_task_name: "Detecting Minimal Semantic Units (DiMSUM)"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: span
    name: mwe_components
    description: "Highlight all words that are components of multiword expressions."
    labels:
      - "MWE Component"

annotation_instructions: |
  You will be shown a sentence. Your task is to identify all multiword expressions
  (MWEs) by highlighting their component words. MWEs are sequences of words that
  function together as a single meaning unit, including:
  - Idioms: "kick the bucket", "break the ice"
  - Phrasal verbs: "give up", "look forward to"
  - Compound nouns: "hot dog", "ice cream"
  - Fixed phrases: "in spite of", "as well as"
  Highlight all words that belong to any MWE in the sentence.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Sentence:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

示例数据sample-data.json

[
  {
    "id": "mwe_001",
    "text": "She decided to give up smoking after the doctor's warning about her health."
  },
  {
    "id": "mwe_002",
    "text": "The ice cream shop on Main Street is open until 10 PM every night."
  }
]

// ... and 8 more items

获取此设计

View on GitHub

Clone or download from the repository

快速开始:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2016/task10-minimal-semantic-units
potato start config.yaml

详情

标注类型

span

领域

SemEvalNLPMultiword ExpressionsLexical Semantics

应用场景

MWE DetectionCompositional SemanticsLexical Analysis

标签

semevalsemeval-2016shared-taskmwemultiword-expressionsdimsumsemantic-units

发现问题或想改进此设计?

提交 Issue