Detecting Minimal Semantic Units and Their Meanings (DiMSUM)
Identification of multiword expression components in text through span annotation, detecting minimal semantic units that function as single meaning-bearing elements. Based on SemEval-2016 Task 10.
Configuration Fileconfig.yaml
# Detecting Minimal Semantic Units and Their Meanings (DiMSUM)
# Based on Schneider et al., SemEval 2016
# Paper: https://aclanthology.org/S16-1084/
# Dataset: https://dimsum16.github.io/
#
# This task asks annotators to identify multiword expression (MWE)
# components in text. MWEs are sequences of words that function together
# as a single semantic unit (e.g., "kick the bucket", "in spite of",
# "hot dog").
#
# Span Labels:
# - MWE Component: A word that is part of a multiword expression
annotation_task_name: "Detecting Minimal Semantic Units (DiMSUM)"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: span
name: mwe_components
description: "Highlight all words that are components of multiword expressions."
labels:
- "MWE Component"
annotation_instructions: |
You will be shown a sentence. Your task is to identify all multiword expressions
(MWEs) by highlighting their component words. MWEs are sequences of words that
function together as a single meaning unit, including:
- Idioms: "kick the bucket", "break the ice"
- Phrasal verbs: "give up", "look forward to"
- Compound nouns: "hot dog", "ice cream"
- Fixed phrases: "in spite of", "as well as"
Highlight all words that belong to any MWE in the sentence.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Sentence:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
Sample Datasample-data.json
[
{
"id": "mwe_001",
"text": "She decided to give up smoking after the doctor's warning about her health."
},
{
"id": "mwe_002",
"text": "The ice cream shop on Main Street is open until 10 PM every night."
}
]
// ... and 8 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/semeval/2016/task10-minimal-semantic-units potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
Aspect-Based Sentiment Analysis
Identification of aspect terms in review text with sentiment polarity classification for each aspect. Based on SemEval-2016 Task 5 (ABSA).
Causal Medical Claim Detection and PICO Extraction
Detection of causal claims in medical texts and extraction of PICO (Population, Intervention, Comparator, Outcome) elements. Based on SemEval-2023 Task 8 (Khetan et al.).
Character Identification on Multiparty Dialogues
Identification and linking of character mentions in TV show dialogue, combining span annotation with entity resolution for the main cast of Friends. Based on SemEval-2018 Task 4.