intermediatetext
Detecting Minimal Semantic Units and Their Meanings (DiMSUM)
Identification of multiword expression components in text through span annotation, detecting minimal semantic units that function as single meaning-bearing elements. Based on SemEval-2016 Task 10.
配置文件config.yaml
# Detecting Minimal Semantic Units and Their Meanings (DiMSUM)
# Based on Schneider et al., SemEval 2016
# Paper: https://aclanthology.org/S16-1084/
# Dataset: https://dimsum16.github.io/
#
# This task asks annotators to identify multiword expression (MWE)
# components in text. MWEs are sequences of words that function together
# as a single semantic unit (e.g., "kick the bucket", "in spite of",
# "hot dog").
#
# Span Labels:
# - MWE Component: A word that is part of a multiword expression
annotation_task_name: "Detecting Minimal Semantic Units (DiMSUM)"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: span
name: mwe_components
description: "Highlight all words that are components of multiword expressions."
labels:
- "MWE Component"
annotation_instructions: |
You will be shown a sentence. Your task is to identify all multiword expressions
(MWEs) by highlighting their component words. MWEs are sequences of words that
function together as a single meaning unit, including:
- Idioms: "kick the bucket", "break the ice"
- Phrasal verbs: "give up", "look forward to"
- Compound nouns: "hot dog", "ice cream"
- Fixed phrases: "in spite of", "as well as"
Highlight all words that belong to any MWE in the sentence.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Sentence:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
示例数据sample-data.json
[
{
"id": "mwe_001",
"text": "She decided to give up smoking after the doctor's warning about her health."
},
{
"id": "mwe_002",
"text": "The ice cream shop on Main Street is open until 10 PM every night."
}
]
// ... and 8 more items获取此设计
View on GitHub
Clone or download from the repository
快速开始:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/semeval/2016/task10-minimal-semantic-units potato start config.yaml
详情
标注类型
span
领域
SemEvalNLPMultiword ExpressionsLexical Semantics
应用场景
MWE DetectionCompositional SemanticsLexical Analysis
标签
semevalsemeval-2016shared-taskmwemultiword-expressionsdimsumsemantic-units
发现问题或想改进此设计?
提交 Issue相关设计
Aspect-Based Sentiment Analysis
Identification of aspect terms in review text with sentiment polarity classification for each aspect. Based on SemEval-2016 Task 5 (ABSA).
spanradio
Causal Medical Claim Detection and PICO Extraction
Detection of causal claims in medical texts and extraction of PICO (Population, Intervention, Comparator, Outcome) elements. Based on SemEval-2023 Task 8 (Khetan et al.).
spanradio
Character Identification on Multiparty Dialogues
Identification and linking of character mentions in TV show dialogue, combining span annotation with entity resolution for the main cast of Friends. Based on SemEval-2018 Task 4.
spanradio