Complex Word Identification
Binary classification of words as complex or simple for text simplification purposes, identifying words that may be difficult for non-native speakers or people with reading difficulties. Based on SemEval-2016 Task 11.
配置文件config.yaml
# Complex Word Identification
# Based on Paetzold and Specia, SemEval 2016
# Paper: https://aclanthology.org/S16-1085/
# Dataset: https://www.cs.york.ac.uk/semeval-2016/task11/
#
# This task asks annotators to determine whether a target word in context
# is complex (difficult to understand for non-native speakers or people
# with reading difficulties) or simple.
#
# Classification Labels:
# - Complex: The target word would be difficult for a non-native speaker
# - Simple: The target word is easy to understand
annotation_task_name: "Complex Word Identification"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: radio
name: complexity_judgment
description: "Is the target word complex or simple for a non-native English speaker?"
labels:
- "Complex"
- "Simple"
keyboard_shortcuts:
"Complex": "1"
"Simple": "2"
tooltips:
"Complex": "The word would be difficult for a non-native speaker or someone with reading difficulties"
"Simple": "The word is common and easy to understand"
annotation_instructions: |
You will be shown a sentence with a target word highlighted. Your task is to
determine whether the target word would be considered complex (difficult) or
simple (easy) for a non-native English speaker with intermediate proficiency.
Consider factors like word frequency, length, and familiarity.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 12px; margin-bottom: 12px;">
<strong style="color: #a16207;">Target Word:</strong>
<span style="font-size: 17px; font-weight: bold;">{{target_word}}</span>
</div>
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Sentence:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
示例数据sample-data.json
[
{
"id": "cwi_001",
"text": "The exacerbation of symptoms was noted during the follow-up examination.",
"target_word": "exacerbation"
},
{
"id": "cwi_002",
"text": "The dog ran quickly across the green field.",
"target_word": "quickly"
}
]
// ... and 8 more items获取此设计
Clone or download from the repository
快速开始:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/semeval/2016/task11-complex-word-identification potato start config.yaml
详情
标注类型
领域
应用场景
标签
发现问题或想改进此设计?
提交 Issue相关设计
ADMIRE - Multimodal Idiomaticity Recognition
Multimodal idiomaticity detection task requiring annotators to identify whether expressions are used idiomatically or literally, with supporting cue analysis. Based on SemEval-2025 Task 1 (ADMIRE).
AfriSenti - African Language Sentiment
Sentiment analysis for tweets in African languages, classifying text as positive, negative, or neutral. Covers 14 African languages including Amharic, Hausa, Igbo, Yoruba, and Swahili. Based on SemEval-2023 Task 12 (Muhammad et al.).
Argument Reasoning in Civil Procedure
Legal argument reasoning task requiring annotators to answer multiple-choice questions about civil procedure by selecting the best answer and providing legal reasoning. Based on SemEval-2024 Task 5.