Lexical Complexity Prediction
Predict the complexity of words in context using both Likert scale and continuous slider ratings, based on SemEval-2021 Task 1 (Shardlow et al.). Annotators assess how difficult a target word is for a non-native English speaker to understand.
Configuration Fileconfig.yaml
# Lexical Complexity Prediction
# Based on Shardlow et al., SemEval 2021
# Paper: https://aclanthology.org/2021.semeval-1.1/
# Dataset: https://sites.google.com/view/lcpsharedtask2021
#
# Annotators rate how complex a target word is within its sentence context.
# Complexity is assessed on a 5-point Likert scale and a continuous slider
# from 0 (very simple) to 1 (very complex), targeting non-native speakers.
annotation_task_name: "Lexical Complexity Prediction"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: likert
name: complexity_likert
description: "Rate the complexity of the highlighted target word on a 5-point scale."
min_label: "Very Simple"
max_label: "Very Complex"
size: 5
- annotation_type: slider
name: complexity_slider
description: "Rate the complexity of the target word on a continuous scale from 0 (simplest) to 1 (most complex)."
min_value: 0
max_value: 1
starting_value: 0.5
annotation_instructions: |
You will see a sentence with a highlighted target word. Your task is to rate how
complex the target word is for a non-native English speaker.
1. Read the sentence and identify the target word shown in bold.
2. Rate the word's complexity on the 5-point Likert scale (Very Simple to Very Complex).
3. Also provide a continuous complexity rating using the slider (0 = simplest, 1 = most complex).
Consider factors like word frequency, number of syllables, and whether the word
has a simpler synonym that could replace it.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Sentence:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
<div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #a16207;">Target Word:</strong>
<span style="font-size: 18px; font-weight: bold; color: #b45309;">{{target_word}}</span>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
Sample Datasample-data.json
[
{
"id": "lcp_001",
"text": "The patient was diagnosed with a severe case of pneumonia after presenting with persistent cough and dyspnea.",
"target_word": "dyspnea"
},
{
"id": "lcp_002",
"text": "The children were playing in the garden and seemed very happy.",
"target_word": "happy"
}
]
// ... and 8 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/semeval/2021/task01-lexical-complexity potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
Graded Word Similarity in Context
Rate the semantic similarity of two words in their respective contexts on a graded scale, based on SemEval-2020 Task 3 (Armendariz et al.). Annotators assess how similar word meanings are when each word appears in a specific sentence context.
Multilingual Semantic Word Similarity
Graded word similarity judgment across multiple languages, based on SemEval-2017 Task 2. Annotators rate how semantically similar two words are on a continuous scale, supporting cross-lingual evaluation of distributional semantic models.
Semantic Textual Relatedness
Semantic textual relatedness task requiring annotators to rate the degree of semantic relatedness between sentence pairs using both a Likert scale and a continuous slider. Based on SemEval-2024 Task 1 (STR).