Lexical Complexity Prediction

Predict the complexity of words in context using both Likert scale and continuous slider ratings, based on SemEval-2021 Task 1 (Shardlow et al.). Annotators assess how difficult a target word is for a non-native English speaker to understand.

Configuration Fileconfig.yaml

This Potato config reproduces the annotation task. Save it as config.yaml and run potato start config.yaml to try it.

yaml

# Lexical Complexity Prediction
# Based on Shardlow et al., SemEval 2021
# Paper: https://aclanthology.org/2021.semeval-1.1/
# Dataset: https://sites.google.com/view/lcpsharedtask2021
#
# Annotators rate how complex a target word is within its sentence context.
# Complexity is assessed on a 5-point Likert scale and a continuous slider
# from 0 (very simple) to 1 (very complex), targeting non-native speakers.

annotation_task_name: "Lexical Complexity Prediction"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: likert
    name: complexity_likert
    description: "Rate the complexity of the highlighted target word on a 5-point scale."
    min_label: "Very Simple"
    max_label: "Very Complex"
    size: 5

  - annotation_type: slider
    name: complexity_slider
    description: "Rate the complexity of the target word on a continuous scale from 0 (simplest) to 1 (most complex)."
    min_value: 0
    max_value: 1
    starting_value: 0.5

annotation_instructions: |
  You will see a sentence with a highlighted target word. Your task is to rate how
  complex the target word is for a non-native English speaker.

  1. Read the sentence and identify the target word shown in bold.
  2. Rate the word's complexity on the 5-point Likert scale (Very Simple to Very Complex).
  3. Also provide a continuous complexity rating using the slider (0 = simplest, 1 = most complex).

  Consider factors like word frequency, number of syllables, and whether the word
  has a simpler synonym that could replace it.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Sentence:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
    <div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #a16207;">Target Word:</strong>
      <span style="font-size: 18px; font-weight: bold; color: #b45309;">{{target_word}}</span>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

json

[
  {
    "id": "lcp_001",
    "text": "The patient was diagnosed with a severe case of pneumonia after presenting with persistent cough and dyspnea.",
    "target_word": "dyspnea"
  },
  {
    "id": "lcp_002",
    "text": "The children were playing in the garden and seemed very happy.",
    "target_word": "happy"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2021/task01-lexical-complexity
potato start config.yaml

Dataset & paper

Shardlow et al., SemEval 2021

Official dataset ↗Read the paper ↗

Citation (BibTeX)

bibtex

@inproceedings{shardlow-etal-2021-semeval,
    title = "{S}em{E}val-2021 {T}ask 1: {L}exical {C}omplexity {P}rediction",
    author = "Shardlow, Matthew  and Evans, Richard  and Paetzold, Gustavo Henrique  and Zampieri, Marcos",
    booktitle = "Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)",
    year = "2021",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.semeval-1.1"
}

Details

Annotation Types

likertslider

Domain

NLPSemEval

Use Cases

Lexical ComplexityReadability Assessment

Related Designs

Graded Word Similarity in Context

Rate the semantic similarity of two words in their respective contexts on a graded scale, based on SemEval-2020 Task 3 (Armendariz et al.). Annotators assess how similar word meanings are when each word appears in a specific sentence context.

likertslider

Multilingual Semantic Word Similarity

Graded word similarity judgment across multiple languages, based on SemEval-2017 Task 2. Annotators rate how semantically similar two words are on a continuous scale, supporting cross-lingual evaluation of distributional semantic models.

likertslider

Semantic Textual Relatedness

Semantic textual relatedness task requiring annotators to rate the degree of semantic relatedness between sentence pairs using both a Likert scale and a continuous slider. Based on SemEval-2024 Task 1 (STR).

likertslider

Lexical Complexity Prediction

Configuration Fileconfig.yaml

Sample Datasample-data.json

Get This Design

Dataset & paper

Details

Annotation Types

Domain

Use Cases

Tags

Related Designs

Graded Word Similarity in Context

Multilingual Semantic Word Similarity

Semantic Textual Relatedness