Complex Word Identification

Binary classification of words as complex or simple for text simplification purposes, identifying words that may be difficult for non-native speakers or people with reading difficulties. Based on SemEval-2016 Task 11.

配置文件config.yaml

# Complex Word Identification
# Based on Paetzold and Specia, SemEval 2016
# Paper: https://aclanthology.org/S16-1085/
# Dataset: https://www.cs.york.ac.uk/semeval-2016/task11/
#
# This task asks annotators to determine whether a target word in context
# is complex (difficult to understand for non-native speakers or people
# with reading difficulties) or simple.
#
# Classification Labels:
# - Complex: The target word would be difficult for a non-native speaker
# - Simple: The target word is easy to understand

annotation_task_name: "Complex Word Identification"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: radio
    name: complexity_judgment
    description: "Is the target word complex or simple for a non-native English speaker?"
    labels:
      - "Complex"
      - "Simple"
    keyboard_shortcuts:
      "Complex": "1"
      "Simple": "2"
    tooltips:
      "Complex": "The word would be difficult for a non-native speaker or someone with reading difficulties"
      "Simple": "The word is common and easy to understand"

annotation_instructions: |
  You will be shown a sentence with a target word highlighted. Your task is to
  determine whether the target word would be considered complex (difficult) or
  simple (easy) for a non-native English speaker with intermediate proficiency.
  Consider factors like word frequency, length, and familiarity.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 12px; margin-bottom: 12px;">
      <strong style="color: #a16207;">Target Word:</strong>
      <span style="font-size: 17px; font-weight: bold;">{{target_word}}</span>
    </div>
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Sentence:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

示例数据sample-data.json

[
  {
    "id": "cwi_001",
    "text": "The exacerbation of symptoms was noted during the follow-up examination.",
    "target_word": "exacerbation"
  },
  {
    "id": "cwi_002",
    "text": "The dog ran quickly across the green field.",
    "target_word": "quickly"
  }
]

// ... and 8 more items

获取此设计

View on GitHub

Clone or download from the repository

快速开始：

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2016/task11-complex-word-identification
potato start config.yaml

详情

标注类型

radio

领域

SemEvalNLPText SimplificationReadability

应用场景

Complex Word IdentificationText SimplificationReadability Assessment

Complex Word Identification

配置文件config.yaml

示例数据sample-data.json

获取此设计

详情

标注类型

领域

应用场景

标签

相关设计

ADMIRE - Multimodal Idiomaticity Recognition

AfriSenti - African Language Sentiment

Argument Reasoning in Civil Procedure