CODWOE - Comparing Dictionaries and Word Embeddings

Definition generation and concreteness classification for words in context, comparing dictionary definitions with distributional word representations. Based on SemEval-2022 Task 1 (Mickus et al.).

ملف الإعدادconfig.yaml

# CODWOE - Comparing Dictionaries and Word Embeddings
# Based on Mickus et al., SemEval 2022
# Paper: https://aclanthology.org/2022.semeval-1.1/
# Dataset: https://codwoe.atilf.fr/
#
# This task asks annotators to write a definition for a target word based
# on the context in which it appears, and to classify whether the word
# is used in a concrete, abstract, or mixed sense.
#
# Concreteness Labels:
# - Concrete: The word refers to something tangible or perceivable
# - Abstract: The word refers to a concept, quality, or intangible entity
# - Mixed: The word has both concrete and abstract aspects in this context

annotation_task_name: "CODWOE - Comparing Dictionaries and Word Embeddings"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: text
    name: definition
    description: "Write a definition for the target word as used in this context"

  - annotation_type: radio
    name: concreteness
    description: "Is the target word used in a concrete, abstract, or mixed sense?"
    labels:
      - "Concrete"
      - "Abstract"
      - "Mixed"
    keyboard_shortcuts:
      "Concrete": "1"
      "Abstract": "2"
      "Mixed": "3"
    tooltips:
      "Concrete": "The word refers to something tangible, physical, or directly perceivable by the senses"
      "Abstract": "The word refers to a concept, quality, state, or intangible entity"
      "Mixed": "The word has both concrete and abstract aspects in this particular context"

annotation_instructions: |
  You will see a sentence with a target word highlighted and the language of the text.
  1. Read the sentence and identify the target word.
  2. Write a clear, concise definition for the target word as it is used in this context.
  3. Classify whether the target word is used in a concrete, abstract, or mixed sense.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="display: flex; gap: 12px; margin-bottom: 12px;">
      <div style="background: #ecfdf5; border: 1px solid #a7f3d0; border-radius: 8px; padding: 12px; flex: 1;">
        <strong style="color: #065f46;">Language:</strong>
        <span style="font-size: 15px; margin-left: 8px;">{{language}}</span>
      </div>
      <div style="background: #fef3c7; border: 1px solid #fde68a; border-radius: 8px; padding: 12px; flex: 1;">
        <strong style="color: #92400e;">Target Word:</strong>
        <span style="font-size: 15px; font-weight: bold; margin-left: 8px;">{{target_word}}</span>
      </div>
    </div>
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Context:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

بيانات نموذجيةsample-data.json

[
  {
    "id": "codwoe_001",
    "text": "The architect presented her vision for a sustainable bridge that would connect the two neighborhoods across the river.",
    "target_word": "bridge",
    "language": "English"
  },
  {
    "id": "codwoe_002",
    "text": "After years of conflict, the treaty served as a bridge between the two nations, fostering mutual understanding and cooperation.",
    "target_word": "bridge",
    "language": "English"
  }
]

// ... and 8 more items

احصل على هذا التصميم

View on GitHub

Clone or download from the repository

بدء سريع:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2022/task01-codwoe
potato start config.yaml

التفاصيل

أنواع التوسيم

textradio

المجال

NLPLexical SemanticsSemEval

حالات الاستخدام

Definition GenerationWord SenseLexicography

الوسوم

semevalsemeval-2022shared-taskdefinitionsword-embeddingslexical-semanticsmultilingual

وجدت مشكلة أو تريد تحسين هذا التصميم؟

افتح مشكلة

تصاميم ذات صلة

Hypernym Discovery

Discovery of hypernyms (broader terms) for a given target term in context, with classification of the hypernym relationship type. Based on SemEval-2018 Task 9.

textradio

Argument Reasoning in Civil Procedure

Legal argument reasoning task requiring annotators to answer multiple-choice questions about civil procedure by selecting the best answer and providing legal reasoning. Based on SemEval-2024 Task 5.

radiotext

BRAINTEASER - Commonsense-Defying QA

Lateral thinking and commonsense-defying question answering task requiring annotators to select answers to brain teasers that defy default commonsense assumptions and provide explanations. Based on SemEval-2024 Task 9 (BRAINTEASER).

radiotext