Interpretable Semantic Textual Similarity

Fine-grained semantic similarity assessment between sentence pairs with span alignment, combining chunk-level annotation with graded similarity scoring. Based on SemEval-2016 Task 2.

Fichier de configurationconfig.yaml

# Interpretable Semantic Textual Similarity
# Based on Agirre et al., SemEval 2016
# Paper: https://aclanthology.org/S16-1082/
# Dataset: http://ixa2.si.ehu.eus/stswiki/
#
# This task asks annotators to assess semantic similarity between
# sentence pairs at a fine-grained level. Annotators highlight aligned
# chunks and rate the overall similarity on a 6-point scale.
#
# Span Labels:
# - Aligned Chunk: A text chunk that corresponds to content in the other sentence
#
# Similarity Scale (Likert 1-6):
# 1 = Completely Different (no semantic overlap)
# 6 = Identical Meaning (same meaning, possibly different wording)

annotation_task_name: "Interpretable Semantic Textual Similarity"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: span
    name: aligned_chunks
    description: "Highlight chunks in Sentence 1 that align with content in Sentence 2."
    labels:
      - "Aligned Chunk"

  - annotation_type: likert
    name: similarity_score
    description: "How semantically similar are the two sentences?"
    min_label: "Completely Different"
    max_label: "Identical Meaning"
    size: 6

annotation_instructions: |
  You will be shown two sentences. Your task is to:
  1. Highlight chunks in Sentence 1 that correspond to content in Sentence 2.
  2. Rate the overall semantic similarity between the two sentences on a 1-6 scale.
  Consider both the meaning and the information conveyed by each sentence.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 12px;">
      <strong style="color: #0369a1;">Sentence 1:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
    <div style="background: #f0fdf4; border: 1px solid #bbf7d0; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #15803d;">Sentence 2:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{sentence_2}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Données d'exemplesample-data.json

[
  {
    "id": "ists_001",
    "text": "A man is playing a guitar on stage.",
    "sentence_2": "A musician performs with a guitar in front of an audience."
  },
  {
    "id": "ists_002",
    "text": "The cat sat on the mat near the fireplace.",
    "sentence_2": "A dog was sleeping in the garden outside."
  }
]

// ... and 8 more items

Obtenir ce design

View on GitHub

Clone or download from the repository

Démarrage rapide :

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2016/task02-interpretable-sts
potato start config.yaml

Détails

Types d'annotation

spanlikert

Domaine

SemEvalNLPSemantic SimilarityText Alignment

Cas d'utilisation

Semantic SimilarityText AlignmentParaphrase Detection

Étiquettes

semevalsemeval-2016shared-tasksemantic-similarityinterpretablealignmentsts

Vous avez trouvé un problème ou souhaitez améliorer ce design ?

Ouvrir un ticket

Designs associés

Semantic Textual Relatedness

Semantic textual relatedness task requiring annotators to rate the degree of semantic relatedness between sentence pairs using both a Likert scale and a continuous slider. Based on SemEval-2024 Task 1 (STR).

likertslider

ESA: Error Span Annotation for Machine Translation

Error span annotation for machine translation output. Annotators identify error spans in translations, classify error types (accuracy, fluency, terminology, style), and rate severity.

spanradio

LongEval: Faithfulness Evaluation for Long-form Summarization

Faithfulness evaluation of long-form summaries. Annotators identify atomic content units in summaries, check each against source documents for faithfulness, and rate overall summary quality.

spanradio