Interpretable Semantic Textual Similarity

Fine-grained semantic similarity assessment between sentence pairs with span alignment, combining chunk-level annotation with graded similarity scoring. Based on SemEval-2016 Task 2.

Configuration Fileconfig.yaml

This Potato config reproduces the annotation task. Save it as config.yaml and run potato start config.yaml to try it.

yaml

# Interpretable Semantic Textual Similarity
# Based on Agirre et al., SemEval 2016
# Paper: https://aclanthology.org/S16-1082/
# Dataset: http://ixa2.si.ehu.eus/stswiki/
#
# This task asks annotators to assess semantic similarity between
# sentence pairs at a fine-grained level. Annotators highlight aligned
# chunks and rate the overall similarity on a 6-point scale.
#
# Span Labels:
# - Aligned Chunk: A text chunk that corresponds to content in the other sentence
#
# Similarity Scale (Likert 1-6):
# 1 = Completely Different (no semantic overlap)
# 6 = Identical Meaning (same meaning, possibly different wording)

annotation_task_name: "Interpretable Semantic Textual Similarity"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: span
    name: aligned_chunks
    description: "Highlight chunks in Sentence 1 that align with content in Sentence 2."
    labels:
      - "Aligned Chunk"

  - annotation_type: likert
    name: similarity_score
    description: "How semantically similar are the two sentences?"
    min_label: "Completely Different"
    max_label: "Identical Meaning"
    size: 6

annotation_instructions: |
  You will be shown two sentences. Your task is to:
  1. Highlight chunks in Sentence 1 that correspond to content in Sentence 2.
  2. Rate the overall semantic similarity between the two sentences on a 1-6 scale.
  Consider both the meaning and the information conveyed by each sentence.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 12px;">
      <strong style="color: #0369a1;">Sentence 1:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
    <div style="background: #f0fdf4; border: 1px solid #bbf7d0; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #15803d;">Sentence 2:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{sentence_2}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

json

[
  {
    "id": "ists_001",
    "text": "A man is playing a guitar on stage.",
    "sentence_2": "A musician performs with a guitar in front of an audience."
  },
  {
    "id": "ists_002",
    "text": "The cat sat on the mat near the fireplace.",
    "sentence_2": "A dog was sleeping in the garden outside."
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2016/task02-interpretable-sts
potato start config.yaml

Dataset & paper

Agirre et al., SemEval 2016

Official dataset ↗Read the paper ↗

Citation (BibTeX)

bibtex

@inproceedings{agirre-etal-2016-semeval,
    title = "{S}em{E}val-2016 Task 2: Interpretable Semantic Textual Similarity",
    author = "Agirre, Eneko and Banea, Carmen and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Mihalcea, Rada and Rigau, German and Wiebe, Janyce",
    booktitle = "Proceedings of the 10th International Workshop on Semantic Evaluation",
    year = "2016",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/S16-1082"
}

Details

Annotation Types

spanlikert

Domain

SemEvalNLPSemantic SimilarityText Alignment

Use Cases

Semantic SimilarityText AlignmentParaphrase Detection

Related Designs

Semantic Textual Relatedness

Semantic textual relatedness task requiring annotators to rate the degree of semantic relatedness between sentence pairs using both a Likert scale and a continuous slider. Based on SemEval-2024 Task 1 (STR).

likertslider

ESA: Error Span Annotation for Machine Translation

Error span annotation for machine translation output. Annotators identify error spans in translations, classify error types (accuracy, fluency, terminology, style), and rate severity.

spanradio

LongEval: Faithfulness Evaluation for Long-Form Summarization

LongEval is the EACL 2023 protocol for human evaluation of faithfulness in long-form summaries (Krishna et al.). This Potato config reproduces its fine-grained, clause-level faithfulness judgments against source documents.