Skip to content
Showcase/SPIN Self-Play Preference Annotation
intermediatepreference

SPIN Self-Play Preference Annotation

Human vs. AI response discrimination for Self-Play Fine-Tuning, based on Chen et al., ICML 2024. Annotators identify which of two responses was written by a human versus an AI model, and rate the fluency of both responses.

Which response is better?Response AResponse BvsConfidence:Submit

Configuration Fileconfig.yaml

# SPIN Self-Play Preference Annotation
# Based on Chen et al., ICML 2024
# Paper: https://arxiv.org/abs/2401.01335
# Dataset: https://huggingface.co/datasets/UCLA-AGI/SPIN_iter0
#
# Annotation task for Self-Play Fine-Tuning (SPIN). The key idea is to
# train a language model to distinguish its own outputs from human-written
# responses. Annotators compare two responses to a prompt and identify
# which was written by a human versus generated by an AI model.
#
# Preference Labels:
# - Human Response: The annotator believes this is the human-written response
# - AI Response: The annotator believes this is the AI-generated response
# - Cannot Tell: The annotator cannot reliably distinguish between the two
#
# Fluency Rating:
# - Fluent: Both responses read naturally with no issues
# - Somewhat Fluent: Minor awkwardness or unnatural phrasing
# - Not Fluent: Significant issues with naturalness or coherence
#
# Annotation Guidelines:
# 1. Read the prompt carefully
# 2. Read both responses and try to identify which is human vs. AI
# 3. Select your judgment for the pairwise comparison
# 4. Rate the overall fluency of the responses

annotation_task_name: "SPIN Self-Play Preference Annotation"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  # Step 1: Identify human vs. AI response
  - annotation_type: pairwise
    name: source_preference
    description: "Which response do you believe is the human-written response?"
    mode: "binary"
    labels:
      - "Human Response"
      - "AI Response"
      - "Cannot Tell"
    keyboard_shortcuts:
      "Human Response": "a"
      "AI Response": "b"
      "Cannot Tell": "c"
    tooltips:
      "Human Response": "Response A appears to be written by a human"
      "AI Response": "Response A appears to be generated by AI (Response B is human)"
      "Cannot Tell": "Cannot reliably distinguish which is human-written"

  # Step 2: Fluency rating
  - annotation_type: radio
    name: fluency
    description: "How fluent and natural do the responses read overall?"
    labels:
      - "Fluent"
      - "Somewhat Fluent"
      - "Not Fluent"
    keyboard_shortcuts:
      "Fluent": "1"
      "Somewhat Fluent": "2"
      "Not Fluent": "3"
    tooltips:
      "Fluent": "Both responses read naturally with no awkwardness"
      "Somewhat Fluent": "Minor awkwardness or unnatural phrasing in one or both"
      "Not Fluent": "Significant issues with naturalness or coherence"

annotation_instructions: |
  You will see a prompt and two responses. One was written by a human and the other
  was generated by an AI language model. Your task is to identify which is which.

  For each item:
  1. Read the prompt carefully.
  2. Read both Response A and Response B thoroughly.
  3. Try to identify which response was written by a human and which by AI.
  4. Rate the overall fluency of both responses.

  Tips for distinguishing human vs. AI:
  - Human responses may be more concise or contain informal language
  - AI responses may be more structured, verbose, or formulaic
  - Look for patterns like numbered lists, hedging language, or overly balanced viewpoints
  - Trust your intuition - if both seem equally natural, select "Cannot Tell"

html_layout: |
  <div style="padding: 15px; max-width: 900px; margin: auto;">
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Prompt:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
    <div style="display: flex; gap: 16px;">
      <div style="flex: 1; background: #e3f2fd; border: 2px solid #1976d2; border-radius: 8px; padding: 16px;">
        <h4 style="margin-top: 0; color: #1976d2;">Response A:</h4>
        <div style="white-space: pre-wrap; font-size: 14px; line-height: 1.6;">{{response_a}}</div>
      </div>
      <div style="flex: 1; background: #fce4ec; border: 2px solid #c62828; border-radius: 8px; padding: 16px;">
        <h4 style="margin-top: 0; color: #c62828;">Response B:</h4>
        <div style="white-space: pre-wrap; font-size: 14px; line-height: 1.6;">{{response_b}}</div>
      </div>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

[
  {
    "id": "spin_001",
    "text": "What's the best way to learn a new language as an adult?",
    "response_a": "Honestly, immersion is the best way. I moved to Spain knowing barely any Spanish and within 6 months I was conversational. Duolingo and classes help, but nothing beats being forced to use the language every day. Also, don't be afraid to make mistakes - locals appreciate the effort.",
    "response_b": "Learning a new language as an adult can be achieved through several effective strategies:\n\n1. Consistent daily practice (even 15-30 minutes)\n2. Immersive experiences through media, conversation partners, or travel\n3. Spaced repetition systems for vocabulary retention\n4. Focus on high-frequency words first\n5. Practice speaking early, even if imperfectly\n\nResearch suggests that adults can actually learn languages efficiently due to their existing knowledge of grammar concepts and learning strategies."
  },
  {
    "id": "spin_002",
    "text": "Explain why exercise is important for mental health.",
    "response_a": "Exercise triggers the release of endorphins, serotonin, and other neurotransmitters that naturally improve mood and reduce stress. Regular physical activity has been shown to:\n\n1. Reduce symptoms of depression and anxiety\n2. Improve sleep quality\n3. Boost self-esteem and cognitive function\n4. Provide a healthy coping mechanism for stress\n5. Increase social interaction opportunities\n\nStudies indicate that even moderate exercise, such as 30 minutes of walking three times a week, can have significant positive effects on mental well-being.",
    "response_b": "When I started running three years ago after a rough patch, I didn't expect it to help my anxiety as much as it did. There's something about the physical exhaustion that quiets the mental noise. Science backs this up too - exercise releases feel-good chemicals in your brain. But beyond the chemistry, it gives you structure, goals, and sometimes community. Even on days I don't feel like going, I've never regretted a workout."
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/preference-learning/spin-self-play
potato start config.yaml

Details

Annotation Types

pairwiseradio

Domain

NLPAI Alignment

Use Cases

Preference LearningSelf-PlayLLM Training

Tags

spinself-playpreferencehuman-ai-distinctionicml2024

Found an issue or want to improve this design?

Open an Issue