SPIN Self-Play Preference Annotation
Human vs. AI response discrimination for Self-Play Fine-Tuning, based on Chen et al., ICML 2024. Annotators identify which of two responses was written by a human versus an AI model, and rate the fluency of both responses.
Configuration Fileconfig.yaml
# SPIN Self-Play Preference Annotation
# Based on Chen et al., ICML 2024
# Paper: https://arxiv.org/abs/2401.01335
# Dataset: https://huggingface.co/datasets/UCLA-AGI/SPIN_iter0
#
# Annotation task for Self-Play Fine-Tuning (SPIN). The key idea is to
# train a language model to distinguish its own outputs from human-written
# responses. Annotators compare two responses to a prompt and identify
# which was written by a human versus generated by an AI model.
#
# Preference Labels:
# - Human Response: The annotator believes this is the human-written response
# - AI Response: The annotator believes this is the AI-generated response
# - Cannot Tell: The annotator cannot reliably distinguish between the two
#
# Fluency Rating:
# - Fluent: Both responses read naturally with no issues
# - Somewhat Fluent: Minor awkwardness or unnatural phrasing
# - Not Fluent: Significant issues with naturalness or coherence
#
# Annotation Guidelines:
# 1. Read the prompt carefully
# 2. Read both responses and try to identify which is human vs. AI
# 3. Select your judgment for the pairwise comparison
# 4. Rate the overall fluency of the responses
annotation_task_name: "SPIN Self-Play Preference Annotation"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
# Step 1: Identify human vs. AI response
- annotation_type: pairwise
name: source_preference
description: "Which response do you believe is the human-written response?"
mode: "binary"
labels:
- "Human Response"
- "AI Response"
- "Cannot Tell"
keyboard_shortcuts:
"Human Response": "a"
"AI Response": "b"
"Cannot Tell": "c"
tooltips:
"Human Response": "Response A appears to be written by a human"
"AI Response": "Response A appears to be generated by AI (Response B is human)"
"Cannot Tell": "Cannot reliably distinguish which is human-written"
# Step 2: Fluency rating
- annotation_type: radio
name: fluency
description: "How fluent and natural do the responses read overall?"
labels:
- "Fluent"
- "Somewhat Fluent"
- "Not Fluent"
keyboard_shortcuts:
"Fluent": "1"
"Somewhat Fluent": "2"
"Not Fluent": "3"
tooltips:
"Fluent": "Both responses read naturally with no awkwardness"
"Somewhat Fluent": "Minor awkwardness or unnatural phrasing in one or both"
"Not Fluent": "Significant issues with naturalness or coherence"
annotation_instructions: |
You will see a prompt and two responses. One was written by a human and the other
was generated by an AI language model. Your task is to identify which is which.
For each item:
1. Read the prompt carefully.
2. Read both Response A and Response B thoroughly.
3. Try to identify which response was written by a human and which by AI.
4. Rate the overall fluency of both responses.
Tips for distinguishing human vs. AI:
- Human responses may be more concise or contain informal language
- AI responses may be more structured, verbose, or formulaic
- Look for patterns like numbered lists, hedging language, or overly balanced viewpoints
- Trust your intuition - if both seem equally natural, select "Cannot Tell"
html_layout: |
<div style="padding: 15px; max-width: 900px; margin: auto;">
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Prompt:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
<div style="display: flex; gap: 16px;">
<div style="flex: 1; background: #e3f2fd; border: 2px solid #1976d2; border-radius: 8px; padding: 16px;">
<h4 style="margin-top: 0; color: #1976d2;">Response A:</h4>
<div style="white-space: pre-wrap; font-size: 14px; line-height: 1.6;">{{response_a}}</div>
</div>
<div style="flex: 1; background: #fce4ec; border: 2px solid #c62828; border-radius: 8px; padding: 16px;">
<h4 style="margin-top: 0; color: #c62828;">Response B:</h4>
<div style="white-space: pre-wrap; font-size: 14px; line-height: 1.6;">{{response_b}}</div>
</div>
</div>
</div>
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false
Sample Datasample-data.json
[
{
"id": "spin_001",
"text": "What's the best way to learn a new language as an adult?",
"response_a": "Honestly, immersion is the best way. I moved to Spain knowing barely any Spanish and within 6 months I was conversational. Duolingo and classes help, but nothing beats being forced to use the language every day. Also, don't be afraid to make mistakes - locals appreciate the effort.",
"response_b": "Learning a new language as an adult can be achieved through several effective strategies:\n\n1. Consistent daily practice (even 15-30 minutes)\n2. Immersive experiences through media, conversation partners, or travel\n3. Spaced repetition systems for vocabulary retention\n4. Focus on high-frequency words first\n5. Practice speaking early, even if imperfectly\n\nResearch suggests that adults can actually learn languages efficiently due to their existing knowledge of grammar concepts and learning strategies."
},
{
"id": "spin_002",
"text": "Explain why exercise is important for mental health.",
"response_a": "Exercise triggers the release of endorphins, serotonin, and other neurotransmitters that naturally improve mood and reduce stress. Regular physical activity has been shown to:\n\n1. Reduce symptoms of depression and anxiety\n2. Improve sleep quality\n3. Boost self-esteem and cognitive function\n4. Provide a healthy coping mechanism for stress\n5. Increase social interaction opportunities\n\nStudies indicate that even moderate exercise, such as 30 minutes of walking three times a week, can have significant positive effects on mental well-being.",
"response_b": "When I started running three years ago after a rough patch, I didn't expect it to help my anxiety as much as it did. There's something about the physical exhaustion that quiets the mental noise. Science backs this up too - exercise releases feel-good chemicals in your brain. But beyond the chemistry, it gives you structure, goals, and sometimes community. Even on days I don't feel like going, I've never regretted a workout."
}
]
// ... and 8 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/preference-learning/spin-self-play potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
DPO Preference Data Collection
Pairwise preference annotation for Direct Preference Optimization, based on Rafailov et al., NeurIPS 2023. Annotators compare two model responses to a prompt, select a preference, rate alignment dimensions, and provide reasoning.
RewardBench - Reward Model Evaluation
Evaluation of reward model preferences via pairwise comparison of chosen and rejected responses. Annotators judge which response is better across chat, safety, reasoning, and coding categories, and rate response quality on multiple dimensions.
RewardBench: Reward Model Evaluation via Pairwise Preference
Reward model evaluation through pairwise preference annotation. Annotators choose between a 'chosen' (better) and 'rejected' (worse) response across diverse categories (chat, safety, reasoning, coding).