intermediatepreference
AlpacaFarm Preference Simulation
Simulate human preferences for instruction-following responses. Create preference data for efficient RLHF research and LLM evaluation.
Configuration Fileconfig.yaml
# AlpacaFarm Preference Simulation Configuration
# Based on Dubois et al., NeurIPS 2023
# Task: Collect preferences for instruction-following
annotation_task_name: "AlpacaFarm Preference Simulation"
task_dir: "."
data_files:
- data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
- name: "preference"
description: |
Which response better follows the instruction?
Consider helpfulness, accuracy, and appropriateness.
annotation_type: radio
labels:
- "Response 1 is much better"
- "Response 1 is slightly better"
- "Tie - both are equal"
- "Response 2 is slightly better"
- "Response 2 is much better"
- name: "preference_reason"
description: "Primary reason for your preference:"
annotation_type: radio
labels:
- "More accurate/correct"
- "More helpful/useful"
- "Better formatted/organized"
- "More appropriate tone"
- "More complete"
- "More concise"
- "Followed instructions better"
- "Both equally good/bad"
- "Other"
- name: "response1_quality"
description: "Rate Response 1 quality (1-5):"
annotation_type: likert
min_label: "1 - Very poor"
max_label: "5 - Excellent"
size: 5
- name: "response2_quality"
description: "Rate Response 2 quality (1-5):"
annotation_type: likert
min_label: "1 - Very poor"
max_label: "5 - Excellent"
size: 5
- name: "task_difficulty"
description: "How difficult was this instruction to follow well?"
annotation_type: radio
labels:
- "Very easy"
- "Easy"
- "Moderate"
- "Difficult"
- "Very difficult"
- name: "annotation_confidence"
description: "How confident are you in your preference?"
annotation_type: radio
labels:
- "Very confident"
- "Somewhat confident"
- "Not very confident"
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 2
annotation_instructions: |
## AlpacaFarm Preference Annotation
Compare two AI responses to the same instruction and indicate your preference.
### Evaluation Criteria:
**Instruction Following**
- Did it do what was asked?
- Did it follow any specific requirements?
- Did it stay on topic?
**Helpfulness**
- Would this actually help the user?
- Is the information useful?
- Is it actionable?
**Accuracy**
- Is the information correct?
- Are there factual errors?
- Is it misleading?
**Quality**
- Is it well-written?
- Is it appropriately detailed?
- Is the tone appropriate?
### Preference Scale:
- **Much better**: Clear, obvious winner
- **Slightly better**: Marginal advantage
- **Tie**: Genuinely indistinguishable
### Guidelines:
- Read the instruction first
- Consider what a typical user would want
- Don't overthink - go with your gut
- Ties are okay when responses are equal
### Common Reasons for Preference:
- More accurate information
- Better addresses the actual question
- More helpful/actionable
- Better organized/formatted
- More appropriate length
- Better tone for the context
Sample Datasample-data.json
[
{
"id": "alpaca_001",
"instruction": "Explain the concept of compound interest to a teenager.",
"response_1": "Compound interest is when you earn interest on both your original money AND on the interest you've already earned. Imagine you put $100 in a savings account with 10% interest. After year 1, you have $110. In year 2, you earn 10% on $110 (not just $100), giving you $121. It's like a snowball rolling downhill, getting bigger and bigger!",
"response_2": "Compound interest is the interest calculated on the initial principal and also on the accumulated interest from previous periods. The formula is A = P(1 + r/n)^(nt), where A is the final amount, P is principal, r is the annual interest rate, n is the number of times interest is compounded per year, and t is the number of years."
},
{
"id": "alpaca_002",
"instruction": "Write a short poem about coffee.",
"response_1": "Morning's dark embrace,\nSteam rises, warmth fills the cup,\nEnergy awaits.",
"response_2": "Coffee, oh coffee, so brown and so hot,\nI drink you each morning, I like you a lot.\nYou wake me up when I'm feeling so tired,\nYour caffeine boost is what I desired."
}
]Get This Design
View on GitHub
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/preference-learning/alpacafarm-simulation potato start config.yaml
Details
Annotation Types
likertradio
Domain
Natural Language ProcessingAI Alignment
Use Cases
RLHFPreference LearningModel Evaluation
Tags
preferencesimulationinstructionalpacarlhfllm
Found an issue or want to improve this design?
Open an IssueRelated Designs
InstructGPT Instruction Following
Evaluate how well AI responses follow user instructions. Compare outputs on helpfulness, truthfulness, and harmlessness for RLHF training.
likertradio
Constitutional AI Harmlessness Evaluation
Evaluate AI assistant responses for harmlessness and helpfulness based on the Constitutional AI framework by Anthropic. Annotators rate responses on a harmfulness scale, assess helpfulness, and provide explanations for their judgments.
radiolikert
OpenAssistant Conversation Quality
Rate AI assistant responses across multiple quality dimensions. Evaluate conversations for the OpenAssistant crowdsourced dataset.
likertradio