Pairwise Comparison
Compare pairs of items for preference and quality assessment.
Pairwise Comparison
Pairwise comparison presents annotators with two items and asks them to choose which is better, making it ideal for preference learning, quality assessment, and ranking tasks.
Basic Configuration
annotation_schemes:
- annotation_type: pairwise
name: preference
description: "Which response is better?"
options:
- label: "A is better"
value: "A"
- label: "B is better"
value: "B"
- label: "Tie"
value: "tie"Data Format
Pairwise tasks require data with two items per instance:
{
"id": "pair_1",
"prompt": "What is the capital of France?",
"response_a": "The capital of France is Paris.",
"response_b": "Paris is the capital city of France, located in the north-central part of the country."
}Configure the display fields:
item_a_field: response_a
item_b_field: response_b
context_field: promptConfiguration Options
Custom Labels
- annotation_type: pairwise
name: quality
options:
- label: "Response A is significantly better"
value: "A_strong"
- label: "Response A is slightly better"
value: "A_weak"
- label: "About the same"
value: "tie"
- label: "Response B is slightly better"
value: "B_weak"
- label: "Response B is significantly better"
value: "B_strong"Keyboard Shortcuts
- annotation_type: pairwise
name: preference
keyboard_shortcuts:
A: "1"
B: "2"
tie: "3"No Tie Option
Force a choice:
- annotation_type: pairwise
name: preference
options:
- label: "A is better"
value: "A"
- label: "B is better"
value: "B"
allow_tie: falseDisplay Configuration
Side-by-Side Layout
display:
layout: side_by_side
item_a_title: "Response A"
item_b_title: "Response B"Stacked Layout
display:
layout: stacked
item_a_title: "Option 1"
item_b_title: "Option 2"Show Context
Display shared context above comparisons:
display:
show_context: true
context_title: "Question"Common Use Cases
LLM Response Preference
task_name: "Response Quality Comparison"
data_files:
- path: data/comparisons.json
item_a_field: response_a
item_b_field: response_b
context_field: prompt
annotation_schemes:
- annotation_type: pairwise
name: overall_preference
description: "Which response is better overall?"
options:
- label: "A is much better"
value: "A++"
- label: "A is better"
value: "A+"
- label: "About equal"
value: "="
- label: "B is better"
value: "B+"
- label: "B is much better"
value: "B++"Translation Quality
annotation_schemes:
- annotation_type: pairwise
name: translation_preference
description: "Which translation is more accurate?"
options:
- label: "Translation A"
value: "A"
- label: "Translation B"
value: "B"
- label: "Both equally good"
value: "tie"
- label: "Both equally bad"
value: "neither"Summary Evaluation
annotation_schemes:
- annotation_type: pairwise
name: summary_quality
description: "Which summary better captures the main points?"
options:
- label: "Summary A"
value: "A"
- label: "Summary B"
value: "B"
- annotation_type: multiselect
name: a_advantages
description: "What makes A better? (if applicable)"
labels:
- More concise
- More accurate
- Better coverage
- Clearer language
- annotation_type: multiselect
name: b_advantages
description: "What makes B better? (if applicable)"
labels:
- More concise
- More accurate
- Better coverage
- Clearer languageMulti-Aspect Comparison
Evaluate multiple dimensions:
annotation_schemes:
- annotation_type: pairwise
name: helpfulness
description: "Which response is more helpful?"
options:
- label: "A"
value: "A"
- label: "Equal"
value: "tie"
- label: "B"
value: "B"
- annotation_type: pairwise
name: accuracy
description: "Which response is more accurate?"
options:
- label: "A"
value: "A"
- label: "Equal"
value: "tie"
- label: "B"
value: "B"
- annotation_type: pairwise
name: safety
description: "Which response is safer?"
options:
- label: "A"
value: "A"
- label: "Equal"
value: "tie"
- label: "B"
value: "B"Randomization
Prevent position bias by randomizing order:
randomize_pair_order: trueWhen enabled, A and B are randomly swapped, with the actual order tracked in output.
With Justification
Require explanations for choices:
annotation_schemes:
- annotation_type: pairwise
name: preference
description: "Which response is better?"
options:
- label: "A is better"
value: "A"
- label: "B is better"
value: "B"
- annotation_type: text
name: justification
description: "Explain your choice"
textarea: true
required: trueGraded Preference Scale
More granular preference options:
- annotation_type: pairwise
name: preference
scale_type: 7point
options:
- label: "A is much better"
value: -3
- label: "A is better"
value: -2
- label: "A is slightly better"
value: -1
- label: "Equal"
value: 0
- label: "B is slightly better"
value: 1
- label: "B is better"
value: 2
- label: "B is much better"
value: 3Output Format
{
"id": "pair_1",
"preference": "A",
"justification": "Response A is more concise while still being accurate.",
"display_order": ["response_a", "response_b"]
}With randomization, display_order indicates what the annotator saw.
Full Example: RLHF Data Collection
task_name: "AI Response Preference Collection"
data_files:
- path: data/model_outputs.json
item_a_field: model_a_response
item_b_field: model_b_response
context_field: user_query
display:
layout: side_by_side
show_context: true
context_title: "User Query"
item_a_title: "Response A"
item_b_title: "Response B"
randomize_pair_order: true
annotation_schemes:
- annotation_type: pairwise
name: overall
description: "Overall, which response is better?"
options:
- label: "A is significantly better"
value: "A++"
- label: "A is better"
value: "A+"
- label: "About the same"
value: "="
- label: "B is better"
value: "B+"
- label: "B is significantly better"
value: "B++"
keyboard_shortcuts:
"A++": "1"
"A+": "2"
"=": "3"
"B+": "4"
"B++": "5"
- annotation_type: multiselect
name: criteria
description: "What factors influenced your decision?"
labels:
- Accuracy
- Helpfulness
- Clarity
- Safety
- Completeness
- annotation_type: text
name: notes
description: "Additional notes (optional)"
textarea: true
required: falseBest Practices
- Use clear, distinct labels - Annotators should instantly understand options
- Consider tie options carefully - Sometimes forcing a choice is appropriate
- Enable randomization - Prevents position bias
- Add justification fields - Helps understand reasoning and improves data quality
- Use keyboard shortcuts - Speeds up annotation significantly
- Test with your data - Ensure display works well with your content length