Tutorials7 min read
Tâches de comparaison et de préférence d'images
Construisez des interfaces de comparaison d'images côte à côte pour le classement de préférences, les tests A/B et l'évaluation de la qualité.
Potato Team·
Tâches de comparaison et de préférence d'images
La comparaison d'images est essentielle pour entraîner des modèles génératifs, évaluer la qualité des images et comprendre les préférences humaines. Ce tutoriel couvre la comparaison par paires, le classement et les configurations de tests A/B.
Cas d'utilisation
- IA générative : RLHF pour les modèles de génération d'images
- Qualité d'image : Comparer la compression, l'amélioration ou la restauration
- Tests de design : Tests A/B de designs visuels
- Classement de recherche : Évaluer les résultats de recherche d'images
Comparaison par paires de base
yaml
annotation_task_name: "Image Preference"
data_files:
- data/pairs.json
item_properties:
id_key: pair_id
image_a_key: image_left
image_b_key: image_right
image:
enabled: true
layout: side_by_side
display_size: medium
enable_zoom: true
sync_zoom: true # Zoom both images together
annotation_schemes:
- annotation_type: radio
name: preference
description: "Which image do you prefer?"
labels:
- Left is much better
- Left is slightly better
- About the same
- Right is slightly better
- Right is much better
layout: horizontalFormat des données
json
{
"pair_id": "pair_001",
"image_left": "/images/model_a_output.png",
"image_right": "/images/model_b_output.png",
"prompt": "A sunset over mountains"
}Interface de comparaison améliorée
yaml
annotation_task_name: "AI Image Generation Evaluation"
data_files:
- data/generation_pairs.json
item_properties:
id_key: id
image_a_key: image_a
image_b_key: image_b
context_key: prompt
# Show the generation prompt
display:
show_context: true
context_label: "Generation Prompt"
context_field: prompt
image:
enabled: true
layout: side_by_side
gap: 20 # Pixels between images
labels:
left: "Image A"
right: "Image B"
# Interaction
enable_zoom: true
sync_zoom: true
enable_pan: true
sync_pan: true
# Display
max_height: 500
background: "#1F2937"
border_radius: 8
annotation_schemes:
# Overall preference
- annotation_type: radio
name: overall_preference
description: "Overall, which image is better?"
labels:
- name: A much better
keyboard_shortcut: "1"
- name: A slightly better
keyboard_shortcut: "2"
- name: Tie
keyboard_shortcut: "3"
- name: B slightly better
keyboard_shortcut: "4"
- name: B much better
keyboard_shortcut: "5"
required: true
# Specific criteria
- annotation_type: radio
name: prompt_adherence
description: "Which better matches the prompt?"
labels: [A, Tie, B]
- annotation_type: radio
name: visual_quality
description: "Which has better visual quality (no artifacts)?"
labels: [A, Tie, B]
- annotation_type: radio
name: aesthetic_appeal
description: "Which is more aesthetically pleasing?"
labels: [A, Tie, B]
- annotation_type: radio
name: realism
description: "Which looks more realistic?"
labels: [A, Tie, B, N/A (neither should be realistic)]
# Issues detection
- annotation_type: multiselect
name: issues_a
description: "Issues in Image A (select all)"
labels:
- Distorted faces/hands
- Text rendering issues
- Unnatural lighting
- Missing elements from prompt
- Extra unwanted elements
- Blurry or low quality
- Color issues
- None
- annotation_type: multiselect
name: issues_b
description: "Issues in Image B (select all)"
labels:
- Distorted faces/hands
- Text rendering issues
- Unnatural lighting
- Missing elements from prompt
- Extra unwanted elements
- Blurry or low quality
- Color issues
- NoneComparaison avant/après
Pour l'amélioration, la restauration ou l'édition d'images :
yaml
annotation_task_name: "Image Enhancement Evaluation"
data_files:
- data/enhancements.json
item_properties:
id_key: id
image_a_key: original
image_b_key: enhanced
image:
layout: side_by_side
labels:
left: "Original"
right: "Enhanced"
# Slider comparison
comparison_mode: slider # Drag slider to reveal
slider_position: 50 # Start at middle
annotation_schemes:
- annotation_type: radio
name: enhancement_quality
description: "How well was the image enhanced?"
labels:
- Significantly improved
- Slightly improved
- No noticeable change
- Made worse
- annotation_type: multiselect
name: improvements
description: "What was improved?"
labels:
- Sharpness/detail
- Color accuracy
- Noise reduction
- Dynamic range
- Artifact removal
- Nothing
- annotation_type: multiselect
name: problems_introduced
description: "Any problems introduced?"
labels:
- Over-sharpening/halos
- Color shift
- Loss of detail
- New artifacts
- Unnatural look
- NoneClassement de plusieurs images
Pour classer plus de 2 images :
yaml
annotation_task_name: "Image Ranking"
data_files:
- data/image_sets.json
item_properties:
id_key: id
image_list_key: images # Array of image paths
image:
layout: grid
columns: 3
enable_zoom: true
annotation_schemes:
- annotation_type: ranking
name: preference_rank
description: "Rank images from best (1) to worst"
source: images
allow_ties: false
- annotation_type: radio
name: best_for_use
description: "Which would you use for this purpose?"
dynamic_labels_from: imagesFormat des données :
json
{
"id": "set_001",
"prompt": "A cat sitting on a windowsill",
"images": [
"/images/set001_a.png",
"/images/set001_b.png",
"/images/set001_c.png",
"/images/set001_d.png"
]
}Échelle meilleur-pire (Best-Worst Scaling)
Classement efficace par choix répétés du meilleur et du pire :
yaml
annotation_schemes:
- annotation_type: best_worst
name: preference
description: "Select the BEST and WORST images"
source: images
best_label: "Best"
worst_label: "Worst"
neither_allowed: falseTests A/B pour le design
yaml
annotation_task_name: "Design A/B Test"
data_files:
- data/design_variants.json
item_properties:
id_key: id
image_a_key: variant_a
image_b_key: variant_b
context_key: design_context
display:
show_context: true
context_label: "Design Context"
image:
layout: side_by_side
labels:
left: "Design A"
right: "Design B"
randomize_order: true # Prevent position bias
annotation_schemes:
- annotation_type: radio
name: preference
description: "Which design do you prefer?"
labels: [A, No preference, B]
randomize_with_images: true # Labels follow image randomization
- annotation_type: likert
name: a_appeal
description: "Rate Design A's visual appeal"
size: 7
min_label: "Very unappealing"
max_label: "Very appealing"
- annotation_type: likert
name: b_appeal
description: "Rate Design B's visual appeal"
size: 7
min_label: "Very unappealing"
max_label: "Very appealing"
- annotation_type: text
name: reasoning
description: "Why did you choose this preference?"
textarea: true
required: falseConfiguration complète
yaml
annotation_task_name: "Generative Model Comparison - RLHF Data"
data_files:
- data/model_outputs.json
item_properties:
id_key: id
image_a_key: model_a_output
image_b_key: model_b_output
context_key: prompt
display:
show_context: true
context_label: "Generation Prompt"
context_style: "highlighted"
image:
enabled: true
layout: side_by_side
gap: 24
labels:
left: "Output A"
right: "Output B"
max_height: 512
enable_zoom: true
sync_zoom: true
enable_pan: true
sync_pan: true
background: "#111827"
border: "1px solid #374151"
border_radius: 8
# Prevent position bias
randomize_order: true
annotation_schemes:
- annotation_type: radio
name: overall
description: "Which image better represents the prompt?"
labels:
- name: A is clearly better
value: 2
keyboard_shortcut: "1"
- name: A is slightly better
value: 1
keyboard_shortcut: "2"
- name: About equal
value: 0
keyboard_shortcut: "3"
- name: B is slightly better
value: -1
keyboard_shortcut: "4"
- name: B is clearly better
value: -2
keyboard_shortcut: "5"
required: true
preserve_with_randomization: true # Values adjust for randomized order
- annotation_type: likert
name: confidence
description: "How confident are you?"
size: 5
min_label: "Guessing"
max_label: "Certain"
annotation_guidelines:
title: "Image Comparison Guidelines"
content: |
## Evaluation Criteria
Consider these factors:
1. **Prompt adherence**: Does it match what was asked?
2. **Visual quality**: Are there artifacts or distortions?
3. **Aesthetics**: Is it visually pleasing?
4. **Realism** (if applicable): Does it look natural?
## Tips
- Zoom in to check for details and artifacts
- Consider the prompt carefully
- Don't let one factor dominate unfairly
quality_control:
attention_checks:
frequency: 15
gold_pairs:
- image_a: "/gold/clearly_better.png"
image_b: "/gold/clearly_worse.png"
expected_preference: ["A is clearly better", "A is slightly better"]
output_annotation_dir: annotations/
output_annotation_format: jsonlFormat de sortie
json
{
"pair_id": "pair_001",
"prompt": "A sunset over mountains",
"image_a": "/images/model_a_output.png",
"image_b": "/images/model_b_output.png",
"display_order": ["B", "A"],
"annotations": {
"overall": 1,
"confidence": 4
},
"annotator": "rater_01",
"timestamp": "2024-12-25T14:30:00Z"
}Conseils pour les tâches de comparaison
- Randomisez l'ordre : Évitez le biais de position gauche/droite
- Contrôles synchronisés : Le zoom/panoramique lié aide à une comparaison équitable
- Critères clairs : Définissez ce que signifie "meilleur"
- Vérifications d'attention : Incluez des paires évidentes
- Limites de temps : Envisagez un temps par comparaison pour la cohérence
Prochaines étapes
- Mettez en place le crowdsourcing pour des données de préférence à grande échelle
- Découvrez l'analyse de classement
- Explorez la documentation sur la comparaison par paires
Documentation complète sur la comparaison sur /docs/annotation-types/pairwise-comparison.