Tutorials7 min read
Tareas de Comparación y Preferencia de Imágenes
Construye interfaces de comparación de imágenes lado a lado para clasificación de preferencias, pruebas A/B y evaluación de calidad.
Potato Team·
Tareas de Comparación y Preferencia de Imágenes
La comparación de imágenes es esencial para entrenar modelos generativos, evaluar la calidad de imágenes y comprender las preferencias humanas. Este tutorial cubre la comparación por pares, clasificación y configuraciones de pruebas A/B.
Casos de Uso
- IA generativa: RLHF para modelos de generación de imágenes
- Calidad de imagen: Comparación de compresión, mejora o restauración
- Pruebas de diseño: Pruebas A/B de diseños visuales
- Clasificación de búsqueda: Evaluación de resultados de recuperación de imágenes
Comparación Básica por Pares
yaml
annotation_task_name: "Image Preference"
data_files:
- data/pairs.json
item_properties:
id_key: pair_id
image_a_key: image_left
image_b_key: image_right
image:
enabled: true
layout: side_by_side
display_size: medium
enable_zoom: true
sync_zoom: true # Zoom both images together
annotation_schemes:
- annotation_type: radio
name: preference
description: "Which image do you prefer?"
labels:
- Left is much better
- Left is slightly better
- About the same
- Right is slightly better
- Right is much better
layout: horizontalFormato de Datos
json
{
"pair_id": "pair_001",
"image_left": "/images/model_a_output.png",
"image_right": "/images/model_b_output.png",
"prompt": "A sunset over mountains"
}Interfaz de Comparación Mejorada
yaml
annotation_task_name: "AI Image Generation Evaluation"
data_files:
- data/generation_pairs.json
item_properties:
id_key: id
image_a_key: image_a
image_b_key: image_b
context_key: prompt
# Show the generation prompt
display:
show_context: true
context_label: "Generation Prompt"
context_field: prompt
image:
enabled: true
layout: side_by_side
gap: 20 # Pixels between images
labels:
left: "Image A"
right: "Image B"
# Interaction
enable_zoom: true
sync_zoom: true
enable_pan: true
sync_pan: true
# Display
max_height: 500
background: "#1F2937"
border_radius: 8
annotation_schemes:
# Overall preference
- annotation_type: radio
name: overall_preference
description: "Overall, which image is better?"
labels:
- name: A much better
keyboard_shortcut: "1"
- name: A slightly better
keyboard_shortcut: "2"
- name: Tie
keyboard_shortcut: "3"
- name: B slightly better
keyboard_shortcut: "4"
- name: B much better
keyboard_shortcut: "5"
required: true
# Specific criteria
- annotation_type: radio
name: prompt_adherence
description: "Which better matches the prompt?"
labels: [A, Tie, B]
- annotation_type: radio
name: visual_quality
description: "Which has better visual quality (no artifacts)?"
labels: [A, Tie, B]
- annotation_type: radio
name: aesthetic_appeal
description: "Which is more aesthetically pleasing?"
labels: [A, Tie, B]
- annotation_type: radio
name: realism
description: "Which looks more realistic?"
labels: [A, Tie, B, N/A (neither should be realistic)]
# Issues detection
- annotation_type: multiselect
name: issues_a
description: "Issues in Image A (select all)"
labels:
- Distorted faces/hands
- Text rendering issues
- Unnatural lighting
- Missing elements from prompt
- Extra unwanted elements
- Blurry or low quality
- Color issues
- None
- annotation_type: multiselect
name: issues_b
description: "Issues in Image B (select all)"
labels:
- Distorted faces/hands
- Text rendering issues
- Unnatural lighting
- Missing elements from prompt
- Extra unwanted elements
- Blurry or low quality
- Color issues
- NoneComparación Antes/Después
Para mejora, restauración o edición de imágenes:
yaml
annotation_task_name: "Image Enhancement Evaluation"
data_files:
- data/enhancements.json
item_properties:
id_key: id
image_a_key: original
image_b_key: enhanced
image:
layout: side_by_side
labels:
left: "Original"
right: "Enhanced"
# Slider comparison
comparison_mode: slider # Drag slider to reveal
slider_position: 50 # Start at middle
annotation_schemes:
- annotation_type: radio
name: enhancement_quality
description: "How well was the image enhanced?"
labels:
- Significantly improved
- Slightly improved
- No noticeable change
- Made worse
- annotation_type: multiselect
name: improvements
description: "What was improved?"
labels:
- Sharpness/detail
- Color accuracy
- Noise reduction
- Dynamic range
- Artifact removal
- Nothing
- annotation_type: multiselect
name: problems_introduced
description: "Any problems introduced?"
labels:
- Over-sharpening/halos
- Color shift
- Loss of detail
- New artifacts
- Unnatural look
- NoneClasificación de Múltiples Imágenes
Para clasificar más de 2 imágenes:
yaml
annotation_task_name: "Image Ranking"
data_files:
- data/image_sets.json
item_properties:
id_key: id
image_list_key: images # Array of image paths
image:
layout: grid
columns: 3
enable_zoom: true
annotation_schemes:
- annotation_type: ranking
name: preference_rank
description: "Rank images from best (1) to worst"
source: images
allow_ties: false
- annotation_type: radio
name: best_for_use
description: "Which would you use for this purpose?"
dynamic_labels_from: imagesFormato de datos:
json
{
"id": "set_001",
"prompt": "A cat sitting on a windowsill",
"images": [
"/images/set001_a.png",
"/images/set001_b.png",
"/images/set001_c.png",
"/images/set001_d.png"
]
}Escalamiento Mejor-Peor
Clasificación eficiente mediante elecciones repetidas de mejor-peor:
yaml
annotation_schemes:
- annotation_type: best_worst
name: preference
description: "Select the BEST and WORST images"
source: images
best_label: "Best"
worst_label: "Worst"
neither_allowed: falsePruebas A/B para Diseño
yaml
annotation_task_name: "Design A/B Test"
data_files:
- data/design_variants.json
item_properties:
id_key: id
image_a_key: variant_a
image_b_key: variant_b
context_key: design_context
display:
show_context: true
context_label: "Design Context"
image:
layout: side_by_side
labels:
left: "Design A"
right: "Design B"
randomize_order: true # Prevent position bias
annotation_schemes:
- annotation_type: radio
name: preference
description: "Which design do you prefer?"
labels: [A, No preference, B]
randomize_with_images: true # Labels follow image randomization
- annotation_type: likert
name: a_appeal
description: "Rate Design A's visual appeal"
size: 7
min_label: "Very unappealing"
max_label: "Very appealing"
- annotation_type: likert
name: b_appeal
description: "Rate Design B's visual appeal"
size: 7
min_label: "Very unappealing"
max_label: "Very appealing"
- annotation_type: text
name: reasoning
description: "Why did you choose this preference?"
textarea: true
required: falseConfiguración Completa
yaml
annotation_task_name: "Generative Model Comparison - RLHF Data"
data_files:
- data/model_outputs.json
item_properties:
id_key: id
image_a_key: model_a_output
image_b_key: model_b_output
context_key: prompt
display:
show_context: true
context_label: "Generation Prompt"
context_style: "highlighted"
image:
enabled: true
layout: side_by_side
gap: 24
labels:
left: "Output A"
right: "Output B"
max_height: 512
enable_zoom: true
sync_zoom: true
enable_pan: true
sync_pan: true
background: "#111827"
border: "1px solid #374151"
border_radius: 8
# Prevent position bias
randomize_order: true
annotation_schemes:
- annotation_type: radio
name: overall
description: "Which image better represents the prompt?"
labels:
- name: A is clearly better
value: 2
keyboard_shortcut: "1"
- name: A is slightly better
value: 1
keyboard_shortcut: "2"
- name: About equal
value: 0
keyboard_shortcut: "3"
- name: B is slightly better
value: -1
keyboard_shortcut: "4"
- name: B is clearly better
value: -2
keyboard_shortcut: "5"
required: true
preserve_with_randomization: true # Values adjust for randomized order
- annotation_type: likert
name: confidence
description: "How confident are you?"
size: 5
min_label: "Guessing"
max_label: "Certain"
annotation_guidelines:
title: "Image Comparison Guidelines"
content: |
## Evaluation Criteria
Consider these factors:
1. **Prompt adherence**: Does it match what was asked?
2. **Visual quality**: Are there artifacts or distortions?
3. **Aesthetics**: Is it visually pleasing?
4. **Realism** (if applicable): Does it look natural?
## Tips
- Zoom in to check for details and artifacts
- Consider the prompt carefully
- Don't let one factor dominate unfairly
quality_control:
attention_checks:
frequency: 15
gold_pairs:
- image_a: "/gold/clearly_better.png"
image_b: "/gold/clearly_worse.png"
expected_preference: ["A is clearly better", "A is slightly better"]
output_annotation_dir: annotations/
output_annotation_format: jsonlFormato de Salida
json
{
"pair_id": "pair_001",
"prompt": "A sunset over mountains",
"image_a": "/images/model_a_output.png",
"image_b": "/images/model_b_output.png",
"display_order": ["B", "A"], // B was shown on left
"annotations": {
"overall": 1, // A slightly better (adjusted for display order)
"confidence": 4
},
"annotator": "rater_01",
"timestamp": "2024-12-25T14:30:00Z"
}Consejos para Tareas de Comparación
- Aleatorizar el orden: Prevenir sesgo de posición izquierda/derecha
- Sincronizar controles: Zoom/pan vinculados ayudan a una comparación justa
- Criterios claros: Definir qué significa "mejor"
- Verificaciones de atención: Incluir pares obvios
- Límites de tiempo: Considerar el tiempo por comparación para consistencia
Próximos Pasos
- Configurar crowdsourcing para datos de preferencia a gran escala
- Aprender sobre análisis de clasificación
- Explorar la documentación de comparación por pares
Documentación completa de comparación en /docs/annotation-types/pairwise-comparison.