La comparaison d'images est essentielle pour entraîner des modèles génératifs, évaluer la qualité des images et comprendre les préférences humaines. Ce tutoriel couvre la comparaison par paires, le classement et les configurations de tests A/B.

Cas d'utilisation

IA générative : RLHF pour les modèles de génération d'images
Qualité d'image : Comparer la compression, l'amélioration ou la restauration
Tests de design : Tests A/B de designs visuels
Classement de recherche : Évaluer les résultats de recherche d'images

Comparaison par paires de base

yaml

annotation_task_name: "Image Preference"
 
data_files:
  - data/pairs.json
 
item_properties:
  id_key: pair_id
  image_a_key: image_left
  image_b_key: image_right
 
image:
  enabled: true
  layout: side_by_side
  display_size: medium
  enable_zoom: true
  sync_zoom: true  # Zoom both images together
 
annotation_schemes:
  - annotation_type: radio
    name: preference
    description: "Which image do you prefer?"
    labels:
      - Left is much better
      - Left is slightly better
      - About the same
      - Right is slightly better
      - Right is much better
    layout: horizontal

Format des données

json

{
  "pair_id": "pair_001",
  "image_left": "/images/model_a_output.png",
  "image_right": "/images/model_b_output.png",
  "prompt": "A sunset over mountains"
}

Interface de comparaison améliorée

yaml

annotation_task_name: "AI Image Generation Evaluation"
 
data_files:
  - data/generation_pairs.json
 
item_properties:
  id_key: id
  image_a_key: image_a
  image_b_key: image_b
  context_key: prompt
 
# Show the generation prompt
display:
  show_context: true
  context_label: "Generation Prompt"
  context_field: prompt
 
image:
  enabled: true
  layout: side_by_side
  gap: 20  # Pixels between images
  labels:
    left: "Image A"
    right: "Image B"
 
  # Interaction
  enable_zoom: true
  sync_zoom: true
  enable_pan: true
  sync_pan: true
 
  # Display
  max_height: 500
  background: "#1F2937"
  border_radius: 8
 
annotation_schemes:
  # Overall preference
  - annotation_type: radio
    name: overall_preference
    description: "Overall, which image is better?"
    labels:
      - name: A much better
        keyboard_shortcut: "1"
      - name: A slightly better
        keyboard_shortcut: "2"
      - name: Tie
        keyboard_shortcut: "3"
      - name: B slightly better
        keyboard_shortcut: "4"
      - name: B much better
        keyboard_shortcut: "5"
    required: true
 
  # Specific criteria
  - annotation_type: radio
    name: prompt_adherence
    description: "Which better matches the prompt?"
    labels: [A, Tie, B]
 
  - annotation_type: radio
    name: visual_quality
    description: "Which has better visual quality (no artifacts)?"
    labels: [A, Tie, B]
 
  - annotation_type: radio
    name: aesthetic_appeal
    description: "Which is more aesthetically pleasing?"
    labels: [A, Tie, B]
 
  - annotation_type: radio
    name: realism
    description: "Which looks more realistic?"
    labels: [A, Tie, B, N/A (neither should be realistic)]
 
  # Issues detection
  - annotation_type: multiselect
    name: issues_a
    description: "Issues in Image A (select all)"
    labels:
      - Distorted faces/hands
      - Text rendering issues
      - Unnatural lighting
      - Missing elements from prompt
      - Extra unwanted elements
      - Blurry or low quality
      - Color issues
      - None
 
  - annotation_type: multiselect
    name: issues_b
    description: "Issues in Image B (select all)"
    labels:
      - Distorted faces/hands
      - Text rendering issues
      - Unnatural lighting
      - Missing elements from prompt
      - Extra unwanted elements
      - Blurry or low quality
      - Color issues
      - None

Comparaison avant/après

Pour l'amélioration, la restauration ou l'édition d'images :

yaml

annotation_task_name: "Image Enhancement Evaluation"
 
data_files:
  - data/enhancements.json
 
item_properties:
  id_key: id
  image_a_key: original
  image_b_key: enhanced
 
image:
  layout: side_by_side
  labels:
    left: "Original"
    right: "Enhanced"
 
  # Slider comparison
  comparison_mode: slider  # Drag slider to reveal
  slider_position: 50  # Start at middle
 
annotation_schemes:
  - annotation_type: radio
    name: enhancement_quality
    description: "How well was the image enhanced?"
    labels:
      - Significantly improved
      - Slightly improved
      - No noticeable change
      - Made worse
 
  - annotation_type: multiselect
    name: improvements
    description: "What was improved?"
    labels:
      - Sharpness/detail
      - Color accuracy
      - Noise reduction
      - Dynamic range
      - Artifact removal
      - Nothing
 
  - annotation_type: multiselect
    name: problems_introduced
    description: "Any problems introduced?"
    labels:
      - Over-sharpening/halos
      - Color shift
      - Loss of detail
      - New artifacts
      - Unnatural look
      - None

Classement de plusieurs images

Pour classer plus de 2 images :

yaml

annotation_task_name: "Image Ranking"
 
data_files:
  - data/image_sets.json
 
item_properties:
  id_key: id
  image_list_key: images  # Array of image paths
 
image:
  layout: grid
  columns: 3
  enable_zoom: true
 
annotation_schemes:
  - annotation_type: ranking
    name: preference_rank
    description: "Rank images from best (1) to worst"
    source: images
    allow_ties: false
 
  - annotation_type: radio
    name: best_for_use
    description: "Which would you use for this purpose?"
    dynamic_labels_from: images

Format des données :

json

{
  "id": "set_001",
  "prompt": "A cat sitting on a windowsill",
  "images": [
    "/images/set001_a.png",
    "/images/set001_b.png",
    "/images/set001_c.png",
    "/images/set001_d.png"
  ]
}

Échelle meilleur-pire (Best-Worst Scaling)

Classement efficace par choix répétés du meilleur et du pire :

yaml

annotation_schemes:
  - annotation_type: best_worst
    name: preference
    description: "Select the BEST and WORST images"
    source: images
    best_label: "Best"
    worst_label: "Worst"
    neither_allowed: false

Tests A/B pour le design

yaml

annotation_task_name: "Design A/B Test"
 
data_files:
  - data/design_variants.json
 
item_properties:
  id_key: id
  image_a_key: variant_a
  image_b_key: variant_b
  context_key: design_context
 
display:
  show_context: true
  context_label: "Design Context"
 
image:
  layout: side_by_side
  labels:
    left: "Design A"
    right: "Design B"
  randomize_order: true  # Prevent position bias
 
annotation_schemes:
  - annotation_type: radio
    name: preference
    description: "Which design do you prefer?"
    labels: [A, No preference, B]
    randomize_with_images: true  # Labels follow image randomization
 
  - annotation_type: likert
    name: a_appeal
    description: "Rate Design A's visual appeal"
    size: 7
    min_label: "Very unappealing"
    max_label: "Very appealing"
 
  - annotation_type: likert
    name: b_appeal
    description: "Rate Design B's visual appeal"
    size: 7
    min_label: "Very unappealing"
    max_label: "Very appealing"
 
  - annotation_type: text
    name: reasoning
    description: "Why did you choose this preference?"
    multiline: true
    required: false

Configuration complète

yaml

annotation_task_name: "Generative Model Comparison - RLHF Data"
 
data_files:
  - data/model_outputs.json
 
item_properties:
  id_key: id
  image_a_key: model_a_output
  image_b_key: model_b_output
  context_key: prompt
 
display:
  show_context: true
  context_label: "Generation Prompt"
  context_style: "highlighted"
 
image:
  enabled: true
  layout: side_by_side
  gap: 24
  labels:
    left: "Output A"
    right: "Output B"
 
  max_height: 512
  enable_zoom: true
  sync_zoom: true
  enable_pan: true
  sync_pan: true
 
  background: "#111827"
  border: "1px solid #374151"
  border_radius: 8
 
  # Prevent position bias
  randomize_order: true
 
annotation_schemes:
  - annotation_type: radio
    name: overall
    description: "Which image better represents the prompt?"
    labels:
      - name: A is clearly better
        value: 2
        keyboard_shortcut: "1"
      - name: A is slightly better
        value: 1
        keyboard_shortcut: "2"
      - name: About equal
        value: 0
        keyboard_shortcut: "3"
      - name: B is slightly better
        value: -1
        keyboard_shortcut: "4"
      - name: B is clearly better
        value: -2
        keyboard_shortcut: "5"
    required: true
    preserve_with_randomization: true  # Values adjust for randomized order
 
  - annotation_type: likert
    name: confidence
    description: "How confident are you?"
    size: 5
    min_label: "Guessing"
    max_label: "Certain"
 
annotation_guidelines:
  title: "Image Comparison Guidelines"
  content: |
    ## Evaluation Criteria
    Consider these factors:
    1. **Prompt adherence**: Does it match what was asked?
    2. **Visual quality**: Are there artifacts or distortions?
    3. **Aesthetics**: Is it visually pleasing?
    4. **Realism** (if applicable): Does it look natural?
 
    ## Tips
    - Zoom in to check for details and artifacts
    - Consider the prompt carefully
    - Don't let one factor dominate unfairly
 
quality_control:
  attention_checks:
    frequency: 15
    gold_pairs:
      - image_a: "/gold/clearly_better.png"
        image_b: "/gold/clearly_worse.png"
        expected_preference: ["A is clearly better", "A is slightly better"]
 
output_annotation_dir: annotations/
export_annotation_format: jsonl

Format de sortie

json

{
  "pair_id": "pair_001",
  "prompt": "A sunset over mountains",
  "image_a": "/images/model_a_output.png",
  "image_b": "/images/model_b_output.png",
  "display_order": ["B", "A"],
  "annotations": {
    "overall": 1,
    "confidence": 4
  },
  "annotator": "rater_01",
  "timestamp": "2024-12-25T14:30:00Z"
}

Conseils pour les tâches de comparaison

Randomisez l'ordre : Évitez le biais de position gauche/droite
Contrôles synchronisés : Le zoom/panoramique lié aide à une comparaison équitable
Critères clairs : Définissez ce que signifie "meilleur"
Vérifications d'attention : Incluez des paires évidentes
Limites de temps : Envisagez un temps par comparaison pour la cohérence

Prochaines étapes

Mettez en place le crowdsourcing pour des données de préférence à grande échelle
Découvrez l'analyse de classement
Explorez la documentation sur la comparaison par paires

Documentation complète sur la comparaison sur /docs/annotation-types/pairwise-comparison.