LLM Response Preference

Compare AI-generated responses to collect preference data for RLHF training.

Configuration Fileconfig.yaml

annotation_task_name: "LLM Response Preference Collection"

port: 8000

# Data configuration
data_files:
  - "data/response_pairs.json"

# Display settings
display:
  layout: side_by_side
  show_context: true
  context_title: "User Prompt"
  item_a_title: "Response A"
  item_b_title: "Response B"

# Randomize to prevent position bias
randomize_pair_order: true

# Annotation schemes
annotation_schemes:
  # Main preference rating
  - annotation_type: pairwise
    name: overall_preference
    description: "Overall, which response is better?"
    options:
      - label: "A is much better"
        value: "A++"
      - label: "A is slightly better"
        value: "A+"
      - label: "About equal"
        value: "="
      - label: "B is slightly better"
        value: "B+"
      - label: "B is much better"
        value: "B++"
    sequential_key_binding: true

  # Individual aspect ratings
  - annotation_type: pairwise
    name: helpfulness
    description: "Which response is more helpful?"
    options:
      - label: "A"
        value: "A"
      - label: "Equal"
        value: "="
      - label: "B"
        value: "B"

  - annotation_type: pairwise
    name: accuracy
    description: "Which response is more accurate?"
    options:
      - label: "A"
        value: "A"
      - label: "Equal"
        value: "="
      - label: "B"
        value: "B"

  - annotation_type: pairwise
    name: safety
    description: "Which response is safer/less harmful?"
    options:
      - label: "A"
        value: "A"
      - label: "Equal"
        value: "="
      - label: "B"
        value: "B"

  # Reasons for preference
  - annotation_type: multiselect
    name: preference_reasons
    description: "What factors influenced your choice? (Select all that apply)"
    labels:
      - More accurate information
      - Better explained
      - More concise
      - More thorough
      - Better formatting
      - More appropriate tone
      - Safer/less harmful

  # Free-text justification
  - annotation_type: text
    name: justification
    description: "Briefly explain your preference"
    textarea: true
    required: false
    placeholder: "Why did you prefer one response over the other?"

# User settings
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 3

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

Get This Design

This design is available in our showcase. Copy the configuration below to get started.

Quick start:

# Create your project folder
mkdir pairwise-preference
cd pairwise-preference
# Copy config.yaml from above
potato start config.yaml

Details

Annotation Types

pairwisemultiselecttext

Domain

NLPAI/ML

Use Cases

RLHFpreference learningmodel evaluation

Related Designs

DPO Preference Data Collection

Pairwise preference annotation for Direct Preference Optimization, based on Rafailov et al., NeurIPS 2023. Annotators compare two model responses to a prompt, select a preference, rate alignment dimensions, and provide reasoning.

pairwiseradio

Pairwise Preference with Rationale

Compare two AI responses and select the better one while providing a written justification. Used for reward model training with interpretable preference signals.

multiselectradio

FLUTE: Figurative Language Understanding through Textual Explanations

Figurative language understanding via NLI. Annotators classify figurative sentences (sarcasm, simile, metaphor, idiom) and provide textual explanations of the figurative meaning. The task combines natural language inference with fine-grained figurative language type classification.

radiotext