Skip to content
Showcase/InstructGPT Instruction Following
intermediatepreference

InstructGPT Instruction Following

Evaluate how well AI responses follow user instructions. Compare outputs on helpfulness, truthfulness, and harmlessness for RLHF training.

Rate quality:12345PoorExcellentSubmit

Configuration Fileconfig.yaml

# InstructGPT Instruction Following Configuration
# Based on Ouyang et al., NeurIPS 2022
# Task: Evaluate instruction-following quality for RLHF

annotation_task_name: "InstructGPT Instruction Following"
task_dir: "."

data_files:
  - data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - name: "overall_preference"
    description: "Which response is OVERALL BETTER?"
    annotation_type: radio
    labels:
      - "A is significantly better"
      - "A is slightly better"
      - "About the same"
      - "B is slightly better"
      - "B is significantly better"

  - name: "helpfulness"
    description: "Which response is more HELPFUL for the user's goal?"
    annotation_type: radio
    labels:
      - "A is more helpful"
      - "Both equally helpful"
      - "B is more helpful"
      - "Neither is helpful"

  - name: "truthfulness"
    description: "Which response is more TRUTHFUL and accurate?"
    annotation_type: radio
    labels:
      - "A is more truthful"
      - "Both equally truthful"
      - "B is more truthful"
      - "Cannot assess truthfulness"

  - name: "harmlessness"
    description: "Which response is more HARMLESS (less problematic)?"
    annotation_type: radio
    labels:
      - "A is more harmless"
      - "Both equally harmless"
      - "B is more harmless"
      - "Both are problematic"

  - name: "instruction_following"
    description: "Which response better FOLLOWS THE INSTRUCTION?"
    annotation_type: radio
    labels:
      - "A follows better"
      - "Both follow equally well"
      - "B follows better"
      - "Neither follows the instruction"

  - name: "response_a_rating"
    description: "Rate Response A on a 1-7 scale:"
    annotation_type: likert
    min_label: "1 - Very poor"
    max_label: "7 - Excellent"
    size: 7

  - name: "response_b_rating"
    description: "Rate Response B on a 1-7 scale:"
    annotation_type: likert
    min_label: "1 - Very poor"
    max_label: "7 - Excellent"
    size: 7

allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3

annotation_instructions: |
  ## InstructGPT Instruction Following Evaluation

  Compare two AI responses and evaluate how well they follow instructions.

  ### The Three H's:

  **Helpful**: Does the response help the user achieve their goal?
  - Provides relevant information
  - Addresses the actual request
  - Appropriate level of detail

  **Honest/Truthful**: Is the information accurate?
  - Facts are correct
  - Uncertainty is acknowledged
  - No misleading claims

  **Harmless**: Does the response avoid harm?
  - No dangerous advice
  - Respectful and appropriate
  - Considers potential misuse

  ### Instruction Following:
  - Did it do what was asked?
  - Did it follow format requirements?
  - Did it stay on topic?

  ### Rating Scale (1-7):
  1. Very poor - Completely fails
  2. Poor - Major issues
  3. Below average - Notable problems
  4. Average - Acceptable
  5. Above average - Good
  6. Good - Minor issues only
  7. Excellent - Outstanding

  ### Guidelines:
  - Read the instruction carefully
  - Consider all three H's
  - Small differences = "about the same"
  - Rate each response independently too

Sample Datasample-data.json

[
  {
    "id": "instruct_001",
    "instruction": "Explain quantum entanglement in simple terms that a high school student could understand.",
    "response_a": "Quantum entanglement is when two particles become connected in a special way. When you measure one particle, you instantly know something about the other, no matter how far apart they are. It's like having two magic coins that always land on opposite sides - if one shows heads, the other must be tails, even if they're on different continents.",
    "response_b": "Quantum entanglement occurs when particles become correlated such that the quantum state of one particle cannot be described independently of the others. This phenomenon, which Einstein called 'spooky action at a distance,' involves non-local correlations that persist regardless of spatial separation."
  },
  {
    "id": "instruct_002",
    "instruction": "Write a haiku about programming.",
    "response_a": "Bugs hide in the code\nDebugging late into night\nSyntax error found",
    "response_b": "Programming is fun and challenging. You write code to make computers do things. Sometimes there are bugs that need to be fixed. It requires patience and logical thinking."
  }
]

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/preference-learning/instructgpt-preference
potato start config.yaml

Details

Annotation Types

likertradio

Domain

Natural Language ProcessingAI Alignment

Use Cases

RLHFInstruction FollowingPreference Learning

Tags

preferenceinstructionhelpfulnessrlhfalignmentgpt

Found an issue or want to improve this design?

Open an Issue