Skip to content
Showcase/PRM800K Step-by-Step Verification
advancedpreference

PRM800K Step-by-Step Verification

Step-by-step verification of math reasoning chains based on OpenAI's PRM800K dataset. Annotators mark each reasoning step as Correct, Neutral, or Incorrect, stopping at the first error found, then judge the final answer correctness and explain any errors.

Submit

Configuration Fileconfig.yaml

# PRM800K Step-by-Step Verification
# Based on "Let's Verify Step by Step" (Lightman et al., ICLR 2024)

annotation_task_name: "PRM800K Step-by-Step Verification"
task_dir: "."
data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

html_layout: |
  <div class="container" style="max-width: 800px; margin: 0 auto; font-family: 'Segoe UI', Arial, sans-serif;">
    <div style="background: #e8f4fd; border: 1px solid #b3d9f2; border-radius: 8px; padding: 18px 22px; margin-bottom: 20px;">
      <h3 style="margin: 0 0 10px 0; color: #1a5276; font-size: 16px;">Math Problem</h3>
      <div style="font-size: 15px; color: #2c3e50; line-height: 1.6;">{{text}}</div>
    </div>
    <div style="margin-bottom: 20px;">
      <h3 style="margin: 0 0 12px 0; color: #2c3e50; font-size: 15px;">Reasoning Steps</h3>
      <div style="white-space: pre-wrap; font-size: 14px; line-height: 1.7;">{{steps}}</div>
    </div>
    <div style="background: #fef9e7; border: 2px solid #f9e79f; border-radius: 8px; padding: 14px 20px; margin-bottom: 10px;">
      <h3 style="margin: 0 0 6px 0; color: #7d6608; font-size: 15px;">Final Answer</h3>
      <div style="font-size: 16px; font-weight: 600; color: #1a5276;">{{final_answer}}</div>
    </div>
  </div>

annotation_schemes:
  - name: "step_correctness"
    annotation_type: radio
    description: "Rate each reasoning step. Stop and mark 'Incorrect' at the first error you find."
    labels:
      - "Correct — this step is logically valid"
      - "Neutral — this step doesn't clearly advance or harm"
      - "Incorrect — this step contains an error"
    keyboard_shortcuts:
      "Correct — this step is logically valid": "1"
      "Neutral — this step doesn't clearly advance or harm": "2"
      "Incorrect — this step contains an error": "3"

  - name: "solution_correctness"
    annotation_type: radio
    description: "Is the final answer correct?"
    labels:
      - "Correct — final answer is right"
      - "Incorrect — final answer is wrong"
    keyboard_shortcuts:
      "Correct — final answer is right": "4"
      "Incorrect — final answer is wrong": "5"

  - name: "error_explanation"
    annotation_type: text
    description: "If you found an error, explain what went wrong in that step."

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2

Sample Datasample-data.json

[
  {
    "id": "prm800k-001",
    "text": "Find the sum of all positive integers n such that n^2 + 12n - 2007 is a perfect square.",
    "steps": "Step 1: We need n^2 + 12n - 2007 = k^2 for some non-negative integer k.\nStep 2: Complete the square on the left: (n + 6)^2 - 36 - 2007 = k^2, so (n + 6)^2 - 2043 = k^2.\nStep 3: Rearrange to get (n + 6)^2 - k^2 = 2043, which factors as (n + 6 - k)(n + 6 + k) = 2043.\nStep 4: Factor 2043 = 3 × 681 = 3 × 3 × 227. So 2043 = 1 × 2043 = 3 × 681 = 9 × 227.\nStep 5: For each factorization 2043 = a × b with a ≤ b and a, b same parity, we get n + 6 - k = a and n + 6 + k = b, so n = (a + b)/2 - 6.\nStep 6: From (1, 2043): n = (1 + 2043)/2 - 6 = 1022 - 6 = 1016. From (3, 681): n = (3 + 681)/2 - 6 = 342 - 6 = 336. From (9, 227): n = (9 + 227)/2 - 6 = 118 - 6 = 112.",
    "final_answer": "The sum is 1016 + 336 + 112 = 1464."
  },
  {
    "id": "prm800k-002",
    "text": "A bag contains 4 red balls and 6 blue balls. Two balls are drawn without replacement. What is the probability that both balls are red?",
    "steps": "Step 1: The total number of balls is 4 + 6 = 10.\nStep 2: The probability of drawing a red ball first is 4/10 = 2/5.\nStep 3: After drawing one red ball, there are 3 red balls and 6 blue balls left, for 9 total.\nStep 4: The probability of drawing a second red ball is 3/9 = 1/3.\nStep 5: The probability of both events is (2/5) × (1/3) = 2/15.",
    "final_answer": "The probability is 2/15."
  }
]

// ... and 6 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/agentic/prm800k-step-verification
potato start config.yaml

Details

Annotation Types

radiotext

Domain

MathematicsReasoning

Use Cases

Process Reward ModelsRLHFMath Verification

Tags

process-reward-modelmath-reasoningstep-verificationPRMRLHF

Found an issue or want to improve this design?

Open an Issue