Skip to content
Showcase/CodePRM Code Process Reward
advancedpreference

CodePRM Code Process Reward

Process reward annotation for step-by-step code generation with execution feedback. Annotators verify each incremental code generation step against problem requirements, check syntactic validity, and provide feedback informed by test execution results.

Submit

Configuration Fileconfig.yaml

# CodePRM Code Process Reward
# Based on "CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation" (Li et al., ACL 2025 Findings)

annotation_task_name: "CodePRM Code Process Reward"
task_dir: "."
data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

html_layout: |
  <div class="container" style="max-width: 850px; margin: 0 auto; font-family: 'Segoe UI', Arial, sans-serif;">
    <div style="background: #f0f4ff; border: 1px solid #b0c4de; border-radius: 8px; padding: 18px 22px; margin-bottom: 18px;">
      <h3 style="margin: 0 0 10px 0; color: #2c3e6b; font-size: 16px;">Problem Description</h3>
      <div style="font-size: 14px; color: #2c3e50; line-height: 1.6;">{{text}}</div>
    </div>
    <div style="margin-bottom: 18px;">
      <h3 style="margin: 0 0 12px 0; color: #2c3e50; font-size: 15px;">Incremental Code Steps</h3>
      <div style="background: #fafafa; border: 1px solid #e0e0e0; border-radius: 6px; padding: 14px 18px; font-family: 'Courier New', monospace; font-size: 13px; white-space: pre-wrap; line-height: 1.6;">{{steps}}</div>
    </div>
    <div style="background: #1a1a2e; border-radius: 8px; padding: 16px 20px; margin-bottom: 10px;">
      <h3 style="margin: 0 0 8px 0; color: #00e676; font-size: 14px; font-family: 'Courier New', monospace;">$ python run_tests.py</h3>
      <div style="font-family: 'Courier New', monospace; font-size: 13px; color: #00e676; white-space: pre-wrap; line-height: 1.5;">{{test_output}}</div>
    </div>
  </div>

annotation_schemes:
  - name: "step_correctness"
    annotation_type: radio
    description: "Verify each code generation step against the problem requirements."
    labels:
      - "Correct — code step is valid"
      - "Neutral — step doesn't affect outcome"
      - "Incorrect — step introduces a bug"
    keyboard_shortcuts:
      "Correct — code step is valid": "1"
      "Neutral — step doesn't affect outcome": "2"
      "Incorrect — step introduces a bug": "3"

  - name: "code_compiles"
    annotation_type: radio
    description: "Is the code in this step syntactically valid?"
    labels:
      - "Yes — code is syntactically valid"
      - "No — syntax error present"
      - "N/A — not a code step"
    keyboard_shortcuts:
      "Yes — code is syntactically valid": "4"
      "No — syntax error present": "5"
      "N/A — not a code step": "6"

  - name: "step_feedback"
    annotation_type: text
    description: "Provide feedback on the current step."

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2

Sample Datasample-data.json

[
  {
    "id": "codeprm-001",
    "text": "Write a function `merge_sorted(a, b)` that merges two sorted lists into one sorted list without using built-in sort functions.",
    "steps": "Step 1: Define the function signature and initialize pointers.\n  def merge_sorted(a, b):\n      result = []\n      i, j = 0, 0\n\nStep 2: Add the main merge loop comparing elements from both lists.\n      while i < len(a) and j < len(b):\n          if a[i] <= b[j]:\n              result.append(a[i])\n              i += 1\n          else:\n              result.append(b[j])\n              j += 1\n\nStep 3: Append remaining elements from both lists.\n      result.extend(a[i:])\n      result.extend(b[j:])\n      return result",
    "test_output": "test_merge_empty_lists .............. PASS\ntest_merge_single_elements .......... PASS\ntest_merge_different_lengths ........ PASS\ntest_merge_duplicates ............... PASS\n\n4 passed, 0 failed"
  },
  {
    "id": "codeprm-002",
    "text": "Implement a function `is_balanced(s)` that checks whether a string of brackets '()[]{}' is balanced.",
    "steps": "Step 1: Define the function and the bracket mappings.\n  def is_balanced(s):\n      stack = []\n      mapping = {')': '(', ']': '[', '}': '{'}\n\nStep 2: Iterate through each character and push opening brackets.\n      for char in s:\n          if char in mapping.values():\n              stack.append(char)\n\nStep 3: For closing brackets, check the stack.\n          elif char in mapping:\n              if not stack or stack.pop() != mapping[char]:\n                  return False\n\nStep 4: Return whether the stack is empty.\n      return len(stack) == 0",
    "test_output": "test_empty_string ................... PASS\ntest_simple_balanced ................ PASS\ntest_nested_balanced ................ PASS\ntest_unbalanced_open ................ PASS\ntest_unbalanced_close ............... PASS\ntest_mixed_balanced ................. PASS\n\n6 passed, 0 failed"
  }
]

// ... and 6 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/agentic/codeprm-code-process-reward
potato start config.yaml

Details

Annotation Types

radiotext

Domain

Code GenerationProgramming

Use Cases

Process Reward ModelsCode VerificationRLHF

Tags

process-reward-modelcode-generationexecution-feedbackprogramming

Found an issue or want to improve this design?

Open an Issue