Skip to content
Showcase/CodePRM Code Process Reward
advancedpreference

CodePRM Code Process Reward

Process reward annotation for step-by-step code generation with execution feedback. Annotators verify each incremental code generation step against problem requirements, check syntactic validity, and provide feedback informed by test execution results.

Submit

Configuration Fileconfig.yaml

This Potato config reproduces the annotation task. Save it as config.yaml and run potato start config.yaml to try it.

yaml
# CodePRM Code Process Reward
# Based on "CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation" (Li et al., ACL 2025 Findings)

annotation_task_name: "CodePRM Code Process Reward"
task_dir: "."
data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

html_layout: |
  <div class="container" style="max-width: 850px; margin: 0 auto; font-family: 'Segoe UI', Arial, sans-serif;">
    <div style="background: #f0f4ff; border: 1px solid #b0c4de; border-radius: 8px; padding: 18px 22px; margin-bottom: 18px;">
      <h3 style="margin: 0 0 10px 0; color: #2c3e6b; font-size: 16px;">Problem Description</h3>
      <div style="font-size: 14px; color: #2c3e50; line-height: 1.6;">{{text}}</div>
    </div>
    <div style="margin-bottom: 18px;">
      <h3 style="margin: 0 0 12px 0; color: #2c3e50; font-size: 15px;">Incremental Code Steps</h3>
      <div style="background: #fafafa; border: 1px solid #e0e0e0; border-radius: 6px; padding: 14px 18px; font-family: 'Courier New', monospace; font-size: 13px; white-space: pre-wrap; line-height: 1.6;">{{steps}}</div>
    </div>
    <div style="background: #1a1a2e; border-radius: 8px; padding: 16px 20px; margin-bottom: 10px;">
      <h3 style="margin: 0 0 8px 0; color: #00e676; font-size: 14px; font-family: 'Courier New', monospace;">$ python run_tests.py</h3>
      <div style="font-family: 'Courier New', monospace; font-size: 13px; color: #00e676; white-space: pre-wrap; line-height: 1.5;">{{test_output}}</div>
    </div>
  </div>

annotation_schemes:
  - name: "step_correctness"
    annotation_type: radio
    description: "Verify each code generation step against the problem requirements."
    labels:
      - "Correct — code step is valid"
      - "Neutral — step doesn't affect outcome"
      - "Incorrect — step introduces a bug"
    keyboard_shortcuts:
      "Correct — code step is valid": "1"
      "Neutral — step doesn't affect outcome": "2"
      "Incorrect — step introduces a bug": "3"

  - name: "code_compiles"
    annotation_type: radio
    description: "Is the code in this step syntactically valid?"
    labels:
      - "Yes — code is syntactically valid"
      - "No — syntax error present"
      - "N/A — not a code step"
    keyboard_shortcuts:
      "Yes — code is syntactically valid": "4"
      "No — syntax error present": "5"
      "N/A — not a code step": "6"

  - name: "step_feedback"
    annotation_type: text
    description: "Provide feedback on the current step."

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2

Sample Datasample-data.json

json
[
  {
    "id": "codeprm-001",
    "text": "Write a function `merge_sorted(a, b)` that merges two sorted lists into one sorted list without using built-in sort functions.",
    "steps": "Step 1: Define the function signature and initialize pointers.\n  def merge_sorted(a, b):\n      result = []\n      i, j = 0, 0\n\nStep 2: Add the main merge loop comparing elements from both lists.\n      while i < len(a) and j < len(b):\n          if a[i] <= b[j]:\n              result.append(a[i])\n              i += 1\n          else:\n              result.append(b[j])\n              j += 1\n\nStep 3: Append remaining elements from both lists.\n      result.extend(a[i:])\n      result.extend(b[j:])\n      return result",
    "test_output": "test_merge_empty_lists .............. PASS\ntest_merge_single_elements .......... PASS\ntest_merge_different_lengths ........ PASS\ntest_merge_duplicates ............... PASS\n\n4 passed, 0 failed"
  },
  {
    "id": "codeprm-002",
    "text": "Implement a function `is_balanced(s)` that checks whether a string of brackets '()[]{}' is balanced.",
    "steps": "Step 1: Define the function and the bracket mappings.\n  def is_balanced(s):\n      stack = []\n      mapping = {')': '(', ']': '[', '}': '{'}\n\nStep 2: Iterate through each character and push opening brackets.\n      for char in s:\n          if char in mapping.values():\n              stack.append(char)\n\nStep 3: For closing brackets, check the stack.\n          elif char in mapping:\n              if not stack or stack.pop() != mapping[char]:\n                  return False\n\nStep 4: Return whether the stack is empty.\n      return len(stack) == 0",
    "test_output": "test_empty_string ................... PASS\ntest_simple_balanced ................ PASS\ntest_nested_balanced ................ PASS\ntest_unbalanced_open ................ PASS\ntest_unbalanced_close ............... PASS\ntest_mixed_balanced ................. PASS\n\n6 passed, 0 failed"
  }
]

// ... and 6 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/agentic/codeprm-code-process-reward
potato start config.yaml

Dataset & paper

Li et al., ACL 2025 Findings

Citation (BibTeX)

bibtex
@inproceedings{li2025codeprm, title={CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation}, author={Li, Qingyao and Dai, Xinyi and Li, Xiangyang and Zhang, Weinan and Wang, Yasheng and Tang, Ruiming and Yu, Yong}, booktitle={Findings of the Association for Computational Linguistics: ACL 2025}, year={2025}}

Details

Annotation Types

radiotext

Domain

Code GenerationProgramming

Use Cases

Process Reward ModelsCode VerificationRLHF

Tags

process-reward-modelcode-generationexecution-feedbackprogramming

Found an issue or want to improve this design?

Open an Issue