Process Reward Annotation
Collect per-step reward signals for training process reward models with first-error and per-step annotation modes. Export directly to PRM, DPO, and SWE-bench training formats.
Process Reward Annotation
New in v2.4.0
Process Reward Models (PRMs) require per-step correctness labels rather than a single outcome-level score. Training an effective PRM means collecting annotations that identify exactly where in a multi-step trace the agent went wrong, what kind of error occurred, and whether recovery was possible. This is fundamentally different from outcome-based annotation where you only judge the final result.
Potato provides two annotation modes optimized for different speed/detail tradeoffs when collecting process reward data. First-error mode is designed for fast binary labeling: the annotator clicks the first incorrect step, and all subsequent steps are automatically marked as tainted. Per-step mode asks the annotator to independently rate every step, producing richer signals at the cost of more annotation time.
Both modes integrate with the coding trace display, agent trace display, and web agent display, so you can collect process rewards for any type of agent trace.
First-Error Mode
In first-error mode, the annotator reads through the trace sequentially and clicks on the first step where the agent made an error. All steps before the clicked step are automatically labeled as correct. The clicked step and all subsequent steps are labeled as incorrect (the clicked step as the "first error" and the rest as "downstream of error").
This produces the exact label format needed for training binary PRMs: a sequence of +1 labels followed by a -1 at the error point and -1 for all remaining steps.
Configuration
annotation_schemes:
- name: process_reward
annotation_type: process_reward
mode: first_error
description: "Click the first step where the agent made a mistake"
first_error:
# Visual styling
correct_color: "#22c55e" # green for steps before the error
error_color: "#ef4444" # red for the first error step
downstream_color: "#f97316" # orange for steps after the error
unmarked_color: "#6b7280" # gray for steps not yet reviewed
# Behavior
require_confirmation: true # ask "Are you sure?" before marking
allow_no_error: true # allow annotator to mark all steps correct
show_step_content: true # show step content in the annotation panel
# Labels applied automatically
labels:
correct: "+1"
first_error: "-1 (first error)"
downstream: "-1 (downstream)"
all_correct: "+1 (all correct)"Annotation Workflow
- The annotator reads the trace from top to bottom
- Steps are initially unmarked (gray)
- The annotator clicks the first incorrect step
- Steps 0 through N-1 turn green (correct)
- Step N turns red (first error)
- Steps N+1 through the end turn orange (downstream)
- If the entire trace is correct, the annotator clicks "All Steps Correct"
Output Format
{
"id": "trace_042",
"annotations": {
"process_reward": {
"mode": "first_error",
"first_error_step": 4,
"total_steps": 8,
"labels": [1, 1, 1, 1, -1, -1, -1, -1]
}
}
}When the annotator marks all steps correct, first_error_step is null and the labels array contains all 1 values.
Per-Step Mode
In per-step mode, the annotator independently rates every step in the trace. This produces richer signals -- a step can be "partially correct" or "unnecessary" rather than just correct/incorrect. It also captures cases where the agent recovers from an error, which first-error mode cannot represent.
Configuration
annotation_schemes:
- name: process_reward
annotation_type: process_reward
mode: per_step
description: "Rate each step independently"
per_step:
# Rating options
labels:
- value: "correct"
display: "Correct"
color: "#22c55e"
score: 1.0
- value: "partially_correct"
display: "Partially Correct"
color: "#eab308"
score: 0.5
- value: "incorrect"
display: "Incorrect"
color: "#ef4444"
score: -1.0
- value: "unnecessary"
display: "Unnecessary"
color: "#f97316"
score: -0.5
- value: "recovery"
display: "Recovery from Error"
color: "#3b82f6"
score: 0.25
# Optional error categorization for incorrect/partially correct steps
error_categories:
enabled: true
categories:
- "Wrong tool selected"
- "Correct tool, wrong arguments"
- "Hallucinated information"
- "Repeated previous step"
- "Logic error"
- "Syntax error"
- "Missed edge case"
- "Unnecessary step"
- "Other"
# Behavior
require_all_steps: true # all steps must be rated before submission
allow_notes: true # optional text field per step
show_running_score: true # show cumulative reward scoreAnnotation Workflow
- Each step in the trace has a rating widget beside it
- The annotator selects a label for each step
- If the step is rated "Incorrect" or "Partially Correct" and error categories are enabled, a dropdown appears to select the error type
- An optional notes field allows free-text explanation
- A running score at the top shows the cumulative reward
Output Format
{
"id": "trace_042",
"annotations": {
"process_reward": {
"mode": "per_step",
"total_steps": 6,
"labels": [1.0, 1.0, -1.0, 0.25, 1.0, 1.0],
"step_details": {
"0": {"label": "correct"},
"1": {"label": "correct"},
"2": {
"label": "incorrect",
"error_category": "Wrong tool selected",
"notes": "Agent used grep when it should have read the file directly"
},
"3": {
"label": "recovery",
"notes": "Agent recognized the mistake and tried a different approach"
},
"4": {"label": "correct"},
"5": {"label": "correct"}
},
"cumulative_score": 2.75
}
}
}Configuration Reference
Full configuration options for the process reward annotation scheme:
annotation_schemes:
- name: process_reward
annotation_type: process_reward
mode: first_error # "first_error" or "per_step"
description: "Process reward annotation"
# Required for mode: first_error
first_error:
correct_color: "#22c55e"
error_color: "#ef4444"
downstream_color: "#f97316"
unmarked_color: "#6b7280"
require_confirmation: true
allow_no_error: true
show_step_content: true
# Required for mode: per_step
per_step:
labels:
- value: "correct"
display: "Correct"
color: "#22c55e"
score: 1.0
- value: "incorrect"
display: "Incorrect"
color: "#ef4444"
score: -1.0
error_categories:
enabled: false
categories: []
require_all_steps: true
allow_notes: false
show_running_score: false
# Common options
target: agentic_steps # bind to trace steps
keyboard_shortcuts:
enabled: true
correct: "1"
incorrect: "2"
partially_correct: "3"
unnecessary: "4"
next_step: "j"
prev_step: "k"Export to Training Formats
Potato can export process reward annotations directly to formats used by common PRM training pipelines.
PRM Training Format
Export binary step-level labels for PRM training:
python -m potato.export \
-i output/ \
-f prm \
-o results/prm_training_data.jsonlOutput format:
{
"trace_id": "trace_042",
"steps": [
{"content": "Search for Tokyo population", "label": 1},
{"content": "Parse search results", "label": 1},
{"content": "Search for NYC population", "label": -1},
{"content": "Compare populations", "label": -1}
]
}DPO / RLHF Preference Pairs
When you have multiple annotated traces for the same task, export pairwise preferences for DPO or RLHF training. The exporter pairs traces where one has a higher cumulative reward than the other:
python -m potato.export \
-i output/ \
-f dpo \
-o results/dpo_pairs.jsonl \
--min-score-gap 0.5Output format:
{
"prompt": "Fix the failing test in test_parser.py",
"chosen": [
{"role": "assistant", "content": "Step 1: Read the test file..."},
{"role": "assistant", "content": "Step 2: Identify the bug..."}
],
"rejected": [
{"role": "assistant", "content": "Step 1: Run all tests..."},
{"role": "assistant", "content": "Step 2: Edit a random file..."}
]
}SWE-bench Compatible Results
Export evaluation results in a format compatible with SWE-bench leaderboard submissions:
python -m potato.export \
-i output/ \
-f swebench \
-o results/swebench_results.jsonThis generates the standard SWE-bench evaluation JSON with instance IDs, model patches, and resolution status derived from annotator judgments.
Analysis
Potato provides utility functions for analyzing process reward annotations:
from potato.analysis import load_annotations, process_reward_stats
# Load annotations
annotations = load_annotations("output/")
# Step-level accuracy statistics
stats = process_reward_stats(annotations)
print(f"Total traces annotated: {stats['total_traces']}")
print(f"Traces with no errors: {stats['all_correct_count']} ({stats['all_correct_pct']:.1f}%)")
print(f"Average first-error position: step {stats['avg_first_error_step']:.1f}")
print(f"Average steps before error: {stats['avg_correct_prefix_length']:.1f}")
# Error distribution by step position
for position, count in stats['error_by_position'].items():
print(f" Step {position}: {count} errors")
# Error category distribution (per-step mode only)
if 'error_categories' in stats:
for category, count in stats['error_categories'].items():
print(f" {category}: {count}")
# Inter-annotator agreement on first-error step
if stats['multi_annotator']:
print(f"First-error agreement (exact): {stats['first_error_exact_agreement']:.2f}")
print(f"First-error agreement (within 1): {stats['first_error_near_agreement']:.2f}")Visualization
from potato.analysis import plot_error_distribution
# Plot error position distribution across all traces
plot_error_distribution(
annotations,
output_path="figures/error_distribution.png",
normalize_by_trace_length=True,
title="Where Do Agents First Go Wrong?"
)
# Plot per-step reward curves
from potato.analysis import plot_reward_curves
plot_reward_curves(
annotations,
output_path="figures/reward_curves.png",
group_by="agent_model",
title="Cumulative Reward by Model"
)Research Context
Process reward annotation in Potato is designed to support research on training and evaluating reward models for agentic systems. Several recent lines of work motivate this feature:
- AgentPRM demonstrates that process reward models trained on step-level labels significantly outperform outcome reward models for guiding coding agents during search.
- ToolRM and ToolRL show that reward models specialized for tool-use steps can improve agent performance on API-calling and code generation tasks.
- DeepSWE applies process reward models to SWE-bench-scale software engineering tasks, using per-step labels to train verifiers that guide agent tree search.
- Step-level RLHF research shows that per-step human feedback produces more sample-efficient reward models than episode-level feedback.
Potato's first-error and per-step modes map directly to the label formats used by these approaches. The export pipeline produces training-ready data without additional preprocessing.
See Also
- Coding Agent Annotation -- display and annotate coding agent traces
- Agentic Annotation -- general-purpose agent trace annotation with per-turn ratings
- Export Formats -- all supported export formats
- Quality Control -- inter-annotator agreement and adjudication
For implementation details, see the source documentation.