Coding Agent Annotation
Annotate coding agent traces with diff rendering, terminal output, and file tree navigation. Import from Claude Code, Aider, SWE-Agent, and other coding assistants.
Coding Agent Annotation
New in v2.4.0
Coding agents -- Claude Code, Aider, SWE-Agent, OpenHands, and others -- produce traces that are fundamentally different from general-purpose agent traces. They contain code diffs, terminal output, file reads, directory traversals, and test results. Reviewing these traces requires specialized rendering that understands the structure of code changes and presents them in a format familiar to software engineers.
Potato's CodingTraceDisplay is a purpose-built display type for coding agent sessions. It renders unified diffs with red/green syntax-highlighted lines, terminal output in dark blocks, file reads with line numbers, and provides a file tree sidebar showing every file the agent touched. Annotators can navigate between files, expand or collapse long outputs, and rate individual operations or the trace as a whole.
Configuration
Enable the coding trace display in your project config:
agentic:
enabled: true
trace_converter: claude_code
display_type: coding_trace
coding_trace_display:
# Diff rendering
diff_style: unified # "unified" or "side_by_side"
diff_context_lines: 3 # lines of context around changes
syntax_highlight: true # language-aware highlighting
show_line_numbers: true
# Terminal output
terminal_theme: dark # "dark" or "light"
terminal_max_lines: 80 # auto-collapse after this many lines
show_exit_codes: true
# File reads
file_read_max_lines: 100 # auto-collapse file reads longer than this
show_file_path: true
show_line_range: true # display "lines 42-87" when partial reads
# File tree sidebar
file_tree:
enabled: true
position: left # "left" or "right"
show_operation_icons: true # icons for read/edit/create/delete
group_by_directory: true
click_to_navigate: true # click a file to jump to its operations
# Collapsible sections
auto_collapse_threshold: 500 # characters before auto-collapsing
collapse_file_reads: true
collapse_terminal_output: trueDisplay Features
Unified Diff View
Edit operations are rendered as unified diffs with red/green highlighting. Deleted lines appear with a red background and a - prefix; added lines appear with a green background and a + prefix. Context lines are shown in neutral gray. The file path and line range appear in a header bar above each diff block.
When diff_style: side_by_side is set, the old and new versions appear in adjacent columns, making it easier to see what changed in complex edits.
Dark Terminal Blocks
Bash and shell commands are rendered in dark terminal blocks with monospaced font. The command itself appears with a $ prompt prefix, and the output appears below. Exit codes are shown in a small badge (green for 0, red for non-zero). Long outputs are auto-collapsed with a "Show N more lines" expander.
Line-Numbered File Reads
When the agent reads a file, the content is displayed with line numbers in a light code block. Partial reads show the line range (e.g., "lines 42-87 of 312"). Syntax highlighting is applied based on the file extension.
File Tree Sidebar
The file tree sidebar shows every file the agent touched during the trace. Files are grouped by directory and sorted alphabetically. Each file has an icon indicating the operations performed:
- Pencil icon for edited files
- Eye icon for read-only files
- Plus icon for newly created files
- Trash icon for deleted files
- Terminal icon for executed scripts
Clicking a file in the tree scrolls the main panel to the first operation involving that file.
Collapsible Long Outputs
Any output block exceeding the auto_collapse_threshold is automatically collapsed. A summary line shows the first and last few lines with a "Show all N lines" button. This keeps the trace navigable even when individual operations produce hundreds of lines of output.
Trace Converters
Potato ships four coding-agent-specific converters that normalize trace formats into the unified coding trace representation.
| Converter | Source | Format |
|---|---|---|
claude_code | Claude Code / Anthropic API | Messages API with tool_use blocks (Read, Edit, Bash, Write tools) |
aider | Aider | Markdown chat logs with SEARCH/REPLACE and ORIGINAL/UPDATED edit blocks |
swe_agent_trajectory | SWE-Agent | Trajectory JSON files with thought/action/observation triples |
auto | Auto-detect | Inspects trace structure and selects the best converter automatically |
Specify the converter in your config:
agentic:
trace_converter: claude_code # or aider, swe_agent_trajectory, autoClaude Code Converter
The claude_code converter handles traces from the Anthropic Messages API where tool use is represented as tool_use and tool_result content blocks. It recognizes the standard Claude Code tools:
- Read tool calls become file read displays
- Edit tool calls become unified diffs
- Write tool calls become file creation displays
- Bash tool calls become terminal blocks
- Glob/Grep tool calls become search result displays
Aider Converter
The aider converter parses Aider's markdown-based chat format. It extracts SEARCH/REPLACE blocks (and the older ORIGINAL/UPDATED format) and converts them into unified diffs. Shell commands and their output are extracted from fenced code blocks marked with bash or shell.
SWE-Agent Trajectory Converter
The swe_agent_trajectory converter reads SWE-Agent's trajectory JSON files. Each trajectory entry contains a thought (the agent's reasoning), an action (the command executed), and an observation (the command output). The converter classifies actions into file edits, file reads, shell commands, and navigation operations.
CLI Usage
Convert raw traces before starting the annotation server:
# Convert Claude Code traces
python -m potato.trace_converter \
-i traces.json \
-f claude_code \
-o data/converted.jsonl
# Convert Aider chat logs
python -m potato.trace_converter \
-i aider_chat_history/ \
-f aider \
-o data/aider_converted.jsonl
# Convert SWE-Agent trajectories
python -m potato.trace_converter \
-i trajectories/ \
-f swe_agent_trajectory \
-o data/swe_converted.jsonl
# Auto-detect format
python -m potato.trace_converter \
-i mixed_traces/ \
-f auto \
-o data/auto_converted.jsonlThe -i flag accepts a single file or a directory. When a directory is given, all .json and .jsonl files are processed. The converter writes one JSON object per line to the output file.
Additional options:
# Filter by file extension
python -m potato.trace_converter \
-i traces/ -f claude_code -o data/out.jsonl \
--include "*.json"
# Add metadata fields from a CSV
python -m potato.trace_converter \
-i traces/ -f claude_code -o data/out.jsonl \
--metadata metadata.csv --join-key trace_id
# Validate output without writing
python -m potato.trace_converter \
-i traces.json -f claude_code --validateData Format
After conversion, each line in the output JSONL file follows this structure:
{
"id": "trace_001",
"task_description": "Fix the failing test in test_parser.py",
"repository": "myproject",
"structured_turns": [
{
"type": "file_read",
"tool": "Read",
"file_path": "src/parser.py",
"content": "def parse(input_str):\n tokens = tokenize(input_str)\n ...",
"line_start": 1,
"line_end": 45
},
{
"type": "edit",
"tool": "Edit",
"file_path": "src/parser.py",
"old_content": " if len(tokens) == 0:\n return None",
"new_content": " if len(tokens) == 0:\n raise ParseError('Empty input')",
"line_start": 12,
"line_end": 13
},
{
"type": "terminal",
"tool": "Bash",
"command": "python -m pytest test_parser.py -v",
"output": "test_parser.py::test_empty_input PASSED\ntest_parser.py::test_valid_input PASSED\n\n2 passed in 0.34s",
"exit_code": 0
},
{
"type": "file_write",
"tool": "Write",
"file_path": "src/parser.py",
"content": "...",
"is_new_file": false
}
],
"metadata": {
"agent": "claude_code",
"model": "claude-sonnet-4-20250514",
"total_tokens": 15234,
"duration_seconds": 42
}
}The structured_turns array preserves the exact order of operations. Each turn has a type field (file_read, edit, terminal, file_write, search, thought) and type-specific fields.
Configuration Reference
Here is a complete configuration combining the coding trace display with annotation schemes for evaluating coding agent output:
task_name: "Coding Agent Evaluation"
task_dir: "."
data_files:
- "data/coding_traces.jsonl"
item_properties:
id_key: id
text_key: task_description
agentic:
enabled: true
trace_converter: claude_code
display_type: coding_trace
coding_trace_display:
diff_style: unified
diff_context_lines: 3
syntax_highlight: true
show_line_numbers: true
terminal_theme: dark
terminal_max_lines: 80
show_exit_codes: true
file_read_max_lines: 100
file_tree:
enabled: true
position: left
show_operation_icons: true
group_by_directory: true
click_to_navigate: true
auto_collapse_threshold: 500
annotation_schemes:
# Did the agent complete the task?
- annotation_type: radio
name: task_completion
description: "Did the agent successfully complete the task?"
labels:
- "Fully Complete"
- "Partially Complete"
- "Failed"
- "Made Things Worse"
# Per-step correctness
- annotation_type: per_turn_rating
name: step_quality
description: "Rate this step"
target: agentic_steps
rating_type: radio
labels:
- "Good"
- "Acceptable"
- "Unnecessary"
- "Incorrect"
# Code quality rating
- annotation_type: likert
name: code_quality
description: "Rate the quality of the code changes"
min: 1
max: 5
labels:
1: "Very Poor"
2: "Poor"
3: "Acceptable"
4: "Good"
5: "Excellent"
# Free-text notes
- annotation_type: text
name: notes
description: "Any additional observations about the coding trace"
label_requirement:
required: false
output_annotation_dir: "output/"
output_annotation_format: "jsonl"Running Example Projects
Potato includes example projects for coding agent annotation:
# Clone the repository
git clone https://github.com/davidjurgens/potato.git
cd potato
# Run the Claude Code trace evaluation example
potato start example/coding_agent_eval/config.yaml -p 8000
# Run the SWE-bench evaluation example
potato start example/swe_bench_eval/config.yaml -p 8000
# Run the multi-agent comparison example
potato start example/coding_agent_comparison/config.yaml -p 8000Each example includes sample traces, a complete configuration file, and a README describing the annotation task.
See Also
- Process Reward Annotation -- collect per-step reward signals for PRM training
- Code Review Annotation -- GitHub PR-style inline review for code changes
- Live Coding Agent Observation -- watch and interact with coding agents in real time
- Agentic Annotation -- general-purpose agent trace annotation
- Export Formats -- export annotation data for model training
For implementation details, see the source documentation.