Skip to content
Questa pagina non è ancora disponibile nella tua lingua. Viene mostrata la versione in inglese.

Coding Agent Annotation

Annotate coding agent traces with diff rendering, terminal output, and file tree navigation. Import from Claude Code, Aider, SWE-Agent, and other coding assistants.

Coding Agent Annotation

New in v2.4.0

Coding agents -- Claude Code, Aider, SWE-Agent, OpenHands, and others -- produce traces that are fundamentally different from general-purpose agent traces. They contain code diffs, terminal output, file reads, directory traversals, and test results. Reviewing these traces requires specialized rendering that understands the structure of code changes and presents them in a format familiar to software engineers.

Potato's CodingTraceDisplay is a purpose-built display type for coding agent sessions. It renders unified diffs with red/green syntax-highlighted lines, terminal output in dark blocks, file reads with line numbers, and provides a file tree sidebar showing every file the agent touched. Annotators can navigate between files, expand or collapse long outputs, and rate individual operations or the trace as a whole.

Configuration

Enable the coding trace display in your project config:

yaml
agentic:
  enabled: true
  trace_converter: claude_code
  display_type: coding_trace
 
  coding_trace_display:
    # Diff rendering
    diff_style: unified          # "unified" or "side_by_side"
    diff_context_lines: 3        # lines of context around changes
    syntax_highlight: true       # language-aware highlighting
    show_line_numbers: true
 
    # Terminal output
    terminal_theme: dark         # "dark" or "light"
    terminal_max_lines: 80       # auto-collapse after this many lines
    show_exit_codes: true
 
    # File reads
    file_read_max_lines: 100     # auto-collapse file reads longer than this
    show_file_path: true
    show_line_range: true        # display "lines 42-87" when partial reads
 
    # File tree sidebar
    file_tree:
      enabled: true
      position: left             # "left" or "right"
      show_operation_icons: true # icons for read/edit/create/delete
      group_by_directory: true
      click_to_navigate: true    # click a file to jump to its operations
 
    # Collapsible sections
    auto_collapse_threshold: 500 # characters before auto-collapsing
    collapse_file_reads: true
    collapse_terminal_output: true

Display Features

Unified Diff View

Edit operations are rendered as unified diffs with red/green highlighting. Deleted lines appear with a red background and a - prefix; added lines appear with a green background and a + prefix. Context lines are shown in neutral gray. The file path and line range appear in a header bar above each diff block.

When diff_style: side_by_side is set, the old and new versions appear in adjacent columns, making it easier to see what changed in complex edits.

Dark Terminal Blocks

Bash and shell commands are rendered in dark terminal blocks with monospaced font. The command itself appears with a $ prompt prefix, and the output appears below. Exit codes are shown in a small badge (green for 0, red for non-zero). Long outputs are auto-collapsed with a "Show N more lines" expander.

Line-Numbered File Reads

When the agent reads a file, the content is displayed with line numbers in a light code block. Partial reads show the line range (e.g., "lines 42-87 of 312"). Syntax highlighting is applied based on the file extension.

File Tree Sidebar

The file tree sidebar shows every file the agent touched during the trace. Files are grouped by directory and sorted alphabetically. Each file has an icon indicating the operations performed:

  • Pencil icon for edited files
  • Eye icon for read-only files
  • Plus icon for newly created files
  • Trash icon for deleted files
  • Terminal icon for executed scripts

Clicking a file in the tree scrolls the main panel to the first operation involving that file.

Collapsible Long Outputs

Any output block exceeding the auto_collapse_threshold is automatically collapsed. A summary line shows the first and last few lines with a "Show all N lines" button. This keeps the trace navigable even when individual operations produce hundreds of lines of output.

Trace Converters

Potato ships four coding-agent-specific converters that normalize trace formats into the unified coding trace representation.

ConverterSourceFormat
claude_codeClaude Code / Anthropic APIMessages API with tool_use blocks (Read, Edit, Bash, Write tools)
aiderAiderMarkdown chat logs with SEARCH/REPLACE and ORIGINAL/UPDATED edit blocks
swe_agent_trajectorySWE-AgentTrajectory JSON files with thought/action/observation triples
autoAuto-detectInspects trace structure and selects the best converter automatically

Specify the converter in your config:

yaml
agentic:
  trace_converter: claude_code    # or aider, swe_agent_trajectory, auto

Claude Code Converter

The claude_code converter handles traces from the Anthropic Messages API where tool use is represented as tool_use and tool_result content blocks. It recognizes the standard Claude Code tools:

  • Read tool calls become file read displays
  • Edit tool calls become unified diffs
  • Write tool calls become file creation displays
  • Bash tool calls become terminal blocks
  • Glob/Grep tool calls become search result displays

Aider Converter

The aider converter parses Aider's markdown-based chat format. It extracts SEARCH/REPLACE blocks (and the older ORIGINAL/UPDATED format) and converts them into unified diffs. Shell commands and their output are extracted from fenced code blocks marked with bash or shell.

SWE-Agent Trajectory Converter

The swe_agent_trajectory converter reads SWE-Agent's trajectory JSON files. Each trajectory entry contains a thought (the agent's reasoning), an action (the command executed), and an observation (the command output). The converter classifies actions into file edits, file reads, shell commands, and navigation operations.

CLI Usage

Convert raw traces before starting the annotation server:

bash
# Convert Claude Code traces
python -m potato.trace_converter \
  -i traces.json \
  -f claude_code \
  -o data/converted.jsonl
 
# Convert Aider chat logs
python -m potato.trace_converter \
  -i aider_chat_history/ \
  -f aider \
  -o data/aider_converted.jsonl
 
# Convert SWE-Agent trajectories
python -m potato.trace_converter \
  -i trajectories/ \
  -f swe_agent_trajectory \
  -o data/swe_converted.jsonl
 
# Auto-detect format
python -m potato.trace_converter \
  -i mixed_traces/ \
  -f auto \
  -o data/auto_converted.jsonl

The -i flag accepts a single file or a directory. When a directory is given, all .json and .jsonl files are processed. The converter writes one JSON object per line to the output file.

Additional options:

bash
# Filter by file extension
python -m potato.trace_converter \
  -i traces/ -f claude_code -o data/out.jsonl \
  --include "*.json"
 
# Add metadata fields from a CSV
python -m potato.trace_converter \
  -i traces/ -f claude_code -o data/out.jsonl \
  --metadata metadata.csv --join-key trace_id
 
# Validate output without writing
python -m potato.trace_converter \
  -i traces.json -f claude_code --validate

Data Format

After conversion, each line in the output JSONL file follows this structure:

json
{
  "id": "trace_001",
  "task_description": "Fix the failing test in test_parser.py",
  "repository": "myproject",
  "structured_turns": [
    {
      "type": "file_read",
      "tool": "Read",
      "file_path": "src/parser.py",
      "content": "def parse(input_str):\n    tokens = tokenize(input_str)\n    ...",
      "line_start": 1,
      "line_end": 45
    },
    {
      "type": "edit",
      "tool": "Edit",
      "file_path": "src/parser.py",
      "old_content": "    if len(tokens) == 0:\n        return None",
      "new_content": "    if len(tokens) == 0:\n        raise ParseError('Empty input')",
      "line_start": 12,
      "line_end": 13
    },
    {
      "type": "terminal",
      "tool": "Bash",
      "command": "python -m pytest test_parser.py -v",
      "output": "test_parser.py::test_empty_input PASSED\ntest_parser.py::test_valid_input PASSED\n\n2 passed in 0.34s",
      "exit_code": 0
    },
    {
      "type": "file_write",
      "tool": "Write",
      "file_path": "src/parser.py",
      "content": "...",
      "is_new_file": false
    }
  ],
  "metadata": {
    "agent": "claude_code",
    "model": "claude-sonnet-4-20250514",
    "total_tokens": 15234,
    "duration_seconds": 42
  }
}

The structured_turns array preserves the exact order of operations. Each turn has a type field (file_read, edit, terminal, file_write, search, thought) and type-specific fields.

Configuration Reference

Here is a complete configuration combining the coding trace display with annotation schemes for evaluating coding agent output:

yaml
task_name: "Coding Agent Evaluation"
task_dir: "."
 
data_files:
  - "data/coding_traces.jsonl"
 
item_properties:
  id_key: id
  text_key: task_description
 
agentic:
  enabled: true
  trace_converter: claude_code
  display_type: coding_trace
 
  coding_trace_display:
    diff_style: unified
    diff_context_lines: 3
    syntax_highlight: true
    show_line_numbers: true
    terminal_theme: dark
    terminal_max_lines: 80
    show_exit_codes: true
    file_read_max_lines: 100
    file_tree:
      enabled: true
      position: left
      show_operation_icons: true
      group_by_directory: true
      click_to_navigate: true
    auto_collapse_threshold: 500
 
annotation_schemes:
  # Did the agent complete the task?
  - annotation_type: radio
    name: task_completion
    description: "Did the agent successfully complete the task?"
    labels:
      - "Fully Complete"
      - "Partially Complete"
      - "Failed"
      - "Made Things Worse"
 
  # Per-step correctness
  - annotation_type: per_turn_rating
    name: step_quality
    description: "Rate this step"
    target: agentic_steps
    rating_type: radio
    labels:
      - "Good"
      - "Acceptable"
      - "Unnecessary"
      - "Incorrect"
 
  # Code quality rating
  - annotation_type: likert
    name: code_quality
    description: "Rate the quality of the code changes"
    min: 1
    max: 5
    labels:
      1: "Very Poor"
      2: "Poor"
      3: "Acceptable"
      4: "Good"
      5: "Excellent"
 
  # Free-text notes
  - annotation_type: text
    name: notes
    description: "Any additional observations about the coding trace"
    label_requirement:
      required: false
 
output_annotation_dir: "output/"
output_annotation_format: "jsonl"

Running Example Projects

Potato includes example projects for coding agent annotation:

bash
# Clone the repository
git clone https://github.com/davidjurgens/potato.git
cd potato
 
# Run the Claude Code trace evaluation example
potato start example/coding_agent_eval/config.yaml -p 8000
 
# Run the SWE-bench evaluation example
potato start example/swe_bench_eval/config.yaml -p 8000
 
# Run the multi-agent comparison example
potato start example/coding_agent_comparison/config.yaml -p 8000

Each example includes sample traces, a complete configuration file, and a README describing the annotation task.

See Also

For implementation details, see the source documentation.