코딩 에이전트 어노테이션

diff 렌더링, 터미널 출력, 파일 트리 탐색으로 코딩 에이전트 trace에 어노테이션을 답니다. Claude Code, Aider, SWE-Agent를 비롯한 코딩 어시스턴트에서 가져오기를 지원합니다.

v2.4.0의 새 기능

코딩 에이전트 -- Claude Code, Aider, SWE-Agent, OpenHands 등 -- 는 범용 에이전트 trace와 근본적으로 다른 trace를 생성합니다. 이러한 trace에는 코드 diff, 터미널 출력, 파일 읽기, 디렉터리 탐색, 테스트 결과가 포함됩니다. 이런 trace를 검토하려면 코드 변경의 구조를 이해하고 소프트웨어 엔지니어에게 익숙한 형식으로 제시하는 전용 렌더링이 필요합니다.

Potato의 CodingTraceDisplay는 코딩 에이전트 세션을 위해 전용으로 만들어진 표시 유형입니다. 통합 diff를 빨강/초록 구문 강조 라인으로 렌더링하고, 터미널 출력을 어두운 블록으로 표시하며, 파일 읽기를 줄 번호와 함께 보여 주고, 에이전트가 다룬 모든 파일을 보여 주는 파일 트리 사이드바를 제공합니다. 어노테이터는 파일 사이를 탐색하고, 긴 출력을 펼치거나 접고, 개별 작업이나 trace 전체를 평가할 수 있습니다.

구성

프로젝트 구성에서 코딩 trace 표시를 활성화합니다:

yaml

agentic:
  enabled: true
  trace_converter: claude_code
  display_type: coding_trace
 
  coding_trace_display:
    # Diff rendering
    diff_style: unified          # "unified" or "side_by_side"
    diff_context_lines: 3        # lines of context around changes
    syntax_highlight: true       # language-aware highlighting
    show_line_numbers: true
 
    # Terminal output
    terminal_theme: dark         # "dark" or "light"
    terminal_max_lines: 80       # auto-collapse after this many lines
    show_exit_codes: true
 
    # File reads
    file_read_max_lines: 100     # auto-collapse file reads longer than this
    show_file_path: true
    show_line_range: true        # display "lines 42-87" when partial reads
 
    # File tree sidebar
    file_tree:
      enabled: true
      position: left             # "left" or "right"
      show_operation_icons: true # icons for read/edit/create/delete
      group_by_directory: true
      click_to_navigate: true    # click a file to jump to its operations
 
    # Collapsible sections
    auto_collapse_threshold: 500 # characters before auto-collapsing
    collapse_file_reads: true
    collapse_terminal_output: true

표시 기능

통합 Diff 보기

편집 작업은 빨강/초록 강조가 적용된 통합 diff로 렌더링됩니다. 삭제된 라인은 빨간색 배경과 - 접두사로 나타나고, 추가된 라인은 초록색 배경과 + 접두사로 나타납니다. 컨텍스트 라인은 중립적인 회색으로 표시됩니다. 파일 경로와 줄 범위는 각 diff 블록 위의 헤더 바에 나타납니다.

diff_style: side_by_side가 설정되면 이전 버전과 새 버전이 인접한 열에 나타나, 복잡한 편집에서 무엇이 바뀌었는지 보기가 더 쉬워집니다.

어두운 터미널 블록

Bash와 셸 명령은 고정폭 글꼴의 어두운 터미널 블록으로 렌더링됩니다. 명령 자체는 $ 프롬프트 접두사와 함께 나타나고, 그 출력은 아래에 나타납니다. 종료 코드는 작은 배지에 표시됩니다(0이면 초록, 0이 아니면 빨강). 긴 출력은 "N개 라인 더 보기" 확장기와 함께 자동으로 접힙니다.

줄 번호가 매겨진 파일 읽기

에이전트가 파일을 읽으면, 내용이 밝은 코드 블록에 줄 번호와 함께 표시됩니다. 부분 읽기는 줄 범위(예: "312줄 중 42-87줄")를 보여 줍니다. 구문 강조는 파일 확장자를 기준으로 적용됩니다.

파일 트리 사이드바

파일 트리 사이드바는 trace 동안 에이전트가 다룬 모든 파일을 보여 줍니다. 파일은 디렉터리별로 그룹화되어 알파벳순으로 정렬됩니다. 각 파일에는 수행된 작업을 나타내는 아이콘이 있습니다:

편집된 파일에는 연필 아이콘
읽기 전용 파일에는 눈 아이콘
새로 생성된 파일에는 더하기 아이콘
삭제된 파일에는 휴지통 아이콘
실행된 스크립트에는 터미널 아이콘

트리에서 파일을 클릭하면 메인 패널이 해당 파일과 관련된 첫 작업으로 스크롤됩니다.

접을 수 있는 긴 출력

auto_collapse_threshold를 초과하는 출력 블록은 자동으로 접힙니다. 요약 줄에는 처음과 마지막 몇 줄이 "전체 N개 라인 보기" 버튼과 함께 표시됩니다. 이렇게 하면 개별 작업이 수백 줄의 출력을 생성하더라도 trace를 탐색하기 쉽게 유지됩니다.

Trace 변환기

Potato는 trace 형식을 통합 코딩 trace 표현으로 정규화하는 코딩 에이전트 전용 변환기 네 가지를 함께 제공합니다.

변환기	출처	형식
`claude_code`	Claude Code / Anthropic API	`tool_use` 블록(Read, Edit, Bash, Write 도구)이 있는 Messages API
`aider`	Aider	SEARCH/REPLACE 및 ORIGINAL/UPDATED 편집 블록이 있는 Markdown 채팅 로그
`swe_agent_trajectory`	SWE-Agent	사고/행동/관찰 트리플이 있는 trajectory JSON 파일
`auto`	자동 감지	trace 구조를 검사하여 가장 적합한 변환기를 자동으로 선택

구성에서 변환기를 지정합니다:

yaml

agentic:
  trace_converter: claude_code    # or aider, swe_agent_trajectory, auto

Claude Code 변환기

claude_code 변환기는 도구 사용이 tool_use 및 tool_result 콘텐츠 블록으로 표현되는 Anthropic Messages API의 trace를 처리합니다. 표준 Claude Code 도구를 인식합니다:

Read 도구 호출은 파일 읽기 표시가 됩니다
Edit 도구 호출은 통합 diff가 됩니다
Write 도구 호출은 파일 생성 표시가 됩니다
Bash 도구 호출은 터미널 블록이 됩니다
Glob/Grep 도구 호출은 검색 결과 표시가 됩니다

Aider 변환기

aider 변환기는 Aider의 markdown 기반 채팅 형식을 파싱합니다. SEARCH/REPLACE 블록(및 이전 ORIGINAL/UPDATED 형식)을 추출하여 통합 diff로 변환합니다. 셸 명령과 그 출력은 bash 또는 shell로 표시된 펜스 코드 블록에서 추출됩니다.

SWE-Agent Trajectory 변환기

swe_agent_trajectory 변환기는 SWE-Agent의 trajectory JSON 파일을 읽습니다. 각 trajectory 항목에는 사고(에이전트의 추론), 행동(실행된 명령), 관찰(명령 출력)이 들어 있습니다. 변환기는 행동을 파일 편집, 파일 읽기, 셸 명령, 탐색 작업으로 분류합니다.

CLI 사용법

어노테이션 서버를 시작하기 전에 원시 trace를 변환합니다:

bash

# Convert Claude Code traces
python -m potato.trace_converter \
  -i traces.json \
  -f claude_code \
  -o data/converted.jsonl
 
# Convert Aider chat logs
python -m potato.trace_converter \
  -i aider_chat_history/ \
  -f aider \
  -o data/aider_converted.jsonl
 
# Convert SWE-Agent trajectories
python -m potato.trace_converter \
  -i trajectories/ \
  -f swe_agent_trajectory \
  -o data/swe_converted.jsonl
 
# Auto-detect format
python -m potato.trace_converter \
  -i mixed_traces/ \
  -f auto \
  -o data/auto_converted.jsonl

-i 플래그는 단일 파일이나 디렉터리를 받습니다. 디렉터리를 지정하면 모든 .json 및 .jsonl 파일이 처리됩니다. 변환기는 출력 파일에 한 줄당 하나의 JSON 객체를 씁니다.

추가 옵션:

bash

# Filter by file extension
python -m potato.trace_converter \
  -i traces/ -f claude_code -o data/out.jsonl \
  --include "*.json"
 
# Add metadata fields from a CSV
python -m potato.trace_converter \
  -i traces/ -f claude_code -o data/out.jsonl \
  --metadata metadata.csv --join-key trace_id
 
# Validate output without writing
python -m potato.trace_converter \
  -i traces.json -f claude_code --validate

데이터 형식

변환 후 출력 JSONL 파일의 각 줄은 다음 구조를 따릅니다:

json

{
  "id": "trace_001",
  "task_description": "Fix the failing test in test_parser.py",
  "repository": "myproject",
  "structured_turns": [
    {
      "type": "file_read",
      "tool": "Read",
      "file_path": "src/parser.py",
      "content": "def parse(input_str):\n    tokens = tokenize(input_str)\n    ...",
      "line_start": 1,
      "line_end": 45
    },
    {
      "type": "edit",
      "tool": "Edit",
      "file_path": "src/parser.py",
      "old_content": "    if len(tokens) == 0:\n        return None",
      "new_content": "    if len(tokens) == 0:\n        raise ParseError('Empty input')",
      "line_start": 12,
      "line_end": 13
    },
    {
      "type": "terminal",
      "tool": "Bash",
      "command": "python -m pytest test_parser.py -v",
      "output": "test_parser.py::test_empty_input PASSED\ntest_parser.py::test_valid_input PASSED\n\n2 passed in 0.34s",
      "exit_code": 0
    },
    {
      "type": "file_write",
      "tool": "Write",
      "file_path": "src/parser.py",
      "content": "...",
      "is_new_file": false
    }
  ],
  "metadata": {
    "agent": "claude_code",
    "model": "claude-sonnet-4-20250514",
    "total_tokens": 15234,
    "duration_seconds": 42
  }
}

structured_turns 배열은 작업의 정확한 순서를 보존합니다. 각 턴에는 type 필드(file_read, edit, terminal, file_write, search, thought)와 유형별 필드가 있습니다.

구성 레퍼런스

다음은 코딩 trace 표시와 코딩 에이전트 출력을 평가하기 위한 어노테이션 스킴을 결합한 완전한 구성입니다:

yaml

task_name: "Coding Agent Evaluation"
task_dir: "."
 
data_files:
  - "data/coding_traces.jsonl"
 
item_properties:
  id_key: id
  text_key: task_description
 
agentic:
  enabled: true
  trace_converter: claude_code
  display_type: coding_trace
 
  coding_trace_display:
    diff_style: unified
    diff_context_lines: 3
    syntax_highlight: true
    show_line_numbers: true
    terminal_theme: dark
    terminal_max_lines: 80
    show_exit_codes: true
    file_read_max_lines: 100
    file_tree:
      enabled: true
      position: left
      show_operation_icons: true
      group_by_directory: true
      click_to_navigate: true
    auto_collapse_threshold: 500
 
annotation_schemes:
  # Did the agent complete the task?
  - annotation_type: radio
    name: task_completion
    description: "Did the agent successfully complete the task?"
    labels:
      - "Fully Complete"
      - "Partially Complete"
      - "Failed"
      - "Made Things Worse"
 
  # Per-step correctness
  - annotation_type: per_turn_rating
    name: step_quality
    description: "Rate this step"
    target: agentic_steps
    rating_type: radio
    labels:
      - "Good"
      - "Acceptable"
      - "Unnecessary"
      - "Incorrect"
 
  # Code quality rating
  - annotation_type: likert
    name: code_quality
    description: "Rate the quality of the code changes"
    min: 1
    max: 5
    labels:
      1: "Very Poor"
      2: "Poor"
      3: "Acceptable"
      4: "Good"
      5: "Excellent"
 
  # Free-text notes
  - annotation_type: text
    name: notes
    description: "Any additional observations about the coding trace"
    label_requirement:
      required: false
 
output_annotation_dir: "output/"
output_annotation_format: "jsonl"

예제 프로젝트 실행

Potato에는 코딩 에이전트 어노테이션을 위한 예제 프로젝트가 포함되어 있습니다:

bash

# Clone the repository
git clone https://github.com/davidjurgens/potato.git
cd potato
 
# Run the Claude Code trace evaluation example
potato start example/coding_agent_eval/config.yaml -p 8000
 
# Run the SWE-bench evaluation example
potato start example/swe_bench_eval/config.yaml -p 8000
 
# Run the multi-agent comparison example
potato start example/coding_agent_comparison/config.yaml -p 8000

각 예제에는 샘플 trace, 완전한 구성 파일, 어노테이션 작업을 설명하는 README가 포함되어 있습니다.

함께 보기

프로세스 보상 어노테이션 -- PRM 학습을 위한 단계별 보상 신호 수집
코드 리뷰 어노테이션 -- 코드 변경에 대한 GitHub PR 스타일 인라인 리뷰
라이브 코딩 에이전트 관찰 -- 코딩 에이전트를 실시간으로 관찰하고 상호작용
에이전트 어노테이션 -- 범용 에이전트 trace 어노테이션
내보내기 형식 -- 모델 학습을 위한 어노테이션 데이터 내보내기

구현 세부 정보는 원본 문서를 참조하세요.