실시간 코딩 에이전트 관찰

코딩 에이전트가 작업하는 모습을 일시정지, 롤백, 분기와 함께 실시간으로 지켜봅니다. 세 가지 백엔드를 지원합니다: 로컬 모델용 Ollama, Anthropic API, Claude Agent SDK.

v2.4.0의 새 기능

정적 트레이스 어노테이션은 에이전트가 무엇을 했는지 알려줍니다. 실시간 관찰은 에이전트가 사람의 안내에 어떻게 반응하는지 알려줍니다. Potato의 실시간 코딩 에이전트 모드는 어노테이터가 코딩 에이전트가 실시간으로 작업하는 모습 -- 파일 읽기, 코드 편집, 테스트 실행 -- 을 지켜보고 언제든 개입할 수 있게 해줍니다. 에이전트를 일시정지하고, 새 지시를 보내고, 이전 체크포인트로 롤백하거나, 트래젝토리를 분기하여 대안적 접근 방식을 탐색하세요.

이는 정적 트레이스만으로 얻는 것보다 풍부한 어노테이션 데이터를 생성합니다. 타임스탬프가 포함된 전체 트래젝토리, 어노테이터의 개입, 분기 결정 지점, 대안 경로의 비교 데이터를 얻습니다. 이 데이터는 프로세스 보상 모델, 선호 모델, 지시 따르기 평가기를 학습하는 데 직접적으로 유용합니다.

요구 사항

Python 3.10+
Git (체크포인트 시스템은 git 커밋을 사용합니다)
다음 에이전트 백엔드 중 하나:
- 로컬 모델 추론을 위한 Ollama (API 키 불필요)
- Anthropic API 접근을 위한 ANTHROPIC_API_KEY
- 완전한 Claude Code 에이전트 경험을 위한 Claude Agent SDK

백엔드

Potato는 코딩 에이전트를 실행하기 위한 세 가지 백엔드를 지원합니다. 각 백엔드는 에이전트를 서브프로세스에서 실행하고 그 액션을 어노테이션 인터페이스로 실시간 스트리밍합니다.

1. Ollama (로컬 모델)

API 키 없이 로컬에서 코딩 에이전트를 실행하세요. Ollama는 오픈 가중치 모델에 빠른 추론을 제공합니다. 개발, 테스트, 그리고 데이터가 로컬 머신을 벗어날 수 없는 상황에 가장 적합합니다.

설정:

bash

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
 
# Pull a coding-capable model
ollama pull qwen2.5-coder:7b
 
# Or a larger model for better performance
ollama pull deepseek-coder-v2:16b

구성:

yaml

agentic:
  enabled: true
  display_type: coding_trace
  live_agent:
    enabled: true
    backend: ollama
    model: qwen2.5-coder:7b
 
    ollama:
      host: "http://localhost:11434"    # Ollama server URL
      temperature: 0.2
      num_ctx: 8192                     # context window size
      num_predict: 2048                 # max tokens per response
      keep_alive: "5m"                  # keep model loaded in memory
 
    # Agent capabilities
    tools:
      - read_file
      - edit_file
      - write_file
      - bash
      - glob
      - grep
    max_steps: 50
    step_timeout_seconds: 60

2. Anthropic API

Anthropic API를 통해 Claude 모델을 사용하세요. 도구 사용 기능과 함께 강력한 코딩 성능을 제공합니다. API 키가 필요합니다.

설정:

bash

# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..."
 
# Or add to .env file
echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env

구성:

yaml

agentic:
  enabled: true
  display_type: coding_trace
  live_agent:
    enabled: true
    backend: anthropic
    model: claude-sonnet-4-20250514
 
    anthropic:
      api_key: ${ANTHROPIC_API_KEY}
      max_tokens: 4096
      temperature: 0.2
      system_prompt: |
        You are a coding assistant working on a software project.
        Read files before editing them. Run tests after making changes.
        Explain your reasoning before each action.
 
    # Agent capabilities
    tools:
      - read_file
      - edit_file
      - write_file
      - bash
      - glob
      - grep
    max_steps: 100
    step_timeout_seconds: 120

3. Claude Agent SDK

Claude Agent SDK는 자동 도구 오케스트레이션, 컨텍스트 관리, 다중 파일 추론을 포함한 완전한 Claude Code 에이전트 경험을 제공합니다. 가장 기능이 풍부한 백엔드이지만 SDK가 설치되어 있어야 합니다.

설정:

bash

# Install the Claude Agent SDK
pip install claude-agent-sdk
 
# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..."

구성:

yaml

agentic:
  enabled: true
  display_type: coding_trace
  live_agent:
    enabled: true
    backend: claude_agent_sdk
 
    claude_agent_sdk:
      api_key: ${ANTHROPIC_API_KEY}
      model: claude-sonnet-4-20250514
      max_turns: 100
      permission_mode: auto           # auto-approve tool use
      enable_thinking: true           # show extended thinking
 
    max_steps: 100
    step_timeout_seconds: 180

컨트롤

어노테이션 인터페이스는 어노테이터가 에이전트의 동작을 안내할 수 있는 네 가지 컨트롤 동작을 제공합니다.

일시정지 / 재개

단계 사이에서 에이전트를 멈추려면 Pause를 클릭하세요. 에이전트는 현재 단계를 마치고 대기합니다. 어노테이터는 현재 상태를 검토하고, 파일을 살펴보고, 에이전트를 계속 진행시킬지 개입할지 결정할 수 있습니다. 에이전트가 진행하도록 하려면 Resume을 클릭하세요.

yaml

live_agent:
  controls:
    pause_resume:
      enabled: true
      auto_pause_on_error: true      # pause when a command fails
      auto_pause_after_steps: 0      # pause after N steps (0 = disabled)
      keyboard_shortcut: "Space"

지시 보내기

에이전트가 일시정지된 동안, 어노테이터는 에이전트의 방향을 바꾸는 새 지시를 보낼 수 있습니다. 이는 에이전트가 잘못된 경로로 가고 있을 때, 또는 어노테이터가 에이전트가 안내에 어떻게 반응하는지 테스트하고자 할 때 유용합니다.

yaml

live_agent:
  controls:
    send_instructions:
      enabled: true
      placeholder: "Type instructions for the agent..."
      inject_as: system_message      # "system_message" or "user_message"
      keyboard_shortcut: "Enter"
      presets:
        - "Try a different approach"
        - "Read the error message more carefully"
        - "Check the test file for expected behavior"
        - "Revert your last change and try again"

지시는 에이전트의 대화 컨텍스트에 주입됩니다. inject_as 옵션은 지시가 시스템 메시지(권위 있는 지시)로 나타날지 사용자 메시지(대화형 안내)로 나타날지를 제어합니다.

롤백

롤백은 프로젝트를 이전 git 체크포인트로 되돌립니다. 에이전트가 수행하는 모든 파일 변경은 자동으로 커밋되므로, 어노테이터는 타임라인의 이전 단계를 클릭하여 정확히 그 상태로 롤백할 수 있습니다. 에이전트의 대화 컨텍스트도 이에 맞춰 잘립니다.

yaml

live_agent:
  controls:
    rollback:
      enabled: true
      show_checkpoint_diff: true     # show what will be undone
      require_confirmation: true     # "Are you sure?" dialog
      keyboard_shortcut: "Ctrl+Z"

분기 및 재생

분기 및 재생은 롤백과 지시 보내기를 결합합니다. 어노테이터는 체크포인트로 롤백하고 다른 지시를 보내 분기 트래젝토리를 생성합니다. 이는 선호 데이터를 수집하는 데 강력합니다: 동일한 시작점에서 두 가지 다른 접근 방식을 탐색하고 결과를 비교할 수 있습니다.

yaml

live_agent:
  controls:
    branch:
      enabled: true
      max_branches: 5                # maximum branches from any checkpoint
      branch_naming: auto            # "auto" or "manual"
      compare_view: true             # side-by-side branch comparison
      keyboard_shortcut: "Ctrl+B"

분기 비교 보기는 두 분기를 나란히 보여주며, 어디서 갈라지는지 강조합니다. 어노테이터는 어느 분기가 더 나은 결과를 냈는지 평가하여 DPO 학습용 선호 쌍을 생성할 수 있습니다.

Git 체크포인트 시스템

실시간 에이전트 모드는 git을 사용하여 모든 파일 변경을 추적합니다. 이는 신뢰할 수 있는 롤백, 분기, 전체 변경 기록을 제공합니다.

작동 방식

에이전트가 시작되기 전에, Potato는 potato-session-{session_id}라는 새 git 브랜치를 생성합니다
각 파일 변경(편집, 쓰기, 생성, 삭제) 후, Potato는 설명적인 메시지와 함께 자동으로 커밋합니다
각 커밋은 타임라인에 나타나는 체크포인트로 태그됩니다
롤백은 git checkout을 사용하여 작업 디렉터리를 임의의 체크포인트로 복원합니다
분기는 체크포인트 커밋에서 새 git 브랜치를 생성합니다

구성

yaml

live_agent:
  git_checkpoints:
    enabled: true
    branch_prefix: "potato-session"
    commit_message_format: "Step {step}: {tool} {file_path}"
    auto_commit: true
    cleanup_on_complete: false       # delete session branches when done
    require_clean_working_dir: true  # fail if there are uncommitted changes

수동 체크포인트 관리

bash

# List all Potato session branches
git branch | grep potato-session
 
# View checkpoints for a session
git log potato-session-abc123 --oneline
 
# Clean up old session branches
python -m potato.cleanup_sessions --older-than 7d

데이터 형식

실시간 코딩 에이전트 작업의 입력 데이터는 작업 설명과 선택적으로 시작 파일 또는 디렉터리를 지정합니다:

json

{
  "id": "task_001",
  "task_description": "Fix the bug in src/parser.py where empty input causes a crash",
  "project_dir": "/path/to/project",
  "start_file": "src/parser.py",
  "test_command": "python -m pytest tests/test_parser.py -v",
  "context_files": [
    "src/parser.py",
    "tests/test_parser.py"
  ]
}

필드	필수	설명
`id`	예	고유한 작업 식별자
`task_description`	예	에이전트가 해야 할 일
`project_dir`	예	프로젝트 디렉터리 경로
`start_file`	아니요	에이전트에게 처음 보여줄 파일
`test_command`	아니요	수정을 검증하는 명령
`context_files`	아니요	에이전트의 컨텍스트에 미리 로드할 파일

구성 참조

실시간 코딩 에이전트 관찰 작업의 전체 구성:

yaml

task_name: "Live Coding Agent Observation"
task_dir: "."
 
data_files:
  - "data/coding_tasks.jsonl"
 
item_properties:
  id_key: id
  text_key: task_description
 
agentic:
  enabled: true
  display_type: coding_trace
 
  coding_trace_display:
    diff_style: unified
    diff_context_lines: 3
    syntax_highlight: true
    show_line_numbers: true
    terminal_theme: dark
    file_tree:
      enabled: true
      position: left
      click_to_navigate: true
 
  live_agent:
    enabled: true
    backend: anthropic
    model: claude-sonnet-4-20250514
 
    anthropic:
      api_key: ${ANTHROPIC_API_KEY}
      max_tokens: 4096
      temperature: 0.2
 
    tools:
      - read_file
      - edit_file
      - write_file
      - bash
      - glob
      - grep
 
    max_steps: 100
    step_timeout_seconds: 120
 
    controls:
      pause_resume:
        enabled: true
        auto_pause_on_error: true
        keyboard_shortcut: "Space"
      send_instructions:
        enabled: true
        inject_as: system_message
        presets:
          - "Try a different approach"
          - "Read the error message carefully"
          - "Run the tests first"
      rollback:
        enabled: true
        require_confirmation: true
      branch:
        enabled: true
        max_branches: 5
        compare_view: true
 
    git_checkpoints:
      enabled: true
      branch_prefix: "potato-session"
      auto_commit: true
      cleanup_on_complete: false
 
annotation_schemes:
  # Per-step ratings during observation
  - annotation_type: per_turn_rating
    name: step_quality
    description: "Rate each agent step as you observe it"
    target: agentic_steps
    rating_type: radio
    labels:
      - "Good"
      - "Acceptable"
      - "Unnecessary"
      - "Incorrect"
 
  # Overall task completion after agent finishes
  - annotation_type: radio
    name: task_completion
    description: "Did the agent complete the task?"
    labels:
      - "Fully Complete"
      - "Partially Complete"
      - "Failed"
 
  # Branch comparison (when branching is used)
  - annotation_type: radio
    name: branch_preference
    description: "Which branch produced a better result?"
    labels:
      - "Branch A"
      - "Branch B"
      - "Both Equal"
      - "Both Failed"
 
  # Notes on the observation
  - annotation_type: text
    name: observation_notes
    description: "Describe what you observed and any interventions you made"
    label_requirement:
      required: false
 
output_annotation_dir: "output/"
output_annotation_format: "jsonl"

분기 트래젝토리 내보내기

어노테이터가 분기 및 재생을 사용하면, 출력에 전체 분기 트리가 포함됩니다. 이 형식은 비교 트래젝토리로부터 선호 모델과 프로세스 보상 모델을 학습하도록 설계되었습니다.

json

{
  "id": "task_001",
  "annotator": "observer_01",
  "root_branch": {
    "branch_id": "main",
    "steps": [
      {"step": 0, "type": "file_read", "file": "src/parser.py", "rating": "Good"},
      {"step": 1, "type": "edit", "file": "src/parser.py", "rating": "Incorrect"}
    ],
    "children": [
      {
        "branch_id": "branch_1",
        "branch_point": 1,
        "instruction": "Try a different approach -- use a try/except block instead",
        "steps": [
          {"step": 2, "type": "edit", "file": "src/parser.py", "rating": "Good"},
          {"step": 3, "type": "terminal", "command": "pytest", "rating": "Good"}
        ],
        "outcome": "Fully Complete",
        "children": []
      },
      {
        "branch_id": "branch_2",
        "branch_point": 1,
        "instruction": "Read the test file first to understand expected behavior",
        "steps": [
          {"step": 2, "type": "file_read", "file": "tests/test_parser.py", "rating": "Good"},
          {"step": 3, "type": "edit", "file": "src/parser.py", "rating": "Good"},
          {"step": 4, "type": "terminal", "command": "pytest", "rating": "Good"}
        ],
        "outcome": "Fully Complete",
        "children": []
      }
    ]
  },
  "branch_preference": "Branch B",
  "observation_notes": "Both branches solved the problem, but branch B produced cleaner code by reading the tests first."
}

선호 학습을 위해 분기 트래젝토리를 내보냅니다:

bash

# Export as DPO preference pairs from branch comparisons
python -m potato.export \
  -i output/ \
  -f branching_dpo \
  -o results/branch_preferences.jsonl
 
# Export full trajectory trees
python -m potato.export \
  -i output/ \
  -f trajectory_tree \
  -o results/trajectory_trees.jsonl

보안

실시간 에이전트는 작업 데이터에 지정된 프로젝트 디렉터리에서 실행됩니다. 해당 디렉터리 내의 파일을 읽고, 쓰고, 실행할 수 있는 접근 권한을 가집니다. 다음 보안 관행을 고려하세요:

샌드박싱: 신뢰할 수 없는 코드나 신뢰할 수 없는 에이전트 모델의 경우, Potato를 Docker 컨테이너나 VM 내에서 실행하세요. 에이전트는 임의의 셸 명령을 실행할 수 있으므로 격리가 중요합니다.
읽기 전용 모드: 에이전트가 코드를 수정하지 않고 분석만 하기를 원한다면 bash 및 write_file 도구를 비활성화하세요.
네트워크 제한: Docker의 --network none 플래그를 사용하여 에이전트가 네트워크 요청을 하지 못하도록 하세요.
리소스 제한: 폭주하는 에이전트를 방지하기 위해 max_steps와 step_timeout_seconds를 설정하세요.

yaml

# Restricted tool set for analysis-only tasks
live_agent:
  tools:
    - read_file
    - glob
    - grep
  # No edit_file, write_file, or bash

문제 해결

Ollama가 실행되고 있지 않음

text

Error: Connection refused at http://localhost:11434

Ollama 서버를 시작하세요:

bash

ollama serve

실행 중인지 확인하세요:

bash

ollama list

API 키 누락

text

Error: ANTHROPIC_API_KEY environment variable not set

환경 변수를 설정하세요:

bash

export ANTHROPIC_API_KEY="sk-ant-..."

또는 프로젝트의 .env 파일에 추가하세요. Potato는 .env 파일을 자동으로 로드합니다.

Git이 초기화되지 않음

text

Error: Project directory is not a git repository

체크포인트 시스템에는 git이 필요합니다. 프로젝트 디렉터리에 저장소를 초기화하세요:

bash

cd /path/to/project
git init
git add -A
git commit -m "Initial commit"

에이전트가 루프에 갇힘

에이전트가 같은 액션을 여러 번 반복하면 갇혔을 수 있습니다. Potato는 동일한 인수로 동일한 도구 호출이 3회 반복될 때 루프를 감지하고 에이전트를 자동으로 일시정지합니다. 이 임계값을 구성할 수 있습니다:

yaml

live_agent:
  loop_detection:
    enabled: true
    threshold: 3                     # pause after N identical consecutive steps
    action: pause                    # "pause" or "terminate"

세션 브랜치 정리

시간이 지나면 세션 브랜치가 쌓입니다. 주기적으로 정리하세요:

bash

# Remove branches older than 7 days
python -m potato.cleanup_sessions --older-than 7d
 
# Remove all session branches
python -m potato.cleanup_sessions --all
 
# Dry run (show what would be deleted)
python -m potato.cleanup_sessions --older-than 7d --dry-run

함께 보기

코딩 에이전트 어노테이션 -- 정적 코딩 에이전트 트레이스를 어노테이션합니다
프로세스 보상 어노테이션 -- PRM 학습을 위한 단계별 보상 신호를 수집합니다
코드 리뷰 어노테이션 -- 코드 변경에 대한 GitHub PR 스타일의 인라인 리뷰
에이전트 어노테이션 -- 범용 에이전트 트레이스 어노테이션

구현 세부 사항은 원본 문서를 참조하세요.