코드 리뷰 어노테이션

GitHub PR 스타일의 인라인 diff 댓글, 파일 단위 정확성 평가, 승인 또는 거부 판정으로 AI 코딩 에이전트의 출력을 검토하여 코드 품질을 평가합니다.

v2.4.0의 새 기능

AI 코딩 에이전트가 생성한 코드 변경을 평가하려면 단순한 합격/불합격 판정 이상의 것이 필요합니다. 연구자와 엔지니어링 팀은 여러 단위에서 코드 품질을 평가해야 합니다. 개별 라인에는 버그나 스타일 위반이 있을 수 있고, 파일 전체가 올바르게 수정되었거나 불필요할 수 있으며, 전체 변경 집합이 문제는 해결하지만 기술 부채를 도입할 수도 있습니다. 이는 사람 코드 리뷰어가 GitHub에서 풀 리퀘스트를 검토할 때 따르는 워크플로와 동일합니다.

Potato의 코드 리뷰 어노테이션 모드는 GitHub PR 리뷰 경험을 에이전트 평가에 도입합니다. 어노테이터는 에이전트가 수정한 모든 파일의 통합 diff를 봅니다. 어떤 diff 라인이든 클릭하여 카테고리 태그가 달린 인라인 댓글을 남길 수 있습니다. 각 파일은 정확성과 품질 평가를 받습니다. 어노테이터는 승인, 변경 요청, 댓글만 작성 중에서 최종 판정을 내립니다. 이 모든 것이 구조화된 어노테이션 데이터로 수집되어 코드 품질 모델 학습에 바로 사용할 수 있습니다.

인라인 댓글

어노테이터는 diff의 어떤 라인이든 클릭하여 인라인 댓글 양식을 엽니다. 각 댓글에는 카테고리, 심각도, 자유 텍스트 내용이 있습니다. 댓글은 GitHub PR 리뷰 댓글과 똑같이 특정 라인에 고정되어 표시됩니다.

댓글 카테고리

기본 댓글 카테고리는 가장 일반적인 코드 리뷰 피드백 유형을 다룹니다.

카테고리	설명
`bug`	기능적 버그 -- 코드가 올바르게 동작하지 않음
`logic`	논리 오류 -- 문법은 유효하더라도 접근 방식에 결함이 있음
`security`	보안 취약점 또는 안전하지 않은 관행
`performance`	성능 문제 -- 불필요한 연산, 메모리 누수 등
`style`	스타일 위반 -- 네이밍, 서식, 관용적 사용
`suggestion`	더 나은 대안적 접근 방식
`question`	설명 필요 -- 리뷰어가 의도를 확신하지 못함
`praise`	긍정적 피드백 -- 에이전트가 잘한 부분

설정

yaml

annotation_schemes:
  - name: inline_comments
    annotation_type: code_review_comments
    description: "Click any diff line to add an inline comment"
 
    inline_comments:
      # Comment categories
      categories:
        - value: bug
          display: "Bug"
          color: "#ef4444"
          icon: "bug"
        - value: logic
          display: "Logic Error"
          color: "#f97316"
          icon: "alert-triangle"
        - value: security
          display: "Security"
          color: "#dc2626"
          icon: "shield-alert"
        - value: performance
          display: "Performance"
          color: "#eab308"
          icon: "zap"
        - value: style
          display: "Style"
          color: "#6b7280"
          icon: "palette"
        - value: suggestion
          display: "Suggestion"
          color: "#3b82f6"
          icon: "lightbulb"
        - value: question
          display: "Question"
          color: "#8b5cf6"
          icon: "help-circle"
        - value: praise
          display: "Praise"
          color: "#22c55e"
          icon: "thumbs-up"
 
      # Severity levels (optional)
      severity:
        enabled: true
        levels:
          - value: critical
            display: "Critical"
          - value: major
            display: "Major"
          - value: minor
            display: "Minor"
          - value: nit
            display: "Nit"
 
      # Behavior
      require_category: true
      require_severity: false
      allow_multi_line: true       # comments can span a range of lines
      allow_suggestions: true      # annotator can write suggested replacement code
      min_comments: 0              # minimum comments required before submission

제안된 코드 변경

allow_suggestions가 활성화되면 어노테이터는 댓글을 다는 코드 블록에 대한 대체 코드를 작성할 수 있습니다. 이는 GitHub의 "suggestion" 기능을 반영한 것입니다. 제안은 댓글 아래 코드 블록에 표시되며 코드 수정 모델 학습에 사용할 수 있습니다.

yaml

# In inline comment output:
{
  "file": "src/parser.py",
  "line_start": 42,
  "line_end": 44,
  "category": "bug",
  "severity": "critical",
  "comment": "Off-by-one error: range should be inclusive of end",
  "suggestion": "for i in range(start, end + 1):\n    process(tokens[i])"
}

파일 단위 평가

에이전트가 수정한 각 파일은 두 가지 독립적인 평가를 받습니다. 정확성과 코드 품질입니다.

설정

yaml

annotation_schemes:
  - name: file_ratings
    annotation_type: code_review_file_ratings
    description: "Rate each modified file"
 
    file_ratings:
      dimensions:
        - name: correctness
          display: "Correctness"
          description: "Are the changes to this file functionally correct?"
          scale:
            min: 1
            max: 5
            labels:
              1: "Broken -- introduces bugs or breaks existing functionality"
              2: "Mostly broken -- significant functional issues"
              3: "Partially correct -- works but has edge cases or minor bugs"
              4: "Mostly correct -- minor issues only"
              5: "Fully correct -- changes work as intended"
 
        - name: quality
          display: "Code Quality"
          description: "How well-written are the changes to this file?"
          scale:
            min: 1
            max: 5
            labels:
              1: "Very poor -- unreadable, no structure"
              2: "Poor -- hard to follow, inconsistent style"
              3: "Acceptable -- works but could be cleaner"
              4: "Good -- clean, idiomatic, well-structured"
              5: "Excellent -- exemplary code, would merge as-is"
 
      # Files to rate
      include_unchanged: false     # only rate files the agent modified
      include_new_files: true      # include files the agent created
      include_deleted_files: true  # include files the agent deleted
 
      # Behavior
      require_all_files: true      # must rate every modified file

출력 형식

json

{
  "file_ratings": {
    "src/parser.py": {
      "correctness": 4,
      "quality": 3
    },
    "tests/test_parser.py": {
      "correctness": 5,
      "quality": 4
    },
    "src/utils.py": {
      "correctness": 2,
      "quality": 2
    }
  }
}

종합 판정

모든 파일을 검토하고 인라인 댓글을 남긴 후, 어노테이터는 전체 변경 집합에 대한 종합 판정을 내립니다.

설정

yaml

annotation_schemes:
  - name: verdict
    annotation_type: code_review_verdict
    description: "Give an overall verdict on the code changes"
 
    verdict:
      options:
        - value: approve
          display: "Approve"
          description: "Changes are correct and ready to merge"
          color: "#22c55e"
          icon: "check-circle"
        - value: request_changes
          display: "Request Changes"
          description: "Changes need fixes before merging"
          color: "#ef4444"
          icon: "x-circle"
        - value: comment_only
          display: "Comment Only"
          description: "Leaving feedback without a verdict"
          color: "#6b7280"
          icon: "message-circle"
 
      # Optional summary text
      require_summary: true
      summary_placeholder: "Summarize your review..."
      summary_min_length: 20

설정 참조

다음은 코드 리뷰 어노테이션 작업을 위한 완전한 설정입니다.

yaml

task_name: "Coding Agent Code Review"
task_dir: "."
 
data_files:
  - "data/coding_traces.jsonl"
 
item_properties:
  id_key: id
  text_key: task_description
 
agentic:
  enabled: true
  trace_converter: claude_code
  display_type: coding_trace
 
  coding_trace_display:
    diff_style: unified
    diff_context_lines: 5
    syntax_highlight: true
    show_line_numbers: true
    terminal_theme: dark
    file_tree:
      enabled: true
      position: left
      show_operation_icons: true
      click_to_navigate: true
 
annotation_schemes:
  # Inline comments on diff lines
  - name: inline_comments
    annotation_type: code_review_comments
    inline_comments:
      categories:
        - { value: bug, display: "Bug", color: "#ef4444" }
        - { value: logic, display: "Logic Error", color: "#f97316" }
        - { value: security, display: "Security", color: "#dc2626" }
        - { value: performance, display: "Performance", color: "#eab308" }
        - { value: style, display: "Style", color: "#6b7280" }
        - { value: suggestion, display: "Suggestion", color: "#3b82f6" }
        - { value: question, display: "Question", color: "#8b5cf6" }
        - { value: praise, display: "Praise", color: "#22c55e" }
      severity:
        enabled: true
        levels:
          - { value: critical, display: "Critical" }
          - { value: major, display: "Major" }
          - { value: minor, display: "Minor" }
          - { value: nit, display: "Nit" }
      require_category: true
      allow_multi_line: true
      allow_suggestions: true
 
  # File-level correctness and quality
  - name: file_ratings
    annotation_type: code_review_file_ratings
    file_ratings:
      dimensions:
        - name: correctness
          display: "Correctness"
          scale: { min: 1, max: 5 }
        - name: quality
          display: "Code Quality"
          scale: { min: 1, max: 5 }
      require_all_files: true
 
  # Overall verdict
  - name: verdict
    annotation_type: code_review_verdict
    verdict:
      options:
        - { value: approve, display: "Approve", color: "#22c55e" }
        - { value: request_changes, display: "Request Changes", color: "#ef4444" }
        - { value: comment_only, display: "Comment Only", color: "#6b7280" }
      require_summary: true
      summary_min_length: 20
 
output_annotation_dir: "output/"
output_annotation_format: "jsonl"

어노테이션 워크플로

다음은 코드 리뷰 어노테이션 작업을 완료할 때 어노테이터가 보고 수행하는 과정입니다.

작업 개요: 작업 설명이 상단에 표시되어 에이전트가 무엇을 하도록 요청받았는지 보여줍니다(예: "test_parser.py에서 실패하는 테스트 수정").
파일 트리 탐색: 왼쪽 사이드바에 에이전트가 건드린 모든 파일이 표시됩니다. 파일은 색상으로 구분됩니다. 새 파일은 초록색, 수정된 파일은 노란색, 삭제된 파일은 빨간색입니다.
diff 검토: 메인 패널에는 각 파일의 통합 diff가 표시됩니다. 어노테이터는 diff를 스크롤하며 각 변경 내용을 읽습니다.
인라인 댓글 추가: 라인 번호를 클릭하면 댓글 양식이 열립니다. 어노테이터는 카테고리(bug, suggestion 등)를 선택하고, 선택적으로 심각도를 고르며, 댓글을 작성하고, 선택적으로 코드 제안을 추가합니다.
파일 평가: 각 파일의 diff를 검토한 후, 어노테이터는 각 파일 diff 아래의 평가 위젯을 사용하여 정확성(1-5)과 코드 품질(1-5)을 평가합니다.
종합 판정: 하단에서 어노테이터는 판정(승인, 변경 요청, 댓글만)을 선택하고 검토 요약을 작성합니다.
제출: 어노테이터는 "Submit"을 클릭하여 모든 인라인 댓글, 파일 평가, 판정을 하나의 어노테이션 레코드로 저장합니다.

데이터 형식

단일 코드 리뷰 어노테이션의 전체 출력입니다.

json

{
  "id": "trace_042",
  "annotator": "reviewer_01",
  "timestamp": "2025-01-15T14:30:00Z",
  "annotations": {
    "inline_comments": [
      {
        "file": "src/parser.py",
        "line_start": 42,
        "line_end": 42,
        "category": "bug",
        "severity": "critical",
        "comment": "This will throw IndexError when tokens list is empty",
        "suggestion": "if tokens:\n    return tokens[0]\nreturn None"
      },
      {
        "file": "src/parser.py",
        "line_start": 15,
        "line_end": 15,
        "category": "style",
        "severity": "nit",
        "comment": "Variable name 'x' is not descriptive"
      },
      {
        "file": "tests/test_parser.py",
        "line_start": 28,
        "line_end": 30,
        "category": "praise",
        "comment": "Good edge case coverage for empty input"
      }
    ],
    "file_ratings": {
      "src/parser.py": { "correctness": 3, "quality": 2 },
      "tests/test_parser.py": { "correctness": 5, "quality": 4 }
    },
    "verdict": {
      "decision": "request_changes",
      "summary": "The core fix is on the right track but has an edge case bug with empty input. The test coverage is good. Fix the IndexError and clean up variable naming."
    }
  }
}

내보내기

코드 리뷰 어노테이션은 여러 형식으로 내보낼 수 있습니다.

bash

# Export as structured code review JSON
python -m potato.export \
  -i output/ \
  -f code_review \
  -o results/reviews.jsonl
 
# Export inline comments only (for training code comment models)
python -m potato.export \
  -i output/ \
  -f code_review_comments \
  -o results/comments.jsonl
 
# Export file ratings as a CSV (for analysis)
python -m potato.export \
  -i output/ \
  -f code_review_file_ratings \
  -o results/file_ratings.csv
 
# Export verdict distribution summary
python -m potato.export \
  -i output/ \
  -f code_review_verdicts \
  -o results/verdicts.json

code_review_comments 형식은 코드 리뷰 댓글을 생성하거나 코드 문제의 위치와 카테고리를 예측하는 모델을 학습하는 데 특히 유용합니다.

함께 보기

코딩 에이전트 어노테이션 -- diff 렌더링과 파일 트리로 코딩 에이전트 트레이스 표시
프로세스 보상 어노테이션 -- PRM 학습을 위한 단계별 보상 신호
실시간 코딩 에이전트 관찰 -- 코딩 에이전트를 실시간으로 관찰하고 상호작용
에이전트 어노테이션 -- 범용 에이전트 트레이스 어노테이션
내보내기 형식 -- 지원되는 모든 내보내기 형식

구현 세부 사항은 원본 문서를 참고하세요.