Skip to content
このページはまだお使いの言語に翻訳されていません。英語版を表示しています。

Web Agent Annotation

Review web-browsing agent traces in Potato with filmstrip navigation, SVG overlays (clicks, bounding boxes, mouse paths), and per-step annotation controls.

Web Agent Annotation

New in v2.4.0

Evaluating web-browsing AI agents requires seeing exactly what the agent saw, where it clicked, and whether each step made sense. Potato provides a dedicated Web Agent Trace Viewer with two modes: reviewing pre-recorded traces and creating new ones by browsing live websites.

Overview

ModeUse When
Review ModeYou have pre-recorded traces from WebArena, Mind2Web, Anthropic Computer Use, or Potato's own recorder
Creation ModeYou want annotators to browse websites and record new interaction traces

Review Mode

Annotators step through screenshots of an agent's browsing session. SVG overlays render click markers, bounding boxes, mouse paths, and scroll indicators on top of each screenshot.

Configuration

yaml
instance_display:
  fields:
    - key: steps
      type: web_agent_trace
      label: "Agent Browsing Trace"
      display_options:
        show_overlays: true
        show_filmstrip: true
        show_thought: true
        show_observation: true
        show_element_info: true
        screenshot_max_width: 800
        screenshot_max_height: 600
        filmstrip_size: 80

Data Format

Each instance needs a steps array with per-step data:

json
{
  "id": "trace_001",
  "task_description": "Find and add a blue wool sweater to cart",
  "site": "amazon.com",
  "steps": [
    {
      "step_index": 0,
      "screenshot_url": "screenshots/step_000.png",
      "action_type": "click",
      "element": {
        "tag": "input",
        "text": "Search",
        "bbox": [340, 45, 680, 75]
      },
      "coordinates": {"x": 510, "y": 60},
      "mouse_path": [[200, 300], [350, 200], [510, 60]],
      "thought": "I need to search for blue wool sweaters",
      "observation": "Search box is focused",
      "timestamp": 1.2,
      "viewport": {"width": 1280, "height": 720}
    }
  ]
}

Supported Action Types

ActionOverlay
clickRed circle with crosshair and pulse animation
typeYellow highlight on target element
scrollGreen directional arrow
hoverPurple circle
selectBlue bounding box
navigateNo overlay
waitNo overlay
doneNo overlay

Keyboard Shortcuts

KeyAction
/ Previous / Next step
1Toggle click marker overlays
2Toggle bounding box overlays
3Toggle mouse path overlays
4Toggle scroll indicators
AShow all overlays
NHide all overlays

Per-Step Annotations

Add per_step: true to any annotation scheme to create annotation controls that appear inline with each step:

yaml
annotation_schemes:
  - annotation_type: radio
    name: step_correctness
    per_step: true
    labels:
      - name: correct
      - name: incorrect
      - name: unnecessary
  - annotation_type: text
    name: step_notes
    per_step: true
    label: "Notes on this step"

Per-step annotations are stored as {scheme_name}_step_{index} (e.g., step_correctness_step_0).

Creation Mode

Annotators browse a live website inside the Potato interface and interactions are automatically recorded as an annotation-ready trace.

Configuration

yaml
instance_display:
  fields:
    - key: browsing_session
      type: web_agent_recorder
      display_options:
        start_url: "https://www.google.com"
        proxy_mode: auto
        record_mouse_path: true
        record_viewport: true
        screenshot_method: server
        max_steps: 50

Proxy Modes

ModeDescription
auto (default)Detects whether the target site allows iframe embedding and chooses the best mode automatically
iframeForces iframe proxy — works for ~90% of sites with under 100ms overhead
playwrightForces server-side Playwright — works for 100% of sites, requires the playwright package

To enable Playwright mode:

bash
pip install playwright
playwright install chromium

Converting Existing Traces

Use the trace converter CLI to normalize traces from other frameworks into Potato's format:

bash
# Convert from a specific format
python -m potato.trace_converter -i traces.json -f web_agent -o output.jsonl
 
# Auto-detect format
python -m potato.trace_converter -i traces.json --auto-detect -o output.jsonl

Supported input formats: WebArena, VisualWebArena, Mind2Web, Anthropic Computer Use, and raw Potato recordings.

Full Example

yaml
task_name: "Web Agent Evaluation"
task_dir: "."
 
data_files:
  - "traces.jsonl"
 
instance_display:
  fields:
    - key: task_description
      type: text
      label: "Task"
    - key: steps
      type: web_agent_trace
      label: "Agent Trace"
      display_options:
        show_overlays: true
        show_filmstrip: true
        show_thought: true
 
annotation_schemes:
  - annotation_type: radio
    name: task_success
    question: "Did the agent complete the task successfully?"
    labels:
      - name: "Yes"
      - name: "Partially"
      - name: "No"
  - annotation_type: radio
    name: step_correctness
    question: "Was this step correct?"
    per_step: true
    labels:
      - name: correct
      - name: incorrect
      - name: unnecessary
  - annotation_type: text
    name: error_description
    question: "Describe any errors in the agent's behavior"
 
output_annotation_dir: "output/"
output_annotation_format: "jsonl"

Running Example Projects

bash
# Review Mode
python potato/flask_server.py start examples/agent-traces/web-agent-review/config.yaml -p 8000
 
# Creation Mode
python potato/flask_server.py start examples/agent-traces/web-agent-creation/config.yaml -p 8000

API Reference

These REST endpoints are available when using Creation Mode:

EndpointMethodDescription
/api/web_agent/start_sessionPOSTBegin a recording session
/api/web_agent/save_stepPOSTSave a recorded interaction step
/api/web_agent/save_screenshotPOSTUpload a screenshot for a step
/api/web_agent/end_sessionPOSTEnd session and persist the trace
/api/web_agent/proxy/{url}GETProxy an external URL through the server
/api/web_agent/check_frameableGETTest whether a URL allows iframe embedding

Further Reading

For implementation details, see the source documentation.