このページはまだお使いの言語に翻訳されていません。英語版を表示しています。

Web Agent Annotation

Review web-browsing agent traces in Potato with filmstrip navigation, SVG overlays (clicks, bounding boxes, mouse paths), and per-step annotation controls.

New in v2.4.0

Evaluating web-browsing AI agents requires seeing exactly what the agent saw, where it clicked, and whether each step made sense. Potato provides a dedicated Web Agent Trace Viewer with two modes: reviewing pre-recorded traces and creating new ones by browsing live websites.

Overview

Mode	Use When
Review Mode	You have pre-recorded traces from WebArena, Mind2Web, Anthropic Computer Use, or Potato's own recorder
Creation Mode	You want annotators to browse websites and record new interaction traces

Review Mode

Annotators step through screenshots of an agent's browsing session. SVG overlays render click markers, bounding boxes, mouse paths, and scroll indicators on top of each screenshot.

Configuration

yaml

instance_display:
  fields:
    - key: steps
      type: web_agent_trace
      label: "Agent Browsing Trace"
      display_options:
        show_overlays: true
        show_filmstrip: true
        show_thought: true
        show_observation: true
        show_element_info: true
        screenshot_max_width: 800
        screenshot_max_height: 600
        filmstrip_size: 80

Data Format

Each instance needs a steps array with per-step data:

json

{
  "id": "trace_001",
  "task_description": "Find and add a blue wool sweater to cart",
  "site": "amazon.com",
  "steps": [
    {
      "step_index": 0,
      "screenshot_url": "screenshots/step_000.png",
      "action_type": "click",
      "element": {
        "tag": "input",
        "text": "Search",
        "bbox": [340, 45, 680, 75]
      },
      "coordinates": {"x": 510, "y": 60},
      "mouse_path": [[200, 300], [350, 200], [510, 60]],
      "thought": "I need to search for blue wool sweaters",
      "observation": "Search box is focused",
      "timestamp": 1.2,
      "viewport": {"width": 1280, "height": 720}
    }
  ]
}

Supported Action Types

Action	Overlay
`click`	Red circle with crosshair and pulse animation
`type`	Yellow highlight on target element
`scroll`	Green directional arrow
`hover`	Purple circle
`select`	Blue bounding box
`navigate`	No overlay
`wait`	No overlay
`done`	No overlay

Keyboard Shortcuts

Key	Action
`←` / `→`	Previous / Next step
`1`	Toggle click marker overlays
`2`	Toggle bounding box overlays
`3`	Toggle mouse path overlays
`4`	Toggle scroll indicators
`A`	Show all overlays
`N`	Hide all overlays

Per-Step Annotations

Add per_step: true to any annotation scheme to create annotation controls that appear inline with each step:

yaml

annotation_schemes:
  - annotation_type: radio
    name: step_correctness
    per_step: true
    labels:
      - name: correct
      - name: incorrect
      - name: unnecessary
  - annotation_type: text
    name: step_notes
    per_step: true
    label: "Notes on this step"

Per-step annotations are stored as {scheme_name}_step_{index} (e.g., step_correctness_step_0).

Creation Mode

Annotators browse a live website inside the Potato interface and interactions are automatically recorded as an annotation-ready trace.

Configuration

yaml

instance_display:
  fields:
    - key: browsing_session
      type: web_agent_recorder
      display_options:
        start_url: "https://www.google.com"
        proxy_mode: auto
        record_mouse_path: true
        record_viewport: true
        screenshot_method: server
        max_steps: 50

Proxy Modes

Mode	Description
`auto` (default)	Detects whether the target site allows iframe embedding and chooses the best mode automatically
`iframe`	Forces iframe proxy — works for ~90% of sites with under 100ms overhead
`playwright`	Forces server-side Playwright — works for 100% of sites, requires the `playwright` package

To enable Playwright mode:

bash

pip install playwright
playwright install chromium

Converting Existing Traces

Use the trace converter CLI to normalize traces from other frameworks into Potato's format:

bash

# Convert from a specific format
python -m potato.trace_converter -i traces.json -f web_agent -o output.jsonl
 
# Auto-detect format
python -m potato.trace_converter -i traces.json --auto-detect -o output.jsonl

Supported input formats: WebArena, VisualWebArena, Mind2Web, Anthropic Computer Use, and raw Potato recordings.

Full Example

yaml

task_name: "Web Agent Evaluation"
task_dir: "."
 
data_files:
  - "traces.jsonl"
 
instance_display:
  fields:
    - key: task_description
      type: text
      label: "Task"
    - key: steps
      type: web_agent_trace
      label: "Agent Trace"
      display_options:
        show_overlays: true
        show_filmstrip: true
        show_thought: true
 
annotation_schemes:
  - annotation_type: radio
    name: task_success
    question: "Did the agent complete the task successfully?"
    labels:
      - name: "Yes"
      - name: "Partially"
      - name: "No"
  - annotation_type: radio
    name: step_correctness
    question: "Was this step correct?"
    per_step: true
    labels:
      - name: correct
      - name: incorrect
      - name: unnecessary
  - annotation_type: text
    name: error_description
    question: "Describe any errors in the agent's behavior"
 
output_annotation_dir: "output/"
output_annotation_format: "jsonl"

Running Example Projects

bash

# Review Mode
python potato/flask_server.py start examples/agent-traces/web-agent-review/config.yaml -p 8000
 
# Creation Mode
python potato/flask_server.py start examples/agent-traces/web-agent-creation/config.yaml -p 8000

API Reference

These REST endpoints are available when using Creation Mode:

Endpoint	Method	Description
`/api/web_agent/start_session`	POST	Begin a recording session
`/api/web_agent/save_step`	POST	Save a recorded interaction step
`/api/web_agent/save_screenshot`	POST	Upload a screenshot for a step
`/api/web_agent/end_session`	POST	End session and persist the trace
`/api/web_agent/proxy/{url}`	GET	Proxy an external URL through the server
`/api/web_agent/check_frameable`	GET	Test whether a URL allows iframe embedding

Web Agent Annotation

Overview

Review Mode

Configuration

Data Format

Supported Action Types

Keyboard Shortcuts

Per-Step Annotations

Creation Mode

Configuration

Proxy Modes

Converting Existing Traces

Full Example

Running Example Projects

API Reference

Further Reading