Web Agent Annotation
Review web-browsing agent traces in Potato with filmstrip navigation, SVG overlays (clicks, bounding boxes, mouse paths), and per-step annotation controls.
Web Agent Annotation
New in v2.4.0
Evaluating web-browsing AI agents requires seeing exactly what the agent saw, where it clicked, and whether each step made sense. Potato provides a dedicated Web Agent Trace Viewer with two modes: reviewing pre-recorded traces and creating new ones by browsing live websites.
Overview
| Mode | Use When |
|---|---|
| Review Mode | You have pre-recorded traces from WebArena, Mind2Web, Anthropic Computer Use, or Potato's own recorder |
| Creation Mode | You want annotators to browse websites and record new interaction traces |
Review Mode
Annotators step through screenshots of an agent's browsing session. SVG overlays render click markers, bounding boxes, mouse paths, and scroll indicators on top of each screenshot.
Configuration
instance_display:
fields:
- key: steps
type: web_agent_trace
label: "Agent Browsing Trace"
display_options:
show_overlays: true
show_filmstrip: true
show_thought: true
show_observation: true
show_element_info: true
screenshot_max_width: 800
screenshot_max_height: 600
filmstrip_size: 80Data Format
Each instance needs a steps array with per-step data:
{
"id": "trace_001",
"task_description": "Find and add a blue wool sweater to cart",
"site": "amazon.com",
"steps": [
{
"step_index": 0,
"screenshot_url": "screenshots/step_000.png",
"action_type": "click",
"element": {
"tag": "input",
"text": "Search",
"bbox": [340, 45, 680, 75]
},
"coordinates": {"x": 510, "y": 60},
"mouse_path": [[200, 300], [350, 200], [510, 60]],
"thought": "I need to search for blue wool sweaters",
"observation": "Search box is focused",
"timestamp": 1.2,
"viewport": {"width": 1280, "height": 720}
}
]
}Supported Action Types
| Action | Overlay |
|---|---|
click | Red circle with crosshair and pulse animation |
type | Yellow highlight on target element |
scroll | Green directional arrow |
hover | Purple circle |
select | Blue bounding box |
navigate | No overlay |
wait | No overlay |
done | No overlay |
Keyboard Shortcuts
| Key | Action |
|---|---|
← / → | Previous / Next step |
1 | Toggle click marker overlays |
2 | Toggle bounding box overlays |
3 | Toggle mouse path overlays |
4 | Toggle scroll indicators |
A | Show all overlays |
N | Hide all overlays |
Per-Step Annotations
Add per_step: true to any annotation scheme to create annotation controls that appear inline with each step:
annotation_schemes:
- annotation_type: radio
name: step_correctness
per_step: true
labels:
- name: correct
- name: incorrect
- name: unnecessary
- annotation_type: text
name: step_notes
per_step: true
label: "Notes on this step"Per-step annotations are stored as {scheme_name}_step_{index} (e.g., step_correctness_step_0).
Creation Mode
Annotators browse a live website inside the Potato interface and interactions are automatically recorded as an annotation-ready trace.
Configuration
instance_display:
fields:
- key: browsing_session
type: web_agent_recorder
display_options:
start_url: "https://www.google.com"
proxy_mode: auto
record_mouse_path: true
record_viewport: true
screenshot_method: server
max_steps: 50Proxy Modes
| Mode | Description |
|---|---|
auto (default) | Detects whether the target site allows iframe embedding and chooses the best mode automatically |
iframe | Forces iframe proxy — works for ~90% of sites with under 100ms overhead |
playwright | Forces server-side Playwright — works for 100% of sites, requires the playwright package |
To enable Playwright mode:
pip install playwright
playwright install chromiumConverting Existing Traces
Use the trace converter CLI to normalize traces from other frameworks into Potato's format:
# Convert from a specific format
python -m potato.trace_converter -i traces.json -f web_agent -o output.jsonl
# Auto-detect format
python -m potato.trace_converter -i traces.json --auto-detect -o output.jsonlSupported input formats: WebArena, VisualWebArena, Mind2Web, Anthropic Computer Use, and raw Potato recordings.
Full Example
task_name: "Web Agent Evaluation"
task_dir: "."
data_files:
- "traces.jsonl"
instance_display:
fields:
- key: task_description
type: text
label: "Task"
- key: steps
type: web_agent_trace
label: "Agent Trace"
display_options:
show_overlays: true
show_filmstrip: true
show_thought: true
annotation_schemes:
- annotation_type: radio
name: task_success
question: "Did the agent complete the task successfully?"
labels:
- name: "Yes"
- name: "Partially"
- name: "No"
- annotation_type: radio
name: step_correctness
question: "Was this step correct?"
per_step: true
labels:
- name: correct
- name: incorrect
- name: unnecessary
- annotation_type: text
name: error_description
question: "Describe any errors in the agent's behavior"
output_annotation_dir: "output/"
output_annotation_format: "jsonl"Running Example Projects
# Review Mode
python potato/flask_server.py start examples/agent-traces/web-agent-review/config.yaml -p 8000
# Creation Mode
python potato/flask_server.py start examples/agent-traces/web-agent-creation/config.yaml -p 8000API Reference
These REST endpoints are available when using Creation Mode:
| Endpoint | Method | Description |
|---|---|---|
/api/web_agent/start_session | POST | Begin a recording session |
/api/web_agent/save_step | POST | Save a recorded interaction step |
/api/web_agent/save_screenshot | POST | Upload a screenshot for a step |
/api/web_agent/end_session | POST | End session and persist the trace |
/api/web_agent/proxy/{url} | GET | Proxy an external URL through the server |
/api/web_agent/check_frameable | GET | Test whether a URL allows iframe embedding |
Further Reading
- Live Agent Evaluation — evaluate agents running in real time
- Agentic Annotation — trace format converters and display types overview
- Active Learning — prioritize the most informative traces for annotation
For implementation details, see the source documentation.