# Web Agent Annotation

Source: https://www.potatoannotator.com/docs/features/web-agent-annotation

*New in v2.4.0*

Evaluating web-browsing AI agents requires seeing exactly what the agent saw, where it clicked, and whether each step made sense. Potato provides a dedicated **Web Agent Trace Viewer** with two modes: reviewing pre-recorded traces and creating new ones by browsing live websites.

## Overview

| Mode | Use When |
|------|----------|
| **Review Mode** | You have pre-recorded traces from WebArena, Mind2Web, Anthropic Computer Use, or Potato's own recorder |
| **Creation Mode** | You want annotators to browse websites and record new interaction traces |

## Review Mode

Annotators step through screenshots of an agent's browsing session. SVG overlays render click markers, bounding boxes, mouse paths, and scroll indicators on top of each screenshot.

### Configuration

```yaml
instance_display:
  fields:
    - key: steps
      type: web_agent_trace
      label: "Agent Browsing Trace"
      display_options:
        show_overlays: true
        show_filmstrip: true
        show_thought: true
        show_observation: true
        show_element_info: true
        screenshot_max_width: 800
        screenshot_max_height: 600
        filmstrip_size: 80
```

### Data Format

Each instance needs a `steps` array with per-step data:

```json
{
  "id": "trace_001",
  "task_description": "Find and add a blue wool sweater to cart",
  "site": "amazon.com",
  "steps": [
    {
      "step_index": 0,
      "screenshot_url": "screenshots/step_000.png",
      "action_type": "click",
      "element": {
        "tag": "input",
        "text": "Search",
        "bbox": [340, 45, 680, 75]
      },
      "coordinates": {"x": 510, "y": 60},
      "mouse_path": [[200, 300], [350, 200], [510, 60]],
      "thought": "I need to search for blue wool sweaters",
      "observation": "Search box is focused",
      "timestamp": 1.2,
      "viewport": {"width": 1280, "height": 720}
    }
  ]
}
```

### Supported Action Types

| Action | Overlay |
|--------|---------|
| `click` | Red circle with crosshair and pulse animation |
| `type` | Yellow highlight on target element |
| `scroll` | Green directional arrow |
| `hover` | Purple circle |
| `select` | Blue bounding box |
| `navigate` | No overlay |
| `wait` | No overlay |
| `done` | No overlay |

### Keyboard Shortcuts

| Key | Action |
|-----|--------|
| `←` / `→` | Previous / Next step |
| `1` | Toggle click marker overlays |
| `2` | Toggle bounding box overlays |
| `3` | Toggle mouse path overlays |
| `4` | Toggle scroll indicators |
| `A` | Show all overlays |
| `N` | Hide all overlays |

### Per-Step Annotations

Add `per_step: true` to any annotation scheme to create annotation controls that appear inline with each step:

```yaml
annotation_schemes:
  - annotation_type: radio
    name: step_correctness
    per_step: true
    labels:
      - name: correct
      - name: incorrect
      - name: unnecessary
  - annotation_type: text
    name: step_notes
    per_step: true
    label: "Notes on this step"
```

Per-step annotations are stored as `{scheme_name}_step_{index}` (e.g., `step_correctness_step_0`).

## Creation Mode

Annotators browse a live website inside the Potato interface and interactions are automatically recorded as an annotation-ready trace.

### Configuration

```yaml
instance_display:
  fields:
    - key: browsing_session
      type: web_agent_recorder
      display_options:
        start_url: "https://www.google.com"
        proxy_mode: auto
        record_mouse_path: true
        record_viewport: true
        screenshot_method: server
        max_steps: 50
```

### Proxy Modes

| Mode | Description |
|------|-------------|
| `auto` (default) | Detects whether the target site allows iframe embedding and chooses the best mode automatically |
| `iframe` | Forces iframe proxy — works for ~90% of sites with under 100ms overhead |
| `playwright` | Forces server-side Playwright — works for 100% of sites, requires the `playwright` package |

To enable Playwright mode:

```bash
pip install playwright
playwright install chromium
```

## Converting Existing Traces

Use the trace converter CLI to normalize traces from other frameworks into Potato's format:

```bash
# Convert from a specific format
python -m potato.trace_converter -i traces.json -f web_agent -o output.jsonl

# Auto-detect format
python -m potato.trace_converter -i traces.json --auto-detect -o output.jsonl
```

Supported input formats: **WebArena**, **VisualWebArena**, **Mind2Web**, **Anthropic Computer Use**, and raw Potato recordings.

## Full Example

```yaml
task_name: "Web Agent Evaluation"
task_dir: "."

data_files:
  - "traces.jsonl"

instance_display:
  fields:
    - key: task_description
      type: text
      label: "Task"
    - key: steps
      type: web_agent_trace
      label: "Agent Trace"
      display_options:
        show_overlays: true
        show_filmstrip: true
        show_thought: true

annotation_schemes:
  - annotation_type: radio
    name: task_success
    question: "Did the agent complete the task successfully?"
    labels:
      - name: "Yes"
      - name: "Partially"
      - name: "No"
  - annotation_type: radio
    name: step_correctness
    question: "Was this step correct?"
    per_step: true
    labels:
      - name: correct
      - name: incorrect
      - name: unnecessary
  - annotation_type: text
    name: error_description
    question: "Describe any errors in the agent's behavior"

output_annotation_dir: "output/"
output_annotation_format: "jsonl"
```

## Running Example Projects

```bash
# Review Mode
python potato/flask_server.py start examples/agent-traces/web-agent-review/config.yaml -p 8000

# Creation Mode
python potato/flask_server.py start examples/agent-traces/web-agent-creation/config.yaml -p 8000
```

## API Reference

These REST endpoints are available when using Creation Mode:

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/web_agent/start_session` | POST | Begin a recording session |
| `/api/web_agent/save_step` | POST | Save a recorded interaction step |
| `/api/web_agent/save_screenshot` | POST | Upload a screenshot for a step |
| `/api/web_agent/end_session` | POST | End session and persist the trace |
| `/api/web_agent/proxy/{url}` | GET | Proxy an external URL through the server |
| `/api/web_agent/check_frameable` | GET | Test whether a URL allows iframe embedding |

## Further Reading

- [Live Agent Evaluation](/docs/features/live-agent-evaluation) — evaluate agents running in real time
- [Agentic Annotation](/docs/features/agentic-annotation) — trace format converters and display types overview
- [Active Learning](/docs/features/active-learning) — prioritize the most informative traces for annotation

For implementation details, see the [source documentation](https://github.com/davidjurgens/potato/blob/master/docs/web_agent_annotation.md).
