Mind2Web: Web Agent Task Annotation

Web agent task annotation. Annotators describe web navigation tasks, identify target HTML elements for each action step, and label the action type (click, type, select) needed to complete the task.

Configuration Fileconfig.yaml

# Mind2Web: Web Agent Task Annotation
# Based on "Mind2Web: Towards a Generalist Agent for the Web" (Deng et al., NeurIPS 2023)
# Task: Annotate web navigation actions - identify target elements and action types

annotation_task_name: "Mind2Web Web Agent Task Annotation"
task_dir: "."

# Data configuration
data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

# Display layout showing task description, URL, and HTML snippet
html_layout: |
  <div class="mind2web-container">
    <div class="task-section" style="background: #e8f5e9; padding: 15px; border-radius: 8px; margin-bottom: 15px;">
      <h3 style="margin-top: 0;">Task Description:</h3>
      <div class="task-text" style="font-size: 16px; font-weight: bold;">{{text}}</div>
    </div>
    <div class="url-section" style="background: #e8eaf6; padding: 10px; border-radius: 8px; margin-bottom: 15px;">
      <strong>Website URL:</strong> {{url}}
    </div>
    <div class="action-section" style="background: #fff3e0; padding: 10px; border-radius: 8px; margin-bottom: 15px;">
      <strong>Action Sequence Context:</strong> {{action_sequence}}
    </div>
    <div class="html-section" style="background: #f5f5f5; padding: 15px; border-radius: 8px; border: 2px solid #424242;">
      <h3 style="margin-top: 0;">HTML Snippet (identify the target element):</h3>
      <pre style="white-space: pre-wrap; font-family: monospace; font-size: 13px; line-height: 1.5; overflow-x: auto;">{{html_snippet}}</pre>
    </div>
  </div>

# Annotation schemes
annotation_schemes:
  # Span annotation for target element identification in HTML
  - name: "target_element"
    description: "Highlight the target HTML element that the agent should interact with to perform the action."
    annotation_type: span
    labels:
      - "Target Element"
      - "Secondary Element"
      - "Context Element"
    label_colors:
      "Target Element": "#f44336"
      "Secondary Element": "#ff9800"
      "Context Element": "#9e9e9e"

  # Action type
  - name: "action_type"
    description: "What type of action should the agent perform on the target element?"
    annotation_type: radio
    labels:
      - "Click"
      - "Type"
      - "Select (dropdown)"
      - "Scroll"
      - "Navigate"
      - "Hover"
    keyboard_shortcuts:
      "Click": "1"
      "Type": "2"
      "Select (dropdown)": "3"
      "Scroll": "4"
      "Navigate": "5"
      "Hover": "6"

  # Action parameter (e.g., text to type, option to select)
  - name: "action_value"
    description: "If the action is Type or Select, what value should be entered or selected? Leave blank for Click/Scroll/Navigate."
    annotation_type: text
    required: false
    placeholder: "e.g., 'New York' for a search field, or 'Economy' for a dropdown"

  # Step correctness
  - name: "step_completeness"
    description: "Does this action step make progress toward completing the task?"
    annotation_type: radio
    labels:
      - "Yes - directly advances the task"
      - "Partially - indirect but useful"
      - "No - does not help complete the task"
    keyboard_shortcuts:
      "Yes - directly advances the task": "a"
      "Partially - indirect but useful": "s"
      "No - does not help complete the task": "d"

# User configuration
allow_all_users: true

# Task assignment
instances_per_annotator: 100
annotation_per_instance: 2

Sample Datasample-data.json

[
  {
    "id": "m2w_001",
    "text": "Book a one-way flight from New York to Los Angeles for December 15th",
    "html_snippet": "<div class=\"search-form\">\n  <div class=\"trip-type\">\n    <label><input type=\"radio\" name=\"trip\" value=\"roundtrip\" checked> Round Trip</label>\n    <label><input type=\"radio\" name=\"trip\" value=\"oneway\"> One Way</label>\n    <label><input type=\"radio\" name=\"trip\" value=\"multi\"> Multi-City</label>\n  </div>\n  <div class=\"search-fields\">\n    <input type=\"text\" id=\"origin\" placeholder=\"From\" value=\"\">\n    <input type=\"text\" id=\"destination\" placeholder=\"To\" value=\"\">\n    <input type=\"date\" id=\"depart-date\" placeholder=\"Departure Date\">\n    <button class=\"search-btn\">Search Flights</button>\n  </div>\n</div>",
    "url": "https://www.example-airline.com/flights",
    "action_sequence": "Step 1 of 4: Select trip type"
  },
  {
    "id": "m2w_002",
    "text": "Find a hotel in San Francisco with a pool for January 5-8 under $200/night",
    "html_snippet": "<div class=\"hotel-search\">\n  <input type=\"text\" id=\"location\" placeholder=\"Where are you going?\" class=\"location-input\">\n  <div class=\"date-picker\">\n    <input type=\"date\" id=\"checkin\" placeholder=\"Check-in\">\n    <input type=\"date\" id=\"checkout\" placeholder=\"Check-out\">\n  </div>\n  <div class=\"guests\">\n    <select id=\"rooms\">\n      <option value=\"1\">1 Room</option>\n      <option value=\"2\">2 Rooms</option>\n    </select>\n    <select id=\"adults\">\n      <option value=\"1\">1 Adult</option>\n      <option value=\"2\">2 Adults</option>\n    </select>\n  </div>\n  <button id=\"search-hotels\" class=\"btn-primary\">Search Hotels</button>\n</div>",
    "url": "https://www.example-hotels.com/search",
    "action_sequence": "Step 1 of 5: Enter destination"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/agentic/mind2web-web-agent-tasks
potato start config.yaml

Details

Annotation Types

spantextradio

Domain

Web AgentsHCIAutomation

Use Cases

Web NavigationTask GroundingAgent Training

Related Designs

Biomedical Entity Linking (MedMentions)

Entity mention detection and UMLS concept linking for biomedical text based on MedMentions. Annotators identify biomedical entity mentions in PubMed abstracts and link them to UMLS Concept Unique Identifiers (CUIs), supporting large-scale biomedical knowledge base construction and clinical NLP.

radiospan

Check-COVID: Fact-Checking COVID-19 News Claims

Fact-checking COVID-19 news claims. Annotators verify claims against evidence, identify supporting/refuting spans, and provide verdicts with explanations. Based on the Check-COVID dataset targeting misinformation during the pandemic.