Skip to content
Showcase/AndroidWorld: Mobile Agent Task Evaluation
advancedevaluation

AndroidWorld: Mobile Agent Task Evaluation

Evaluation of autonomous agents performing tasks in dynamic Android environments. Annotators assess task completion, identify interaction types and touch gestures, evaluate UI understanding, and describe mobile-specific issues across diverse Android apps.

Q1: Rate your experience12345Q2: Primary use case?ResearchIndustryEducationQ3: Additional feedback

Configuration Fileconfig.yaml

# AndroidWorld: Mobile Agent Task Evaluation
# Based on "AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents" (Rawles et al., ICLR 2025)
# Task: Evaluate agent performance on Android mobile tasks across diverse apps

annotation_task_name: "AndroidWorld Mobile Agent Evaluation"
task_dir: "."

# Data configuration
data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

# Display layout with mobile screenshots and touch gesture trajectory
html_layout: |
  <div class="android-container" style="max-width: 900px; margin: 0 auto;">
    <div class="task-section" style="background: #e8f5e9; padding: 15px; border-radius: 8px; margin-bottom: 15px; border-left: 4px solid #388e3c;">
      <h3 style="margin-top: 0; color: #388e3c;">Task Instruction</h3>
      <div style="font-size: 16px; font-weight: bold;">{{text}}</div>
    </div>
    <div class="device-info" style="display: flex; gap: 15px; margin-bottom: 15px;">
      <div style="background: #e3f2fd; padding: 10px 15px; border-radius: 8px; flex: 1;">
        <strong>App:</strong> {{app_name}}
      </div>
      <div style="background: #fff3e0; padding: 10px 15px; border-radius: 8px; flex: 1;">
        <strong>Android Version:</strong> {{android_version}}
      </div>
    </div>
    <div class="screenshot-gallery" style="background: #f5f5f5; padding: 15px; border-radius: 8px; margin-bottom: 15px; border: 2px solid #78909c;">
      <h3 style="margin-top: 0; color: #37474f;">Mobile Screen Sequence</h3>
      <div style="white-space: pre-wrap; font-size: 13px; line-height: 1.8; font-family: monospace; background: #fff; padding: 12px; border-radius: 6px;">{{screenshots}}</div>
    </div>
    <div class="trajectory-section" style="background: #fce4ec; padding: 15px; border-radius: 8px; margin-bottom: 15px; border: 2px solid #c62828;">
      <h3 style="margin-top: 0; color: #c62828;">Agent Trajectory (with Touch Gestures)</h3>
      <div style="white-space: pre-wrap; font-family: monospace; font-size: 13px; line-height: 1.6;">{{trajectory}}</div>
    </div>
    <div class="final-state-section" style="background: #e8eaf6; padding: 15px; border-radius: 8px; border: 2px solid #3f51b5;">
      <h3 style="margin-top: 0; color: #3f51b5;">Final Screen State</h3>
      <div style="white-space: pre-wrap; font-size: 14px;">{{final_state}}</div>
    </div>
  </div>

# Annotation schemes
annotation_schemes:
  # Task completion assessment
  - name: "task_completion"
    description: "Did the agent successfully complete the mobile task?"
    annotation_type: radio
    labels:
      - "Complete — task fully achieved"
      - "Partial — some steps done"
      - "Failed — task not completed"
    keyboard_shortcuts:
      "Complete — task fully achieved": "1"
      "Partial — some steps done": "2"
      "Failed — task not completed": "3"

  # Interaction types used
  - name: "interaction_types"
    description: "Which interaction types did the agent use during the trajectory?"
    annotation_type: multiselect
    labels:
      - "Tap"
      - "Long Press"
      - "Swipe"
      - "Scroll"
      - "Type Text"
      - "Back Button"
      - "Home Button"
      - "App Switch"
      - "Notification Interaction"

  # UI understanding assessment
  - name: "ui_understanding"
    description: "Did the agent correctly understand and navigate the mobile UI?"
    annotation_type: radio
    labels:
      - "Yes — correctly understood mobile UI"
      - "Mostly — minor UI misunderstandings"
      - "Partially — several UI errors"
      - "No — fundamental UI confusion"
    keyboard_shortcuts:
      "Yes — correctly understood mobile UI": "q"
      "Mostly — minor UI misunderstandings": "w"
      "Partially — several UI errors": "e"
      "No — fundamental UI confusion": "r"

  # Mobile-specific issues
  - name: "mobile_issues"
    description: "Describe any mobile-specific issues encountered by the agent."
    annotation_type: text
    required: false
    placeholder: "Describe mobile-specific issues: gesture failures, small tap targets, orientation problems, notification handling errors..."

# User configuration
allow_all_users: true

# Task assignment
instances_per_annotator: 50
annotation_per_instance: 2

Sample Datasample-data.json

[
  {
    "id": "aw_001",
    "text": "Add a new contact named 'Dr. Emily Watson' with phone number 555-0142 and email emily.watson@hospital.org to the Contacts app.",
    "app_name": "Contacts",
    "android_version": "Android 13",
    "screenshots": "Screen 1: Contacts app main view in portrait orientation. Top bar shows 'Contacts' title with search icon and three-dot menu. A floating action button (blue circle with '+' icon) is in the bottom-right corner. The contact list shows 12 existing contacts sorted alphabetically with circular avatar icons and names.\n\nScreen 2: New contact form is displayed. Fields visible from top to bottom: profile photo placeholder (camera icon in a circle), First name, Last name, Company, Phone (with phone icon), Email (with envelope icon). The keyboard is open at the bottom of the screen. 'First name' field is focused with a blinking cursor.\n\nScreen 3: Contact form partially filled. First name: 'Emily', Last name: 'Watson', Name prefix field added showing 'Dr.' Phone field shows '555-0142' with type label 'Mobile'. Email field shows 'emily.watson@hospital.org' with type label 'Work'. The 'Save' button (checkmark icon) is visible in the top-right corner.\n\nScreen 4: Contact detail view for 'Dr. Emily Watson'. The screen shows the contact card with name at top, phone number 555-0142 with call/message action buttons, and email emily.watson@hospital.org with email action button. A green banner at top briefly shows 'Contact saved'.",
    "trajectory": "Step 1: [TAP] Floating action button ('+' icon) in bottom-right corner\nStep 2: [TAP] Name prefix field, [TYPE] 'Dr.'\nStep 3: [TAP] First name field, [TYPE] 'Emily'\nStep 4: [TAP] Last name field, [TYPE] 'Watson'\nStep 5: [SCROLL] Down to reveal phone field\nStep 6: [TAP] Phone field, [TYPE] '555-0142'\nStep 7: [TAP] Email field, [TYPE] 'emily.watson@hospital.org'\nStep 8: [TAP] Email type label, select 'Work' from dropdown\nStep 9: [TAP] Save button (checkmark icon) in top-right\nStep 10: STOP — Task complete",
    "final_state": "Contact detail screen showing 'Dr. Emily Watson'. Phone: 555-0142 (Mobile). Email: emily.watson@hospital.org (Work). Contact successfully saved and visible in the contacts list. Save confirmation banner displayed."
  },
  {
    "id": "aw_002",
    "text": "Create a calendar event for 'Team Standup' recurring every weekday at 9:00 AM for 15 minutes in the Calendar app.",
    "app_name": "Google Calendar",
    "android_version": "Android 14",
    "screenshots": "Screen 1: Google Calendar app in month view. The current month is displayed with colored dots indicating existing events on various dates. The floating action button ('+' icon) is in the bottom-right corner. The navigation drawer icon (hamburger menu) is in the top-left. The current date is highlighted with a blue circle.\n\nScreen 2: New event creation screen. Title field at top shows 'Team Standup' typed in. Below: date picker showing today's date, start time set to '9:00 AM', end time set to '9:15 AM'. 'Add people' field, 'Add location' field, 'Add notification' option, and 'More options' link are visible. The 'Does not repeat' dropdown is shown below the date.\n\nScreen 3: Recurrence picker dialog is open, overlaying the event creation screen. Options listed: 'Does not repeat', 'Every day', 'Every week', 'Every month', 'Every year', 'Custom...'. The agent has tapped 'Custom...'. A custom recurrence dialog shows: Repeat every 1 week, with checkboxes for days — Mon, Tue, Wed, Thu, Fri are all checked (highlighted in blue). Sat and Sun are unchecked.\n\nScreen 4: Back on the event creation screen. The recurrence field now shows 'Every weekday (Mon-Fri)'. All other fields are populated. The 'Save' button is visible in the top-right corner. The event summary reads: Team Standup, 9:00 AM - 9:15 AM, Every weekday.",
    "trajectory": "Step 1: [TAP] Floating action button ('+') in bottom-right\nStep 2: [TAP] Title field, [TYPE] 'Team Standup'\nStep 3: [TAP] Start time field\nStep 4: [TAP] Hour selector to '9', minute to '00', AM selected\nStep 5: [TAP] End time field\nStep 6: [TAP] Hour selector to '9', minute to '15', AM selected\nStep 7: [TAP] 'Does not repeat' dropdown\nStep 8: [TAP] 'Custom...' option\nStep 9: [TAP] Checkboxes for Mon, Tue, Wed, Thu, Fri\nStep 10: [TAP] 'Done' to confirm recurrence\nStep 11: [TAP] 'Save' button in top-right\nStep 12: STOP — Task complete",
    "final_state": "Google Calendar returns to month view. The newly created 'Team Standup' event appears on all weekdays. Event details: 9:00 AM - 9:15 AM, recurring every weekday (Mon-Fri). No end date set for the recurrence."
  }
]

// ... and 6 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/agentic/androidworld-mobile-agent-eval
potato start config.yaml

Details

Annotation Types

radiomultiselecttext

Domain

Mobile AgentsAndroid AutomationGUI Agents

Use Cases

Agent EvaluationMobile Task AssessmentTouch Interaction Analysis

Tags

mobile-agentandroidgui-agenttouch-gestureapp-automationdynamic-benchmarktrajectory

Found an issue or want to improve this design?

Open an Issue