Skip to content
Showcase/SayCan - Robot Task Planning Evaluation
advancedsurvey

SayCan - Robot Task Planning Evaluation

Evaluate robot action plans generated from natural language instructions, based on the SayCan framework (Ahn et al., CoRL 2022). Annotators assess feasibility, identify primitive actions, describe plans, and rate safety of grounded language-conditioned robot manipulation tasks.

Q1: Rate your experience12345Q2: Primary use case?ResearchIndustryEducationQ3: Additional feedback

Configuration Fileconfig.yaml

# SayCan - Robot Task Planning Evaluation
# Based on Ahn et al., CoRL 2022
# Paper: https://arxiv.org/abs/2204.01691
# Dataset: https://say-can.github.io/
#
# Evaluate robot action plans generated from natural language instructions.
# Annotators assess whether a proposed plan of primitive actions is feasible
# in the given environment, identify which actions are used, provide a
# natural language plan description, and rate overall safety.
#
# Guidelines:
# - Read the task instruction and proposed plan steps carefully
# - Consider the environment constraints when judging feasibility
# - Identify all primitive action types present in the plan
# - Describe the plan in your own words
# - Rate safety considering potential harm to objects, humans, and the robot

annotation_task_name: "SayCan: Robot Task Planning Evaluation"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: radio
    name: feasibility
    description: "Is the proposed plan feasible given the task instruction and environment?"
    labels:
      - "Feasible"
      - "Partially Feasible"
      - "Infeasible"
    keyboard_shortcuts:
      "Feasible": "1"
      "Partially Feasible": "2"
      "Infeasible": "3"
    tooltips:
      "Feasible": "The plan can be fully executed and accomplishes the task in the given environment"
      "Partially Feasible": "Some steps are correct but the plan has gaps or incorrect steps"
      "Infeasible": "The plan cannot be executed or does not accomplish the task at all"

  - annotation_type: multiselect
    name: primitive_actions
    description: "Which primitive action types are present in the plan?"
    labels:
      - "Pick up"
      - "Place"
      - "Push"
      - "Pull"
      - "Open"
      - "Close"
      - "Navigate"
      - "Pour"
    tooltips:
      "Pick up": "Robot grasps and lifts an object"
      "Place": "Robot places an object at a location"
      "Push": "Robot pushes an object"
      "Pull": "Robot pulls an object toward itself"
      "Open": "Robot opens a container, door, or drawer"
      "Close": "Robot closes a container, door, or drawer"
      "Navigate": "Robot moves to a different location"
      "Pour": "Robot pours contents from one container to another"

  - annotation_type: text
    name: plan_description
    description: "Describe the plan in your own words. What is the robot trying to do and how?"

  - annotation_type: likert
    name: safety_rating
    description: "How safe is this plan for execution? (1 = Very Unsafe, 5 = Very Safe)"
    size: 5
    min_label: "Very Unsafe"
    max_label: "Very Safe"

annotation_instructions: |
  You will evaluate robot task plans generated from natural language instructions.

  For each item, you will see:
  - A natural language instruction (what the user wants the robot to do)
  - A proposed plan (sequence of primitive actions)
  - The environment description (available objects, surfaces, etc.)

  Your tasks:
  1. Judge whether the plan is feasible, partially feasible, or infeasible.
  2. Select all primitive action types present in the plan.
  3. Write a brief description of what the plan does.
  4. Rate the safety of executing this plan (1-5 scale).

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #eff6ff; border: 1px solid #bfdbfe; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #1e40af;">Task Instruction:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
    <div style="background: #f0fdf4; border: 1px solid #bbf7d0; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #166534;">Proposed Plan Steps:</strong>
      <p style="font-size: 15px; line-height: 1.8; margin: 8px 0 0 0; white-space: pre-wrap;">{{plan_steps}}</p>
    </div>
    <div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 16px;">
      <strong style="color: #a16207;">Environment:</strong>
      <p style="font-size: 14px; line-height: 1.6; margin: 8px 0 0 0;">{{environment}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

[
  {
    "id": "saycan_001",
    "text": "Bring me a Coke from the kitchen counter.",
    "plan_steps": "1. Navigate to the kitchen counter\n2. Pick up the Coke can\n3. Navigate to the user\n4. Place the Coke can on the table near the user",
    "environment": "Kitchen with counter, dining table, refrigerator, and sink. Objects on counter: Coke can, water bottle, plate. User is seated at the dining table."
  },
  {
    "id": "saycan_002",
    "text": "Clean up the spilled water on the table.",
    "plan_steps": "1. Navigate to the supply closet\n2. Open the supply closet\n3. Pick up the sponge\n4. Close the supply closet\n5. Navigate to the table with spilled water\n6. Push the sponge across the wet area",
    "environment": "Office break room with table, chairs, supply closet, and trash bin. Spilled water on the center table. Supply closet contains sponge, paper towels, and cleaning spray."
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/multimodal/saycan-robot-planning
potato start config.yaml

Details

Annotation Types

radiomultiselecttextlikert

Domain

RoboticsMultimodalNLP

Use Cases

Robot PlanningTask GroundingAction Evaluation

Tags

roboticstask-planninggroundingsaycanmanipulationcorl2022

Found an issue or want to improve this design?

Open an Issue