MTurk Integration

Name: Potato
Author: Potato Annotation

Run Potato annotation tasks on Amazon Mechanical Turk — configure HITs, qualification tests, approval workflows, bonus payments, and annotator quality filtering.

This guide provides instructions for deploying Potato annotation tasks on Amazon Mechanical Turk (MTurk).

Overview

Potato integrates with MTurk through the External Question HIT type:

You create an External Question HIT on MTurk pointing to your Potato server
Workers click on your HIT and are redirected to your Potato server
Potato extracts the worker ID and other parameters from the URL
Workers complete the annotation task
Upon completion, workers click "Submit HIT to MTurk"

URL Parameters

MTurk passes four parameters to your External Question URL:

Parameter	Description
`workerId`	Worker's unique MTurk identifier
`assignmentId`	Unique ID for this worker-HIT pair
`hitId`	The HIT identifier
`turkSubmitTo`	URL where completion form should POST

Prerequisites

Server Requirements

Publicly accessible server with:
- Open port (typically 8080 or 443)
- HTTPS recommended (required for some browsers)
- Stable internet connection
Python environment with Potato installed

MTurk Requirements

MTurk Requester Account: Sign up at requester.mturk.com
Funded Account: Add funds for production (sandbox is free)

Quick Start

Step 1: Create Your Potato Configuration

yaml

# mturk_task.yaml
annotation_task_name: "Sentiment Classification"
task_description: "Classify the sentiment of short text snippets."
 
# MTurk login configuration
login:
  type: url_direct
  url_argument: workerId
 
# Optional completion code
completion_code: "TASK_COMPLETE"
 
# Crowdsourcing settings
hide_navbar: true
jumping_to_id_disabled: true
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
 
# Data files
data_files:
  - data/items.json
 
# Annotation scheme
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment of this text?"
    labels:
      - positive
      - neutral
      - negative

Step 2: Start Your Server

bash

# Start the server
potato start mturk_task.yaml -p 8080
 
# Or with HTTPS (recommended)
potato start mturk_task.yaml -p 443 --ssl-cert cert.pem --ssl-key key.pem

Step 3: Create Your HIT on MTurk

Create an External Question HIT using this XML template:

xml

<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com:8080/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}&amp;turkSubmitTo=${turkSubmitTo}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>

Important: Use & instead of & in XML.

Configuration Reference

Required Settings

yaml

login:
  type: url_direct      # Required: enables URL-based authentication
  url_argument: workerId  # Required: MTurk uses 'workerId' parameter

Recommended Settings

yaml

hide_navbar: true           # Prevent workers from skipping
jumping_to_id_disabled: true
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
task_description: "Brief description for the preview page."
completion_code: "YOUR_CODE"

Testing in Sandbox

Always test in the MTurk Sandbox before going to production.

Sandbox URLs

Service	URL
Requester	https://requestersandbox.mturk.com
Worker	https://workersandbox.mturk.com
API Endpoint	https://mturk-requester-sandbox.us-east-1.amazonaws.com

Local Testing

Test the MTurk URL parameters locally:

bash

# Test normal workflow
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=TEST_ASSIGNMENT&hitId=TEST_HIT"
 
# Test preview mode
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=ASSIGNMENT_ID_NOT_AVAILABLE&hitId=TEST_HIT"

MTurk API Integration (Optional)

For advanced features, enable MTurk API integration:

bash

pip install boto3

Create configs/mturk_config.yaml:

yaml

aws_access_key_id: "YOUR_ACCESS_KEY"
aws_secret_access_key: "YOUR_SECRET_KEY"
sandbox: true  # Set to false for production
hit_id: "YOUR_HIT_ID"

Enable in your main config:

yaml

mturk:
  enabled: true
  config_file_path: configs/mturk_config.yaml

Creating HITs Programmatically

python

import boto3
 
mturk = boto3.client(
    'mturk',
    region_name='us-east-1',
    endpoint_url='https://mturk-requester-sandbox.us-east-1.amazonaws.com'
)
 
question_xml = '''<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com:8080/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}&amp;turkSubmitTo=${turkSubmitTo}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>'''
 
response = mturk.create_hit(
    Title='Sentiment Classification Task',
    Description='Classify the sentiment of short text snippets.',
    Keywords='sentiment, classification, text',
    Reward='0.50',
    MaxAssignments=100,
    LifetimeInSeconds=86400,
    AssignmentDurationInSeconds=3600,
    AutoApprovalDelayInSeconds=604800,
    Question=question_xml
)
 
print(f"Created HIT: {response['HIT']['HITId']}")

Best Practices

Task Design

Clear Instructions: Provide detailed examples
Reasonable Time: Don't rush workers
Fair Pay: At least minimum wage equivalent ($12-15/hour)
Manageable Length: 5-15 minutes per HIT is ideal

Quality Control

Qualification Tests: Screen workers upfront
Attention Checks: Include verification questions
Redundancy: Multiple workers per item (3+ recommended)
Review Samples: Manually check a subset

Technical

Handle Edge Cases: Workers may reload or go back
Save Progress: Autosave if possible
Graceful Errors: Show helpful error messages

Troubleshooting

Workers See Preview Page After Accepting

Verify assignmentId parameter is being passed correctly
The preview page auto-refreshes; ask workers to wait

Submit Button Doesn't Work

Check browser console for errors
Verify turkSubmitTo parameter is present
Check for CORS or mixed-content issues

Workers Can't Log In

Verify login.url_argument is set to workerId
Ensure login.type is url_direct

MTurk Integration

Overview

URL Parameters

Prerequisites

Server Requirements

MTurk Requirements

Quick Start

Step 1: Create Your Potato Configuration

Step 2: Start Your Server

Step 3: Create Your HIT on MTurk

Configuration Reference

Required Settings

Recommended Settings

Testing in Sandbox

Sandbox URLs

Local Testing

MTurk API Integration (Optional)

Creating HITs Programmatically

Best Practices

Task Design

Quality Control

Technical

Troubleshooting

Workers See Preview Page After Accepting

Submit Button Doesn't Work

Workers Can't Log In

Further Reading