Skip to content
Docs/Deployment

MTurk Integration

Deploy annotation tasks on Amazon Mechanical Turk.

Amazon Mechanical Turk Integration

This guide provides instructions for deploying Potato annotation tasks on Amazon Mechanical Turk (MTurk).

Overview

Potato integrates with MTurk through the External Question HIT type:

  1. You create an External Question HIT on MTurk pointing to your Potato server
  2. Workers click on your HIT and are redirected to your Potato server
  3. Potato extracts the worker ID and other parameters from the URL
  4. Workers complete the annotation task
  5. Upon completion, workers click "Submit HIT to MTurk"

URL Parameters

MTurk passes four parameters to your External Question URL:

ParameterDescription
workerIdWorker's unique MTurk identifier
assignmentIdUnique ID for this worker-HIT pair
hitIdThe HIT identifier
turkSubmitToURL where completion form should POST

Prerequisites

Server Requirements

  1. Publicly accessible server with:

    • Open port (typically 8080 or 443)
    • HTTPS recommended (required for some browsers)
    • Stable internet connection
  2. Python environment with Potato installed

MTurk Requirements

  1. MTurk Requester Account: Sign up at requester.mturk.com
  2. Funded Account: Add funds for production (sandbox is free)

Quick Start

Step 1: Create Your Potato Configuration

yaml
# mturk_task.yaml
annotation_task_name: "Sentiment Classification"
task_description: "Classify the sentiment of short text snippets."
 
# MTurk login configuration
login:
  type: url_direct
  url_argument: workerId
 
# Optional completion code
completion_code: "TASK_COMPLETE"
 
# Crowdsourcing settings
hide_navbar: true
jumping_to_id_disabled: true
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
 
# Data files
data_files:
  - data/items.json
 
# Annotation scheme
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment of this text?"
    labels:
      - positive
      - neutral
      - negative

Step 2: Start Your Server

bash
# Start the server
potato start mturk_task.yaml -p 8080
 
# Or with HTTPS (recommended)
potato start mturk_task.yaml -p 443 --ssl-cert cert.pem --ssl-key key.pem

Step 3: Create Your HIT on MTurk

Create an External Question HIT using this XML template:

xml
<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com:8080/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}&amp;turkSubmitTo=${turkSubmitTo}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>

Important: Use &amp; instead of & in XML.

Configuration Reference

Required Settings

yaml
login:
  type: url_direct      # Required: enables URL-based authentication
  url_argument: workerId  # Required: MTurk uses 'workerId' parameter
yaml
hide_navbar: true           # Prevent workers from skipping
jumping_to_id_disabled: true
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
task_description: "Brief description for the preview page."
completion_code: "YOUR_CODE"

Testing in Sandbox

Always test in the MTurk Sandbox before going to production.

Sandbox URLs

Local Testing

Test the MTurk URL parameters locally:

bash
# Test normal workflow
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=TEST_ASSIGNMENT&hitId=TEST_HIT"
 
# Test preview mode
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=ASSIGNMENT_ID_NOT_AVAILABLE&hitId=TEST_HIT"

MTurk API Integration (Optional)

For advanced features, enable MTurk API integration:

bash
pip install boto3

Create configs/mturk_config.yaml:

yaml
aws_access_key_id: "YOUR_ACCESS_KEY"
aws_secret_access_key: "YOUR_SECRET_KEY"
sandbox: true  # Set to false for production
hit_id: "YOUR_HIT_ID"

Enable in your main config:

yaml
mturk:
  enabled: true
  config_file_path: configs/mturk_config.yaml

Creating HITs Programmatically

python
import boto3
 
mturk = boto3.client(
    'mturk',
    region_name='us-east-1',
    endpoint_url='https://mturk-requester-sandbox.us-east-1.amazonaws.com'
)
 
question_xml = '''<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com:8080/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}&amp;turkSubmitTo=${turkSubmitTo}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>'''
 
response = mturk.create_hit(
    Title='Sentiment Classification Task',
    Description='Classify the sentiment of short text snippets.',
    Keywords='sentiment, classification, text',
    Reward='0.50',
    MaxAssignments=100,
    LifetimeInSeconds=86400,
    AssignmentDurationInSeconds=3600,
    AutoApprovalDelayInSeconds=604800,
    Question=question_xml
)
 
print(f"Created HIT: {response['HIT']['HITId']}")

Best Practices

Task Design

  1. Clear Instructions: Provide detailed examples
  2. Reasonable Time: Don't rush workers
  3. Fair Pay: At least minimum wage equivalent ($12-15/hour)
  4. Manageable Length: 5-15 minutes per HIT is ideal

Quality Control

  1. Qualification Tests: Screen workers upfront
  2. Attention Checks: Include verification questions
  3. Redundancy: Multiple workers per item (3+ recommended)
  4. Review Samples: Manually check a subset

Technical

  1. Handle Edge Cases: Workers may reload or go back
  2. Save Progress: Autosave if possible
  3. Graceful Errors: Show helpful error messages

Troubleshooting

Workers See Preview Page After Accepting

  • Verify assignmentId parameter is being passed correctly
  • The preview page auto-refreshes; ask workers to wait

Submit Button Doesn't Work

  • Check browser console for errors
  • Verify turkSubmitTo parameter is present
  • Check for CORS or mixed-content issues

Workers Can't Log In

  • Verify login.url_argument is set to workerId
  • Ensure login.type is url_direct

Further Reading

For implementation details, see the source documentation.