MTurk Integration
Deploy annotation tasks on Amazon Mechanical Turk.
Amazon Mechanical Turk Integration
This guide provides instructions for deploying Potato annotation tasks on Amazon Mechanical Turk (MTurk).
Overview
Potato integrates with MTurk through the External Question HIT type:
- You create an External Question HIT on MTurk pointing to your Potato server
- Workers click on your HIT and are redirected to your Potato server
- Potato extracts the worker ID and other parameters from the URL
- Workers complete the annotation task
- Upon completion, workers click "Submit HIT to MTurk"
URL Parameters
MTurk passes four parameters to your External Question URL:
| Parameter | Description |
|---|---|
workerId | Worker's unique MTurk identifier |
assignmentId | Unique ID for this worker-HIT pair |
hitId | The HIT identifier |
turkSubmitTo | URL where completion form should POST |
Prerequisites
Server Requirements
-
Publicly accessible server with:
- Open port (typically 8080 or 443)
- HTTPS recommended (required for some browsers)
- Stable internet connection
-
Python environment with Potato installed
MTurk Requirements
- MTurk Requester Account: Sign up at requester.mturk.com
- Funded Account: Add funds for production (sandbox is free)
Quick Start
Step 1: Create Your Potato Configuration
yaml
# mturk_task.yaml
annotation_task_name: "Sentiment Classification"
task_description: "Classify the sentiment of short text snippets."
# MTurk login configuration
login:
type: url_direct
url_argument: workerId
# Optional completion code
completion_code: "TASK_COMPLETE"
# Crowdsourcing settings
hide_navbar: true
jumping_to_id_disabled: true
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
# Data files
data_files:
- data/items.json
# Annotation scheme
annotation_schemes:
- annotation_type: radio
name: sentiment
description: "What is the sentiment of this text?"
labels:
- positive
- neutral
- negativeStep 2: Start Your Server
bash
# Start the server
potato start mturk_task.yaml -p 8080
# Or with HTTPS (recommended)
potato start mturk_task.yaml -p 443 --ssl-cert cert.pem --ssl-key key.pemStep 3: Create Your HIT on MTurk
Create an External Question HIT using this XML template:
xml
<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
<ExternalURL>https://your-server.com:8080/?workerId=${workerId}&assignmentId=${assignmentId}&hitId=${hitId}&turkSubmitTo=${turkSubmitTo}</ExternalURL>
<FrameHeight>800</FrameHeight>
</ExternalQuestion>Important: Use & instead of & in XML.
Configuration Reference
Required Settings
yaml
login:
type: url_direct # Required: enables URL-based authentication
url_argument: workerId # Required: MTurk uses 'workerId' parameterRecommended Settings
yaml
hide_navbar: true # Prevent workers from skipping
jumping_to_id_disabled: true
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
task_description: "Brief description for the preview page."
completion_code: "YOUR_CODE"Testing in Sandbox
Always test in the MTurk Sandbox before going to production.
Sandbox URLs
| Service | URL |
|---|---|
| Requester | https://requestersandbox.mturk.com |
| Worker | https://workersandbox.mturk.com |
| API Endpoint | https://mturk-requester-sandbox.us-east-1.amazonaws.com |
Local Testing
Test the MTurk URL parameters locally:
bash
# Test normal workflow
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=TEST_ASSIGNMENT&hitId=TEST_HIT"
# Test preview mode
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=ASSIGNMENT_ID_NOT_AVAILABLE&hitId=TEST_HIT"MTurk API Integration (Optional)
For advanced features, enable MTurk API integration:
bash
pip install boto3Create configs/mturk_config.yaml:
yaml
aws_access_key_id: "YOUR_ACCESS_KEY"
aws_secret_access_key: "YOUR_SECRET_KEY"
sandbox: true # Set to false for production
hit_id: "YOUR_HIT_ID"Enable in your main config:
yaml
mturk:
enabled: true
config_file_path: configs/mturk_config.yamlCreating HITs Programmatically
python
import boto3
mturk = boto3.client(
'mturk',
region_name='us-east-1',
endpoint_url='https://mturk-requester-sandbox.us-east-1.amazonaws.com'
)
question_xml = '''<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
<ExternalURL>https://your-server.com:8080/?workerId=${workerId}&assignmentId=${assignmentId}&hitId=${hitId}&turkSubmitTo=${turkSubmitTo}</ExternalURL>
<FrameHeight>800</FrameHeight>
</ExternalQuestion>'''
response = mturk.create_hit(
Title='Sentiment Classification Task',
Description='Classify the sentiment of short text snippets.',
Keywords='sentiment, classification, text',
Reward='0.50',
MaxAssignments=100,
LifetimeInSeconds=86400,
AssignmentDurationInSeconds=3600,
AutoApprovalDelayInSeconds=604800,
Question=question_xml
)
print(f"Created HIT: {response['HIT']['HITId']}")Best Practices
Task Design
- Clear Instructions: Provide detailed examples
- Reasonable Time: Don't rush workers
- Fair Pay: At least minimum wage equivalent ($12-15/hour)
- Manageable Length: 5-15 minutes per HIT is ideal
Quality Control
- Qualification Tests: Screen workers upfront
- Attention Checks: Include verification questions
- Redundancy: Multiple workers per item (3+ recommended)
- Review Samples: Manually check a subset
Technical
- Handle Edge Cases: Workers may reload or go back
- Save Progress: Autosave if possible
- Graceful Errors: Show helpful error messages
Troubleshooting
Workers See Preview Page After Accepting
- Verify
assignmentIdparameter is being passed correctly - The preview page auto-refreshes; ask workers to wait
Submit Button Doesn't Work
- Check browser console for errors
- Verify
turkSubmitToparameter is present - Check for CORS or mixed-content issues
Workers Can't Log In
- Verify
login.url_argumentis set toworkerId - Ensure
login.typeisurl_direct
Further Reading
- Crowdsourcing Integration - General crowdsourcing setup
- Quality Control - Attention checks and gold standards
- Task Assignment - Assignment strategies
For implementation details, see the source documentation.