Guides6 min read
Deploying to Amazon Mechanical Turk
Step-by-step instructions for running Potato annotation tasks on MTurk with qualification tests and approval workflows.
By Potato Team·
Deploying to Amazon Mechanical Turk
Amazon Mechanical Turk (MTurk) provides access to a large, on-demand workforce for annotation tasks. This guide covers the complete setup process from AWS configuration to approval workflows.
Prerequisites
- AWS account with MTurk enabled
- MTurk Requester account (production or sandbox)
- Potato server accessible via public URL
- Basic familiarity with MTurk concepts (HITs, Workers, etc.)
MTurk Configuration
annotation_task_name: "MTurk Annotation Task"
login:
type: mturk
# AWS credentials
aws_access_key_env: AWS_ACCESS_KEY_ID
aws_secret_key_env: AWS_SECRET_ACCESS_KEY
region: us-east-1
# Sandbox for testing
sandbox: true # Set false for production
# Worker ID handling
worker_id_param: workerId
hit_id_param: hitId
assignment_id_param: assignmentId
# ExternalQuestion URL
external_url: "https://your-server.com/mturk"
data_files:
- items.json
item_properties:
id_key: id
text_key: text
instances_per_annotator: 20Creating HITs
Basic HIT Configuration
login:
type: mturk
hit_settings:
title: "Classify the sentiment of short texts"
description: "Read social media posts and select the sentiment (positive, negative, or neutral)"
keywords: "sentiment, classification, text, NLP, annotation"
reward: "0.50" # Per HIT in USD
duration: 3600 # 1 hour to complete
lifetime: 604800 # 7 days active
auto_approval_delay: 259200 # 3 days
max_assignments: 3 # Annotations per itemQualification Requirements
login:
type: mturk
qualifications:
# Built-in qualifications
- type: PercentAssignmentsApproved
comparator: GreaterThanOrEqualTo
value: 98
- type: NumberHITsApproved
comparator: GreaterThanOrEqualTo
value: 1000
- type: Locale
comparator: EqualTo
values: [US, CA, GB, AU] # English-speaking countries
# Masters qualification (higher cost)
# - type: Masters
# comparator: ExistsCustom Qualifications
Create your own qualification test:
login:
type: mturk
custom_qualification:
name: "Sentiment Analysis Qualification"
description: "Test for sentiment annotation task"
test:
enabled: true
questions:
- text: "I love this product!"
options: [Positive, Negative, Neutral]
correct: Positive
- text: "This is the worst experience ever."
options: [Positive, Negative, Neutral]
correct: Negative
- text: "The package arrived today."
options: [Positive, Negative, Neutral]
correct: Neutral
pass_threshold: 0.8
duration: 600 # 10 minutes
retry_delay: 86400 # 24 hours before retryComplete MTurk Configuration
annotation_task_name: "MTurk Sentiment Annotation"
# Note: Run with HTTPS for MTurk (use reverse proxy like nginx with SSL)
login:
type: mturk
# AWS Configuration
aws_access_key_env: AWS_ACCESS_KEY_ID
aws_secret_key_env: AWS_SECRET_ACCESS_KEY
region: us-east-1
sandbox: false # Production
# Worker parameters
worker_id_param: workerId
hit_id_param: hitId
assignment_id_param: assignmentId
# External Question
external_url: "https://your-server.com/mturk"
frame_height: 800
# HIT Configuration
hit_settings:
title: "Classify Sentiment of Social Media Posts (Quick Task)"
description: |
Read 20 short social media posts and classify each as
Positive, Negative, or Neutral sentiment. Takes about 10 minutes.
keywords: "sentiment, text classification, NLP, quick task, easy"
reward: "1.00"
duration: 1800 # 30 minutes
lifetime: 259200 # 3 days
auto_approval_delay: 172800 # 2 days
max_assignments: 3
# Qualifications
qualifications:
- type: PercentAssignmentsApproved
comparator: GreaterThanOrEqualTo
value: 97
- type: NumberHITsApproved
comparator: GreaterThanOrEqualTo
value: 500
- type: Locale
comparator: In
values: [US, GB, CA, AU, NZ]
# Custom qualification
custom_qualification:
enabled: true
name: "Sentiment Classification Qualified"
auto_grant_on_pass: true
# Data
data_files:
- tweets.json
item_properties:
id_key: id
text_key: text
instances_per_annotator: 20
# Quality Control
quality_control:
attention_checks:
enabled: true
frequency: 5
items:
- text: "Select POSITIVE for this attention check."
expected: "Positive"
fail_action: reject_assignment
timing:
min_time_total: 120 # At least 2 minutes total
min_time_per_item: 3
# Auto-approval settings
auto_approve:
enabled: true
conditions:
- attention_checks_passed
- min_time_met
- completion: 100
# Auto-rejection
auto_reject:
enabled: true
conditions:
- attention_checks_failed: 2
- completion_below: 50
# Annotation task
annotation_schemes:
- annotation_type: radio
name: sentiment
description: "What is the sentiment of this text?"
labels:
- Positive
- Negative
- Neutral
required: true
# Instructions shown in HIT
annotation_guidelines:
title: "Sentiment Classification Instructions"
content: |
## Your Task
Read each social media post and classify its sentiment.
## Labels
- **Positive**: Happy, excited, satisfied, praising
- **Negative**: Sad, angry, frustrated, complaining
- **Neutral**: Factual, objective, no clear emotion
## Important
- Please read each text carefully
- There are attention checks - answer accurately
- Complete all 20 items to submit
# Completion
completion:
submit_to_mturk: true
show_confirmation: true
confirmation_message: "Thank you! Your work has been submitted."Managing HITs
Publishing HITs
# Create HITs from your data
potato mturk create-hits --config config.yaml --count 100
# Check HIT status
potato mturk status --config config.yaml
# List active HITs
potato mturk list-hits --config config.yaml --status AssignableMonitoring Assignments
# Get assignment status
potato mturk assignments --config config.yaml --hit-id HIT_ID
# Download completed assignments
potato mturk download --config config.yaml --output annotations/Approving/Rejecting
# Auto-process based on quality
potato mturk process --config config.yaml --auto
# Manual approval
potato mturk approve --assignment-id ASSIGNMENT_ID
# Manual rejection
potato mturk reject --assignment-id ASSIGNMENT_ID --reason "Failed attention checks"Handling Worker Communication
login:
type: mturk
communication:
# Contact info shown to workers
requester_email: researcher@university.edu
# Handle worker messages
notification_email: alerts@university.edu
# Feedback on rejection
rejection_feedback: true
rejection_template: |
Your submission was rejected because: {{reason}}
If you believe this is an error, please contact us.Cost Calculation
login:
type: mturk
cost_tracking:
enabled: true
# MTurk fees
base_fee_percent: 20 # 20% on reward
masters_fee_percent: 5 # Additional 5% for Masters
ten_plus_fee_percent: 20 # Additional 20% for >10 assignments
# Budget limits
daily_budget: 100.00
total_budget: 1000.00
pause_on_budget_exceeded: trueCost Formula:
- Base: reward × (1 + 0.20)
- With Masters: reward × (1 + 0.20 + 0.05)
- 10+ assignments: reward × (1 + 0.20 + 0.20)
Output Format
{
"worker_id": "A1234BCDEFG",
"hit_id": "3ABCDEFGHIJK",
"assignment_id": "3LMNOPQRSTUV",
"annotations": {
"sentiment": "Positive"
},
"mturk_metadata": {
"accept_time": "2024-11-15T10:00:00Z",
"submit_time": "2024-11-15T10:12:00Z",
"approval_status": "Approved",
"reward": "1.00"
},
"quality_metrics": {
"attention_checks_passed": 4,
"attention_checks_total": 4,
"time_spent_seconds": 720
}
}Best Practices
- Start with Sandbox: Always test in sandbox first
- Fair pay: Calculate hourly rate (reward ÷ estimated time × 60)
- Clear HITs: Well-written titles/descriptions get better workers
- Quick approval: Workers appreciate fast payment
- Handle rejections carefully: Affects your requester reputation
Comparison: MTurk vs Prolific
| Aspect | MTurk | Prolific |
|---|---|---|
| Worker pool | Large, diverse | Smaller, research-focused |
| Quality | Variable | Generally higher |
| Pricing | Lower base, + fees | Higher, transparent |
| Setup | More complex | Simpler |
| Best for | Large scale, budget | Research, quality |
Next Steps
- Compare with Prolific integration
- Set up quality control
- Calculate inter-annotator agreement
Full MTurk documentation at /docs/deployment/crowdsourcing.