Blog/Guides
Guides6 min read

Deploying to Amazon Mechanical Turk

Step-by-step instructions for running Potato annotation tasks on MTurk with qualification tests and approval workflows.

By Potato Team·

Deploying to Amazon Mechanical Turk

Amazon Mechanical Turk (MTurk) provides access to a large, on-demand workforce for annotation tasks. This guide covers the complete setup process from AWS configuration to approval workflows.

Prerequisites

  1. AWS account with MTurk enabled
  2. MTurk Requester account (production or sandbox)
  3. Potato server accessible via public URL
  4. Basic familiarity with MTurk concepts (HITs, Workers, etc.)

MTurk Configuration

annotation_task_name: "MTurk Annotation Task"
 
login:
  type: mturk
 
  # AWS credentials
  aws_access_key_env: AWS_ACCESS_KEY_ID
  aws_secret_key_env: AWS_SECRET_ACCESS_KEY
  region: us-east-1
 
  # Sandbox for testing
  sandbox: true  # Set false for production
 
  # Worker ID handling
  worker_id_param: workerId
  hit_id_param: hitId
  assignment_id_param: assignmentId
 
  # ExternalQuestion URL
  external_url: "https://your-server.com/mturk"
 
data_files:
  - items.json
 
item_properties:
  id_key: id
  text_key: text
 
instances_per_annotator: 20

Creating HITs

Basic HIT Configuration

login:
  type: mturk
 
  hit_settings:
    title: "Classify the sentiment of short texts"
    description: "Read social media posts and select the sentiment (positive, negative, or neutral)"
    keywords: "sentiment, classification, text, NLP, annotation"
 
    reward: "0.50"  # Per HIT in USD
    duration: 3600  # 1 hour to complete
    lifetime: 604800  # 7 days active
    auto_approval_delay: 259200  # 3 days
 
    max_assignments: 3  # Annotations per item

Qualification Requirements

login:
  type: mturk
 
  qualifications:
    # Built-in qualifications
    - type: PercentAssignmentsApproved
      comparator: GreaterThanOrEqualTo
      value: 98
 
    - type: NumberHITsApproved
      comparator: GreaterThanOrEqualTo
      value: 1000
 
    - type: Locale
      comparator: EqualTo
      values: [US, CA, GB, AU]  # English-speaking countries
 
    # Masters qualification (higher cost)
    # - type: Masters
    #   comparator: Exists

Custom Qualifications

Create your own qualification test:

login:
  type: mturk
 
  custom_qualification:
    name: "Sentiment Analysis Qualification"
    description: "Test for sentiment annotation task"
 
    test:
      enabled: true
      questions:
        - text: "I love this product!"
          options: [Positive, Negative, Neutral]
          correct: Positive
 
        - text: "This is the worst experience ever."
          options: [Positive, Negative, Neutral]
          correct: Negative
 
        - text: "The package arrived today."
          options: [Positive, Negative, Neutral]
          correct: Neutral
 
      pass_threshold: 0.8
      duration: 600  # 10 minutes
      retry_delay: 86400  # 24 hours before retry

Complete MTurk Configuration

annotation_task_name: "MTurk Sentiment Annotation"
 
# Note: Run with HTTPS for MTurk (use reverse proxy like nginx with SSL)
 
login:
  type: mturk
 
  # AWS Configuration
  aws_access_key_env: AWS_ACCESS_KEY_ID
  aws_secret_key_env: AWS_SECRET_ACCESS_KEY
  region: us-east-1
  sandbox: false  # Production
 
  # Worker parameters
  worker_id_param: workerId
  hit_id_param: hitId
  assignment_id_param: assignmentId
 
  # External Question
  external_url: "https://your-server.com/mturk"
  frame_height: 800
 
  # HIT Configuration
  hit_settings:
    title: "Classify Sentiment of Social Media Posts (Quick Task)"
    description: |
      Read 20 short social media posts and classify each as
      Positive, Negative, or Neutral sentiment. Takes about 10 minutes.
    keywords: "sentiment, text classification, NLP, quick task, easy"
 
    reward: "1.00"
    duration: 1800  # 30 minutes
    lifetime: 259200  # 3 days
    auto_approval_delay: 172800  # 2 days
 
    max_assignments: 3
 
  # Qualifications
  qualifications:
    - type: PercentAssignmentsApproved
      comparator: GreaterThanOrEqualTo
      value: 97
 
    - type: NumberHITsApproved
      comparator: GreaterThanOrEqualTo
      value: 500
 
    - type: Locale
      comparator: In
      values: [US, GB, CA, AU, NZ]
 
  # Custom qualification
  custom_qualification:
    enabled: true
    name: "Sentiment Classification Qualified"
    auto_grant_on_pass: true
 
# Data
data_files:
  - tweets.json
 
item_properties:
  id_key: id
  text_key: text
 
instances_per_annotator: 20
 
# Quality Control
quality_control:
  attention_checks:
    enabled: true
    frequency: 5
    items:
      - text: "Select POSITIVE for this attention check."
        expected: "Positive"
    fail_action: reject_assignment
 
  timing:
    min_time_total: 120  # At least 2 minutes total
    min_time_per_item: 3
 
  # Auto-approval settings
  auto_approve:
    enabled: true
    conditions:
      - attention_checks_passed
      - min_time_met
      - completion: 100
 
  # Auto-rejection
  auto_reject:
    enabled: true
    conditions:
      - attention_checks_failed: 2
      - completion_below: 50
 
# Annotation task
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment of this text?"
    labels:
      - Positive
      - Negative
      - Neutral
    required: true
 
# Instructions shown in HIT
annotation_guidelines:
  title: "Sentiment Classification Instructions"
  content: |
    ## Your Task
    Read each social media post and classify its sentiment.
 
    ## Labels
    - **Positive**: Happy, excited, satisfied, praising
    - **Negative**: Sad, angry, frustrated, complaining
    - **Neutral**: Factual, objective, no clear emotion
 
    ## Important
    - Please read each text carefully
    - There are attention checks - answer accurately
    - Complete all 20 items to submit
 
# Completion
completion:
  submit_to_mturk: true
  show_confirmation: true
  confirmation_message: "Thank you! Your work has been submitted."

Managing HITs

Publishing HITs

# Create HITs from your data
potato mturk create-hits --config config.yaml --count 100
 
# Check HIT status
potato mturk status --config config.yaml
 
# List active HITs
potato mturk list-hits --config config.yaml --status Assignable

Monitoring Assignments

# Get assignment status
potato mturk assignments --config config.yaml --hit-id HIT_ID
 
# Download completed assignments
potato mturk download --config config.yaml --output annotations/

Approving/Rejecting

# Auto-process based on quality
potato mturk process --config config.yaml --auto
 
# Manual approval
potato mturk approve --assignment-id ASSIGNMENT_ID
 
# Manual rejection
potato mturk reject --assignment-id ASSIGNMENT_ID --reason "Failed attention checks"

Handling Worker Communication

login:
  type: mturk
 
  communication:
    # Contact info shown to workers
    requester_email: researcher@university.edu
 
    # Handle worker messages
    notification_email: alerts@university.edu
 
    # Feedback on rejection
    rejection_feedback: true
    rejection_template: |
      Your submission was rejected because: {{reason}}
      If you believe this is an error, please contact us.

Cost Calculation

login:
  type: mturk
 
  cost_tracking:
    enabled: true
 
    # MTurk fees
    base_fee_percent: 20  # 20% on reward
    masters_fee_percent: 5  # Additional 5% for Masters
    ten_plus_fee_percent: 20  # Additional 20% for >10 assignments
 
    # Budget limits
    daily_budget: 100.00
    total_budget: 1000.00
    pause_on_budget_exceeded: true

Cost Formula:

  • Base: reward × (1 + 0.20)
  • With Masters: reward × (1 + 0.20 + 0.05)
  • 10+ assignments: reward × (1 + 0.20 + 0.20)

Output Format

{
  "worker_id": "A1234BCDEFG",
  "hit_id": "3ABCDEFGHIJK",
  "assignment_id": "3LMNOPQRSTUV",
  "annotations": {
    "sentiment": "Positive"
  },
  "mturk_metadata": {
    "accept_time": "2024-11-15T10:00:00Z",
    "submit_time": "2024-11-15T10:12:00Z",
    "approval_status": "Approved",
    "reward": "1.00"
  },
  "quality_metrics": {
    "attention_checks_passed": 4,
    "attention_checks_total": 4,
    "time_spent_seconds": 720
  }
}

Best Practices

  1. Start with Sandbox: Always test in sandbox first
  2. Fair pay: Calculate hourly rate (reward ÷ estimated time × 60)
  3. Clear HITs: Well-written titles/descriptions get better workers
  4. Quick approval: Workers appreciate fast payment
  5. Handle rejections carefully: Affects your requester reputation

Comparison: MTurk vs Prolific

AspectMTurkProlific
Worker poolLarge, diverseSmaller, research-focused
QualityVariableGenerally higher
PricingLower base, + feesHigher, transparent
SetupMore complexSimpler
Best forLarge scale, budgetResearch, quality

Next Steps


Full MTurk documentation at /docs/deployment/crowdsourcing.