Crowdsourcing Integration

Integrate with Prolific, MTurk, and other crowdsourcing platforms.

Crowdsourcing Integration

Potato integrates seamlessly with crowdsourcing platforms like Prolific and Amazon Mechanical Turk for large-scale annotation tasks.

Prolific Integration

Basic Setup

yaml

crowdsourcing:
  platform: prolific
  enabled: true
  completion_code: "POTATO2024"  # Code shown on completion

URL Parameters

Prolific passes participant info via URL parameters:

yaml

crowdsourcing:
  platform: prolific
  url_params:
    - PROLIFIC_PID    # Participant ID
    - STUDY_ID        # Study ID
    - SESSION_ID      # Session ID

Workers access via:

text

https://your-server.com/?PROLIFIC_PID=xxx&STUDY_ID=xxx&SESSION_ID=xxx

Prolific Configuration

In your Prolific study settings:

Set Study URL to your Potato server
Add URL parameters: ?PROLIFIC_PID={{%PROLIFIC_PID%}}&STUDY_ID={{%STUDY_ID%}}&SESSION_ID={{%SESSION_ID%}}
Set Completion code to match your config

Validation

Verify Prolific participants:

yaml

crowdsourcing:
  platform: prolific
  validate_participant: true
  completion_code: "POTATO2024"

Amazon MTurk Integration

Basic Setup

yaml

crowdsourcing:
  platform: mturk
  enabled: true

HIT Configuration

Create an External Question HIT pointing to your server:

xml

<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>

URL Parameters

yaml

crowdsourcing:
  platform: mturk
  url_params:
    - workerId
    - assignmentId
    - hitId

Sandbox Testing

Test with MTurk Sandbox first:

yaml

crowdsourcing:
  platform: mturk
  sandbox: true  # Use sandbox environment

Worker Management

Track Workers

yaml

crowdsourcing:
  track_workers: true
  worker_id_field: worker_id

Limit Instances Per Worker

yaml

instances_per_annotator: 50

Block Returning Workers

Prevent workers from retaking the task:

yaml

crowdsourcing:
  prevent_retakes: true

Quality Control

Attention Checks

Insert test questions:

yaml

attention_checks:
  enabled: true
  frequency: 10  # Every 10 instances
  fail_threshold: 2
  action: warn  # or 'block'

Gold Standard Questions

json

{
  "id": "gold_1",
  "text": "The sky is typically blue during a clear day.",
  "gold_label": "True",
  "is_gold": true
}

yaml

quality_control:
  gold_questions: true
  gold_percentage: 10  # 10% of instances
  min_gold_accuracy: 70

Time Limits

yaml

crowdsourcing:
  min_time_per_instance: 5  # seconds
  max_time_total: 3600  # 1 hour

Reject Low-Quality Work

yaml

quality_control:
  auto_reject:
    enabled: true
    conditions:
      - gold_accuracy_below: 50
      - completion_time_under: 300  # seconds

Completion Handling

Show Completion Code

yaml

completion:
  show_code: true
  code: "POTATO2024"
  message: "Thank you! Your completion code is: {code}"

Redirect on Completion

yaml

completion:
  redirect: true
  redirect_url: "https://prolific.co/submissions/complete?cc={code}"

Custom Completion Page

yaml

completion:
  custom_template: templates/completion.html

Payment Tiers

Based on Quality

yaml

payment:
  tiers:
    - name: bonus
      condition:
        gold_accuracy_above: 90
      amount: 0.50
    - name: standard
      condition:
        gold_accuracy_above: 70
      amount: 0.00
    - name: reject
      condition:
        gold_accuracy_below: 50

Full Example: Prolific Study

yaml

task_name: "Sentiment Analysis Study"
 
# Crowdsourcing settings
crowdsourcing:
  platform: prolific
  enabled: true
  completion_code: "SENT2024"
  url_params:
    - PROLIFIC_PID
    - STUDY_ID
    - SESSION_ID
  prevent_retakes: true
 
# Open access for crowdworkers
allow_all_users: true
 
# Task assignment
instances_per_annotator: 50
annotation_per_instance: 3
 
# Quality control
attention_checks:
  enabled: true
  frequency: 10
  fail_threshold: 2
 
quality_control:
  gold_questions: true
  gold_percentage: 5
  min_gold_accuracy: 70
 
# Data
data_files:
  - path: data/main.json
    text_field: text
 
# Annotation scheme
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment?"
    labels:
      - Positive
      - Negative
      - Neutral
    keyboard_shortcuts:
      Positive: "1"
      Negative: "2"
      Neutral: "3"
 
# Completion
completion:
  show_code: true
  code: "SENT2024"
  message: |
    ## Thank you for participating!
 
    Your completion code is: **{code}**
 
    Please return to Prolific and enter this code to receive payment.

Full Example: MTurk HIT

yaml

task_name: "Image Classification HIT"
 
crowdsourcing:
  platform: mturk
  enabled: true
  url_params:
    - workerId
    - assignmentId
    - hitId
 
allow_all_users: true
instances_per_annotator: 20
 
# Time constraints
crowdsourcing:
  min_time_per_instance: 3
  max_time_total: 1800
 
# MTurk form submission
completion:
  mturk_submit: true
  submit_url: "https://www.mturk.com/mturk/externalSubmit"
 
annotation_schemes:
  - annotation_type: radio
    name: category
    description: "What is shown in this image?"
    labels:
      - Cat
      - Dog
      - Bird
      - Other

Monitoring Workers

Admin Dashboard

yaml

admin_users:
  - researcher@university.edu
 
admin_dashboard:
  enabled: true
  show_worker_stats: true

View at /admin to see:

Worker completion rates
Average time per instance
Gold accuracy scores
Attention check results

Export Worker Data

bash

potato export-workers config.yaml --output workers.csv

Best Practices

Test thoroughly - Run pilot with small group first
Set fair pay - Calculate estimated time and pay fairly
Clear instructions - Include examples and edge cases
Use attention checks - Catch random clicking
Include gold questions - Verify understanding
Monitor in real-time - Watch for issues early
Plan for rejection - Set clear quality criteria upfront
Communicate issues - Contact workers about problems
Iterate on feedback - Improve based on worker comments
Export data regularly - Don't wait until the end

Crowdsourcing Integration

Crowdsourcing Integration

Prolific Integration

Basic Setup

URL Parameters

Prolific Configuration

Validation

Amazon MTurk Integration

Basic Setup

HIT Configuration

URL Parameters

Sandbox Testing

Worker Management

Track Workers

Limit Instances Per Worker

Block Returning Workers

Quality Control

Attention Checks

Gold Standard Questions

Time Limits

Reject Low-Quality Work

Completion Handling

Show Completion Code

Redirect on Completion

Custom Completion Page

Payment Tiers

Based on Quality

Full Example: Prolific Study

Full Example: MTurk HIT

Monitoring Workers

Admin Dashboard

Export Worker Data

Best Practices

Further Reading