Skip to content
यह पृष्ठ अभी आपकी भाषा में उपलब्ध नहीं है। अंग्रेज़ी संस्करण दिखाया जा रहा है।

Crowdsourcing Integration

Integrate with Prolific, MTurk, and other crowdsourcing platforms.

Crowdsourcing Integration

Potato integrates seamlessly with crowdsourcing platforms like Prolific and Amazon Mechanical Turk for large-scale annotation tasks.

Prolific Integration

Basic Setup

yaml
crowdsourcing:
  platform: prolific
  enabled: true
  completion_code: "POTATO2024"  # Code shown on completion

URL Parameters

Prolific passes participant info via URL parameters:

yaml
crowdsourcing:
  platform: prolific
  url_params:
    - PROLIFIC_PID    # Participant ID
    - STUDY_ID        # Study ID
    - SESSION_ID      # Session ID

Workers access via:

text
https://your-server.com/?PROLIFIC_PID=xxx&STUDY_ID=xxx&SESSION_ID=xxx

Prolific Configuration

In your Prolific study settings:

  1. Set Study URL to your Potato server
  2. Add URL parameters: ?PROLIFIC_PID={{%PROLIFIC_PID%}}&STUDY_ID={{%STUDY_ID%}}&SESSION_ID={{%SESSION_ID%}}
  3. Set Completion code to match your config

Validation

Verify Prolific participants:

yaml
crowdsourcing:
  platform: prolific
  validate_participant: true
  completion_code: "POTATO2024"

Amazon MTurk Integration

Basic Setup

yaml
crowdsourcing:
  platform: mturk
  enabled: true

HIT Configuration

Create an External Question HIT pointing to your server:

xml
<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>

URL Parameters

yaml
crowdsourcing:
  platform: mturk
  url_params:
    - workerId
    - assignmentId
    - hitId

Sandbox Testing

Test with MTurk Sandbox first:

yaml
crowdsourcing:
  platform: mturk
  sandbox: true  # Use sandbox environment

Worker Management

Track Workers

yaml
crowdsourcing:
  track_workers: true
  worker_id_field: worker_id

Limit Instances Per Worker

yaml
instances_per_annotator: 50

Block Returning Workers

Prevent workers from retaking the task:

yaml
crowdsourcing:
  prevent_retakes: true

Quality Control

Attention Checks

Insert test questions:

yaml
attention_checks:
  enabled: true
  frequency: 10  # Every 10 instances
  fail_threshold: 2
  action: warn  # or 'block'

Gold Standard Questions

json
{
  "id": "gold_1",
  "text": "The sky is typically blue during a clear day.",
  "gold_label": "True",
  "is_gold": true
}
yaml
quality_control:
  gold_questions: true
  gold_percentage: 10  # 10% of instances
  min_gold_accuracy: 70

Time Limits

yaml
crowdsourcing:
  min_time_per_instance: 5  # seconds
  max_time_total: 3600  # 1 hour

Reject Low-Quality Work

yaml
quality_control:
  auto_reject:
    enabled: true
    conditions:
      - gold_accuracy_below: 50
      - completion_time_under: 300  # seconds

Completion Handling

Show Completion Code

yaml
completion:
  show_code: true
  code: "POTATO2024"
  message: "Thank you! Your completion code is: {code}"

Redirect on Completion

yaml
completion:
  redirect: true
  redirect_url: "https://prolific.co/submissions/complete?cc={code}"

Custom Completion Page

yaml
completion:
  custom_template: templates/completion.html

Payment Tiers

Based on Quality

yaml
payment:
  tiers:
    - name: bonus
      condition:
        gold_accuracy_above: 90
      amount: 0.50
    - name: standard
      condition:
        gold_accuracy_above: 70
      amount: 0.00
    - name: reject
      condition:
        gold_accuracy_below: 50

Full Example: Prolific Study

yaml
task_name: "Sentiment Analysis Study"
 
# Crowdsourcing settings
crowdsourcing:
  platform: prolific
  enabled: true
  completion_code: "SENT2024"
  url_params:
    - PROLIFIC_PID
    - STUDY_ID
    - SESSION_ID
  prevent_retakes: true
 
# Open access for crowdworkers
allow_all_users: true
 
# Task assignment
instances_per_annotator: 50
annotation_per_instance: 3
 
# Quality control
attention_checks:
  enabled: true
  frequency: 10
  fail_threshold: 2
 
quality_control:
  gold_questions: true
  gold_percentage: 5
  min_gold_accuracy: 70
 
# Data
data_files:
  - path: data/main.json
    text_field: text
 
# Annotation scheme
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment?"
    labels:
      - Positive
      - Negative
      - Neutral
    keyboard_shortcuts:
      Positive: "1"
      Negative: "2"
      Neutral: "3"
 
# Completion
completion:
  show_code: true
  code: "SENT2024"
  message: |
    ## Thank you for participating!
 
    Your completion code is: **{code}**
 
    Please return to Prolific and enter this code to receive payment.

Full Example: MTurk HIT

yaml
task_name: "Image Classification HIT"
 
crowdsourcing:
  platform: mturk
  enabled: true
  url_params:
    - workerId
    - assignmentId
    - hitId
 
allow_all_users: true
instances_per_annotator: 20
 
# Time constraints
crowdsourcing:
  min_time_per_instance: 3
  max_time_total: 1800
 
# MTurk form submission
completion:
  mturk_submit: true
  submit_url: "https://www.mturk.com/mturk/externalSubmit"
 
annotation_schemes:
  - annotation_type: radio
    name: category
    description: "What is shown in this image?"
    labels:
      - Cat
      - Dog
      - Bird
      - Other

Monitoring Workers

Admin Dashboard

yaml
admin_users:
  - researcher@university.edu
 
admin_dashboard:
  enabled: true
  show_worker_stats: true

View at /admin to see:

  • Worker completion rates
  • Average time per instance
  • Gold accuracy scores
  • Attention check results

Export Worker Data

bash
potato export-workers config.yaml --output workers.csv

Best Practices

  1. Test thoroughly - Run pilot with small group first
  2. Set fair pay - Calculate estimated time and pay fairly
  3. Clear instructions - Include examples and edge cases
  4. Use attention checks - Catch random clicking
  5. Include gold questions - Verify understanding
  6. Monitor in real-time - Watch for issues early
  7. Plan for rejection - Set clear quality criteria upfront
  8. Communicate issues - Contact workers about problems
  9. Iterate on feedback - Improve based on worker comments
  10. Export data regularly - Don't wait until the end

Further Reading

For implementation details, see the source documentation.