# Crowdsourcing Integration

Source: https://www.potatoannotator.com/docs/deployment/crowdsourcing

Potato integrates with crowdsourcing platforms like Prolific and Amazon Mechanical Turk for large-scale annotation tasks.

## Prolific Integration

### Basic Setup

```yaml
crowdsourcing:
  platform: prolific
  enabled: true
  completion_code: "POTATO2024"  # Code shown on completion
```

### URL Parameters

Prolific passes participant info via URL parameters:

```yaml
crowdsourcing:
  platform: prolific
  url_params:
    - PROLIFIC_PID    # Participant ID
    - STUDY_ID        # Study ID
    - SESSION_ID      # Session ID
```

Workers access via:
```
https://your-server.com/?PROLIFIC_PID=xxx&STUDY_ID=xxx&SESSION_ID=xxx
```

### Prolific Configuration

In your Prolific study settings:
1. Set **Study URL** to your Potato server
2. Add URL parameters: `?PROLIFIC_PID={{%PROLIFIC_PID%}}&STUDY_ID={{%STUDY_ID%}}&SESSION_ID={{%SESSION_ID%}}`
3. Set **Completion code** to match your config

### Validation

Verify Prolific participants:

```yaml
crowdsourcing:
  platform: prolific
  validate_participant: true
  completion_code: "POTATO2024"
```

## Amazon MTurk Integration

### Basic Setup

```yaml
crowdsourcing:
  platform: mturk
  enabled: true
```

### HIT Configuration

Create an External Question HIT pointing to your server:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>
```

### URL Parameters

```yaml
crowdsourcing:
  platform: mturk
  url_params:
    - workerId
    - assignmentId
    - hitId
```

### Sandbox Testing

Test with MTurk Sandbox first:

```yaml
crowdsourcing:
  platform: mturk
  sandbox: true  # Use sandbox environment
```

## Worker Management

### Track Workers

```yaml
crowdsourcing:
  track_workers: true
  worker_id_field: worker_id
```

### Limit Instances Per Worker

```yaml
instances_per_annotator: 50
```

### Block Returning Workers

Prevent workers from retaking the task:

```yaml
crowdsourcing:
  prevent_retakes: true
```

## Quality Control

### Attention Checks

Insert test questions:

```yaml
attention_checks:
  enabled: true
  frequency: 10  # Every 10 instances
  fail_threshold: 2
  action: warn  # or 'block'
```

### Gold Standard Questions

```json
{
  "id": "gold_1",
  "text": "The sky is typically blue during a clear day.",
  "gold_label": "True",
  "is_gold": true
}
```

```yaml
quality_control:
  gold_questions: true
  gold_percentage: 10  # 10% of instances
  min_gold_accuracy: 70
```

### Time Limits

```yaml
crowdsourcing:
  min_time_per_instance: 5  # seconds
  max_time_total: 3600  # 1 hour
```

### Reject Low-Quality Work

```yaml
quality_control:
  auto_reject:
    enabled: true
    conditions:
      - gold_accuracy_below: 50
      - completion_time_under: 300  # seconds
```

## Completion Handling

### Show Completion Code

```yaml
completion:
  show_code: true
  code: "POTATO2024"
  message: "Thank you! Your completion code is: {code}"
```

### Redirect on Completion

```yaml
completion:
  redirect: true
  redirect_url: "https://prolific.co/submissions/complete?cc={code}"
```

### Custom Completion Page

```yaml
completion:
  custom_template: templates/completion.html
```

## Payment Tiers

### Based on Quality

```yaml
payment:
  tiers:
    - name: bonus
      condition:
        gold_accuracy_above: 90
      amount: 0.50
    - name: standard
      condition:
        gold_accuracy_above: 70
      amount: 0.00
    - name: reject
      condition:
        gold_accuracy_below: 50
```

## Full Example: Prolific Study

```yaml
task_name: "Sentiment Analysis Study"

# Crowdsourcing settings
crowdsourcing:
  platform: prolific
  enabled: true
  completion_code: "SENT2024"
  url_params:
    - PROLIFIC_PID
    - STUDY_ID
    - SESSION_ID
  prevent_retakes: true

# Open access for crowdworkers
allow_all_users: true

# Task assignment
instances_per_annotator: 50
annotation_per_instance: 3

# Quality control
attention_checks:
  enabled: true
  frequency: 10
  fail_threshold: 2

quality_control:
  gold_questions: true
  gold_percentage: 5
  min_gold_accuracy: 70

# Data
data_files:
  - path: data/main.json
    text_field: text

# Annotation scheme
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment?"
    labels:
      - Positive
      - Negative
      - Neutral
    keyboard_shortcuts:
      Positive: "1"
      Negative: "2"
      Neutral: "3"

# Completion
completion:
  show_code: true
  code: "SENT2024"
  message: |
    ## Thank you for participating!

    Your completion code is: **{code}**

    Please return to Prolific and enter this code to receive payment.
```

## Full Example: MTurk HIT

```yaml
task_name: "Image Classification HIT"

crowdsourcing:
  platform: mturk
  enabled: true
  url_params:
    - workerId
    - assignmentId
    - hitId

allow_all_users: true
instances_per_annotator: 20

# Time constraints
crowdsourcing:
  min_time_per_instance: 3
  max_time_total: 1800

# MTurk form submission
completion:
  mturk_submit: true
  submit_url: "https://www.mturk.com/mturk/externalSubmit"

annotation_schemes:
  - annotation_type: radio
    name: category
    description: "What is shown in this image?"
    labels:
      - Cat
      - Dog
      - Bird
      - Other
```

## Monitoring Workers

### Admin Dashboard

```yaml
admin_users:
  - researcher@university.edu

admin_dashboard:
  enabled: true
  show_worker_stats: true
```

View at `/admin` to see:
- Worker completion rates
- Average time per instance
- Gold accuracy scores
- Attention check results

### Export Worker Data

```bash
potato export-workers config.yaml --output workers.csv
```

## Best Practices

1. **Test thoroughly** - Run pilot with small group first
2. **Set fair pay** - Calculate estimated time and pay fairly
3. **Clear instructions** - Include examples and edge cases
4. **Use attention checks** - Catch random clicking
5. **Include gold questions** - Verify understanding
6. **Monitor in real-time** - Watch for issues early
7. **Plan for rejection** - Set clear quality criteria upfront
8. **Communicate issues** - Contact workers about problems
9. **Iterate on feedback** - Improve based on worker comments
10. **Export data regularly** - Don't wait until the end

## Further Reading

- [MTurk Integration](/docs/deployment/mturk-integration) - Detailed MTurk setup guide
- [Passwordless Login](/docs/features/passwordless-login) - URL-based authentication
- [Quality Control](/docs/features/quality-control) - Attention checks and gold standards

For implementation details, see the [source documentation](https://github.com/davidjurgens/potato/blob/main/docs/crowdsourcing.md).