# MTurk Integration

Source: https://www.potatoannotator.com/docs/deployment/mturk-integration

This guide provides instructions for deploying Potato annotation tasks on Amazon Mechanical Turk (MTurk).

## Overview

Potato integrates with MTurk through the External Question HIT type:

1. You create an External Question HIT on MTurk pointing to your Potato server
2. Workers click on your HIT and are redirected to your Potato server
3. Potato extracts the worker ID and other parameters from the URL
4. Workers complete the annotation task
5. Upon completion, workers click "Submit HIT to MTurk"

### URL Parameters

MTurk passes four parameters to your External Question URL:

| Parameter | Description |
|-----------|-------------|
| `workerId` | Worker's unique MTurk identifier |
| `assignmentId` | Unique ID for this worker-HIT pair |
| `hitId` | The HIT identifier |
| `turkSubmitTo` | URL where completion form should POST |

## Prerequisites

### Server Requirements

1. **Publicly accessible server** with:
   - Open port (typically 8080 or 443)
   - HTTPS recommended (required for some browsers)
   - Stable internet connection

2. **Python environment** with Potato installed

### MTurk Requirements

1. **MTurk Requester Account**: Sign up at [requester.mturk.com](https://requester.mturk.com)
2. **Funded Account**: Add funds for production (sandbox is free)

## Quick Start

### Step 1: Create Your Potato Configuration

```yaml
# mturk_task.yaml
annotation_task_name: "Sentiment Classification"
task_description: "Classify the sentiment of short text snippets."

# MTurk login configuration
login:
  type: url_direct
  url_argument: workerId

# Optional completion code
completion_code: "TASK_COMPLETE"

# Crowdsourcing settings
hide_navbar: true
jumping_to_id_disabled: true
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3

# Data files
data_files:
  - data/items.json

# Annotation scheme
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment of this text?"
    labels:
      - positive
      - neutral
      - negative
```

### Step 2: Start Your Server

```bash
# Start the server
potato start mturk_task.yaml -p 8080

# Or with HTTPS (recommended)
potato start mturk_task.yaml -p 443 --ssl-cert cert.pem --ssl-key key.pem
```

### Step 3: Create Your HIT on MTurk

Create an External Question HIT using this XML template:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com:8080/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}&amp;turkSubmitTo=${turkSubmitTo}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>
```

**Important**: Use `&amp;` instead of `&` in XML.

## Configuration Reference

### Required Settings

```yaml
login:
  type: url_direct      # Required: enables URL-based authentication
  url_argument: workerId  # Required: MTurk uses 'workerId' parameter
```

### Recommended Settings

```yaml
hide_navbar: true           # Prevent workers from skipping
jumping_to_id_disabled: true
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
task_description: "Brief description for the preview page."
completion_code: "YOUR_CODE"
```

## Testing in Sandbox

Always test in the MTurk Sandbox before going to production.

### Sandbox URLs

| Service | URL |
|---------|-----|
| Requester | https://requestersandbox.mturk.com |
| Worker | https://workersandbox.mturk.com |
| API Endpoint | https://mturk-requester-sandbox.us-east-1.amazonaws.com |

### Local Testing

Test the MTurk URL parameters locally:

```bash
# Test normal workflow
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=TEST_ASSIGNMENT&hitId=TEST_HIT"

# Test preview mode
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=ASSIGNMENT_ID_NOT_AVAILABLE&hitId=TEST_HIT"
```

## MTurk API Integration (Optional)

For advanced features, enable MTurk API integration:

```bash
pip install boto3
```

Create `configs/mturk_config.yaml`:

```yaml
aws_access_key_id: "YOUR_ACCESS_KEY"
aws_secret_access_key: "YOUR_SECRET_KEY"
sandbox: true  # Set to false for production
hit_id: "YOUR_HIT_ID"
```

Enable in your main config:

```yaml
mturk:
  enabled: true
  config_file_path: configs/mturk_config.yaml
```

### Creating HITs Programmatically

```python
import boto3

mturk = boto3.client(
    'mturk',
    region_name='us-east-1',
    endpoint_url='https://mturk-requester-sandbox.us-east-1.amazonaws.com'
)

question_xml = '''<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com:8080/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}&amp;turkSubmitTo=${turkSubmitTo}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>'''

response = mturk.create_hit(
    Title='Sentiment Classification Task',
    Description='Classify the sentiment of short text snippets.',
    Keywords='sentiment, classification, text',
    Reward='0.50',
    MaxAssignments=100,
    LifetimeInSeconds=86400,
    AssignmentDurationInSeconds=3600,
    AutoApprovalDelayInSeconds=604800,
    Question=question_xml
)

print(f"Created HIT: {response['HIT']['HITId']}")
```

## Best Practices

### Task Design

1. **Clear Instructions**: Provide detailed examples
2. **Reasonable Time**: Don't rush workers
3. **Fair Pay**: At least minimum wage equivalent ($12-15/hour)
4. **Manageable Length**: 5-15 minutes per HIT is ideal

### Quality Control

1. **Qualification Tests**: Screen workers upfront
2. **Attention Checks**: Include verification questions
3. **Redundancy**: Multiple workers per item (3+ recommended)
4. **Review Samples**: Manually check a subset

### Technical

1. **Handle Edge Cases**: Workers may reload or go back
2. **Save Progress**: Autosave if possible
3. **Graceful Errors**: Show helpful error messages

## Troubleshooting

### Workers See Preview Page After Accepting

- Verify `assignmentId` parameter is being passed correctly
- The preview page auto-refreshes; ask workers to wait

### Submit Button Doesn't Work

- Check browser console for errors
- Verify `turkSubmitTo` parameter is present
- Check for CORS or mixed-content issues

### Workers Can't Log In

- Verify `login.url_argument` is set to `workerId`
- Ensure `login.type` is `url_direct`

## Further Reading

- [Crowdsourcing Integration](/docs/deployment/crowdsourcing) - General crowdsourcing setup
- [Quality Control](/docs/features/quality-control) - Attention checks and gold standards
- [Task Assignment](/docs/features/task-assignment) - Assignment strategies

For implementation details, see the [source documentation](https://github.com/davidjurgens/potato/blob/main/docs/mturk_integration.md).