MTurk 集成

在 Amazon Mechanical Turk 上部署标注任务。

本指南提供在 Amazon Mechanical Turk (MTurk) 上部署 Potato 标注任务的说明。

概述

Potato 通过外部问题 HIT 类型与 MTurk 集成：

你在 MTurk 上创建一个指向 Potato 服务器的外部问题 HIT
工作者点击你的 HIT 并被重定向到你的 Potato 服务器
Potato 从 URL 中提取工作者 ID 和其他参数
工作者完成标注任务
完成后，工作者点击"提交 HIT 到 MTurk"

URL 参数

MTurk 向你的外部问题 URL 传递四个参数：

参数	描述
`workerId`	工作者的唯一 MTurk 标识符
`assignmentId`	该工作者-HIT 对的唯一 ID
`hitId`	HIT 标识符
`turkSubmitTo`	完成表单应 POST 到的 URL

前提条件

服务器要求

可公开访问的服务器，需要：
- 开放端口（通常为 8080 或 443）
- 建议使用 HTTPS（某些浏览器必需）
- 稳定的网络连接
安装了 Potato 的 Python 环境

MTurk 要求

MTurk 请求者账户：在 requester.mturk.com 注册
已充值账户：生产环境需要充值（沙盒免费）

快速开始

步骤 1：创建 Potato 配置

yaml

# mturk_task.yaml
annotation_task_name: "Sentiment Classification"
task_description: "Classify the sentiment of short text snippets."
 
# MTurk login configuration
login:
  type: url_direct
  url_argument: workerId
 
# Optional completion code
completion_code: "TASK_COMPLETE"
 
# Crowdsourcing settings
hide_navbar: true
jumping_to_id_disabled: true
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
 
# Data files
data_files:
  - data/items.json
 
# Annotation scheme
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment of this text?"
    labels:
      - positive
      - neutral
      - negative

步骤 2：启动服务器

bash

# Start the server
potato start mturk_task.yaml -p 8080
 
# Or with HTTPS (recommended)
potato start mturk_task.yaml -p 443 --ssl-cert cert.pem --ssl-key key.pem

步骤 3：在 MTurk 上创建 HIT

使用以下 XML 模板创建外部问题 HIT：

xml

<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com:8080/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}&amp;turkSubmitTo=${turkSubmitTo}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>

重要提示：在 XML 中使用 & 代替 &。

配置参考

必需设置

yaml

login:
  type: url_direct      # Required: enables URL-based authentication
  url_argument: workerId  # Required: MTurk uses 'workerId' parameter

在沙盒中测试

在上线生产环境之前，务必在 MTurk 沙盒中进行测试。

沙盒 URL

服务	URL
请求者	https://requestersandbox.mturk.com
工作者	https://workersandbox.mturk.com
API 端点	https://mturk-requester-sandbox.us-east-1.amazonaws.com

本地测试

本地测试 MTurk URL 参数：

bash

# Test normal workflow
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=TEST_ASSIGNMENT&hitId=TEST_HIT"
 
# Test preview mode
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=ASSIGNMENT_ID_NOT_AVAILABLE&hitId=TEST_HIT"

MTurk API 集成（可选）

对于高级功能，启用 MTurk API 集成：

bash

pip install boto3

创建 configs/mturk_config.yaml：

yaml

aws_access_key_id: "YOUR_ACCESS_KEY"
aws_secret_access_key: "YOUR_SECRET_KEY"
sandbox: true  # Set to false for production
hit_id: "YOUR_HIT_ID"

在主配置中启用：

yaml

mturk:
  enabled: true
  config_file_path: configs/mturk_config.yaml

以编程方式创建 HIT

python

import boto3
 
mturk = boto3.client(
    'mturk',
    region_name='us-east-1',
    endpoint_url='https://mturk-requester-sandbox.us-east-1.amazonaws.com'
)
 
question_xml = '''<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com:8080/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}&amp;turkSubmitTo=${turkSubmitTo}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>'''
 
response = mturk.create_hit(
    Title='Sentiment Classification Task',
    Description='Classify the sentiment of short text snippets.',
    Keywords='sentiment, classification, text',
    Reward='0.50',
    MaxAssignments=100,
    LifetimeInSeconds=86400,
    AssignmentDurationInSeconds=3600,
    AutoApprovalDelayInSeconds=604800,
    Question=question_xml
)
 
print(f"Created HIT: {response['HIT']['HITId']}")

最佳实践

任务设计

清晰的说明：提供详细示例
合理的时间：不要催促工作者
公平的报酬：至少等同最低工资（$12-15/小时）
适当的长度：每个 HIT 5-15 分钟为理想

质量控制

资格测试：预先筛选工作者
注意力检查：包含验证问题
冗余标注：每个项目多个工作者（建议 3+）
抽样检查：人工检查一部分

技术方面

处理边界情况：工作者可能会刷新或返回
保存进度：尽可能自动保存
优雅的错误处理：显示有帮助的错误信息

故障排除

工作者接受后仍看到预览页面

验证 assignmentId 参数是否正确传递
预览页面会自动刷新；请工作者稍等

提交按钮不起作用

检查浏览器控制台的错误信息
验证 turkSubmitTo 参数是否存在
检查 CORS 或混合内容问题

工作者无法登录

验证 login.url_argument 设置为 workerId
确保 login.type 为 url_direct

MTurk 集成

概述

URL 参数

前提条件

服务器要求

MTurk 要求

快速开始

步骤 1：创建 Potato 配置

步骤 2：启动服务器

步骤 3：在 MTurk 上创建 HIT

配置参考

必需设置

推荐设置

在沙盒中测试

沙盒 URL

本地测试

MTurk API 集成（可选）

以编程方式创建 HIT

最佳实践

任务设计

质量控制

技术方面

故障排除

工作者接受后仍看到预览页面

提交按钮不起作用

工作者无法登录

延伸阅读