众包集成

与 Prolific、MTurk 和其他众包平台进行集成。

Potato 与 Prolific 和 Amazon Mechanical Turk 等众包平台无缝集成，用于大规模标注任务。

Prolific 集成

基本设置

yaml

crowdsourcing:
  platform: prolific
  enabled: true
  completion_code: "POTATO2024"  # Code shown on completion

URL 参数

Prolific 通过 URL 参数传递参与者信息：

yaml

crowdsourcing:
  platform: prolific
  url_params:
    - PROLIFIC_PID    # Participant ID
    - STUDY_ID        # Study ID
    - SESSION_ID      # Session ID

工作者通过以下方式访问：

text

https://your-server.com/?PROLIFIC_PID=xxx&STUDY_ID=xxx&SESSION_ID=xxx

Prolific 配置

在你的 Prolific 研究设置中：

将研究 URL设置为你的 Potato 服务器
添加 URL 参数：?PROLIFIC_PID={{%PROLIFIC_PID%}}&STUDY_ID={{%STUDY_ID%}}&SESSION_ID={{%SESSION_ID%}}
将完成代码设置为与你的配置匹配

验证

验证 Prolific 参与者：

yaml

crowdsourcing:
  platform: prolific
  validate_participant: true
  completion_code: "POTATO2024"

Amazon MTurk 集成

基本设置

yaml

crowdsourcing:
  platform: mturk
  enabled: true

HIT 配置

创建指向你服务器的外部问题 HIT：

xml

<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>

URL 参数

yaml

crowdsourcing:
  platform: mturk
  url_params:
    - workerId
    - assignmentId
    - hitId

沙盒测试

先用 MTurk 沙盒进行测试：

yaml

crowdsourcing:
  platform: mturk
  sandbox: true  # Use sandbox environment

工作者管理

追踪工作者

yaml

crowdsourcing:
  track_workers: true
  worker_id_field: worker_id

限制每个工作者的实例数

yaml

instances_per_annotator: 50

阻止重复工作者

防止工作者重复参与任务：

yaml

crowdsourcing:
  prevent_retakes: true

质量控制

注意力检查

插入测试问题：

yaml

attention_checks:
  enabled: true
  frequency: 10  # Every 10 instances
  fail_threshold: 2
  action: warn  # or 'block'

金标准问题

json

{
  "id": "gold_1",
  "text": "The sky is typically blue during a clear day.",
  "gold_label": "True",
  "is_gold": true
}

yaml

quality_control:
  gold_questions: true
  gold_percentage: 10  # 10% of instances
  min_gold_accuracy: 70

时间限制

yaml

crowdsourcing:
  min_time_per_instance: 5  # seconds
  max_time_total: 3600  # 1 hour

拒绝低质量工作

yaml

quality_control:
  auto_reject:
    enabled: true
    conditions:
      - gold_accuracy_below: 50
      - completion_time_under: 300  # seconds

完成处理

显示完成代码

yaml

completion:
  show_code: true
  code: "POTATO2024"
  message: "Thank you! Your completion code is: {code}"

完成后重定向

yaml

completion:
  redirect: true
  redirect_url: "https://prolific.co/submissions/complete?cc={code}"

自定义完成页面

yaml

completion:
  custom_template: templates/completion.html

支付等级

基于质量

yaml

payment:
  tiers:
    - name: bonus
      condition:
        gold_accuracy_above: 90
      amount: 0.50
    - name: standard
      condition:
        gold_accuracy_above: 70
      amount: 0.00
    - name: reject
      condition:
        gold_accuracy_below: 50

完整示例：Prolific 研究

yaml

task_name: "Sentiment Analysis Study"
 
# Crowdsourcing settings
crowdsourcing:
  platform: prolific
  enabled: true
  completion_code: "SENT2024"
  url_params:
    - PROLIFIC_PID
    - STUDY_ID
    - SESSION_ID
  prevent_retakes: true
 
# Open access for crowdworkers
allow_all_users: true
 
# Task assignment
instances_per_annotator: 50
annotation_per_instance: 3
 
# Quality control
attention_checks:
  enabled: true
  frequency: 10
  fail_threshold: 2
 
quality_control:
  gold_questions: true
  gold_percentage: 5
  min_gold_accuracy: 70
 
# Data
data_files:
  - path: data/main.json
    text_field: text
 
# Annotation scheme
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment?"
    labels:
      - Positive
      - Negative
      - Neutral
    keyboard_shortcuts:
      Positive: "1"
      Negative: "2"
      Neutral: "3"
 
# Completion
completion:
  show_code: true
  code: "SENT2024"
  message: |
    ## Thank you for participating!
 
    Your completion code is: **{code}**
 
    Please return to Prolific and enter this code to receive payment.

完整示例：MTurk HIT

yaml

task_name: "Image Classification HIT"
 
crowdsourcing:
  platform: mturk
  enabled: true
  url_params:
    - workerId
    - assignmentId
    - hitId
 
allow_all_users: true
instances_per_annotator: 20
 
# Time constraints
crowdsourcing:
  min_time_per_instance: 3
  max_time_total: 1800
 
# MTurk form submission
completion:
  mturk_submit: true
  submit_url: "https://www.mturk.com/mturk/externalSubmit"
 
annotation_schemes:
  - annotation_type: radio
    name: category
    description: "What is shown in this image?"
    labels:
      - Cat
      - Dog
      - Bird
      - Other

监控工作者

管理员仪表板

yaml

admin_users:
  - researcher@university.edu
 
admin_dashboard:
  enabled: true
  show_worker_stats: true

访问 /admin 可查看：

工作者完成率
每个实例的平均时间
金标准准确率
注意力检查结果

导出工作者数据

bash

potato export-workers config.yaml --output workers.csv

最佳实践

充分测试 - 先用小组进行试点
设定公平薪酬 - 计算预估时间并公平支付
清晰的说明 - 包含示例和边界情况
使用注意力检查 - 捕获随机点击行为
包含金标准问题 - 验证理解程度
实时监控 - 尽早发现问题
计划拒绝策略 - 预先设定明确的质量标准
沟通问题 - 就问题与工作者联系
根据反馈迭代 - 根据工作者意见进行改进
定期导出数据 - 不要等到最后

众包集成

Prolific 集成

基本设置

URL 参数

Prolific 配置

验证

Amazon MTurk 集成

基本设置

HIT 配置

URL 参数

沙盒测试

工作者管理

追踪工作者

限制每个工作者的实例数

阻止重复工作者

质量控制

注意力检查

金标准问题

时间限制

拒绝低质量工作

完成处理

显示完成代码

完成后重定向

自定义完成页面

支付等级

基于质量

完整示例：Prolific 研究

完整示例：MTurk HIT

监控工作者

管理员仪表板

导出工作者数据

最佳实践

延伸阅读