众包集成
与 Prolific、MTurk 和其他众包平台进行集成。
众包集成
Potato 与 Prolific 和 Amazon Mechanical Turk 等众包平台无缝集成,用于大规模标注任务。
Prolific 集成
基本设置
yaml
crowdsourcing:
platform: prolific
enabled: true
completion_code: "POTATO2024" # Code shown on completionURL 参数
Prolific 通过 URL 参数传递参与者信息:
yaml
crowdsourcing:
platform: prolific
url_params:
- PROLIFIC_PID # Participant ID
- STUDY_ID # Study ID
- SESSION_ID # Session ID工作者通过以下方式访问:
text
https://your-server.com/?PROLIFIC_PID=xxx&STUDY_ID=xxx&SESSION_ID=xxx
Prolific 配置
在你的 Prolific 研究设置中:
- 将研究 URL设置为你的 Potato 服务器
- 添加 URL 参数:
?PROLIFIC_PID={{%PROLIFIC_PID%}}&STUDY_ID={{%STUDY_ID%}}&SESSION_ID={{%SESSION_ID%}} - 将完成代码设置为与你的配置匹配
验证
验证 Prolific 参与者:
yaml
crowdsourcing:
platform: prolific
validate_participant: true
completion_code: "POTATO2024"Amazon MTurk 集成
基本设置
yaml
crowdsourcing:
platform: mturk
enabled: trueHIT 配置
创建指向你服务器的外部问题 HIT:
xml
<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
<ExternalURL>https://your-server.com/?workerId=${workerId}&assignmentId=${assignmentId}&hitId=${hitId}</ExternalURL>
<FrameHeight>800</FrameHeight>
</ExternalQuestion>URL 参数
yaml
crowdsourcing:
platform: mturk
url_params:
- workerId
- assignmentId
- hitId沙盒测试
先用 MTurk 沙盒进行测试:
yaml
crowdsourcing:
platform: mturk
sandbox: true # Use sandbox environment工作者管理
追踪工作者
yaml
crowdsourcing:
track_workers: true
worker_id_field: worker_id限制每个工作者的实例数
yaml
instances_per_annotator: 50阻止重复工作者
防止工作者重复参与任务:
yaml
crowdsourcing:
prevent_retakes: true质量控制
注意力检查
插入测试问题:
yaml
attention_checks:
enabled: true
frequency: 10 # Every 10 instances
fail_threshold: 2
action: warn # or 'block'金标准问题
json
{
"id": "gold_1",
"text": "The sky is typically blue during a clear day.",
"gold_label": "True",
"is_gold": true
}yaml
quality_control:
gold_questions: true
gold_percentage: 10 # 10% of instances
min_gold_accuracy: 70时间限制
yaml
crowdsourcing:
min_time_per_instance: 5 # seconds
max_time_total: 3600 # 1 hour拒绝低质量工作
yaml
quality_control:
auto_reject:
enabled: true
conditions:
- gold_accuracy_below: 50
- completion_time_under: 300 # seconds完成处理
显示完成代码
yaml
completion:
show_code: true
code: "POTATO2024"
message: "Thank you! Your completion code is: {code}"完成后重定向
yaml
completion:
redirect: true
redirect_url: "https://prolific.co/submissions/complete?cc={code}"自定义完成页面
yaml
completion:
custom_template: templates/completion.html支付等级
基于质量
yaml
payment:
tiers:
- name: bonus
condition:
gold_accuracy_above: 90
amount: 0.50
- name: standard
condition:
gold_accuracy_above: 70
amount: 0.00
- name: reject
condition:
gold_accuracy_below: 50完整示例:Prolific 研究
yaml
task_name: "Sentiment Analysis Study"
# Crowdsourcing settings
crowdsourcing:
platform: prolific
enabled: true
completion_code: "SENT2024"
url_params:
- PROLIFIC_PID
- STUDY_ID
- SESSION_ID
prevent_retakes: true
# Open access for crowdworkers
allow_all_users: true
# Task assignment
instances_per_annotator: 50
annotation_per_instance: 3
# Quality control
attention_checks:
enabled: true
frequency: 10
fail_threshold: 2
quality_control:
gold_questions: true
gold_percentage: 5
min_gold_accuracy: 70
# Data
data_files:
- path: data/main.json
text_field: text
# Annotation scheme
annotation_schemes:
- annotation_type: radio
name: sentiment
description: "What is the sentiment?"
labels:
- Positive
- Negative
- Neutral
keyboard_shortcuts:
Positive: "1"
Negative: "2"
Neutral: "3"
# Completion
completion:
show_code: true
code: "SENT2024"
message: |
## Thank you for participating!
Your completion code is: **{code}**
Please return to Prolific and enter this code to receive payment.完整示例:MTurk HIT
yaml
task_name: "Image Classification HIT"
crowdsourcing:
platform: mturk
enabled: true
url_params:
- workerId
- assignmentId
- hitId
allow_all_users: true
instances_per_annotator: 20
# Time constraints
crowdsourcing:
min_time_per_instance: 3
max_time_total: 1800
# MTurk form submission
completion:
mturk_submit: true
submit_url: "https://www.mturk.com/mturk/externalSubmit"
annotation_schemes:
- annotation_type: radio
name: category
description: "What is shown in this image?"
labels:
- Cat
- Dog
- Bird
- Other监控工作者
管理员仪表板
yaml
admin_users:
- researcher@university.edu
admin_dashboard:
enabled: true
show_worker_stats: true访问 /admin 可查看:
- 工作者完成率
- 每个实例的平均时间
- 金标准准确率
- 注意力检查结果
导出工作者数据
bash
potato export-workers config.yaml --output workers.csv最佳实践
- 充分测试 - 先用小组进行试点
- 设定公平薪酬 - 计算预估时间并公平支付
- 清晰的说明 - 包含示例和边界情况
- 使用注意力检查 - 捕获随机点击行为
- 包含金标准问题 - 验证理解程度
- 实时监控 - 尽早发现问题
- 计划拒绝策略 - 预先设定明确的质量标准
- 沟通问题 - 就问题与工作者联系
- 根据反馈迭代 - 根据工作者意见进行改进
- 定期导出数据 - 不要等到最后
延伸阅读
有关实现细节,请参阅源代码文档。