Skip to content

多阶段工作流

使用调查、培训和分支逻辑构建复杂的标注工作流。

多阶段工作流

Potato 2.0 支持结构化的标注工作流,包含多个顺序阶段:知情同意、研究前调查、说明、培训、标注和研究后反馈。

可用阶段

阶段描述
consent知情同意收集
prestudy标注前调查(人口统计、筛查)
instructions任务指南和信息
training带反馈的练习题
annotation主要标注任务(必需)
poststudy标注后调查和反馈

基本配置

在配置中使用 phases 部分:

yaml
phases:
  consent:
    enabled: true
    data_file: "data/consent.json"
 
  prestudy:
    enabled: true
    data_file: "data/demographics.json"
 
  instructions:
    enabled: true
    content: "data/instructions.html"
 
  training:
    enabled: true
    data_file: "data/training.json"
    schema_name: sentiment
    passing_criteria:
      min_correct: 8
 
  # annotation phase is always enabled
 
  poststudy:
    enabled: true
    data_file: "data/feedback.json"

调查问题类型

调查阶段支持以下问题类型:

单选(Radio)

json
{
  "name": "experience",
  "type": "radio",
  "description": "How much annotation experience do you have?",
  "labels": ["None", "Some (< 10 hours)", "Moderate", "Extensive"],
  "required": true
}

复选框/多选

json
{
  "name": "languages",
  "type": "checkbox",
  "description": "What languages do you speak fluently?",
  "labels": ["English", "Spanish", "French", "German", "Chinese", "Other"]
}

文本输入

json
{
  "name": "occupation",
  "type": "text",
  "description": "What is your occupation?",
  "required": true
}

数字输入

json
{
  "name": "years_experience",
  "type": "number",
  "description": "Years of professional experience",
  "min": 0,
  "max": 50
}

Likert 量表

json
{
  "name": "familiarity",
  "type": "likert",
  "description": "How familiar are you with this topic?",
  "size": 5,
  "min_label": "Not familiar",
  "max_label": "Very familiar"
}

下拉选择

json
{
  "name": "country",
  "type": "select",
  "description": "Select your country",
  "labels": ["USA", "Canada", "UK", "Germany", "France", "Other"]
}

知情同意阶段

在开始前收集知情同意:

yaml
phases:
  consent:
    enabled: true
    data_file: "data/consent.json"

consent.json:

json
[
  {
    "name": "consent_agreement",
    "type": "radio",
    "description": "I have read and understood the research consent form and agree to participate.",
    "labels": ["I agree", "I do not agree"],
    "right_label": "I agree",
    "required": true
  }
]

right_label 字段指定继续所需的答案。

研究前调查

收集人口统计或筛查问题:

yaml
phases:
  prestudy:
    enabled: true
    data_file: "data/demographics.json"

demographics.json:

json
[
  {
    "name": "age_range",
    "type": "radio",
    "description": "What is your age range?",
    "labels": ["18-24", "25-34", "35-44", "45-54", "55+"],
    "required": true
  },
  {
    "name": "education",
    "type": "radio",
    "description": "Highest level of education completed",
    "labels": ["High school", "Bachelor's degree", "Master's degree", "Doctoral degree", "Other"],
    "required": true
  },
  {
    "name": "english_native",
    "type": "radio",
    "description": "Is English your native language?",
    "labels": ["Yes", "No"],
    "required": true
  }
]

说明阶段

显示任务说明:

yaml
phases:
  instructions:
    enabled: true
    content: "data/instructions.html"

或使用内联内容:

yaml
phases:
  instructions:
    enabled: true
    inline_content: |
      <h2>Task Instructions</h2>
      <p>In this task, you will classify the sentiment of product reviews.</p>
      <ul>
        <li><strong>Positive:</strong> Expresses satisfaction or praise</li>
        <li><strong>Negative:</strong> Expresses dissatisfaction or criticism</li>
        <li><strong>Neutral:</strong> Factual or mixed sentiment</li>
      </ul>

培训阶段

带反馈的练习题(详见培训阶段):

yaml
phases:
  training:
    enabled: true
    data_file: "data/training.json"
    schema_name: sentiment
    passing_criteria:
      min_correct: 8
      total_questions: 10
    show_explanations: true

研究后调查

标注后收集反馈:

yaml
phases:
  poststudy:
    enabled: true
    data_file: "data/feedback.json"

feedback.json:

json
[
  {
    "name": "difficulty",
    "type": "likert",
    "description": "How difficult was this task?",
    "size": 5,
    "min_label": "Very easy",
    "max_label": "Very difficult"
  },
  {
    "name": "clarity",
    "type": "likert",
    "description": "How clear were the instructions?",
    "size": 5,
    "min_label": "Very unclear",
    "max_label": "Very clear"
  },
  {
    "name": "suggestions",
    "type": "text",
    "description": "Any suggestions for improvement?",
    "textarea": true,
    "required": false
  }
]

内置模板

Potato 包含常见调查问题的预定义标签集:

模板标签
countries国家列表
languages常见语言
ethnicity民族选项
religion宗教选项

在问题中使用模板:

json
{
  "name": "country",
  "type": "select",
  "description": "Select your country",
  "template": "countries"
}

自由文本字段

在结构化问题旁添加可选的文本输入:

json
{
  "name": "topics",
  "type": "checkbox",
  "description": "Which topics interest you?",
  "labels": ["Technology", "Sports", "Politics", "Entertainment"],
  "free_response": true,
  "free_response_label": "Other (please specify)"
}

页面标题

自定义调查部分标题:

json
{
  "page_header": "Demographics Survey",
  "questions": [
    {"name": "age", "type": "radio", ...},
    {"name": "gender", "type": "radio", ...}
  ]
}

完整示例

yaml
task_name: "Sentiment Analysis Study"
task_dir: "."
port: 8000
 
# Data configuration
data_files:
  - "data/reviews.json"
 
item_properties:
  id_key: id
  text_key: text
 
# Annotation scheme
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment of this review?"
    labels:
      - Positive
      - Negative
      - Neutral
    sequential_key_binding: true
 
# Multi-phase workflow
phases:
  consent:
    enabled: true
    data_file: "data/consent.json"
 
  prestudy:
    enabled: true
    data_file: "data/demographics.json"
 
  instructions:
    enabled: true
    content: "data/instructions.html"
 
  training:
    enabled: true
    data_file: "data/training.json"
    schema_name: sentiment
    passing_criteria:
      min_correct: 8
      total_questions: 10
    retries:
      enabled: true
      max_retries: 2
    show_explanations: true
 
  # annotation phase is always enabled
 
  poststudy:
    enabled: true
    data_file: "data/feedback.json"
 
# Output
output_annotation_dir: "output/"
output_annotation_format: "json"
 
# User access
allow_all_users: true

旧版配置

旧的 surveyflow 配置格式仍然支持向后兼容:

yaml
surveyflow:
  enabled: true
  phases:
    - name: pre_survey
      type: survey
      questions: survey_questions.json
    - name: main_annotation
      type: annotation

但是,我们建议新项目迁移到新的 phases 格式。

最佳实践

1. 保持调查简洁

过长的调查会降低完成率。只关注必要的问题。

2. 复杂任务使用培训

培训阶段提高标注质量,特别是对于细微差别的任务。

3. 设置合理的通过标准

yaml
# Too strict - may exclude good annotators
passing_criteria:
  require_all_correct: true
 
# Better - allows for learning
passing_criteria:
  min_correct: 8
  total_questions: 10

4. 提供清晰的说明

在说明阶段包含示例以明确预期。

5. 测试完整流程

部署前自己完成整个工作流以发现问题。

6. 明智使用必填字段

只在必要时将问题标记为必填——可选问题能获得更好的回答质量。

众包集成

对于 Prolific 或 MTurk,配置完成代码:

yaml
phases:
  poststudy:
    enabled: true
    data_file: "data/feedback.json"
    show_completion_code: true
    completion_code_format: "POTATO-{user_id}-{timestamp}"

详见众包