Skip to content

标注方案

定义标注者将如何标注您的数据。

标注方案

标注方案定义了标注者的标注任务。Potato 2.0 支持十四种标注类型,可以组合创建复杂的标注任务。

基本结构

每个方案在 annotation_schemes 数组中定义:

yaml
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment?"
    labels:
      - Positive
      - Negative
      - Neutral

必填字段

字段描述
annotation_type标注类型(radio、multiselect、likert、span、text、number、slider、multirate)
name内部标识符(无空格,用于输出)
description显示给标注者的说明

支持的标注类型

1. 单选(Radio)

从列表中选择一个选项:

yaml
- annotation_type: radio
  name: sentiment
  description: "What is the sentiment of this text?"
  labels:
    - Positive
    - Negative
    - Neutral
 
  # Optional features
  keyboard_shortcuts:
    Positive: "1"
    Negative: "2"
    Neutral: "3"
 
  # Or use sequential binding (1, 2, 3... automatically)
  sequential_key_binding: true
 
  # Horizontal layout instead of vertical
  horizontal: true

2. 李克特量表(Likert Scale)

带有端点标签的评分量表:

yaml
- annotation_type: likert
  name: agreement
  description: "How much do you agree with this statement?"
  size: 5  # Number of scale points
  min_label: "Strongly Disagree"
  max_label: "Strongly Agree"
 
  # Optional mid-point label
  mid_label: "Neutral"
 
  # Show numeric values
  show_numbers: true

3. 多选(Multiselect)

从列表中选择多个选项:

yaml
- annotation_type: multiselect
  name: topics
  description: "Select all relevant topics"
  labels:
    - Politics
    - Technology
    - Sports
    - Entertainment
    - Science
 
  # Selection constraints
  min_selections: 1
  max_selections: 3
 
  # Allow free text response
  free_response: true
  free_response_label: "Other (specify)"

4. 片段标注(Span Annotation)

高亮并标记文本片段:

yaml
- annotation_type: span
  name: entities
  description: "Highlight named entities in the text"
  labels:
    - PERSON
    - ORGANIZATION
    - LOCATION
    - DATE
 
  # Visual customization
  label_colors:
    PERSON: "#3b82f6"
    ORGANIZATION: "#10b981"
    LOCATION: "#f59e0b"
    DATE: "#8b5cf6"
 
  # Allow overlapping spans
  allow_overlapping: false
 
  # Keyboard shortcuts for labels
  sequential_key_binding: true

5. 滑块(Slider)

连续数值范围:

yaml
- annotation_type: slider
  name: confidence
  description: "How confident are you in your answer?"
  min: 0
  max: 100
  step: 1
  default: 50
 
  # Endpoint labels
  min_label: "Not confident"
  max_label: "Very confident"
 
  # Show current value
  show_value: true

6. 文本输入(Text Input)

自由文本回复:

yaml
- annotation_type: text
  name: explanation
  description: "Explain your reasoning"
 
  # Multi-line input
  textarea: true
 
  # Character limits
  min_length: 10
  max_length: 500
 
  # Placeholder text
  placeholder: "Enter your explanation here..."
 
  # Disable paste (for transcription tasks)
  disable_paste: true

7. 数字输入(Number Input)

带约束的数字输入:

yaml
- annotation_type: number
  name: count
  description: "How many entities are mentioned?"
  min: 0
  max: 100
  step: 1
  default: 0

8. 多项评分(Multirate)

在同一量表上对多个项目进行评分:

yaml
- annotation_type: multirate
  name: quality_aspects
  description: "Rate each aspect of the response"
  items:
    - Accuracy
    - Clarity
    - Completeness
    - Relevance
  size: 5  # Scale points
  min_label: "Poor"
  max_label: "Excellent"
 
  # Randomize item order
  randomize: true
 
  # Layout options
  compact: false

通用选项

键盘快捷键

通过键盘绑定加速标注:

yaml
# Manual shortcuts
keyboard_shortcuts:
  Positive: "1"
  Negative: "2"
  Neutral: "3"
 
# Or automatic sequential binding
sequential_key_binding: true  # Assigns 1, 2, 3...

工具提示

为标签提供悬停提示:

yaml
tooltips:
  Positive: "Expresses happiness, approval, or satisfaction"
  Negative: "Expresses sadness, anger, or disappointment"
  Neutral: "No clear emotional content"

标签颜色

自定义颜色以进行视觉区分:

yaml
label_colors:
  PERSON: "#3b82f6"
  LOCATION: "#10b981"
  ORGANIZATION: "#f59e0b"

必填字段

在提交前使方案成为必填项:

yaml
- annotation_type: radio
  name: sentiment
  required: true
  labels:
    - Positive
    - Negative

多方案组合

每个实例组合多种标注类型:

yaml
annotation_schemes:
  # Primary classification
  - annotation_type: radio
    name: sentiment
    description: "Overall sentiment"
    labels:
      - Positive
      - Negative
      - Neutral
    required: true
    sequential_key_binding: true
 
  # Confidence rating
  - annotation_type: likert
    name: confidence
    description: "How confident are you?"
    size: 5
    min_label: "Guessing"
    max_label: "Certain"
 
  # Topic tags
  - annotation_type: multiselect
    name: topics
    description: "Select all relevant topics"
    labels:
      - Politics
      - Technology
      - Sports
      - Entertainment
    free_response: true
 
  # Notes
  - annotation_type: text
    name: notes
    description: "Any additional observations?"
    textarea: true
    required: false

高级功能

成对比较

比较两个项目:

yaml
- annotation_type: pairwise
  name: preference
  description: "Which response is better?"
  options:
    - label: "Response A"
      value: "A"
    - label: "Response B"
      value: "B"
    - label: "Equal"
      value: "tie"
 
  # Allow tie selection
  allow_tie: true

最佳-最差缩放

通过选择最佳和最差来排列项目:

yaml
- annotation_type: best_worst
  name: ranking
  description: "Select the best and worst items"
  # Items come from the data file

下拉选择

节省空间的单选:

yaml
- annotation_type: select
  name: category
  description: "Select a category"
  labels:
    - Category A
    - Category B
    - Category C
    - Category D
    - Category E
 
  # Default selection
  default: "Category A"

数据格式参考

输入

标注方案与您的数据格式配合使用:

json
{
  "id": "doc_1",
  "text": "This is the text to annotate."
}

输出

标注以方案名称为键保存:

json
{
  "id": "doc_1",
  "annotations": {
    "sentiment": "Positive",
    "confidence": 4,
    "topics": ["Technology", "Science"],
    "entities": [
      {"start": 0, "end": 4, "label": "ORGANIZATION", "text": "This"}
    ],
    "notes": "Clear positive sentiment about technology."
  }
}

最佳实践

1. 清晰的标签

使用明确、有区分度的标签:

yaml
# Good
labels:
  - Strongly Positive
  - Somewhat Positive
  - Neutral
  - Somewhat Negative
  - Strongly Negative
 
# Avoid
labels:
  - Good
  - OK
  - Fine
  - Acceptable

2. 有用的工具提示

为有细微差别的标签添加工具提示:

yaml
tooltips:
  Sarcasm: "The text says the opposite of what it means"
  Irony: "A mismatch between expectation and reality"

3. 键盘快捷键

为大批量任务启用快捷键:

yaml
sequential_key_binding: true

4. 逻辑顺序

始终如一地排列标签:

  • 最常见的放在首位
  • 按字母顺序排列
  • 按强度排列(从低到高)

5. 限制选项数量

过多的选择会减慢标注速度:

  • 单选:2-7 个选项
  • 多选:5-15 个选项
  • 李克特量表:5-7 个等级

6. 先行测试

在部署之前自己标注几个示例,以发现:

  • 含糊的标签
  • 遗漏的类别
  • 不清晰的说明