标注方案
定义标注者将如何标注您的数据。
标注方案
标注方案定义了标注者的标注任务。Potato 2.0 支持十四种标注类型,可以组合创建复杂的标注任务。
基本结构
每个方案在 annotation_schemes 数组中定义:
yaml
annotation_schemes:
- annotation_type: radio
name: sentiment
description: "What is the sentiment?"
labels:
- Positive
- Negative
- Neutral必填字段
| 字段 | 描述 |
|---|---|
annotation_type | 标注类型(radio、multiselect、likert、span、text、number、slider、multirate) |
name | 内部标识符(无空格,用于输出) |
description | 显示给标注者的说明 |
支持的标注类型
1. 单选(Radio)
从列表中选择一个选项:
yaml
- annotation_type: radio
name: sentiment
description: "What is the sentiment of this text?"
labels:
- Positive
- Negative
- Neutral
# Optional features
keyboard_shortcuts:
Positive: "1"
Negative: "2"
Neutral: "3"
# Or use sequential binding (1, 2, 3... automatically)
sequential_key_binding: true
# Horizontal layout instead of vertical
horizontal: true2. 李克特量表(Likert Scale)
带有端点标签的评分量表:
yaml
- annotation_type: likert
name: agreement
description: "How much do you agree with this statement?"
size: 5 # Number of scale points
min_label: "Strongly Disagree"
max_label: "Strongly Agree"
# Optional mid-point label
mid_label: "Neutral"
# Show numeric values
show_numbers: true3. 多选(Multiselect)
从列表中选择多个选项:
yaml
- annotation_type: multiselect
name: topics
description: "Select all relevant topics"
labels:
- Politics
- Technology
- Sports
- Entertainment
- Science
# Selection constraints
min_selections: 1
max_selections: 3
# Allow free text response
free_response: true
free_response_label: "Other (specify)"4. 片段标注(Span Annotation)
高亮并标记文本片段:
yaml
- annotation_type: span
name: entities
description: "Highlight named entities in the text"
labels:
- PERSON
- ORGANIZATION
- LOCATION
- DATE
# Visual customization
label_colors:
PERSON: "#3b82f6"
ORGANIZATION: "#10b981"
LOCATION: "#f59e0b"
DATE: "#8b5cf6"
# Allow overlapping spans
allow_overlapping: false
# Keyboard shortcuts for labels
sequential_key_binding: true5. 滑块(Slider)
连续数值范围:
yaml
- annotation_type: slider
name: confidence
description: "How confident are you in your answer?"
min: 0
max: 100
step: 1
default: 50
# Endpoint labels
min_label: "Not confident"
max_label: "Very confident"
# Show current value
show_value: true6. 文本输入(Text Input)
自由文本回复:
yaml
- annotation_type: text
name: explanation
description: "Explain your reasoning"
# Multi-line input
textarea: true
# Character limits
min_length: 10
max_length: 500
# Placeholder text
placeholder: "Enter your explanation here..."
# Disable paste (for transcription tasks)
disable_paste: true7. 数字输入(Number Input)
带约束的数字输入:
yaml
- annotation_type: number
name: count
description: "How many entities are mentioned?"
min: 0
max: 100
step: 1
default: 08. 多项评分(Multirate)
在同一量表上对多个项目进行评分:
yaml
- annotation_type: multirate
name: quality_aspects
description: "Rate each aspect of the response"
items:
- Accuracy
- Clarity
- Completeness
- Relevance
size: 5 # Scale points
min_label: "Poor"
max_label: "Excellent"
# Randomize item order
randomize: true
# Layout options
compact: false通用选项
键盘快捷键
通过键盘绑定加速标注:
yaml
# Manual shortcuts
keyboard_shortcuts:
Positive: "1"
Negative: "2"
Neutral: "3"
# Or automatic sequential binding
sequential_key_binding: true # Assigns 1, 2, 3...工具提示
为标签提供悬停提示:
yaml
tooltips:
Positive: "Expresses happiness, approval, or satisfaction"
Negative: "Expresses sadness, anger, or disappointment"
Neutral: "No clear emotional content"标签颜色
自定义颜色以进行视觉区分:
yaml
label_colors:
PERSON: "#3b82f6"
LOCATION: "#10b981"
ORGANIZATION: "#f59e0b"必填字段
在提交前使方案成为必填项:
yaml
- annotation_type: radio
name: sentiment
required: true
labels:
- Positive
- Negative多方案组合
每个实例组合多种标注类型:
yaml
annotation_schemes:
# Primary classification
- annotation_type: radio
name: sentiment
description: "Overall sentiment"
labels:
- Positive
- Negative
- Neutral
required: true
sequential_key_binding: true
# Confidence rating
- annotation_type: likert
name: confidence
description: "How confident are you?"
size: 5
min_label: "Guessing"
max_label: "Certain"
# Topic tags
- annotation_type: multiselect
name: topics
description: "Select all relevant topics"
labels:
- Politics
- Technology
- Sports
- Entertainment
free_response: true
# Notes
- annotation_type: text
name: notes
description: "Any additional observations?"
textarea: true
required: false高级功能
成对比较
比较两个项目:
yaml
- annotation_type: pairwise
name: preference
description: "Which response is better?"
options:
- label: "Response A"
value: "A"
- label: "Response B"
value: "B"
- label: "Equal"
value: "tie"
# Allow tie selection
allow_tie: true最佳-最差缩放
通过选择最佳和最差来排列项目:
yaml
- annotation_type: best_worst
name: ranking
description: "Select the best and worst items"
# Items come from the data file下拉选择
节省空间的单选:
yaml
- annotation_type: select
name: category
description: "Select a category"
labels:
- Category A
- Category B
- Category C
- Category D
- Category E
# Default selection
default: "Category A"数据格式参考
输入
标注方案与您的数据格式配合使用:
json
{
"id": "doc_1",
"text": "This is the text to annotate."
}输出
标注以方案名称为键保存:
json
{
"id": "doc_1",
"annotations": {
"sentiment": "Positive",
"confidence": 4,
"topics": ["Technology", "Science"],
"entities": [
{"start": 0, "end": 4, "label": "ORGANIZATION", "text": "This"}
],
"notes": "Clear positive sentiment about technology."
}
}最佳实践
1. 清晰的标签
使用明确、有区分度的标签:
yaml
# Good
labels:
- Strongly Positive
- Somewhat Positive
- Neutral
- Somewhat Negative
- Strongly Negative
# Avoid
labels:
- Good
- OK
- Fine
- Acceptable2. 有用的工具提示
为有细微差别的标签添加工具提示:
yaml
tooltips:
Sarcasm: "The text says the opposite of what it means"
Irony: "A mismatch between expectation and reality"3. 键盘快捷键
为大批量任务启用快捷键:
yaml
sequential_key_binding: true4. 逻辑顺序
始终如一地排列标签:
- 最常见的放在首位
- 按字母顺序排列
- 按强度排列(从低到高)
5. 限制选项数量
过多的选择会减慢标注速度:
- 单选:2-7 个选项
- 多选:5-15 个选项
- 李克特量表:5-7 个等级
6. 先行测试
在部署之前自己标注几个示例,以发现:
- 含糊的标签
- 遗漏的类别
- 不清晰的说明