成对比较
比较成对项目,用于偏好和质量评估。
成对比较
成对比较允许标注者并排比较两个项目并表明其偏好。支持两种模式:
- 二元模式(默认):点击首选的选项卡(A 或 B),可选平局按钮
- 量表模式:使用滑块评估一个选项相对于另一个的偏好程度
常见使用场景包括比较模型输出、RLHF 偏好学习、翻译或摘要的质量比较以及 A/B 测试。
二元模式
二元模式显示两个可点击的选项卡。标注者点击其首选选项。
yaml
annotation_schemes:
- annotation_type: pairwise
name: preference
description: "Which response is better?"
mode: binary
# Data source - key in instance data containing items to compare
items_key: "responses"
# Display options
show_labels: true
labels:
- "Response A"
- "Response B"
# Tie option
allow_tie: true
tie_label: "No preference"
# Keyboard shortcuts
sequential_key_binding: true
# Validation
label_requirement:
required: true量表模式
量表模式在两个项目之间显示一个滑块,允许标注者表示偏好程度。
yaml
annotation_schemes:
- annotation_type: pairwise
name: preference_scale
description: "Rate how much better A is than B"
mode: scale
items_key: "responses"
labels:
- "Response A"
- "Response B"
# Scale configuration
scale:
min: -3 # Negative = prefer left item (A)
max: 3 # Positive = prefer right item (B)
step: 1
default: 0
# Endpoint labels
labels:
min: "A is much better"
max: "B is much better"
center: "Equal"
label_requirement:
required: true数据格式
该方案期望实例数据包含要比较的项目列表:
json
{"id": "1", "responses": ["Response A text", "Response B text"]}
{"id": "2", "responses": ["First option here", "Second option here"]}items_key 配置指定哪个字段包含要比较的项目。该字段应包含至少 2 个项目的列表。
键盘快捷键
在 sequential_key_binding: true 的二元模式中:
| 按键 | 操作 |
|---|---|
1 | 选择选项 A |
2 | 选择选项 B |
0 | 选择平局/无偏好(如果 allow_tie: true) |
量表模式使用滑块交互。
输出格式
二元模式
json
{
"preference": {
"selection": "A"
}
}平局时:
json
{
"preference": {
"selection": "tie"
}
}量表模式
负值表示偏好 A,正值偏好 B,零表示相等:
json
{
"preference_scale": {
"scale_value": "-2"
}
}示例
基本二元比较
yaml
annotation_schemes:
- annotation_type: pairwise
name: quality
description: "Which text is higher quality?"
labels: ["Text A", "Text B"]
allow_tie: true多方面比较
从多个维度进行比较:
yaml
annotation_schemes:
- annotation_type: pairwise
name: fluency
description: "Which response is more fluent?"
labels: ["Response A", "Response B"]
- annotation_type: pairwise
name: relevance
description: "Which response is more relevant?"
labels: ["Response A", "Response B"]
- annotation_type: pairwise
name: overall
description: "Which response is better overall?"
labels: ["Response A", "Response B"]
allow_tie: true自定义范围的偏好量表
yaml
annotation_schemes:
- annotation_type: pairwise
name: sentiment_comparison
description: "Compare the sentiment of these two statements"
mode: scale
labels: ["Statement A", "Statement B"]
scale:
min: -5
max: 5
step: 1
labels:
min: "A is much more positive"
max: "B is much more positive"
center: "Equal sentiment"RLHF 偏好收集
yaml
annotation_schemes:
- annotation_type: pairwise
name: overall
description: "Overall, which response is better?"
labels: ["Response A", "Response B"]
allow_tie: true
sequential_key_binding: true
- annotation_type: multiselect
name: criteria
description: "What factors influenced your decision?"
labels:
- Accuracy
- Helpfulness
- Clarity
- Safety
- Completeness
- annotation_type: text
name: notes
description: "Additional notes (optional)"
textarea: true
required: false样式
成对比较标注使用主题系统中的 CSS 变量。添加自定义 CSS 以自定义选项卡:
css
/* Make tiles taller */
.pairwise-tile {
min-height: 200px;
}
/* Change selected tile highlight */
.pairwise-tile.selected {
border-color: #10b981;
background-color: rgba(16, 185, 129, 0.1);
}最佳实践
- 使用清晰、独特的标签 - 标注者应立即理解选项
- 仔细考虑平局选项 - 有时强制选择更合适
- 使用键盘快捷键 - 显著加快标注速度
- 添加理由说明字段 - 有助于理解推理过程并提高数据质量
- 用您的数据测试 - 确保显示效果与您的内容长度匹配
延伸阅读
有关实现细节,请参阅源文档。