多项评分(矩阵评分)
以矩阵格式在同一量表上对多个项目进行评分。
多项评分(矩阵评分)标注
多项评分类型以矩阵格式显示多个项目,每个项目在同一量表上进行评分。非常适合评估单个项目的多个维度。
基本配置
yaml
annotation_schemes:
- name: "aspect_ratings"
description: "Rate each aspect of the response"
annotation_type: "multirate"
labels:
- name: "Accuracy"
- name: "Clarity"
- name: "Helpfulness"
options:
- name: "1"
- name: "2"
- name: "3"
- name: "4"
- name: "5"配置选项
| 字段 | 类型 | 必填 | 描述 |
|---|---|---|---|
name | string | 是 | 标注的唯一标识符 |
description | string | 是 | 显示给标注者的说明 |
annotation_type | string | 是 | 必须为 "multirate" |
labels | array | 是 | 要评分的项目(行) |
options | array | 是 | 评分量表选项(列) |
size | number | 否 | options 的替代方式:量表点数 |
min_label | string | 否 | 最低评分的标签 |
max_label | string | 否 | 最高评分的标签 |
randomize | boolean | 否 | 随机化项目顺序 |
compact | boolean | 否 | 使用紧凑布局 |
示例
回复质量评估
yaml
- name: "quality_assessment"
description: "Rate each aspect of the AI response"
annotation_type: "multirate"
labels:
- name: "Accuracy"
tooltip: "Is the information factually correct?"
- name: "Completeness"
tooltip: "Does it fully address the question?"
- name: "Clarity"
tooltip: "Is it easy to understand?"
- name: "Relevance"
tooltip: "Does it stay on topic?"
size: 5
min_label: "Poor"
max_label: "Excellent"翻译质量
yaml
- name: "translation_quality"
description: "Evaluate the translation quality"
annotation_type: "multirate"
labels:
- name: "Fluency"
- name: "Adequacy"
- name: "Terminology"
- name: "Style"
options:
- name: "1 - Unacceptable"
- name: "2 - Poor"
- name: "3 - Acceptable"
- name: "4 - Good"
- name: "5 - Excellent"产品评价维度
yaml
- name: "product_dimensions"
description: "Rate each aspect of the product"
annotation_type: "multirate"
labels:
- name: "Build Quality"
- name: "Value for Money"
- name: "Ease of Use"
- name: "Customer Support"
- name: "Documentation"
size: 5
min_label: "Very Poor"
max_label: "Excellent"
randomize: true输出格式
多项评分标注输出一个将每个项目映射到其评分的字典:
json
{
"id": "item_1",
"annotations": {
"aspect_ratings": {
"Accuracy": "4",
"Clarity": "5",
"Helpfulness": "3"
}
}
}使用场景
- LLM 评估:从多个质量维度对回复进行评分
- 翻译评估:评估流畅度、充分性和术语
- 产品评价:捕获不同产品方面的评分
- 调查研究:李克特式矩阵问题
- 同行评审:从多个标准对论文进行评分
最佳实践
- 限制项目数量 - 3-7 个项目效果最好;更多会导致疲劳
- 使用一致的量表 - 所有项目应使用相同的评分量表
- 按逻辑排序 - 将相关维度放在一起
- 提供清晰的定义 - 使用工具提示解释每个维度
- 考虑随机化 - 防止回答中的顺序偏差