多项评分（矩阵评分）

Name: Potato
Author: Potato Annotation

以矩阵格式在同一量表上对多个项目进行评分。

多项评分类型以矩阵格式显示多个项目，每个项目在同一量表上进行评分。非常适合评估单个项目的多个维度。

基本配置

yaml

annotation_schemes:
  - name: "aspect_ratings"
    description: "Rate each aspect of the response"
    annotation_type: "multirate"
    labels:
      - name: "Accuracy"
      - name: "Clarity"
      - name: "Helpfulness"
    options:
      - name: "1"
      - name: "2"
      - name: "3"
      - name: "4"
      - name: "5"

配置选项

字段	类型	必填	描述
`name`	string	是	标注的唯一标识符
`description`	string	是	显示给标注者的说明
`annotation_type`	string	是	必须为 `"multirate"`
`labels`	array	是	要评分的项目（行）
`options`	array	是	评分量表选项（列）
`size`	number	否	options 的替代方式：量表点数
`min_label`	string	否	最低评分的标签
`max_label`	string	否	最高评分的标签
`randomize`	boolean	否	随机化项目顺序
`compact`	boolean	否	使用紧凑布局

示例

回复质量评估

yaml

- name: "quality_assessment"
  description: "Rate each aspect of the AI response"
  annotation_type: "multirate"
  labels:
    - name: "Accuracy"
      tooltip: "Is the information factually correct?"
    - name: "Completeness"
      tooltip: "Does it fully address the question?"
    - name: "Clarity"
      tooltip: "Is it easy to understand?"
    - name: "Relevance"
      tooltip: "Does it stay on topic?"
  size: 5
  min_label: "Poor"
  max_label: "Excellent"

翻译质量

yaml

- name: "translation_quality"
  description: "Evaluate the translation quality"
  annotation_type: "multirate"
  labels:
    - name: "Fluency"
    - name: "Adequacy"
    - name: "Terminology"
    - name: "Style"
  options:
    - name: "1 - Unacceptable"
    - name: "2 - Poor"
    - name: "3 - Acceptable"
    - name: "4 - Good"
    - name: "5 - Excellent"

产品评价维度

yaml

- name: "product_dimensions"
  description: "Rate each aspect of the product"
  annotation_type: "multirate"
  labels:
    - name: "Build Quality"
    - name: "Value for Money"
    - name: "Ease of Use"
    - name: "Customer Support"
    - name: "Documentation"
  size: 5
  min_label: "Very Poor"
  max_label: "Excellent"
  randomize: true

输出格式

多项评分标注输出一个将每个项目映射到其评分的字典：

json

{
  "id": "item_1",
  "annotations": {
    "aspect_ratings": {
      "Accuracy": "4",
      "Clarity": "5",
      "Helpfulness": "3"
    }
  }
}

使用场景

LLM 评估：从多个质量维度对回复进行评分
翻译评估：评估流畅度、充分性和术语
产品评价：捕获不同产品方面的评分
调查研究：李克特式矩阵问题
同行评审：从多个标准对论文进行评分

最佳实践

限制项目数量 - 3-7 个项目效果最好；更多会导致疲劳
使用一致的量表 - 所有项目应使用相同的评分量表
按逻辑排序 - 将相关维度放在一起
提供清晰的定义 - 使用工具提示解释每个维度
考虑随机化 - 防止回答中的顺序偏差