Skip to content

多项评分(矩阵评分)

以矩阵格式在同一量表上对多个项目进行评分。

多项评分(矩阵评分)标注

多项评分类型以矩阵格式显示多个项目,每个项目在同一量表上进行评分。非常适合评估单个项目的多个维度。

基本配置

yaml
annotation_schemes:
  - name: "aspect_ratings"
    description: "Rate each aspect of the response"
    annotation_type: "multirate"
    labels:
      - name: "Accuracy"
      - name: "Clarity"
      - name: "Helpfulness"
    options:
      - name: "1"
      - name: "2"
      - name: "3"
      - name: "4"
      - name: "5"

配置选项

字段类型必填描述
namestring标注的唯一标识符
descriptionstring显示给标注者的说明
annotation_typestring必须为 "multirate"
labelsarray要评分的项目(行)
optionsarray评分量表选项(列)
sizenumberoptions 的替代方式:量表点数
min_labelstring最低评分的标签
max_labelstring最高评分的标签
randomizeboolean随机化项目顺序
compactboolean使用紧凑布局

示例

回复质量评估

yaml
- name: "quality_assessment"
  description: "Rate each aspect of the AI response"
  annotation_type: "multirate"
  labels:
    - name: "Accuracy"
      tooltip: "Is the information factually correct?"
    - name: "Completeness"
      tooltip: "Does it fully address the question?"
    - name: "Clarity"
      tooltip: "Is it easy to understand?"
    - name: "Relevance"
      tooltip: "Does it stay on topic?"
  size: 5
  min_label: "Poor"
  max_label: "Excellent"

翻译质量

yaml
- name: "translation_quality"
  description: "Evaluate the translation quality"
  annotation_type: "multirate"
  labels:
    - name: "Fluency"
    - name: "Adequacy"
    - name: "Terminology"
    - name: "Style"
  options:
    - name: "1 - Unacceptable"
    - name: "2 - Poor"
    - name: "3 - Acceptable"
    - name: "4 - Good"
    - name: "5 - Excellent"

产品评价维度

yaml
- name: "product_dimensions"
  description: "Rate each aspect of the product"
  annotation_type: "multirate"
  labels:
    - name: "Build Quality"
    - name: "Value for Money"
    - name: "Ease of Use"
    - name: "Customer Support"
    - name: "Documentation"
  size: 5
  min_label: "Very Poor"
  max_label: "Excellent"
  randomize: true

输出格式

多项评分标注输出一个将每个项目映射到其评分的字典:

json
{
  "id": "item_1",
  "annotations": {
    "aspect_ratings": {
      "Accuracy": "4",
      "Clarity": "5",
      "Helpfulness": "3"
    }
  }
}

使用场景

  • LLM 评估:从多个质量维度对回复进行评分
  • 翻译评估:评估流畅度、充分性和术语
  • 产品评价:捕获不同产品方面的评分
  • 调查研究:李克特式矩阵问题
  • 同行评审:从多个标准对论文进行评分

最佳实践

  1. 限制项目数量 - 3-7 个项目效果最好;更多会导致疲劳
  2. 使用一致的量表 - 所有项目应使用相同的评分量表
  3. 按逻辑排序 - 将相关维度放在一起
  4. 提供清晰的定义 - 使用工具提示解释每个维度
  5. 考虑随机化 - 防止回答中的顺序偏差