对话标注

标注对话和多项文本，支持特殊显示选项。

Potato 支持标注多项数据，其中每个实例包含一组文本元素。常用于：

对话标注：包含多个轮次的对话
成对比较：比较两个或多个文本变体
多文档任务：评分或标记多个相关文本

数据格式

输入数据

多项数据在 text 字段中以字符串列表表示：

json

{"id": "conv_001", "text": ["Tom: Isn't this awesome?!", "Sam: Yes! I like you!", "Tom: Great!", "Sam: Awesome! Let's party!"]}
{"id": "conv_002", "text": ["Tom: I am so sorry for that", "Sam: No worries", "Tom: Thanks for your understanding!"]}

列表中的每个字符串代表一个项目（例如对话轮次、文档变体）。

配置

基本设置

yaml

# Data configuration
data_files:
  - data/dialogues.json
 
item_properties:
  id_key: id
  text_key: text
 
# Configure list display
list_as_text:
  text_list_prefix_type: none  # No prefix since speaker names are in text
  alternating_shading: true    # Shade every other turn for readability
 
# Annotation schemes
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the overall sentiment of this conversation?"
    labels:
      - positive
      - neutral
      - negative

显示选项

list_as_text 配置控制列表项的显示方式：

yaml

list_as_text:
  text_list_prefix_type: alphabet  # Prefix type for items
  horizontal: false                # Layout direction
  alternating_shading: false       # Shade alternate turns

前缀类型

选项	示例	适用场景
`alphabet`	A. B. C.	成对比较、选项
`number`	1. 2. 3.	顺序轮次、有序列表
`bullet`	. . .	无序项目
`none`	（无前缀）	文本中包含说话者名称的对话

布局选项

选项	描述
`horizontal: false`	垂直布局（默认）- 项目堆叠
`horizontal: true`	并排布局 - 用于成对比较
`alternating_shading: true`	为对话交替设置背景色

配置示例

对话标注

yaml

annotation_task_name: Dialogue Analysis
 
data_files:
  - data/dialogues.json
 
item_properties:
  id_key: id
  text_key: text
 
list_as_text:
  text_list_prefix_type: none
  alternating_shading: true
 
annotation_schemes:
  - annotation_type: span
    name: certainty
    description: Highlight phrases expressing certainty or uncertainty
    labels:
      - certain
      - uncertain
    sequential_key_binding: true
 
  - annotation_type: radio
    name: sentiment
    description: What sentiment does the conversation hold?
    labels:
      - positive
      - neutral
      - negative
    sequential_key_binding: true

成对文本比较

yaml

annotation_task_name: Text Comparison
 
data_files:
  - data/pairs.json
 
item_properties:
  id_key: id
  text_key: text
 
list_as_text:
  text_list_prefix_type: alphabet
  horizontal: true
 
annotation_schemes:
  - annotation_type: radio
    name: preference
    description: Which text is better?
    labels:
      - A is better
      - B is better
      - Equal

工作示例

完整的工作示例可在 project-hub/dialogue_analysis/ 中找到：

bash

python potato/flask_server.py start project-hub/dialogue_analysis/configs/dialogue-analysis.yaml -p 8000

示例数据格式：

json

{"id":"1","text":["Tom: Isn't this awesome?!", "Sam: Yes! I like you!", "Tom: great!", "Sam: Awesome! Let's party!"]}
{"id":"2","text":["Tom: I am so sorry for that", "Sam: No worries", "Tom: thanks for your understanding!"]}

提示

说话者名称：使用 text_list_prefix_type: none 时，在文本中包含说话者名称（例如 "Tom: Hello"）
片段标注：在对话数据中使用片段标注时，标注者可以在任何显示的轮次中高亮文本
前缀选择：
- 在文本中嵌入了说话者名称的对话使用 none
- 序列顺序重要时使用 number
- 成对/比较任务使用 alphabet
可读性：为长对话启用 alternating_shading，帮助标注者跟踪正在阅读的轮次
比较任务：使用 horizontal: true 配合 alphabet 前缀进行并排比较

对话标注

数据格式

输入数据

配置

基本设置

显示选项

前缀类型

布局选项

配置示例

对话标注

成对文本比较

工作示例

提示

延伸阅读