Note: This post describes Potato 2.1 as it was at release. Some configuration keys and features have been updated in later versions. See the current documentation for up-to-date configuration syntax.

我们很高兴地宣布 Potato 2.1.0 发布，这个功能丰富的版本为标注平台带来了五项重大能力。此次更新专注于多模态内容显示、AI 驱动的视觉标注以及更丰富的关系标注。

实例显示系统

v2.1 的核心功能是全新的 instance_display 配置块。以前，要在单选按钮旁边显示图片需要使用一些不太优雅的变通方法，比如创建一个 min_annotations: 0 的 image_annotation 方案。现在您可以明确地将要显示的内容与要收集的标注分开。

yaml

instance_display:
  layout:
    direction: horizontal
    gap: 24px
  fields:
    - key: image_url
      type: image
      label: "Image to Classify"
      display_options:
        max_width: 600
        zoomable: true
    - key: description
      type: text
      label: "Context"
 
annotation_schemes:
  - annotation_type: radio
    name: category
    labels: [nature, urban, people, objects]

实例显示支持 11 种内容类型：text、html、image、video、audio、dialogue、pairwise、code、spreadsheet、document 和 pdf。您可以将多个显示字段与任何标注方案结合，水平或垂直排列，并通过 span_target: true 在文本字段上启用跨度标注。

一个突出的功能是逐轮对话评分 — 您可以在各个对话轮次中添加内联的 Likert 量表评分组件，让标注者在不离开对话视图的情况下评价特定的发言者。

阅读完整的实例显示文档 →

多字段跨度标注

跨度标注现在支持 target_field 选项，可以在同一数据实例中对多个文本字段进行标注。这对于摘要评估等任务至关重要，因为您需要在源文档和摘要中同时标注实体。

yaml

annotation_schemes:
  - annotation_type: span
    name: source_entities
    target_field: "source_text"
    labels: [PERSON, ORGANIZATION, LOCATION]
 
  - annotation_type: span
    name: summary_entities
    target_field: "summary"
    labels: [PERSON, ORGANIZATION, LOCATION]

输出标注按字段名称分组，清楚地标明每个跨度属于哪个文本字段。

阅读更新后的跨度标注文档 →

跨度链接

新的 span_link 标注类型通过在标注的跨度之间创建有类型的关系来实现关系抽取。这使知识图谱构建、共指消解和话语分析等任务成为可能。

yaml

annotation_schemes:
  - annotation_type: span
    name: entities
    labels:
      - name: "PERSON"
        color: "#3b82f6"
      - name: "ORGANIZATION"
        color: "#22c55e"
 
  - annotation_type: span_link
    name: relations
    span_schema: entities
    link_types:
      - name: "WORKS_FOR"
        directed: true
        allowed_source_labels: ["PERSON"]
        allowed_target_labels: ["ORGANIZATION"]
        color: "#dc2626"
      - name: "COLLABORATES_WITH"
        directed: false
        allowed_source_labels: ["PERSON"]
        allowed_target_labels: ["PERSON"]
        color: "#06b6d4"

主要功能包括有向和无向链接、N 元关系（两个以上跨度之间的链接）、文本上方的视觉弧线显示，以及限制哪些实体类型可以参与各关系类型的标签约束。

阅读完整的跨度链接文档 →

视觉 AI 支持

Potato 2.1 引入了四个新的视觉端点，为图像和视频标注任务带来 AI 驱动的辅助功能。这是 Potato AI 能力从文本扩展到视觉领域的重大突破。

四个视觉端点

YOLO — 最适合使用本地推理进行快速、精确的目标检测。支持 YOLOv8 变体和用于开放词汇检测的 YOLO-World。

yaml

ai_support:
  enabled: true
  endpoint_type: "yolo"
  ai_config:
    model: "yolov8m.pt"
    confidence_threshold: 0.5
    iou_threshold: 0.45

Ollama Vision — 使用 Ollama 在本地运行视觉语言模型。支持 LLaVA、Llama 3.2 Vision、Qwen2.5-VL、BakLLaVA 和 Moondream。

yaml

ai_support:
  enabled: true
  endpoint_type: "ollama_vision"
  ai_config:
    model: "llava:latest"
    base_url: "http://localhost:11434"

OpenAI Vision — 基于云的视觉分析，使用 GPT-4o，可配置细节级别。

yaml

ai_support:
  enabled: true
  endpoint_type: "openai_vision"
  ai_config:
    api_key: "${OPENAI_API_KEY}"
    model: "gpt-4o"
    detail: "auto"

Anthropic Vision — Claude 具有视觉能力，可用于图像理解和分类。

yaml

ai_support:
  enabled: true
  endpoint_type: "anthropic_vision"
  ai_config:
    api_key: "${ANTHROPIC_API_KEY}"
    model: "claude-sonnet-4-20250514"

图像 AI 功能

对于图像标注任务，视觉 AI 提供四种辅助模式：

检测 — 查找与您配置的标签匹配的目标，并以虚线叠加层绘制建议的边界框
预标注（自动） — 自动检测图像中的所有目标，并创建供人工审核的建议
分类 — 对选定区域或整个图像进行分类，并给出置信度分数
提示 — 提供指导而不透露确切位置，适用于标注者培训

yaml

annotation_schemes:
  - annotation_type: image_annotation
    name: object_detection
    tools: [bbox, polygon]
    labels:
      - name: "person"
        color: "#FF6B6B"
      - name: "car"
        color: "#4ECDC4"
    ai_support:
      enabled: true
      features:
        detection: true
        pre_annotate: true
        classification: false
        hint: true

视频 AI 功能

对于视频任务，视觉 AI 新增了场景检测（识别场景边界并建议时间片段）、关键帧检测（找到重要时刻）以及目标跟踪（建议跨帧的目标位置）。

接受/拒绝工作流

AI 建议以虚线叠加层形式出现，标注者可以接受（双击）、拒绝（右击）、全部接受或全部清除 — 在加速标注的同时保持人在回路中。

分离视觉和文本端点

您可以为文本和视觉任务配置不同的 AI 端点，为每种内容类型使用最佳模型：

yaml

ai_support:
  enabled: true
  endpoint_type: "ollama"          # Text annotations
  visual_endpoint_type: "yolo"     # Image/video annotations
  ai_config:
    model: "llama3.2"
  visual_ai_config:
    model: "yolov8m.pt"
    confidence_threshold: 0.5

阅读完整的视觉 AI 支持文档 →

布局自定义

Potato 2.1 增加了对复杂自定义视觉布局的支持。Potato 默认生成一个可编辑的 layouts/task_layout.html 文件，您可以提供包含 CSS 网格布局、颜色编码选项和区块样式的完全自定义 HTML 模板。

yaml

task_layout: layouts/custom_task_layout.html

project-hub/layout-examples/ 中包含三个示例布局：

内容审核 — 警告横幅、2 列网格、按严重程度颜色编码
对话问答 — 案例元数据、圆形 Likert 评分、分组评估
医学审查 — 专业医学样式、结构化报告

自定义布局与新的 instance_display 系统协同工作 — 显示内容渲染在您的自定义标注表单上方。

阅读完整的布局自定义文档 →

其他改进

标签理由

第四种 AI 能力加入了提示、关键词高亮和标签建议的行列。理由功能为每个标签可能适用的原因生成平衡的解释，帮助标注者理解不同分类背后的推理。

yaml

ai_support:
  features:
    rationales:
      enabled: true

错误修复和测试

新增 50 多项测试以提高可靠性
各标注类型的响应式设计改进
增强的 project-hub 组织结构和布局示例

升级到 v2.1

bash

pip install --upgrade potato-annotation

现有的 v2.0 配置无需修改即可使用 — 所有新功能都通过额外的配置块（如 instance_display、span_link 方案和视觉 AI 端点）选择性启用。

开始使用

新功能 — v2.1 完整功能概览
实例显示 — 多模态内容显示
视觉 AI 支持 — 图像和视频标注的 AI 功能
跨度链接 — 实体关系标注
布局自定义 — 自定义 HTML 模板

有问题或反馈？加入我们的 Discord 或在 GitHub 上提交 issue。