ICL 标注

AI 辅助的上下文学习与人工验证，实现可扩展的标注。

Potato 的 ICL（上下文学习）标注功能通过使用高置信度的人工标注作为上下文示例来引导 LLM 标注剩余数据，实现 AI 辅助标注。系统追踪 LLM 置信度并将预测路由回人工进行验证。

概述

ICL 标注系统：

收集高置信度示例：识别标注者一致的实例（例如 80%+ 一致率）
使用 LLM 标注：使用示例提示 LLM 标注未标注的实例
追踪置信度：记录每个预测的 LLM 置信度分数
验证准确性：将 LLM 标注实例的样本路由给人工进行盲审验证
报告指标：根据验证结果计算并显示 LLM 准确率

功能

自动示例收集

系统自动识别多个标注者一致的高置信度示例：

可配置的一致率阈值（默认：80%）
最少标注者数量要求（默认：2）
可配置间隔的自动刷新
按模式分组的示例池

带限制的 LLM 标注

为实现迭代改进而非批量标注：

最大总标签数：限制 LLM 预测的总数
最大未标注比例：只标注剩余数据的一定百分比
准确率过低时暂停：当准确率低于阈值时自动暂停

盲审验证

验证使用"盲标注"方式——标注者将实例视为普通任务，不知道 LLM 的预测：

可配置的采样率（默认：LLM 标签的 20%）
多种选择策略：low_confidence、random、mixed
验证任务自然混入常规分配

配置

ICL 标注需要启用 ai_support：

yaml

# AI endpoint configuration (required)
ai_support:
  enabled: true
  endpoint_type: "openai"
  ai_config:
    model: "gpt-4o-mini"
    api_key: "${OPENAI_API_KEY}"
 
# ICL labeling configuration
icl_labeling:
  enabled: true
 
  # Example selection settings
  example_selection:
    min_agreement_threshold: 0.8      # 80% annotators must agree
    min_annotators_per_instance: 2    # Minimum annotations for consensus
    max_examples_per_schema: 10       # Max examples per schema in prompt
    refresh_interval_seconds: 300     # How often to refresh examples
 
  # LLM labeling settings
  llm_labeling:
    batch_size: 20
    trigger_threshold: 5              # Min examples before LLM labeling starts
    confidence_threshold: 0.7         # Min confidence to accept prediction
    batch_interval_seconds: 600
    max_total_labels: 100             # Max instances to label total
    max_unlabeled_ratio: 0.5          # Max portion of unlabeled to label
    pause_on_low_accuracy: true
    min_accuracy_threshold: 0.7
 
  # Human verification settings
  verification:
    enabled: true
    sample_rate: 0.2                  # 20% of LLM labels verified
    selection_strategy: "low_confidence"
    mix_with_regular_assignments: true
    assignment_mix_rate: 0.2

选择策略

low_confidence：优先验证 LLM 最不确定的预测
random：从所有预测中随机采样
mixed：50% 低置信度 + 50% 随机

管理员 API

状态端点

http

GET /admin/api/icl/status

返回 ICL 标注器的整体状态，包括每个模式的示例数、已做出的预测数、验证队列大小和准确率指标。

示例端点

http

GET /admin/api/icl/examples?schema=sentiment

返回高置信度示例，可按模式过滤。

准确率端点

http

GET /admin/api/icl/accuracy?schema=sentiment

返回基于人工验证结果的准确率指标。

手动触发端点

http

POST /admin/api/icl/trigger
Content-Type: application/json
 
{"schema_name": "sentiment"}

手动触发特定模式的批量标注。

使用工作流

1. 配置项目

yaml

ai_support:
  enabled: true
  endpoint_type: "openai"
  ai_config:
    model: "gpt-4o-mini"
    api_key: "${OPENAI_API_KEY}"
 
icl_labeling:
  enabled: true
  example_selection:
    min_agreement_threshold: 0.8
  llm_labeling:
    max_total_labels: 50  # Start small
  verification:
    enabled: true
    sample_rate: 0.3  # Verify 30% initially

2. 收集人工标注

让标注者正常标注数据。当他们达成共识（80%+ 一致率）时，这些实例将作为示例可用。

3. 监控进度

bash

curl http://localhost:8000/admin/api/icl/status

4. 检查准确率

bash

curl http://localhost:8000/admin/api/icl/accuracy

5. 迭代

根据准确率：

如果准确率高（>80%），增加 max_total_labels
如果准确率低，在继续之前添加更多人工示例

最佳实践

从小规模开始：以保守的限制开始（max_total_labels: 50），在扩大规模前评估准确率
尽早验证：初期使用更高的 sample_rate（0.3-0.5）以获得有信心的准确率估计
积极监控：通过管理员 API 定期检查准确率指标
调整阈值：如果 LLM 准确率低：
- 提高 min_agreement_threshold 以获得更干净的示例
- 提高 trigger_threshold 以在标注前获得更多示例
- 降低 confidence_threshold 以拒绝不确定的预测
使用选择策略：
- low_confidence：最适合识别有问题的类别
- random：最适合无偏准确率估计
- mixed：平衡方法

故障排除

LLM 未在标注

检查 ai_support 是否正确配置
验证是否有足够的高置信度示例
检查标注是否因限制或低准确率而暂停

准确率低

提高 min_agreement_threshold 以获得更干净的示例
添加更多标注指南/说明
检查正在使用的示例（/admin/api/icl/examples）

验证任务未出现

验证 verification.enabled 为 true
检查 mix_with_regular_assignments 为 true
验证队列中是否有待验证项

ICL 标注

概述

功能

自动示例收集

带限制的 LLM 标注

盲审验证

配置

选择策略

管理员 API

状态端点

示例端点

准确率端点

手动触发端点

使用工作流

1. 配置项目

2. 收集人工标注

3. 监控进度

4. 检查准确率

5. 迭代

最佳实践

故障排除

LLM 未在标注

准确率低

验证任务未出现

延伸阅读