Skip to content

标注历史

通过时间戳追踪每个标注操作,用于审计和分析。

标注历史

Potato 提供全面的标注操作追踪功能,具有细粒度的时间戳元数据。这支持性能分析、质量保证和详细的审计追踪。

概述

标注历史系统追踪:

  • 每个标注操作:标签选择、片段标注、文本输入
  • 精确时间戳:服务器端和客户端时间戳
  • 操作元数据:用户、实例、模式、旧值/新值
  • 性能指标:处理时间、操作速率
  • 可疑活动:异常快速或突发活动模式

操作追踪

每个标注变更都被记录为 AnnotationAction,包含:

字段描述
action_id每个操作的唯一 UUID
timestamp服务器端时间戳
client_timestamp浏览器端时间戳(如果可用)
user_id执行操作的用户
instance_id被标注的实例
action_type执行的操作类型
schema_name标注模式名称
label_name模式中的特定标签
old_value之前的值(用于更新/删除)
new_value新值(用于添加/更新)
span_data片段标注的详细信息
server_processing_time_ms服务器处理时间

操作类型

系统追踪以下操作类型:

  • add_label - 新标签选择
  • update_label - 标签值变更
  • delete_label - 标签移除
  • add_span - 新片段标注创建
  • update_span - 片段标注修改
  • delete_span - 片段标注移除

配置

标注历史追踪默认启用。无需额外配置。

性能指标

系统从操作历史中计算性能指标:

python
from potato.annotation_history import AnnotationHistoryManager
 
metrics = AnnotationHistoryManager.calculate_performance_metrics(actions)
 
# Returns:
{
    'total_actions': 150,
    'average_action_time_ms': 45.2,
    'fastest_action_time_ms': 12,
    'slowest_action_time_ms': 234,
    'actions_per_minute': 8.5,
    'total_processing_time_ms': 6780
}

可疑活动检测

系统可以检测潜在的问题标注模式:

python
from potato.annotation_history import AnnotationHistoryManager
 
analysis = AnnotationHistoryManager.detect_suspicious_activity(
    actions,
    fast_threshold_ms=500,      # Actions faster than this are flagged
    burst_threshold_seconds=2   # Actions closer than this are flagged
)
 
# Returns:
{
    'suspicious_actions': [...],
    'fast_actions_count': 5,
    'burst_actions_count': 12,
    'fast_actions_percentage': 3.3,
    'burst_actions_percentage': 8.0,
    'suspicious_score': 15.2,
    'suspicious_level': 'Low'
}

可疑等级

分数等级解释
0-10正常典型标注行为
10-30一些快速操作,可能可接受
30-60值得注意的模式,可能需要审查
60-80令人担忧的模式,建议审查
80-100非常高可能存在质量问题,需立即审查

API 参考

AnnotationAction

python
from potato.annotation_history import AnnotationAction
 
action = AnnotationAction(
    action_id="uuid-here",
    timestamp=datetime.now(),
    user_id="annotator1",
    instance_id="doc_001",
    action_type="add_label",
    schema_name="sentiment",
    label_name="positive",
    old_value=None,
    new_value=True
)
 
# Serialize to dictionary
data = action.to_dict()
 
# Deserialize from dictionary
action = AnnotationAction.from_dict(data)

AnnotationHistoryManager

python
from potato.annotation_history import AnnotationHistoryManager
 
# Create a new action with current timestamp
action = AnnotationHistoryManager.create_action(
    user_id="annotator1",
    instance_id="doc_001",
    action_type="add_label",
    schema_name="sentiment",
    label_name="positive",
    old_value=None,
    new_value=True
)
 
# Filter actions by time range
filtered = AnnotationHistoryManager.get_actions_by_time_range(
    actions,
    start_time=datetime(2024, 1, 1),
    end_time=datetime(2024, 1, 31)
)
 
# Filter actions by instance
instance_actions = AnnotationHistoryManager.get_actions_by_instance(
    actions, instance_id="doc_001"
)
 
# Calculate performance metrics
metrics = AnnotationHistoryManager.calculate_performance_metrics(actions)
 
# Detect suspicious activity
analysis = AnnotationHistoryManager.detect_suspicious_activity(actions)

用例

质量保证

监控标注者行为以发现质量问题:

python
for user_id in get_all_users():
    user_actions = get_user_actions(user_id)
    analysis = AnnotationHistoryManager.detect_suspicious_activity(user_actions)
 
    if analysis['suspicious_level'] in ['High', 'Very High']:
        flag_for_review(user_id, analysis)

审计追踪

追踪变更以满足合规要求:

python
instance_actions = AnnotationHistoryManager.get_actions_by_instance(
    all_actions, "doc_001"
)
 
audit_log = [action.to_dict() for action in instance_actions]
with open("audit_doc_001.json", "w") as f:
    json.dump(audit_log, f, indent=2)

时间分析

了解标注时间模式:

python
from collections import Counter
 
hours = Counter(action.timestamp.hour for action in all_actions)
print("Peak annotation hours:", hours.most_common(5))

数据存储

标注历史存储在用户状态文件中:

text
output/
  annotations/
    user_state_annotator1.json  # Includes action history
    user_state_annotator2.json

导出格式

操作以 ISO 8601 时间戳格式序列化:

json
{
  "action_id": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2024-01-15T10:30:45.123456",
  "user_id": "annotator1",
  "instance_id": "doc_001",
  "action_type": "add_label",
  "schema_name": "sentiment",
  "label_name": "positive",
  "old_value": null,
  "new_value": true,
  "server_processing_time_ms": 23
}

最佳实践

  1. 定期监控:定期检查可疑活动报告
  2. 阈值调整:根据任务复杂度调整检测阈值
  3. 导出备份:定期导出历史记录用于长期存储
  4. 隐私合规:考虑时间戳的数据保留政策

延伸阅读

有关实现细节,请参阅源代码文档