标注历史
通过时间戳追踪每个标注操作,用于审计和分析。
标注历史
Potato 提供全面的标注操作追踪功能,具有细粒度的时间戳元数据。这支持性能分析、质量保证和详细的审计追踪。
概述
标注历史系统追踪:
- 每个标注操作:标签选择、片段标注、文本输入
- 精确时间戳:服务器端和客户端时间戳
- 操作元数据:用户、实例、模式、旧值/新值
- 性能指标:处理时间、操作速率
- 可疑活动:异常快速或突发活动模式
操作追踪
每个标注变更都被记录为 AnnotationAction,包含:
| 字段 | 描述 |
|---|---|
action_id | 每个操作的唯一 UUID |
timestamp | 服务器端时间戳 |
client_timestamp | 浏览器端时间戳(如果可用) |
user_id | 执行操作的用户 |
instance_id | 被标注的实例 |
action_type | 执行的操作类型 |
schema_name | 标注模式名称 |
label_name | 模式中的特定标签 |
old_value | 之前的值(用于更新/删除) |
new_value | 新值(用于添加/更新) |
span_data | 片段标注的详细信息 |
server_processing_time_ms | 服务器处理时间 |
操作类型
系统追踪以下操作类型:
add_label- 新标签选择update_label- 标签值变更delete_label- 标签移除add_span- 新片段标注创建update_span- 片段标注修改delete_span- 片段标注移除
配置
标注历史追踪默认启用。无需额外配置。
性能指标
系统从操作历史中计算性能指标:
python
from potato.annotation_history import AnnotationHistoryManager
metrics = AnnotationHistoryManager.calculate_performance_metrics(actions)
# Returns:
{
'total_actions': 150,
'average_action_time_ms': 45.2,
'fastest_action_time_ms': 12,
'slowest_action_time_ms': 234,
'actions_per_minute': 8.5,
'total_processing_time_ms': 6780
}可疑活动检测
系统可以检测潜在的问题标注模式:
python
from potato.annotation_history import AnnotationHistoryManager
analysis = AnnotationHistoryManager.detect_suspicious_activity(
actions,
fast_threshold_ms=500, # Actions faster than this are flagged
burst_threshold_seconds=2 # Actions closer than this are flagged
)
# Returns:
{
'suspicious_actions': [...],
'fast_actions_count': 5,
'burst_actions_count': 12,
'fast_actions_percentage': 3.3,
'burst_actions_percentage': 8.0,
'suspicious_score': 15.2,
'suspicious_level': 'Low'
}可疑等级
| 分数 | 等级 | 解释 |
|---|---|---|
| 0-10 | 正常 | 典型标注行为 |
| 10-30 | 低 | 一些快速操作,可能可接受 |
| 30-60 | 中 | 值得注意的模式,可能需要审查 |
| 60-80 | 高 | 令人担忧的模式,建议审查 |
| 80-100 | 非常高 | 可能存在质量问题,需立即审查 |
API 参考
AnnotationAction
python
from potato.annotation_history import AnnotationAction
action = AnnotationAction(
action_id="uuid-here",
timestamp=datetime.now(),
user_id="annotator1",
instance_id="doc_001",
action_type="add_label",
schema_name="sentiment",
label_name="positive",
old_value=None,
new_value=True
)
# Serialize to dictionary
data = action.to_dict()
# Deserialize from dictionary
action = AnnotationAction.from_dict(data)AnnotationHistoryManager
python
from potato.annotation_history import AnnotationHistoryManager
# Create a new action with current timestamp
action = AnnotationHistoryManager.create_action(
user_id="annotator1",
instance_id="doc_001",
action_type="add_label",
schema_name="sentiment",
label_name="positive",
old_value=None,
new_value=True
)
# Filter actions by time range
filtered = AnnotationHistoryManager.get_actions_by_time_range(
actions,
start_time=datetime(2024, 1, 1),
end_time=datetime(2024, 1, 31)
)
# Filter actions by instance
instance_actions = AnnotationHistoryManager.get_actions_by_instance(
actions, instance_id="doc_001"
)
# Calculate performance metrics
metrics = AnnotationHistoryManager.calculate_performance_metrics(actions)
# Detect suspicious activity
analysis = AnnotationHistoryManager.detect_suspicious_activity(actions)用例
质量保证
监控标注者行为以发现质量问题:
python
for user_id in get_all_users():
user_actions = get_user_actions(user_id)
analysis = AnnotationHistoryManager.detect_suspicious_activity(user_actions)
if analysis['suspicious_level'] in ['High', 'Very High']:
flag_for_review(user_id, analysis)审计追踪
追踪变更以满足合规要求:
python
instance_actions = AnnotationHistoryManager.get_actions_by_instance(
all_actions, "doc_001"
)
audit_log = [action.to_dict() for action in instance_actions]
with open("audit_doc_001.json", "w") as f:
json.dump(audit_log, f, indent=2)时间分析
了解标注时间模式:
python
from collections import Counter
hours = Counter(action.timestamp.hour for action in all_actions)
print("Peak annotation hours:", hours.most_common(5))数据存储
标注历史存储在用户状态文件中:
text
output/
annotations/
user_state_annotator1.json # Includes action history
user_state_annotator2.json
导出格式
操作以 ISO 8601 时间戳格式序列化:
json
{
"action_id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2024-01-15T10:30:45.123456",
"user_id": "annotator1",
"instance_id": "doc_001",
"action_type": "add_label",
"schema_name": "sentiment",
"label_name": "positive",
"old_value": null,
"new_value": true,
"server_processing_time_ms": 23
}最佳实践
- 定期监控:定期检查可疑活动报告
- 阈值调整:根据任务复杂度调整检测阈值
- 导出备份:定期导出历史记录用于长期存储
- 隐私合规:考虑时间戳的数据保留政策
延伸阅读
有关实现细节,请参阅源代码文档。