Skip to content
Cette page n'est pas encore disponible dans votre langue. La version anglaise est affichée.

Annotation History

Track every annotation action with timestamps for auditing and analysis.

Annotation History

Potato provides comprehensive tracking of all annotation actions with fine-grained timestamp metadata. This enables performance analysis, quality assurance, and detailed audit trails.

Overview

The annotation history system tracks:

  • Every annotation action: Label selections, span annotations, text inputs
  • Precise timestamps: Server and client-side timestamps
  • Action metadata: User, instance, schema, old/new values
  • Performance metrics: Processing times, action rates
  • Suspicious activity: Unusually fast or burst activity patterns

Action Tracking

Every annotation change is recorded as an AnnotationAction with:

FieldDescription
action_idUnique UUID for each action
timestampServer-side timestamp
client_timestampBrowser-side timestamp (if available)
user_idUser who performed the action
instance_idInstance being annotated
action_typeType of action performed
schema_nameAnnotation schema name
label_nameSpecific label within the schema
old_valuePrevious value (for updates/deletes)
new_valueNew value (for adds/updates)
span_dataSpan details for span annotations
server_processing_time_msServer processing time

Action Types

The system tracks these action types:

  • add_label - New label selection
  • update_label - Label value changed
  • delete_label - Label removed
  • add_span - New span annotation created
  • update_span - Span annotation modified
  • delete_span - Span annotation removed

Configuration

Annotation history tracking is enabled by default. No additional configuration required.

Performance Metrics

The system calculates performance metrics from action history:

python
from potato.annotation_history import AnnotationHistoryManager
 
metrics = AnnotationHistoryManager.calculate_performance_metrics(actions)
 
# Returns:
{
    'total_actions': 150,
    'average_action_time_ms': 45.2,
    'fastest_action_time_ms': 12,
    'slowest_action_time_ms': 234,
    'actions_per_minute': 8.5,
    'total_processing_time_ms': 6780
}

Suspicious Activity Detection

The system can detect potentially problematic annotation patterns:

python
from potato.annotation_history import AnnotationHistoryManager
 
analysis = AnnotationHistoryManager.detect_suspicious_activity(
    actions,
    fast_threshold_ms=500,      # Actions faster than this are flagged
    burst_threshold_seconds=2   # Actions closer than this are flagged
)
 
# Returns:
{
    'suspicious_actions': [...],
    'fast_actions_count': 5,
    'burst_actions_count': 12,
    'fast_actions_percentage': 3.3,
    'burst_actions_percentage': 8.0,
    'suspicious_score': 15.2,
    'suspicious_level': 'Low'
}

Suspicious Levels

ScoreLevelInterpretation
0-10NormalTypical annotation behavior
10-30LowSome fast actions, likely acceptable
30-60MediumNotable pattern, may warrant review
60-80HighConcerning pattern, review recommended
80-100Very HighLikely quality issue, immediate review

API Reference

AnnotationAction

python
from potato.annotation_history import AnnotationAction
 
action = AnnotationAction(
    action_id="uuid-here",
    timestamp=datetime.now(),
    user_id="annotator1",
    instance_id="doc_001",
    action_type="add_label",
    schema_name="sentiment",
    label_name="positive",
    old_value=None,
    new_value=True
)
 
# Serialize to dictionary
data = action.to_dict()
 
# Deserialize from dictionary
action = AnnotationAction.from_dict(data)

AnnotationHistoryManager

python
from potato.annotation_history import AnnotationHistoryManager
 
# Create a new action with current timestamp
action = AnnotationHistoryManager.create_action(
    user_id="annotator1",
    instance_id="doc_001",
    action_type="add_label",
    schema_name="sentiment",
    label_name="positive",
    old_value=None,
    new_value=True
)
 
# Filter actions by time range
filtered = AnnotationHistoryManager.get_actions_by_time_range(
    actions,
    start_time=datetime(2024, 1, 1),
    end_time=datetime(2024, 1, 31)
)
 
# Filter actions by instance
instance_actions = AnnotationHistoryManager.get_actions_by_instance(
    actions, instance_id="doc_001"
)
 
# Calculate performance metrics
metrics = AnnotationHistoryManager.calculate_performance_metrics(actions)
 
# Detect suspicious activity
analysis = AnnotationHistoryManager.detect_suspicious_activity(actions)

Use Cases

Quality Assurance

Monitor annotator behavior for quality issues:

python
for user_id in get_all_users():
    user_actions = get_user_actions(user_id)
    analysis = AnnotationHistoryManager.detect_suspicious_activity(user_actions)
 
    if analysis['suspicious_level'] in ['High', 'Very High']:
        flag_for_review(user_id, analysis)

Audit Trail

Track changes for regulatory compliance:

python
instance_actions = AnnotationHistoryManager.get_actions_by_instance(
    all_actions, "doc_001"
)
 
audit_log = [action.to_dict() for action in instance_actions]
with open("audit_doc_001.json", "w") as f:
    json.dump(audit_log, f, indent=2)

Time Analysis

Understand annotation timing patterns:

python
from collections import Counter
 
hours = Counter(action.timestamp.hour for action in all_actions)
print("Peak annotation hours:", hours.most_common(5))

Data Storage

Annotation history is stored in the user state files:

text
output/
  annotations/
    user_state_annotator1.json  # Includes action history
    user_state_annotator2.json

Export Format

Actions are serialized with ISO 8601 timestamps:

json
{
  "action_id": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2024-01-15T10:30:45.123456",
  "user_id": "annotator1",
  "instance_id": "doc_001",
  "action_type": "add_label",
  "schema_name": "sentiment",
  "label_name": "positive",
  "old_value": null,
  "new_value": true,
  "server_processing_time_ms": 23
}

Best Practices

  1. Regular monitoring: Check suspicious activity reports periodically
  2. Threshold tuning: Adjust detection thresholds based on task complexity
  3. Export backups: Regularly export history for long-term storage
  4. Privacy compliance: Consider data retention policies for timestamps

Further Reading

For implementation details, see the source documentation.