Multi-Object Tracking Annotation

Multi-Object Tracking (MOT) annotation creates training data for surveillance, autonomous driving, and sports analytics. This tutorial discusses MOT annotation concepts and how Potato's current video annotation features can support basic tracking workflows.

MOT Annotation Challenges

Maintaining consistent object IDs across frames
Handling occlusions and re-appearances
Tracking through crowded scenes
Managing ID switches and merges

Current Video Annotation Support

Potato currently supports basic video annotation through the video_annotation type. While full MOT-specific features like automatic ID management, interpolation, and occlusion handling are not yet implemented, you can set up basic video labeling workflows.

Basic Video Annotation Setup

yaml

annotation_task_name: "Video Object Labeling"
 
data_files:
  - data/videos.json
 
annotation_schemes:
  - annotation_type: video_annotation
    name: objects
    description: "Label objects in video frames"
    video_path: video
    labels:
      - name: person
      - name: vehicle
      - name: cyclist

Sample Data Format

Your data/videos.json file should contain entries with video paths:

json

[
  {
    "id": "video_001",
    "video": "/path/to/video.mp4"
  },
  {
    "id": "video_002",
    "video": "/path/to/another_video.mp4"
  }
]

Manual Tracking Workflow

Without dedicated MOT features, you can still perform tracking annotation manually:

Creating Tracks Manually

Navigate to the frame where an object first appears
Use the video annotation interface to label the object
Include a consistent identifier in your annotation (e.g., "person_1")
Move to subsequent frames and continue labeling with the same identifier

Handling Occlusions

When an object becomes occluded:

Note the last frame where the object was visible
When the object reappears, use the same identifier to maintain track continuity
Document occlusion periods in your annotation notes

Proposed MOT Features

The following features would enhance Potato's MOT annotation capabilities and are being considered for future development:

Automatic ID assignment: Auto-increment IDs for new objects
Track interpolation: Linear or cubic interpolation between keyframes
Occlusion handling: Visibility levels (visible, partial, heavy, not_visible)
Trajectory visualization: Show object paths across frames
Track management panel: Merge, split, and manage track IDs
Per-frame attributes: Properties that change frame-to-frame

If you're interested in these features, please reach out to the Potato development team or contribute to the project.

Tips for Manual MOT Annotation

Work in short segments: 100-200 frames at a time
Consistent naming: Use a clear ID scheme (e.g., "person_001", "vehicle_023")
Document your process: Keep notes about occlusions and track decisions
Review passes: Watch forward then backward to catch errors
Use external tools: Consider pre-processing with detection models

Alternative Approaches

For projects requiring full MOT annotation capabilities:

Hybrid workflow: Use Potato for initial labeling and specialized MOT tools for track management
Pre-annotation: Run object detectors to generate initial bounding boxes, then refine in Potato
Post-processing: Export Potato annotations and apply tracking algorithms externally

Next Steps

Learn about video frame annotation
Explore image annotation features
Read about inter-annotator agreement for quality control

For current video annotation documentation, see /docs/features/image-annotation.