Blog/Tutorials
Tutorials3 min read

Multi-Object Tracking Annotation

An overview of multi-object tracking annotation concepts and how Potato's video annotation capabilities can support basic tracking workflows.

By Potato Team·

Multi-Object Tracking Annotation

Multi-Object Tracking (MOT) annotation creates training data for surveillance, autonomous driving, and sports analytics. This tutorial discusses MOT annotation concepts and how Potato's current video annotation features can support basic tracking workflows.

MOT Annotation Challenges

  • Maintaining consistent object IDs across frames
  • Handling occlusions and re-appearances
  • Tracking through crowded scenes
  • Managing ID switches and merges

Current Video Annotation Support

Potato currently supports basic video annotation through the video_annotation type. While full MOT-specific features like automatic ID management, interpolation, and occlusion handling are not yet implemented, you can set up basic video labeling workflows.

Basic Video Annotation Setup

annotation_task_name: "Video Object Labeling"
 
data_files:
  - data/videos.json
 
annotation_schemes:
  - annotation_type: video_annotation
    name: objects
    description: "Label objects in video frames"
    video_path: video
    labels:
      - name: person
      - name: vehicle
      - name: cyclist

Sample Data Format

Your data/videos.json file should contain entries with video paths:

[
  {
    "id": "video_001",
    "video": "/path/to/video.mp4"
  },
  {
    "id": "video_002",
    "video": "/path/to/another_video.mp4"
  }
]

Manual Tracking Workflow

Without dedicated MOT features, you can still perform tracking annotation manually:

Creating Tracks Manually

  1. Navigate to the frame where an object first appears
  2. Use the video annotation interface to label the object
  3. Include a consistent identifier in your annotation (e.g., "person_1")
  4. Move to subsequent frames and continue labeling with the same identifier

Handling Occlusions

When an object becomes occluded:

  1. Note the last frame where the object was visible
  2. When the object reappears, use the same identifier to maintain track continuity
  3. Document occlusion periods in your annotation notes

Proposed MOT Features

The following features would enhance Potato's MOT annotation capabilities and are being considered for future development:

  • Automatic ID assignment: Auto-increment IDs for new objects
  • Track interpolation: Linear or cubic interpolation between keyframes
  • Occlusion handling: Visibility levels (visible, partial, heavy, not_visible)
  • Trajectory visualization: Show object paths across frames
  • Track management panel: Merge, split, and manage track IDs
  • Per-frame attributes: Properties that change frame-to-frame

If you're interested in these features, please reach out to the Potato development team or contribute to the project.

Tips for Manual MOT Annotation

  1. Work in short segments: 100-200 frames at a time
  2. Consistent naming: Use a clear ID scheme (e.g., "person_001", "vehicle_023")
  3. Document your process: Keep notes about occlusions and track decisions
  4. Review passes: Watch forward then backward to catch errors
  5. Use external tools: Consider pre-processing with detection models

Alternative Approaches

For projects requiring full MOT annotation capabilities:

  1. Hybrid workflow: Use Potato for initial labeling and specialized MOT tools for track management
  2. Pre-annotation: Run object detectors to generate initial bounding boxes, then refine in Potato
  3. Post-processing: Export Potato annotations and apply tracking algorithms externally

Next Steps


For current video annotation documentation, see /docs/features/image-annotation.