Multi-Object Tracking Annotation
An overview of multi-object tracking annotation concepts and how Potato's video annotation capabilities can support basic tracking workflows.
Multi-object tracking (MOT) annotation produces training data for things like surveillance, self-driving cars, and sports analytics. This post walks through the core ideas behind MOT annotation and how far Potato's current video features will take you on a basic tracking workflow.
A quick caveat before you read on: Potato does not have dedicated MOT tooling yet. If you need automatic ID management and interpolation today, you will probably want a specialized tool. But for smaller jobs, the manual approach below works fine.
What makes MOT annotation hard
- Keeping object IDs consistent from one frame to the next
- Dealing with objects that get occluded and then come back
- Following objects through crowded scenes
- Sorting out ID switches and merges
What Potato's video annotation does today
Potato handles basic video annotation through the video_annotation type. The MOT-specific niceties (automatic ID management, interpolation, occlusion handling) are not in yet, but you can still set up a basic video labeling workflow.
Basic video annotation setup
annotation_task_name: "Video Object Labeling"
data_files:
- data/videos.json
annotation_schemes:
- annotation_type: video_annotation
name: objects
description: "Label objects in video frames"
video_path: video
labels:
- name: person
- name: vehicle
- name: cyclistSample data format
Your data/videos.json file holds entries with video paths:
[
{
"id": "video_001",
"video": "/path/to/video.mp4"
},
{
"id": "video_002",
"video": "/path/to/another_video.mp4"
}
]Tracking by hand
Without dedicated MOT features, you can still track objects manually. It is more tedious, but it works.
Building tracks one frame at a time
- Go to the frame where an object first shows up
- Label it in the video annotation interface
- Give it a consistent identifier in the annotation, like "person_1"
- Step through the following frames and keep labeling it with that same identifier
Dealing with occlusions
When an object disappears behind something:
- Note the last frame where you could see it
- When it comes back, reuse the same identifier so the track stays continuous
- Jot down the occlusion period in your annotation notes
Features we are thinking about
These would make Potato much better at MOT, and they are on the list for future work:
- Automatic ID assignment that auto-increments IDs for new objects
- Track interpolation, linear or cubic, between keyframes
- Occlusion handling with visibility levels (visible, partial, heavy, not_visible)
- Trajectory visualization to show object paths across frames
- A track management panel for merging, splitting, and managing track IDs
- Per-frame attributes for properties that change frame to frame
If any of these matter to you, get in touch with the Potato team or contribute the feature yourself.
Tips for manual MOT annotation
- Work in short segments of 100 to 200 frames at a time.
- Use a clear ID scheme like "person_001" or "vehicle_023" and stick to it.
- Keep notes about occlusions and the track decisions you made.
- Do a review pass: watch the segment forward, then backward, to catch errors.
- Lean on external tools. Pre-processing with detection models saves a lot of clicking.
Other ways to go about it
If you need full MOT capabilities now, here are a few routes:
- Run a hybrid workflow: do the initial labeling in Potato, then hand off to a specialized MOT tool for track management.
- Pre-annotate with object detectors to generate starting bounding boxes, then refine them in Potato.
- Export your Potato annotations and run tracking algorithms on them after the fact.
Where to go next
- Learn about video frame annotation
- Explore image annotation features
- Read about inter-annotator agreement for quality control
For the full picture on how video annotation works in Potato, see the source documentation.
For current video annotation documentation, see /docs/features/image-annotation.