Skip to content
Docs/Annotation Types

Video Annotation

Annotate temporal segments, classify frames, mark keyframes, and track objects in videos.

Video Annotation

Potato provides comprehensive video annotation capabilities including temporal segmentation, frame classification, keyframe marking, and object tracking.

Annotation Modes

ModeDescriptionUse Case
segmentMark temporal rangesScene detection, speaker turns
frameClassify individual framesFrame-level labeling
keyframeMark important momentsHighlight detection
trackingTrack objects across framesObject tracking
combinedAll modes in one interfaceComplex annotation tasks

Basic Configuration

yaml
annotation_schemes:
  - name: "video_segments"
    description: "Mark segments where the speaker changes"
    annotation_type: "video_annotation"
    labels:
      - name: "Speaker A"
        color: "#FF6B6B"
      - name: "Speaker B"
        color: "#4ECDC4"

Configuration Options

FieldTypeDefaultDescription
namestringRequiredUnique identifier for the annotation
descriptionstringRequiredInstructions shown to annotators
annotation_typestringRequiredMust be "video_annotation"
labelslistRequiredAvailable annotation labels
modestring"segment"Annotation mode
min_segmentsinteger0Minimum required annotations
max_segmentsintegernullMaximum allowed annotations
timeline_heightinteger70Timeline height in pixels
overview_heightinteger40Overview bar height in pixels
zoom_enabledbooleantrueEnable timeline zoom
playback_rate_controlbooleantrueShow playback speed selector
frame_steppingbooleantrueEnable frame-by-frame navigation
show_timecodebooleantrueDisplay frame number and time
video_fpsinteger30Video frame rate for calculations

Label Configuration

Labels can be simple strings or detailed objects with colors and keyboard shortcuts:

yaml
labels:
  - name: "intro"
    color: "#4ECDC4"
    key_value: "1"
  - name: "content"
    color: "#3B82F6"
    key_value: "2"
  - name: "outro"
    color: "#8B5CF6"
    key_value: "3"

Examples by Mode

Segment Mode (Default)

Mark temporal ranges with start and end times:

yaml
annotation_schemes:
  - name: "scene_detection"
    description: "Mark each scene in the video"
    annotation_type: "video_annotation"
    mode: "segment"
    labels:
      - name: "indoor"
        color: "#3B82F6"
        key_value: "1"
      - name: "outdoor"
        color: "#22C55E"
        key_value: "2"
      - name: "transition"
        color: "#F59E0B"
        key_value: "3"
    zoom_enabled: true
    playback_rate_control: true

Frame Mode

Classify individual frames:

yaml
annotation_schemes:
  - name: "frame_quality"
    description: "Classify each frame's quality"
    annotation_type: "video_annotation"
    mode: "frame"
    labels:
      - name: "good"
        color: "#22C55E"
      - name: "blurry"
        color: "#F59E0B"
      - name: "occluded"
        color: "#EF4444"
    frame_stepping: true
    show_timecode: true

Keyframe Mode

Mark important moments:

yaml
annotation_schemes:
  - name: "highlights"
    description: "Mark key moments in the video"
    annotation_type: "video_annotation"
    mode: "keyframe"
    labels:
      - name: "goal"
        color: "#22C55E"
      - name: "foul"
        color: "#EF4444"
      - name: "highlight"
        color: "#F59E0B"
    frame_stepping: true

Tracking Mode

Track objects across frames with bounding boxes:

yaml
annotation_schemes:
  - name: "object_tracking"
    description: "Track the ball throughout the video"
    annotation_type: "video_annotation"
    mode: "tracking"
    labels:
      - name: "ball"
        color: "#3B82F6"
      - name: "player"
        color: "#22C55E"
    frame_stepping: true
    video_fps: 30

Combined Mode

Use multiple annotation types in one interface:

yaml
annotation_schemes:
  - name: "comprehensive"
    description: "Full video annotation"
    annotation_type: "video_annotation"
    mode: "combined"
    labels:
      - name: "action"
        color: "#3B82F6"
      - name: "dialogue"
        color: "#22C55E"
      - name: "transition"
        color: "#F59E0B"
    zoom_enabled: true
    playback_rate_control: true
    frame_stepping: true
    show_timecode: true

Video Display

For simple video playback without annotation (e.g., to accompany other annotation types), use the video type:

yaml
annotation_schemes:
  - name: "video_player"
    description: "Watch the video clip"
    annotation_type: "video"
    video_path: "{{video_url}}"
    controls: true
    autoplay: false
    loop: false
    muted: false

Keyboard Shortcuts

KeyAction
SpacePlay/pause
, / .Previous/next frame
[Mark segment start
]Mark segment end
EnterCreate segment
KMark keyframe
CClassify current frame
DeleteRemove selected annotation
1-9Select label
+ / -Zoom in/out

Data Format

Input Data

Your data file should include video paths or URLs:

json
[
  {
    "id": "video_1",
    "video_url": "https://example.com/videos/sample1.mp4",
    "description": "Meeting recording - identify speakers"
  },
  {
    "id": "video_2",
    "video_url": "/data/videos/sample2.mp4",
    "description": "Activity video - label actions"
  }
]

Output Format

Output varies by mode:

Segment Mode:

json
{
  "id": "video_1",
  "annotations": {
    "scene_detection": [
      {
        "start": 0.0,
        "end": 5.2,
        "start_frame": 0,
        "end_frame": 156,
        "label": "indoor"
      }
    ]
  }
}

Frame Mode:

json
{
  "id": "video_1",
  "annotations": {
    "frame_quality": [
      {
        "frame": 120,
        "time": 4.0,
        "label": "good"
      }
    ]
  }
}

Keyframe Mode:

json
{
  "id": "video_1",
  "annotations": {
    "highlights": [
      {
        "frame": 450,
        "time": 15.0,
        "label": "goal",
        "note": "First goal of the match"
      }
    ]
  }
}

Tracking Mode:

json
{
  "id": "video_1",
  "annotations": {
    "object_tracking": [
      {
        "id": "track_1",
        "label": "ball",
        "frames": [
          {"frame": 0, "bbox": {"x": 100, "y": 200, "w": 30, "h": 30}},
          {"frame": 1, "bbox": {"x": 105, "y": 198, "w": 30, "h": 30}}
        ]
      }
    ]
  }
}

Supported Video Formats

  • MP4 (recommended)
  • WebM
  • MOV
  • OGG

Video Object Tracking with Keyframe Interpolation

New in v2.2.0

The tracking mode now supports keyframe interpolation, allowing annotators to mark object positions at key frames and have intermediate frames automatically interpolated. This significantly speeds up object tracking tasks.

yaml
annotation_schemes:
  - name: "tracking"
    description: "Track objects with keyframe interpolation"
    annotation_type: "video_annotation"
    mode: "tracking"
    labels:
      - name: "person"
        color: "#3B82F6"
      - name: "vehicle"
        color: "#22C55E"
    frame_stepping: true
    video_fps: 30

Workflow

  1. Navigate to a keyframe and draw a bounding box around the object
  2. Skip ahead several frames and reposition the bounding box
  3. The system automatically interpolates positions for intermediate frames
  4. Review and adjust interpolated positions as needed

The full set of valid annotation modes is: segment, frame, keyframe, tracking, and combined.

Technical Notes

  • Uses Peaks.js for timeline visualization
  • Standard HTML5 video elements (no additional server dependencies)
  • Timeline supports drag-to-select for segment creation

Best Practices

  1. Use compressed videos - Large files slow down loading
  2. Set appropriate FPS - Match video_fps to your actual video frame rate
  3. Enable frame stepping - Essential for precise frame-level annotation
  4. Use playback rate control - Slow motion helps with detailed work
  5. Provide clear segment definitions - Define what constitutes segment boundaries
  6. Use distinct colors - Make labels visually distinguishable
  7. Consider timeline height - Increase for complex segmentation tasks