Skip to content
Showcase/ActivityNet Temporal Localization
intermediatevideo

ActivityNet Temporal Localization

Temporal activity localization in untrimmed videos. Annotators identify activity instances by marking precise start and end timestamps across 200 activity classes.

Frame 847 / 3200Running01:12 - 01:28Segments:WalkRunStandActionWalkRunStandWalkSceneOutdoorIndoorDrag to create and label temporal segments

Configuration Fileconfig.yaml

# ActivityNet Temporal Localization Configuration
# Based on Heilbron et al., CVPR 2015
# Task: Localize activity instances with start/end times in untrimmed videos

annotation_task_name: "ActivityNet Temporal Localization"
task_dir: "."

# Data configuration
data_files:
  - data.json
item_properties:
  id_key: "id"
  text_key: "video_url"

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

# Annotation schemes
annotation_schemes:
  - name: "activity_segments"
    description: |
      Mark the temporal boundaries of each activity instance in the video.
      Draw segments from when the activity STARTS to when it ENDS.
    annotation_type: "video_annotation"
    mode: "segment"
    labels:
      # Sports activities
      - name: "playing_basketball"
        color: "#F97316"
        key_value: "1"
      - name: "playing_soccer"
        color: "#22C55E"
        key_value: "2"
      - name: "swimming"
        color: "#3B82F6"
        key_value: "3"
      - name: "running"
        color: "#EF4444"
        key_value: "4"
      - name: "gymnastics"
        color: "#A855F7"
        key_value: "5"

      # Household activities
      - name: "cooking"
        color: "#EC4899"
        key_value: "6"
      - name: "cleaning"
        color: "#06B6D4"
        key_value: "7"
      - name: "gardening"
        color: "#84CC16"
        key_value: "8"

      # Personal care
      - name: "brushing_teeth"
        color: "#14B8A6"
        key_value: "9"
      - name: "doing_makeup"
        color: "#F472B6"
        key_value: "0"

      # Music/Performance
      - name: "playing_guitar"
        color: "#8B5CF6"
      - name: "playing_piano"
        color: "#6366F1"
      - name: "singing"
        color: "#D946EF"

      # Outdoor activities
      - name: "hiking"
        color: "#65A30D"
      - name: "fishing"
        color: "#0891B2"
      - name: "camping"
        color: "#059669"

    zoom_enabled: true
    playback_rate_control: true
    frame_stepping: true
    show_timecode: true
    timeline_height: 80
    video_fps: 30

# User configuration
allow_all_users: true

# Task assignment
instances_per_annotator: 40
annotation_per_instance: 2

# Instructions
annotation_instructions: |
  ## ActivityNet Temporal Localization Task

  Your goal is to identify and localize activity instances in untrimmed videos.

  ### What is Temporal Localization?
  - Finding the precise START and END times of activities
  - Videos may contain multiple activities or none at all
  - Activities may overlap or occur sequentially

  ### How to Annotate:
  1. Watch the video to understand its content
  2. For each activity you identify:
     - Select the activity type from the labels
     - Mark the START time (when activity begins)
     - Mark the END time (when activity ends)

  ### Defining Boundaries:
  - **Start**: First frame where the activity is clearly happening
  - **End**: Last frame where the activity is still happening
  - Include preparation if it's part of the activity
  - Exclude unrelated pauses or interruptions

  ### Activity Categories:

  **Sports:** basketball, soccer, swimming, running, gymnastics
  **Household:** cooking, cleaning, gardening
  **Personal Care:** brushing teeth, doing makeup
  **Music:** playing guitar, playing piano, singing
  **Outdoor:** hiking, fishing, camping

  ### Tips:
  - Use slow playback for precise boundaries
  - Zoom the timeline for long videos
  - One video may have multiple instances of the same activity
  - If unsure about boundaries, mark your best estimate
  - Skip segments that don't match any activity class

Sample Datasample-data.json

[
  {
    "id": "anet_001",
    "video_url": "https://example.com/videos/basketball_practice.mp4",
    "duration_seconds": 180,
    "source": "youtube",
    "expected_activity": "playing_basketball",
    "description": "Amateur basketball practice session in a gym"
  },
  {
    "id": "anet_002",
    "video_url": "https://example.com/videos/cooking_tutorial.mp4",
    "duration_seconds": 420,
    "source": "youtube",
    "expected_activity": "cooking",
    "description": "Home cooking tutorial - making pasta from scratch"
  }
]

// ... and 3 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/video/action-recognition/activitynet-temporal-localization
potato start config.yaml

Details

Annotation Types

video_annotation

Domain

Computer VisionVideo Understanding

Use Cases

Activity RecognitionTemporal LocalizationAction Detection

Tags

videoactivitytemporallocalizationactivitynetuntrimmed

Found an issue or want to improve this design?

Open an Issue