ActivityNet Temporal Localization
Temporal activity localization in untrimmed videos. Annotators identify activity instances by marking precise start and end timestamps across 200 activity classes.
Configuration Fileconfig.yaml
# ActivityNet Temporal Localization Configuration
# Based on Heilbron et al., CVPR 2015
# Task: Localize activity instances with start/end times in untrimmed videos
annotation_task_name: "ActivityNet Temporal Localization"
task_dir: "."
# Data configuration
data_files:
- data.json
item_properties:
id_key: "id"
text_key: "video_url"
# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
# Annotation schemes
annotation_schemes:
- name: "activity_segments"
description: |
Mark the temporal boundaries of each activity instance in the video.
Draw segments from when the activity STARTS to when it ENDS.
annotation_type: "video_annotation"
mode: "segment"
labels:
# Sports activities
- name: "playing_basketball"
color: "#F97316"
key_value: "1"
- name: "playing_soccer"
color: "#22C55E"
key_value: "2"
- name: "swimming"
color: "#3B82F6"
key_value: "3"
- name: "running"
color: "#EF4444"
key_value: "4"
- name: "gymnastics"
color: "#A855F7"
key_value: "5"
# Household activities
- name: "cooking"
color: "#EC4899"
key_value: "6"
- name: "cleaning"
color: "#06B6D4"
key_value: "7"
- name: "gardening"
color: "#84CC16"
key_value: "8"
# Personal care
- name: "brushing_teeth"
color: "#14B8A6"
key_value: "9"
- name: "doing_makeup"
color: "#F472B6"
key_value: "0"
# Music/Performance
- name: "playing_guitar"
color: "#8B5CF6"
- name: "playing_piano"
color: "#6366F1"
- name: "singing"
color: "#D946EF"
# Outdoor activities
- name: "hiking"
color: "#65A30D"
- name: "fishing"
color: "#0891B2"
- name: "camping"
color: "#059669"
zoom_enabled: true
playback_rate_control: true
frame_stepping: true
show_timecode: true
timeline_height: 80
video_fps: 30
# User configuration
allow_all_users: true
# Task assignment
instances_per_annotator: 40
annotation_per_instance: 2
# Instructions
annotation_instructions: |
## ActivityNet Temporal Localization Task
Your goal is to identify and localize activity instances in untrimmed videos.
### What is Temporal Localization?
- Finding the precise START and END times of activities
- Videos may contain multiple activities or none at all
- Activities may overlap or occur sequentially
### How to Annotate:
1. Watch the video to understand its content
2. For each activity you identify:
- Select the activity type from the labels
- Mark the START time (when activity begins)
- Mark the END time (when activity ends)
### Defining Boundaries:
- **Start**: First frame where the activity is clearly happening
- **End**: Last frame where the activity is still happening
- Include preparation if it's part of the activity
- Exclude unrelated pauses or interruptions
### Activity Categories:
**Sports:** basketball, soccer, swimming, running, gymnastics
**Household:** cooking, cleaning, gardening
**Personal Care:** brushing teeth, doing makeup
**Music:** playing guitar, playing piano, singing
**Outdoor:** hiking, fishing, camping
### Tips:
- Use slow playback for precise boundaries
- Zoom the timeline for long videos
- One video may have multiple instances of the same activity
- If unsure about boundaries, mark your best estimate
- Skip segments that don't match any activity class
Sample Datasample-data.json
[
{
"id": "anet_001",
"video_url": "https://example.com/videos/basketball_practice.mp4",
"duration_seconds": 180,
"source": "youtube",
"expected_activity": "playing_basketball",
"description": "Amateur basketball practice session in a gym"
},
{
"id": "anet_002",
"video_url": "https://example.com/videos/cooking_tutorial.mp4",
"duration_seconds": 420,
"source": "youtube",
"expected_activity": "cooking",
"description": "Home cooking tutorial - making pasta from scratch"
}
]
// ... and 3 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/video/action-recognition/activitynet-temporal-localization potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
ActivityNet Captions Dense Annotation
Dense temporal annotation with natural language descriptions. Annotators segment videos into events and write descriptive captions for each temporal segment.
AVA Atomic Visual Actions
Spatio-temporal action annotation in movie clips. Annotators localize people with bounding boxes and label their atomic actions (pose, person-object, person-person interactions) in 1-second intervals.
Charades Indoor Activity Segmentation
Multi-label temporal activity segmentation in indoor home videos. Annotators identify action instances using compositional verb-object labels (e.g., 'opening door', 'sitting on chair') with precise temporal boundaries.