intermediatevideo
HowTo100M Instructional Video Annotation
Annotate instructional video clips with step descriptions and visual grounding. Link narrated instructions to visual actions for video-language understanding.
ملف الإعدادconfig.yaml
# HowTo100M Instructional Video Annotation Configuration
# Based on Miech et al., ICCV 2019
# Task: Annotate instructional steps and visual grounding
annotation_task_name: "HowTo100M Instructional Video Annotation"
task_dir: "."
data_files:
- data.json
item_properties:
id_key: "id"
text_key: "video_url"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
- name: "step_segments"
description: |
Mark the temporal boundaries of each INSTRUCTIONAL STEP.
A step is one distinct action or instruction being demonstrated.
annotation_type: "video_annotation"
mode: "segment"
labels:
- name: "step"
color: "#22C55E"
key_value: "s"
- name: "intro_outro"
color: "#94A3B8"
key_value: "i"
frame_stepping: true
show_timecode: true
playback_rate_control: true
video_fps: 30
- name: "step_description"
description: "Describe what is being done in this step (imperative form):"
annotation_type: text
- name: "visual_alignment"
description: "How well does the visual content match what's being said?"
annotation_type: radio
labels:
- "Perfect - visual shows exactly what's narrated"
- "Good - visual mostly matches narration"
- "Partial - some mismatch between visual and audio"
- "Poor - visual doesn't match narration"
- "No narration in this segment"
- name: "task_category"
description: "What category of task is this video?"
annotation_type: radio
labels:
- "Cooking/Food"
- "Home Repair/DIY"
- "Crafts/Art"
- "Beauty/Personal Care"
- "Fitness/Exercise"
- "Technology/Software"
- "Gardening/Outdoor"
- "Other"
- name: "step_clarity"
description: "How clear is this instructional step?"
annotation_type: radio
labels:
- "Very clear - easy to follow"
- "Clear - understandable"
- "Somewhat clear - some confusion"
- "Unclear - hard to follow"
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
annotation_instructions: |
## HowTo100M Instructional Video Annotation
Annotate instructional/tutorial video clips with step boundaries and descriptions.
### Task:
1. Mark the temporal boundaries of each instructional step
2. Write a brief description of what's being demonstrated
3. Rate how well the visual matches the narration
### What is a step?
- One distinct action or instruction
- "Add the flour", "Stir until combined", "Press the button"
- NOT background talk or transitions
### Step descriptions:
- Use imperative form: "Mix the ingredients" not "The person mixes"
- Be concise: 3-10 words typically
- Focus on the ACTION being demonstrated
### Visual-Narration Alignment:
- Perfect: Narrator says "crack the egg" and we see egg cracking
- Partial: Narrator says "add salt" but we see general cooking
- Poor: Narrator talks about something not shown
### Guidelines:
- Some clips may have no clear instructional content
- Mark intro/outro segments separately
- Narration may be noisy (auto-generated ASR)
### Tips:
- Watch with audio to understand the instruction
- Steps may overlap with narration timing
- Focus on what would help someone learn the task
بيانات نموذجيةsample-data.json
[
{
"id": "howto_001",
"video_url": "https://example.com/videos/howto_cooking.mp4",
"category": "cooking",
"narration": "First, we're going to add the flour to the bowl",
"duration": 60
},
{
"id": "howto_002",
"video_url": "https://example.com/videos/howto_repair.mp4",
"category": "home_repair",
"narration": "Now take your screwdriver and remove these screws",
"duration": 45
}
]احصل على هذا التصميم
View on GitHub
Clone or download from the repository
بدء سريع:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/video/instructional/howto100m-instructional potato start config.yaml
التفاصيل
أنواع التوسيم
radiotextvideo_annotation
المجال
Computer VisionVideo-LanguageInstructional Video
حالات الاستخدام
Video-Text AlignmentStep RecognitionProcedural Understanding
الوسوم
videoinstructionalhowtonarrationstepsgrounding
وجدت مشكلة أو تريد تحسين هذا التصميم؟
افتح مشكلةتصاميم ذات صلة
YouCook2 Recipe Step Annotation
Annotate cooking videos with recipe step boundaries and descriptions. Segment instructional cooking content into distinct procedural steps.
radiotext
VSTAR Video-grounded Dialogue
Video-grounded dialogue annotation. Annotators watch videos and answer questions requiring situated understanding, write dialogue turns grounded in specific video moments, and mark relevant temporal segments.
video_annotationtext
Charades-STA Temporal Grounding
Ground natural language descriptions to video segments. Given a sentence describing an action, identify the exact temporal boundaries where that action occurs.
radiovideo_annotation