HowTo100M Instructional Video Annotation

Annotate instructional video clips with step descriptions and visual grounding. Link narrated instructions to visual actions for video-language understanding.

ملف الإعدادconfig.yaml

# HowTo100M Instructional Video Annotation Configuration
# Based on Miech et al., ICCV 2019
# Task: Annotate instructional steps and visual grounding

annotation_task_name: "HowTo100M Instructional Video Annotation"
task_dir: "."

data_files:
  - data.json
item_properties:
  id_key: "id"
  text_key: "video_url"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - name: "step_segments"
    description: |
      Mark the temporal boundaries of each INSTRUCTIONAL STEP.
      A step is one distinct action or instruction being demonstrated.
    annotation_type: "video_annotation"
    mode: "segment"
    labels:
      - name: "step"
        color: "#22C55E"
        key_value: "s"
      - name: "intro_outro"
        color: "#94A3B8"
        key_value: "i"
    frame_stepping: true
    show_timecode: true
    playback_rate_control: true
    video_fps: 30

  - name: "step_description"
    description: "Describe what is being done in this step (imperative form):"
    annotation_type: text

  - name: "visual_alignment"
    description: "How well does the visual content match what's being said?"
    annotation_type: radio
    labels:
      - "Perfect - visual shows exactly what's narrated"
      - "Good - visual mostly matches narration"
      - "Partial - some mismatch between visual and audio"
      - "Poor - visual doesn't match narration"
      - "No narration in this segment"

  - name: "task_category"
    description: "What category of task is this video?"
    annotation_type: radio
    labels:
      - "Cooking/Food"
      - "Home Repair/DIY"
      - "Crafts/Art"
      - "Beauty/Personal Care"
      - "Fitness/Exercise"
      - "Technology/Software"
      - "Gardening/Outdoor"
      - "Other"

  - name: "step_clarity"
    description: "How clear is this instructional step?"
    annotation_type: radio
    labels:
      - "Very clear - easy to follow"
      - "Clear - understandable"
      - "Somewhat clear - some confusion"
      - "Unclear - hard to follow"

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2

annotation_instructions: |
  ## HowTo100M Instructional Video Annotation

  Annotate instructional/tutorial video clips with step boundaries and descriptions.

  ### Task:
  1. Mark the temporal boundaries of each instructional step
  2. Write a brief description of what's being demonstrated
  3. Rate how well the visual matches the narration

  ### What is a step?
  - One distinct action or instruction
  - "Add the flour", "Stir until combined", "Press the button"
  - NOT background talk or transitions

  ### Step descriptions:
  - Use imperative form: "Mix the ingredients" not "The person mixes"
  - Be concise: 3-10 words typically
  - Focus on the ACTION being demonstrated

  ### Visual-Narration Alignment:
  - Perfect: Narrator says "crack the egg" and we see egg cracking
  - Partial: Narrator says "add salt" but we see general cooking
  - Poor: Narrator talks about something not shown

  ### Guidelines:
  - Some clips may have no clear instructional content
  - Mark intro/outro segments separately
  - Narration may be noisy (auto-generated ASR)

  ### Tips:
  - Watch with audio to understand the instruction
  - Steps may overlap with narration timing
  - Focus on what would help someone learn the task

بيانات نموذجيةsample-data.json

[
  {
    "id": "howto_001",
    "video_url": "https://example.com/videos/howto_cooking.mp4",
    "category": "cooking",
    "narration": "First, we're going to add the flour to the bowl",
    "duration": 60
  },
  {
    "id": "howto_002",
    "video_url": "https://example.com/videos/howto_repair.mp4",
    "category": "home_repair",
    "narration": "Now take your screwdriver and remove these screws",
    "duration": 45
  }
]

احصل على هذا التصميم

View on GitHub

Clone or download from the repository

بدء سريع:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/video/instructional/howto100m-instructional
potato start config.yaml

التفاصيل

أنواع التوسيم

radiotextvideo_annotation

المجال

Computer VisionVideo-LanguageInstructional Video

حالات الاستخدام

Video-Text AlignmentStep RecognitionProcedural Understanding

الوسوم

videoinstructionalhowtonarrationstepsgrounding

وجدت مشكلة أو تريد تحسين هذا التصميم؟

افتح مشكلة

تصاميم ذات صلة

YouCook2 Recipe Step Annotation

Annotate cooking videos with recipe step boundaries and descriptions. Segment instructional cooking content into distinct procedural steps.

radiotext

VSTAR Video-grounded Dialogue

Video-grounded dialogue annotation. Annotators watch videos and answer questions requiring situated understanding, write dialogue turns grounded in specific video moments, and mark relevant temporal segments.

video_annotationtext

Charades-STA Temporal Grounding

Ground natural language descriptions to video segments. Given a sentence describing an action, identify the exact temporal boundaries where that action occurs.

radiovideo_annotation