SumMe Video Summarization

Create video summaries by selecting key segments that best represent the content. Annotators identify important moments for automatic video summarization research.

ملف الإعدادconfig.yaml

# SumMe Video Summarization Configuration
# Based on Gygli et al., ECCV 2014
# Task: Select important segments for video summarization

annotation_task_name: "SumMe Video Summarization"
task_dir: "."

data_files:
  - data.json
item_properties:
  id_key: "id"
  text_key: "video_url"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - name: "summary_segments"
    description: |
      Mark segments that should be INCLUDED in a summary of this video.
      Select the most important/interesting parts that capture the essence.
    annotation_type: "video_annotation"
    mode: "segment"
    labels:
      - name: "include_in_summary"
        color: "#22C55E"
        key_value: "s"
      - name: "highly_important"
        color: "#EF4444"
        key_value: "h"
    frame_stepping: true
    show_timecode: true
    playback_rate_control: true
    video_fps: 30

  - name: "importance_score"
    description: "Overall, how interesting/important is this video's content?"
    annotation_type: radio
    labels:
      - "5 - Very interesting/important"
      - "4 - Interesting"
      - "3 - Moderately interesting"
      - "2 - Somewhat boring"
      - "1 - Not interesting"

  - name: "video_category"
    description: "What category best describes this video?"
    annotation_type: radio
    labels:
      - "Sports/Action"
      - "Travel/Scenery"
      - "Events/Celebrations"
      - "Animals/Nature"
      - "Tutorial/How-to"
      - "Social/People"
      - "Other"

  - name: "summary_difficulty"
    description: "How difficult was it to select summary segments?"
    annotation_type: radio
    labels:
      - "Easy - clear highlights"
      - "Moderate - some judgment needed"
      - "Difficult - many equally important parts"
      - "Very difficult - no clear structure"

allow_all_users: true
instances_per_annotator: 25
annotation_per_instance: 3

annotation_instructions: |
  ## Video Summarization Task

  Select segments that should be included in a summary of each video.

  ### Goal:
  If the video were shortened to ~15% of its length, what parts should remain?

  ### What makes a good summary segment?
  - Captures key moments or highlights
  - Shows the main subject/action clearly
  - Is visually interesting or informative
  - Would make sense to someone who hasn't seen the full video

  ### Guidelines:
  - Mark all segments you think should be in the summary
  - Use "highly_important" for the absolute best moments
  - Try to select 10-20% of the video
  - Avoid: repetitive content, blurry/unclear footage, transitions

  ### Tips:
  - Watch the whole video first before marking
  - Consider what would be lost if a segment were removed
  - For action videos, focus on peak moments
  - For scenic videos, focus on the best views

بيانات نموذجيةsample-data.json

[
  {
    "id": "summe_001",
    "video_url": "https://example.com/videos/user_video_cooking.mp4",
    "category": "cooking",
    "duration": 180
  },
  {
    "id": "summe_002",
    "video_url": "https://example.com/videos/user_video_travel.mp4",
    "category": "travel",
    "duration": 240
  }
]

احصل على هذا التصميم

View on GitHub

Clone or download from the repository

بدء سريع:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/video/summarization/summe-summarization
potato start config.yaml

التفاصيل

أنواع التوسيم

radiovideo_annotation

المجال

Computer VisionVideo Summarization

حالات الاستخدام

Video SummarizationHighlight DetectionImportance Scoring

الوسوم

videosummarizationhighlightsimportanceuser-generated

وجدت مشكلة أو تريد تحسين هذا التصميم؟

افتح مشكلة

تصاميم ذات صلة

Charades-STA Temporal Grounding

Ground natural language descriptions to video segments. Given a sentence describing an action, identify the exact temporal boundaries where that action occurs.

radiovideo_annotation

DiDeMo Moment Retrieval

Localizing natural language descriptions to specific video moments. Given a text query, annotators identify the corresponding temporal segment in the video.

radiovideo_annotation

HowTo100M Instructional Video Annotation

Annotate instructional video clips with step descriptions and visual grounding. Link narrated instructions to visual actions for video-language understanding.

radiotext