LSMDC Keyframe Selection

Select representative keyframes from movie clips for video description tasks. Annotators identify frames that best summarize the visual content of each shot.

Configuration Fileconfig.yaml

# LSMDC Keyframe Selection Configuration
# Based on Rohrbach et al., IJCV 2017
# Task: Select representative keyframes from movie clips

annotation_task_name: "LSMDC Keyframe Selection"
task_dir: "."

data_files:
  - data.json
item_properties:
  id_key: "id"
  text_key: "video_url"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - name: "keyframes"
    description: |
      Select the most REPRESENTATIVE frame(s) from this clip.
      A good keyframe captures the main action or content of the shot.
    annotation_type: "video_annotation"
    mode: "keyframe"
    labels:
      - name: "best_keyframe"
        color: "#22C55E"
        key_value: "k"
      - name: "alternative_keyframe"
        color: "#3B82F6"
        key_value: "a"
    frame_stepping: true
    show_timecode: true
    playback_rate_control: true
    video_fps: 24

  - name: "keyframe_quality"
    description: "How representative is the best keyframe of the clip?"
    annotation_type: radio
    labels:
      - "Excellent - captures everything important"
      - "Good - captures main content"
      - "Fair - captures some content"
      - "Poor - no single frame works well"

  - name: "clip_content"
    description: "What does this clip primarily show?"
    annotation_type: radio
    labels:
      - "Person/Character focus"
      - "Action/Movement"
      - "Dialogue scene"
      - "Establishing shot/Environment"
      - "Object focus"
      - "Multiple subjects"

allow_all_users: true
instances_per_annotator: 80
annotation_per_instance: 2

annotation_instructions: |
  ## Keyframe Selection Task

  Select the best frame(s) to represent each movie clip.

  ### What makes a good keyframe?
  - Shows the main subject clearly
  - Captures the key action or moment
  - Is visually clear (not blurry)
  - Could stand alone as a summary of the clip

  ### Guidelines:
  - Select ONE best keyframe per clip
  - Optionally mark alternative keyframes
  - Avoid: blurry frames, transitions, extreme close-ups
  - Prefer: clear faces, complete actions, informative composition

  ### Tips:
  - Use frame stepping to find the exact best frame
  - For dialogue, choose a frame with visible faces
  - For action, choose the peak of the action
  - For establishing shots, choose the most informative view

Sample Datasample-data.json

[
  {
    "id": "lsmdc_001",
    "video_url": "https://example.com/videos/movie_clip_001.mp4",
    "movie": "Sample Movie",
    "clip_duration": 5
  },
  {
    "id": "lsmdc_002",
    "video_url": "https://example.com/videos/movie_clip_002.mp4",
    "movie": "Sample Movie",
    "clip_duration": 8
  }
]

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/video/summarization/lsmdc-keyframe-selection
potato start config.yaml

Details

Annotation Types

radiovideo_annotation

Domain

Computer VisionFilm Studies

Use Cases

Keyframe SelectionVideo SummarizationMovie Description

Related Designs

MovieScenes Scene Detection

Detect and annotate scene boundaries in movies. Identify where semantic scene changes occur based on location, time, or narrative shifts.

radiovideo_annotation

Charades-STA Temporal Grounding

Ground natural language descriptions to video segments. Given a sentence describing an action, identify the exact temporal boundaries where that action occurs.

radiovideo_annotation

DiDeMo Moment Retrieval

Localizing natural language descriptions to specific video moments. Given a text query, annotators identify the corresponding temporal segment in the video.

radiovideo_annotation

LSMDC Keyframe Selection

Configuration Fileconfig.yaml

Sample Datasample-data.json

Get This Design

Details

Annotation Types

Domain

Use Cases

Tags

Related Designs

MovieScenes Scene Detection

Charades-STA Temporal Grounding

DiDeMo Moment Retrieval