LSMDC Keyframe Selection

Select representative keyframes from movie clips for video description tasks. Annotators identify frames that best summarize the visual content of each shot.

Archivo de configuraciónconfig.yaml

# LSMDC Keyframe Selection Configuration
# Based on Rohrbach et al., IJCV 2017
# Task: Select representative keyframes from movie clips

annotation_task_name: "LSMDC Keyframe Selection"
task_dir: "."

data_files:
  - data.json
item_properties:
  id_key: "id"
  text_key: "video_url"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - name: "keyframes"
    description: |
      Select the most REPRESENTATIVE frame(s) from this clip.
      A good keyframe captures the main action or content of the shot.
    annotation_type: "video_annotation"
    mode: "keyframe"
    labels:
      - name: "best_keyframe"
        color: "#22C55E"
        key_value: "k"
      - name: "alternative_keyframe"
        color: "#3B82F6"
        key_value: "a"
    frame_stepping: true
    show_timecode: true
    playback_rate_control: true
    video_fps: 24

  - name: "keyframe_quality"
    description: "How representative is the best keyframe of the clip?"
    annotation_type: radio
    labels:
      - "Excellent - captures everything important"
      - "Good - captures main content"
      - "Fair - captures some content"
      - "Poor - no single frame works well"

  - name: "clip_content"
    description: "What does this clip primarily show?"
    annotation_type: radio
    labels:
      - "Person/Character focus"
      - "Action/Movement"
      - "Dialogue scene"
      - "Establishing shot/Environment"
      - "Object focus"
      - "Multiple subjects"

allow_all_users: true
instances_per_annotator: 80
annotation_per_instance: 2

annotation_instructions: |
  ## Keyframe Selection Task

  Select the best frame(s) to represent each movie clip.

  ### What makes a good keyframe?
  - Shows the main subject clearly
  - Captures the key action or moment
  - Is visually clear (not blurry)
  - Could stand alone as a summary of the clip

  ### Guidelines:
  - Select ONE best keyframe per clip
  - Optionally mark alternative keyframes
  - Avoid: blurry frames, transitions, extreme close-ups
  - Prefer: clear faces, complete actions, informative composition

  ### Tips:
  - Use frame stepping to find the exact best frame
  - For dialogue, choose a frame with visible faces
  - For action, choose the peak of the action
  - For establishing shots, choose the most informative view

Datos de ejemplosample-data.json

[
  {
    "id": "lsmdc_001",
    "video_url": "https://example.com/videos/movie_clip_001.mp4",
    "movie": "Sample Movie",
    "clip_duration": 5
  },
  {
    "id": "lsmdc_002",
    "video_url": "https://example.com/videos/movie_clip_002.mp4",
    "movie": "Sample Movie",
    "clip_duration": 8
  }
]

Obtener este diseño

View on GitHub

Clone or download from the repository

Inicio rápido:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/video/summarization/lsmdc-keyframe-selection
potato start config.yaml

Detalles

Tipos de anotación

radiovideo_annotation

Dominio

Computer VisionFilm Studies

Casos de uso

Keyframe SelectionVideo SummarizationMovie Description

Etiquetas

videokeyframemoviedescriptionlsmdcrepresentative

¿Encontró un problema o desea mejorar este diseño?

Abrir un issue

Diseños relacionados

MovieScenes Scene Detection

Detect and annotate scene boundaries in movies. Identify where semantic scene changes occur based on location, time, or narrative shifts.

radiovideo_annotation

Charades-STA Temporal Grounding

Ground natural language descriptions to video segments. Given a sentence describing an action, identify the exact temporal boundaries where that action occurs.

radiovideo_annotation

DiDeMo Moment Retrieval

Localizing natural language descriptions to specific video moments. Given a text query, annotators identify the corresponding temporal segment in the video.

radiovideo_annotation