Scene Boundary Detection

Identify scene boundaries in documentary and narrative videos. Annotators mark transitions between semantically coherent scenes based on visual, audio, and narrative cues.

Configuration Fileconfig.yaml

This Potato config reproduces the annotation task. Save it as config.yaml and run potato start config.yaml to try it.

yaml

# Scene Boundary Detection Configuration
# Based on BBC Planet Earth Scene Dataset (Sidiropoulos et al., 2011)
# Task: Mark scene transitions in documentary videos

annotation_task_name: "Scene Boundary Detection"
task_dir: "."

data_files:
  - data.json
item_properties:
  id_key: "id"
  text_key: "video_url"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - name: "scene_boundaries"
    description: |
      Mark the START of each new scene. A scene is a semantically coherent
      segment unified by location, time, characters, or narrative topic.
    annotation_type: "video_annotation"
    mode: "keyframe"
    labels:
      - name: "scene_start"
        color: "#EF4444"
        key_value: "s"
      - name: "gradual_transition"
        color: "#F97316"
        key_value: "g"
      - name: "cut_transition"
        color: "#3B82F6"
        key_value: "c"
    frame_stepping: true
    show_timecode: true
    playback_rate_control: true

  - name: "scene_type"
    description: "What type of scene follows this boundary?"
    annotation_type: radio
    labels:
      - "Establishing shot"
      - "Action/Event"
      - "Dialogue/Interview"
      - "Transition/Montage"
      - "Credits/Title"

allow_all_users: true
instances_per_annotator: 30
annotation_per_instance: 2

annotation_instructions: |
  ## Scene Boundary Detection Task

  Mark where each new SCENE begins in the video.

  ### What defines a scene boundary?
  - Change in location or setting
  - Significant time jump
  - Change in main subject/topic
  - Narrative shift

  ### Transition Types:
  - **Cut**: Instantaneous change between shots
  - **Gradual**: Fade, dissolve, or wipe transition

  ### Tips:
  - A scene is NOT the same as a shot (scenes contain multiple shots)
  - Mark the FIRST frame of the new scene
  - Use frame stepping for precision

Sample Datasample-data.json

json

[
  {
    "id": "scene_001",
    "video_url": "https://example.com/videos/nature_documentary_ep1.mp4",
    "title": "Planet Earth - Mountains Episode",
    "duration_seconds": 600
  },
  {
    "id": "scene_002",
    "video_url": "https://example.com/videos/nature_documentary_ep2.mp4",
    "title": "Planet Earth - Ocean Deep",
    "duration_seconds": 540
  }
]

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/video/boundary-detection/scene-boundary-detection
potato start config.yaml

Dataset & paper

Sidiropoulos et al., ACM MM 2011

Official dataset ↗Read the paper ↗

Citation (BibTeX)

bibtex

@inproceedings{sidiropoulos2011temporal,
  title={Temporal video segmentation to scenes using high-level audiovisual features},
  author={Sidiropoulos, Panagiotis and Mezaris, Vasileios and Kompatsiaris, Ioannis and Meinedo, Hugo and Bugalho, Miguel and Trancoso, Isabel},
  booktitle={Proceedings of the 2011 ACM Multimedia Conference},
  year={2011}
}

Details

Annotation Types

radiovideo_annotation

Domain

Computer VisionVideo Understanding

Use Cases

Scene DetectionVideo SegmentationContent Analysis

Related Designs

DiDeMo Moment Retrieval

Localizing natural language descriptions to specific video moments. Given a text query, annotators identify the corresponding temporal segment in the video.

radiovideo_annotation

VSTAR Video-grounded Dialogue

Video-grounded dialogue annotation. Annotators watch videos and answer questions requiring situated understanding, write dialogue turns grounded in specific video moments, and mark relevant temporal segments.

video_annotationtext

YouTube Highlights Detection

Detect highlight-worthy moments in domain-specific videos. Annotators identify the most engaging segments for automatic highlight generation.

radiovideo_annotation

Scene Boundary Detection

Configuration Fileconfig.yaml

Sample Datasample-data.json

Get This Design

Dataset & paper

Details

Annotation Types

Domain

Use Cases

Tags

Related Designs

DiDeMo Moment Retrieval

VSTAR Video-grounded Dialogue

YouTube Highlights Detection