TVSum Video Summarization
Frame-level importance scoring for video summarization. Annotators rate 2-second shots on a 1-5 importance scale to identify key moments worth including in a summary.
配置文件config.yaml
# TVSum Video Summarization Configuration
# Based on Song et al., CVPR 2015
# Task: Rate 2-second shots on importance scale for video summarization
annotation_task_name: "TVSum Video Summarization"
task_dir: "."
# Data configuration
data_files:
- data.json
item_properties:
id_key: "id"
text_key: "video_url"
# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
# Annotation schemes
annotation_schemes:
# Video player for viewing the content
- name: "video_player"
description: "Watch the video and rate the importance of each segment"
annotation_type: "video"
video_path: "{{video_url}}"
controls: true
autoplay: false
# Current segment importance rating (1-5 scale as per TVSum methodology)
- name: "segment_importance"
description: |
Rate how important the current 2-second segment is for a video summary.
Consider: Would someone watching only a summary want to see this moment?
annotation_type: likert
size: 5
min_label: "Not Important"
max_label: "Very Important"
labels:
- "1 - Not important at all"
- "2 - Slightly important"
- "3 - Moderately important"
- "4 - Important"
- "5 - Very important (must include)"
# Optional: Mark specific highlight moments
- name: "highlight_moments"
description: "Mark any particularly memorable or highlight-worthy moments"
annotation_type: "video_annotation"
mode: "keyframe"
labels:
- name: "key_moment"
color: "#22C55E"
key_value: "k"
- name: "climax"
color: "#EF4444"
key_value: "c"
- name: "transition"
color: "#F59E0B"
key_value: "t"
frame_stepping: true
show_timecode: true
# User configuration
allow_all_users: true
# Task assignment
instances_per_annotator: 50
annotation_per_instance: 3
# Instructions
surveyflow:
on: true
order:
- prolific_id
prolific_id:
type: text
question: "Please enter your annotator ID:"
annotation_instructions: |
## Video Summarization Task
Your goal is to help create automatic video summaries by rating the importance
of video segments.
### Instructions:
1. Watch the video carefully
2. For each 2-second segment, rate its importance on a 1-5 scale:
- **1**: Not important - Can be skipped entirely
- **2**: Slightly important - Background/filler content
- **3**: Moderately important - Relevant but not essential
- **4**: Important - Should probably be in a summary
- **5**: Very important - Must be included in any summary
### What makes a segment important?
- Key events or actions
- Emotional high points
- Information that's essential to understanding the video
- Visually striking or memorable moments
### Tips:
- Compare segments relative to each other within the same video
- Think: "If I only had 15 seconds, would I include this?"
- Use keyboard shortcuts for faster annotation
示例数据sample-data.json
[
{
"id": "tvsum_001",
"video_url": "https://example.com/videos/changing_tire.mp4",
"title": "Changing a Tire Tutorial",
"category": "HowTo",
"duration_seconds": 180,
"description": "A step-by-step tutorial on how to change a flat tire on your car."
},
{
"id": "tvsum_002",
"video_url": "https://example.com/videos/dog_show.mp4",
"title": "Best Dog Show Moments",
"category": "Entertainment",
"duration_seconds": 240,
"description": "Highlights from a local dog show competition featuring various breeds."
}
]
// ... and 3 more items获取此设计
Clone or download from the repository
快速开始:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/video/summarization/tvsum-summarization potato start config.yaml
详情
标注类型
领域
应用场景
标签
发现问题或想改进此设计?
提交 Issue相关设计
VBench Video Generation Quality Assessment
Quality assessment of AI-generated videos. Annotators rate generated videos on multiple dimensions (temporal consistency, motion smoothness, aesthetic quality) and compare pairs of generated videos.
Video-ChatGPT - Video QA Display and Evaluation
Video question answering evaluation based on the Video-ChatGPT benchmark (Maaz et al., ACL 2024). Annotators watch a video, review a model-generated response to a question, and evaluate correctness and quality.
ActivityNet Captions Dense Annotation
Dense temporal annotation with natural language descriptions. Annotators segment videos into events and write descriptive captions for each temporal segment.