Breakfast Actions Segmentation
Fine-grained temporal action segmentation of breakfast preparation activities. Annotators label sequences of cooking actions like 'take cup', 'pour milk', 'stir'.
Configuration Fileconfig.yaml
# Breakfast Actions Segmentation Configuration
# Based on Kuehne et al., IJCV 2014
# Task: Fine-grained temporal segmentation of breakfast preparation
annotation_task_name: "Breakfast Actions Segmentation"
task_dir: "."
data_files:
- data.json
item_properties:
id_key: "id"
text_key: "video_url"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
- name: "breakfast_actions"
description: |
Segment the video into fine-grained cooking actions.
Mark each atomic action from start to finish.
annotation_type: "video_annotation"
mode: "segment"
labels:
# Object manipulation
- name: "take"
color: "#3B82F6"
key_value: "t"
- name: "put"
color: "#1D4ED8"
key_value: "p"
# Pouring actions
- name: "pour"
color: "#22C55E"
key_value: "o"
- name: "spoon"
color: "#16A34A"
key_value: "s"
# Mixing actions
- name: "stir"
color: "#F97316"
key_value: "r"
- name: "crack"
color: "#EA580C"
key_value: "c"
# Cutting actions
- name: "cut"
color: "#EF4444"
key_value: "u"
- name: "peel"
color: "#DC2626"
key_value: "l"
# Cooking actions
- name: "fry"
color: "#8B5CF6"
key_value: "f"
- name: "butter"
color: "#A855F7"
key_value: "b"
# Other
- name: "squeeze"
color: "#EC4899"
key_value: "q"
- name: "background"
color: "#6B7280"
key_value: "g"
zoom_enabled: true
playback_rate_control: true
frame_stepping: true
timeline_height: 90
- name: "object_involved"
description: "What object is involved in this action?"
annotation_type: text
placeholder: "e.g., cup, egg, pan, butter, cereal"
allow_all_users: true
instances_per_annotator: 25
annotation_per_instance: 2
annotation_instructions: |
## Breakfast Actions Segmentation Task
Segment cooking videos into atomic actions.
### Action Vocabulary:
- **take**: Pick up an object
- **put**: Put down an object
- **pour**: Pour liquid/granules
- **spoon**: Scoop with spoon
- **stir**: Mix with stirring motion
- **crack**: Crack open (eggs)
- **cut**: Cut with knife
- **peel**: Remove outer layer
- **fry**: Cook in pan
- **butter**: Spread butter
- **squeeze**: Squeeze (juice)
- **background**: Non-action segments
### Guidelines:
- Segment ALL frames (no gaps)
- Each segment = one atomic action
- Note the object involved
- Actions can repeat multiple times
Sample Datasample-data.json
[
{
"id": "breakfast_001",
"video_url": "https://example.com/videos/making_cereal.mp4",
"activity": "cereal",
"duration_seconds": 120
},
{
"id": "breakfast_002",
"video_url": "https://example.com/videos/making_pancakes.mp4",
"activity": "pancake",
"duration_seconds": 300
}
]
// ... and 1 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/video/action-recognition/breakfast-actions potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
EPIC-KITCHENS Egocentric Action Annotation
Annotate fine-grained actions in egocentric kitchen videos with verb-noun pairs. Identify cooking actions from a first-person perspective.
How2Sign Sign Language Multi-Tier Annotation
Multi-tier ELAN-style annotation of continuous American Sign Language videos. Annotators segment sign glosses, mark mouthing patterns, classify sign handedness, and provide English translations aligned to video timelines. Based on the How2Sign large-scale multimodal ASL dataset.
ActivityNet Captions Dense Annotation
Dense temporal annotation with natural language descriptions. Annotators segment videos into events and write descriptive captions for each temporal segment.