YouCook2 Recipe Step Annotation
Annotate cooking videos with recipe step boundaries and descriptions. Segment instructional cooking content into distinct procedural steps.
File di configurazioneconfig.yaml
# YouCook2 Recipe Step Annotation Configuration
# Based on Zhou et al., AAAI 2018
# Task: Segment cooking videos into recipe steps with descriptions
annotation_task_name: "YouCook2 Recipe Step Annotation"
task_dir: "."
data_files:
- data.json
item_properties:
id_key: "id"
text_key: "video_url"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
- name: "recipe_steps"
description: |
Mark the temporal boundaries of each RECIPE STEP.
A step is one distinct cooking action or procedure.
annotation_type: "video_annotation"
mode: "segment"
labels:
- name: "recipe_step"
color: "#22C55E"
key_value: "r"
frame_stepping: true
show_timecode: true
playback_rate_control: true
video_fps: 30
- name: "step_description"
description: "Describe this recipe step in one sentence:"
annotation_type: text
- name: "step_type"
description: "What type of cooking action is this step?"
annotation_type: radio
labels:
- "Preparation (cutting, measuring, gathering)"
- "Cooking (heating, frying, boiling)"
- "Mixing (combining, stirring, blending)"
- "Seasoning (adding spices, salt, sauce)"
- "Plating (arranging, serving, garnishing)"
- "Other"
- name: "ingredients_visible"
description: "Are the main ingredients clearly visible?"
annotation_type: radio
labels:
- "Yes - all ingredients visible"
- "Partially - some ingredients visible"
- "No - ingredients not clearly shown"
- name: "step_difficulty"
description: "How difficult is this cooking step?"
annotation_type: radio
labels:
- "Easy - basic technique"
- "Moderate - some skill required"
- "Difficult - advanced technique"
allow_all_users: true
instances_per_annotator: 40
annotation_per_instance: 2
annotation_instructions: |
## YouCook2 Recipe Step Annotation
Segment cooking videos into distinct recipe steps and describe each.
### What is a Recipe Step?
- One distinct cooking action
- Has clear beginning and end
- Can be described in one sentence
### Example Steps:
- "Dice the onions into small pieces"
- "Add olive oil to the heated pan"
- "Stir the mixture until smooth"
- "Bake in the oven for 20 minutes"
### Step Description Guidelines:
- Use imperative form ("Add..." not "Adding...")
- Include key ingredients/tools mentioned
- Be specific but concise (5-15 words)
- Don't include timing unless essential
### Boundary Rules:
- START: When the cook begins the action
- END: When the action is complete
- Brief pauses within an action = same step
- Talking without action = exclude if possible
### NOT separate steps:
- Repeated actions (stirring multiple times = one step)
- Camera angle changes during same action
- Brief interruptions
### Tips:
- Watch the whole clip first to understand the recipe
- Typical recipes have 5-15 major steps
- Focus on actions, not commentary
- Some steps may overlap with narration timing
Dati di esempiosample-data.json
[
{
"id": "youcook_001",
"video_url": "https://example.com/videos/cooking_pasta.mp4",
"recipe": "Pasta Carbonara",
"duration": 300
},
{
"id": "youcook_002",
"video_url": "https://example.com/videos/cooking_salad.mp4",
"recipe": "Caesar Salad",
"duration": 180
}
]Ottieni questo design
Clone or download from the repository
Avvio rapido:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/video/instructional/youcook2-instructional potato start config.yaml
Dettagli
Tipi di annotazione
Dominio
Casi d'uso
Tag
Hai trovato un problema o vuoi migliorare questo design?
Apri un problemaDesign correlati
HowTo100M Instructional Video Annotation
Annotate instructional video clips with step descriptions and visual grounding. Link narrated instructions to visual actions for video-language understanding.
VSTAR Video-grounded Dialogue
Video-grounded dialogue annotation. Annotators watch videos and answer questions requiring situated understanding, write dialogue turns grounded in specific video moments, and mark relevant temporal segments.
Charades-STA Temporal Grounding
Ground natural language descriptions to video segments. Given a sentence describing an action, identify the exact temporal boundaries where that action occurs.