Showcase/FineGym Dataset: Fine-Grained Gymnastics Action Recognition

advancedimage

FineGym Dataset: Fine-Grained Gymnastics Action Recognition

FineGym is a hierarchical video dataset (CVPR 2020) for fine-grained gymnastics action recognition: events, sets, and 99-530 element classes annotated with millisecond-level temporal boundaries. See the label taxonomy and a runnable Potato config to annotate or extend it.

About this dataset

FineGym is a hierarchical video dataset for fine-grained action understanding, introduced by Shao, Zhao, Dai, and Lin at CVPR 2020 (oral). It targets gymnastics, where actions look nearly identical and differ only in subtle motion, to push action recognition past coarse activity labels.

Annotations use a three-level semantic hierarchy: events (the apparatus, such as balance beam or floor exercise), sets (groups of related sub-actions like leaps, turns, or dismounts), and elements (individual gymnastic skills). Boundaries are marked at both the action and sub-action level with millisecond precision.

Two evaluation subsets are standard: Gym99 (99 element classes, roughly balanced) and Gym288 (288 classes, long-tailed); the full element label space is referred to as Gym530. The women's artistic gymnastics events covered are Balance Beam (BB), Floor Exercise (FX), Uneven Bars (UB), and Vault (VT).

The Potato config below reproduces the element-segmentation task: segment-mode video annotation for marking element boundaries, plus radio schemes for event, element group, difficulty, and execution quality. Use it to re-annotate FineGym clips, extend the taxonomy, or build a similar fine-grained sports dataset.

Released: CVPR 2020 (oral)
Domain: Women's artistic gymnastics
Hierarchy: Events -> Sets -> Elements
Element classes: 99 / 288 / 530 (Gym99 / Gym288 / Gym530)
Events: Balance Beam, Floor, Uneven Bars, Vault
Temporal precision: Millisecond-level boundaries

Configuration Fileconfig.yaml

This Potato config reproduces the annotation task. Save it as config.yaml and run potato start config.yaml to try it.

yaml

# FineGym Action Segmentation Configuration
# Based on Shao et al., CVPR 2020
# Task: Annotate hierarchical gymnastic actions

annotation_task_name: "FineGym Action Segmentation"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "video_url"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - name: "action_segments"
    description: |
      Mark the temporal boundaries of each ELEMENT (distinct skill/move).
      An element is one complete gymnastic skill.
    annotation_type: "video_annotation"
    mode: "segment"
    labels:
      - name: "element"
        color: "#22C55E"
        key_value: "e"
      - name: "transition"
        color: "#94A3B8"
        key_value: "t"
    frame_stepping: true
    show_timecode: true
    playback_rate_control: true
    video_fps: 25

  - name: "event_type"
    description: "What gymnastic EVENT is this?"
    annotation_type: radio
    # FineGym covers women's artistic gymnastics only (4 events).
    labels:
      - "Floor Exercise (FX)"
      - "Vault (VT)"
      - "Uneven Bars (UB)"
      - "Balance Beam (BB)"

  - name: "element_group"
    description: "What GROUP does the current element belong to?"
    annotation_type: radio
    labels:
      - "Leap/Jump"
      - "Turn/Spin"
      - "Tumbling"
      - "Balance/Hold"
      - "Swing"
      - "Flight/Release"
      - "Mount"
      - "Dismount"
      - "Dance/Choreography"

  - name: "element_difficulty"
    description: "Estimated difficulty of the element:"
    annotation_type: radio
    labels:
      - "A - Basic"
      - "B - Intermediate"
      - "C - Advanced"
      - "D - Superior"
      - "E+ - Elite"
      - "Unsure"

  - name: "execution_quality"
    description: "How well was the element executed?"
    annotation_type: radio
    labels:
      - "Excellent - no visible errors"
      - "Good - minor deductions"
      - "Fair - noticeable errors"
      - "Poor - major errors"
      - "Fall/incomplete"

allow_all_users: true
instances_per_annotator: 40
annotation_per_instance: 2

annotation_instructions: |
  ## FineGym Action Segmentation

  Annotate gymnastic routines with hierarchical action labels.

  ### Hierarchy:
  1. **Event** - The apparatus (Floor, Beam, Bars, etc.)
  2. **Set** - A connected sequence of elements
  3. **Element** - One distinct skill/move

  ### What is an element?
  - A single, complete gymnastic skill
  - Has clear start and end positions
  - Examples: back handspring, split leap, giant swing

  ### Element Groups (examples):
  - **Leap/Jump**: Split leap, straddle jump, tour jeté
  - **Turn/Spin**: Pirouette, wolf turn, fouetté
  - **Tumbling**: Handspring, salto, twist
  - **Balance/Hold**: Scale, handstand, planche
  - **Swing**: Giant, clear hip, stalder
  - **Flight/Release**: Tkatchev, Jaeger, Gienger

  ### Guidelines:
  - Mark each element separately
  - Include transitions between elements
  - Use frame-stepping for precise boundaries
  - Gymnastics expertise helpful but not required

  ### Tips:
  - Elements start/end at defined positions
  - Watch for connection bonuses (elements linked together)
  - Slow motion helps identify complex skills

Sample Datasample-data.json

json

[
  {
    "id": "finegym_001",
    "video_url": "https://example.com/videos/gymnastics_floor.mp4",
    "event": "FX",
    "athlete": "Athlete A",
    "competition": "Sample Competition"
  },
  {
    "id": "finegym_002",
    "video_url": "https://example.com/videos/gymnastics_beam.mp4",
    "event": "BB",
    "athlete": "Athlete B",
    "competition": "Sample Competition"
  }
]

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/video/action-recognition/finegym-action-segments
potato start config.yaml

Dataset & paper

Shao et al., CVPR 2020

Official dataset ↗Read the paper ↗

Citation (BibTeX)

bibtex

@inproceedings{shao2020finegym,
    title={FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding},
    author={Shao, Dian and Zhao, Yue and Dai, Bo and Lin, Dahua},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    pages={2613--2622},
    year={2020}
}

Details

Annotation Types

radiovideo_annotation

Domain

Computer VisionSports AnalyticsActivity Recognition

Use Cases

Action SegmentationHierarchical ClassificationSports Analysis

Related Designs

EPIC-KITCHENS Egocentric Action Annotation

Annotate fine-grained actions in egocentric kitchen videos with verb-noun pairs. Identify cooking actions from a first-person perspective.

radiotext

How2Sign Sign Language Multi-Tier Annotation

Multi-tier ELAN-style annotation of continuous American Sign Language videos. Annotators segment sign glosses, mark mouthing patterns, classify sign handedness, and provide English translations aligned to video timelines. Based on the How2Sign large-scale multimodal ASL dataset.

video_annotationradio

MSAD Multi-Scenario Anomaly Detection

Video anomaly detection across multiple scenarios. Annotators watch surveillance-style videos and mark temporal segments containing anomalous events, classifying the anomaly type.