How2Sign Sign Language Multi-Tier Annotation
Multi-tier ELAN-style annotation of continuous American Sign Language videos. Annotators segment sign glosses, mark mouthing patterns, classify sign handedness, and provide English translations aligned to video timelines. Based on the How2Sign large-scale multimodal ASL dataset.
Configuration Fileconfig.yaml
# How2Sign Sign Language Multi-Tier Annotation Configuration
# Based on Duarte et al., CVPR 2021
# Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Duarte_How2Sign_A_Large-Scale_Multimodal_Dataset_for_Continuous_American_Sign_Language_CVPR_2021_paper.pdf
# Task: ELAN-style multi-tier annotation of continuous ASL signing
annotation_task_name: "How2Sign Sign Language Multi-Tier Annotation"
task_dir: "."
# Data configuration
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "video_url"
# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
# Annotation schemes - ELAN-style parallel tiers aligned to the video timeline
annotation_schemes:
# Tier 1: Gloss segmentation - mark individual sign boundaries and types
- name: "gloss_tier"
description: |
Segment the video into individual sign units and classify each sign type.
Mark the start and end time of each sign, identifying whether it is a
lexical sign, fingerspelling, classifier construction, pointing, or gesture.
annotation_type: "video_annotation"
mode: "segment"
labels:
- name: "lexical-sign"
color: "#3B82F6"
tooltip: "A standard ASL sign from the lexicon"
- name: "fingerspelling"
color: "#EF4444"
tooltip: "Manual spelling of a word letter by letter"
- name: "classifier"
color: "#10B981"
tooltip: "Classifier predicate showing shape, movement, or location"
- name: "pointing"
color: "#F59E0B"
tooltip: "Indexical pointing to establish or reference a location"
- name: "gesture"
color: "#8B5CF6"
tooltip: "Non-lexical communicative gesture (e.g., shrug, head tilt)"
show_timecode: true
video_fps: 30
# Tier 2: Mouthing patterns - mark mouth activity aligned to signs
- name: "mouthing_tier"
description: |
Annotate the mouthing patterns visible during signing. Mark segments
where the signer produces full English-derived mouthings, reduced
mouthings, or no mouthing at all.
annotation_type: "video_annotation"
mode: "segment"
labels:
- name: "full-mouth"
color: "#06B6D4"
tooltip: "Full English word mouthing visible on lips"
- name: "reduced-mouth"
color: "#84CC16"
tooltip: "Partial or reduced mouthing of an English word"
- name: "no-mouthing"
color: "#9CA3AF"
tooltip: "No visible mouthing component"
show_timecode: true
video_fps: 30
# Tier 3: Sign handedness classification
- name: "sign_handedness"
description: "Classify the dominant handedness pattern for the current sign segment."
annotation_type: radio
labels:
- "one-handed"
- "two-handed"
- "body-anchored"
keyboard_shortcuts:
one-handed: "1"
two-handed: "2"
body-anchored: "3"
# Tier 4: English translation (free text)
- name: "english_translation"
description: "Provide a fluent English translation of the signed content in this clip."
annotation_type: text
textarea: true
# Tier 5: ASL gloss notation (free text)
- name: "gloss_text"
description: |
Write the ASL gloss notation for the signed content using standard
conventions (e.g., UPPERCASE for glosses, fs- prefix for fingerspelling,
IX for pointing, CL: for classifiers).
annotation_type: text
textarea: true
# HTML layout
html_layout: |
<div style="max-width: 900px; margin: 0 auto;">
<h3 style="margin-bottom: 8px;">How2Sign: Multi-Tier ASL Annotation</h3>
<p style="color: #666; font-size: 14px; margin-bottom: 16px;">
Annotate the signing video across multiple parallel tiers, similar to ELAN annotation.
</p>
<div style="text-align: center; margin-bottom: 20px;">
<video controls width="720" style="max-width: 100%; border-radius: 8px; border: 1px solid #ddd;">
<source src="{{video_url}}" type="video/mp4">
Your browser does not support video playback.
</video>
</div>
<div style="background: #f8f9fa; padding: 12px; border-radius: 6px; margin-bottom: 16px; font-size: 13px;">
<strong>Multi-Tier Instructions:</strong> Use the timeline-aligned tiers below to annotate
sign glosses, mouthing patterns, handedness, and provide translations. Each tier captures
a different linguistic dimension of the signing.
</div>
</div>
# User configuration
allow_all_users: true
# Task assignment
instances_per_annotator: 30
annotation_per_instance: 2
# Instructions
annotation_instructions: |
## How2Sign Multi-Tier ASL Annotation
This task uses an ELAN-style multi-tier approach to annotate continuous
American Sign Language videos from the How2Sign dataset.
### Tier 1: Gloss Segmentation
- Segment the video into individual sign units on the timeline
- For each segment, select the sign type:
- **Lexical sign**: A standard ASL sign from the lexicon
- **Fingerspelling**: Manual spelling of a word (e.g., names, technical terms)
- **Classifier**: Classifier predicates showing shape, size, movement, or location
- **Pointing**: Indexical pointing to establish referents in signing space
- **Gesture**: Non-lexical communicative gestures (shrugs, head tilts, etc.)
### Tier 2: Mouthing Patterns
- In parallel, mark the mouthing behavior visible on the signer's face
- **Full mouthing**: Clear English-derived word shape on the lips
- **Reduced mouthing**: Partial or abbreviated lip movement
- **No mouthing**: No visible mouth component for that segment
### Tier 3: Sign Handedness
- For each gloss segment, classify the hand configuration:
- **One-handed**: Produced with the dominant hand only
- **Two-handed**: Requires both hands
- **Body-anchored**: Contact with the body is a key part of the sign
### Tier 4: English Translation
- Provide a natural, fluent English translation of the entire clip
### Tier 5: ASL Gloss Notation
- Write the gloss sequence using standard ASL transcription conventions:
- UPPERCASE for sign glosses (e.g., HOUSE, WANT)
- fs- prefix for fingerspelling (e.g., fs-JOHN)
- IX for indexical pointing
- CL: for classifiers (e.g., CL:3-vehicle-move)
### Quality Guidelines
- Align tier boundaries precisely to sign onset and offset
- When unsure of a sign boundary, use slow-motion playback
- Mouthing tiers should align with but may not exactly match gloss boundaries
- Translation should be natural English, not word-for-word gloss
Sample Datasample-data.json
[
{
"id": "how2sign_001",
"video_url": "https://example.com/videos/how2sign/cooking_tutorial_001.mp4",
"signer_id": "signer_03",
"topic": "cooking tutorial - making pasta",
"duration_seconds": 12.4
},
{
"id": "how2sign_002",
"video_url": "https://example.com/videos/how2sign/travel_guide_001.mp4",
"signer_id": "signer_07",
"topic": "travel guide - visiting national parks",
"duration_seconds": 15.8
}
]
// ... and 6 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/video/how2sign-sign-language potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
EPIC-KITCHENS Egocentric Action Annotation
Annotate fine-grained actions in egocentric kitchen videos with verb-noun pairs. Identify cooking actions from a first-person perspective.
DGS Corpus Sign Language Multi-Tier Annotation
Multi-tier ELAN-style annotation of German Sign Language (DGS) corpus videos. Annotators segment sign types, mouth gestures, non-manual signals, classify discourse functions, and provide German translations across parallel tiers aligned to the video timeline.
HowTo100M Instructional Video Annotation
Annotate instructional video clips with step descriptions and visual grounding. Link narrated instructions to visual actions for video-language understanding.