AMI Meeting Multi-Tier Annotation

Multi-tier ELAN-style annotation of multi-party meeting recordings. Annotators segment speaker turns, head gestures, and focus of attention on parallel timeline tiers, then classify dialogue acts and topic segments. Based on the AMI Meeting Corpus.

Configuration Fileconfig.yaml

This Potato config reproduces the annotation task. Save it as config.yaml and run potato start config.yaml to try it.

yaml

# AMI Meeting Multi-Tier Annotation Configuration
# Based on Carletta et al., MLMI 2005
# Paper: https://link.springer.com/chapter/10.1007/11677482_3
# Task: ELAN-style multi-tier annotation of multi-party meeting recordings

annotation_task_name: "AMI Meeting Multi-Tier Annotation"
task_dir: "."

# Data configuration
data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "video_url"

# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

# Annotation schemes - ELAN-style parallel tiers aligned to the video timeline
annotation_schemes:
  # Tier 1: Speaker turn segmentation
  - name: "speaker_turn_tier"
    description: |
      Segment the meeting timeline by speaker turns. Mark who is speaking at
      each point, including overlapping speech. Each segment should capture
      a continuous turn by one speaker or an overlap between multiple speakers.
    annotation_type: "video_annotation"
    mode: "segment"
    labels:
      - name: "speaker-A"
        color: "#3B82F6"
        tooltip: "Participant A is the primary speaker"
      - name: "speaker-B"
        color: "#EF4444"
        tooltip: "Participant B is the primary speaker"
      - name: "speaker-C"
        color: "#10B981"
        tooltip: "Participant C is the primary speaker"
      - name: "speaker-D"
        color: "#F59E0B"
        tooltip: "Participant D is the primary speaker"
      - name: "overlap"
        color: "#8B5CF6"
        tooltip: "Two or more participants speaking simultaneously"
    show_timecode: true
    video_fps: 25

  # Tier 2: Head gesture segmentation
  - name: "head_gesture_tier"
    description: |
      Annotate visible head gestures of the currently active speaker or the
      participant being focused on. Mark the type and duration of each
      distinct head movement.
    annotation_type: "video_annotation"
    mode: "segment"
    labels:
      - name: "nod"
        color: "#22C55E"
        tooltip: "Vertical head nod (agreement, acknowledgment, backchannel)"
      - name: "shake"
        color: "#EF4444"
        tooltip: "Horizontal head shake (disagreement, negation)"
      - name: "tilt"
        color: "#A855F7"
        tooltip: "Lateral head tilt (thought, consideration, uncertainty)"
      - name: "turn"
        color: "#F97316"
        tooltip: "Head turn toward a specific person or object"
      - name: "neutral"
        color: "#9CA3AF"
        tooltip: "No notable head movement; neutral or still position"
    show_timecode: true
    video_fps: 25

  # Tier 3: Focus of attention tracking
  - name: "focus_of_attention_tier"
    description: |
      Track where the active speaker or focal participant is directing their
      visual attention. Mark the target of gaze or visual focus at each
      point in time.
    annotation_type: "video_annotation"
    mode: "segment"
    labels:
      - name: "whiteboard"
        color: "#06B6D4"
        tooltip: "Looking at the whiteboard"
      - name: "slides"
        color: "#84CC16"
        tooltip: "Looking at the projected slides or screen"
      - name: "speaker-A"
        color: "#3B82F6"
        tooltip: "Looking at participant A"
      - name: "speaker-B"
        color: "#EF4444"
        tooltip: "Looking at participant B"
      - name: "speaker-C"
        color: "#10B981"
        tooltip: "Looking at participant C"
      - name: "speaker-D"
        color: "#F59E0B"
        tooltip: "Looking at participant D"
      - name: "notes"
        color: "#6366F1"
        tooltip: "Looking down at personal notes or documents"
      - name: "table"
        color: "#78716C"
        tooltip: "Looking at the table or objects on it"
    show_timecode: true
    video_fps: 25

  # Tier 4: Dialogue act classification
  - name: "dialogue_act"
    description: "Classify the dialogue act type of the current speaker turn or utterance."
    annotation_type: radio
    labels:
      - "inform"
      - "suggest"
      - "assess"
      - "comment"
      - "elicit-inform"
      - "elicit-suggest"
      - "elicit-assessment"
      - "backchannel"
      - "stall"
      - "fragment"
    keyboard_shortcuts:
      inform: "1"
      suggest: "2"
      assess: "3"
      comment: "4"
      backchannel: "5"

  # Tier 5: Topic segment classification
  - name: "topic_segment"
    description: "Classify the meeting topic or phase that this segment belongs to."
    annotation_type: radio
    labels:
      - "opening"
      - "agenda"
      - "design-discussion"
      - "budget"
      - "action-items"
      - "closing"
      - "off-topic"
    keyboard_shortcuts:
      opening: "q"
      agenda: "w"
      design-discussion: "e"
      budget: "r"
      action-items: "t"
      closing: "y"
      off-topic: "u"

# HTML layout
html_layout: |
  <div style="max-width: 900px; margin: 0 auto;">
    <h3 style="margin-bottom: 8px;">AMI Meeting: Multi-Tier Meeting Annotation</h3>
    <p style="color: #666; font-size: 14px; margin-bottom: 16px;">
      Annotate multi-party meeting recordings across parallel tiers for speaker turns,
      head gestures, visual attention, dialogue acts, and topic segments.
    </p>
    <div style="text-align: center; margin-bottom: 20px;">
      <video controls width="720" style="max-width: 100%; border-radius: 8px; border: 1px solid #ddd;">
        <source src="{{video_url}}" type="video/mp4">
        Your browser does not support video playback.
      </video>
    </div>
    <div style="background: #f8f9fa; padding: 12px; border-radius: 6px; margin-bottom: 16px; font-size: 13px;">
      <strong>Multi-Tier Instructions:</strong> Annotate the meeting across five parallel tiers:
      speaker turns, head gestures, focus of attention, dialogue acts, and topic segments.
      Use the video controls to navigate frame by frame for precise boundary placement.
    </div>
  </div>

# User configuration
allow_all_users: true

# Task assignment
instances_per_annotator: 30
annotation_per_instance: 2

# Instructions
annotation_instructions: |
  ## AMI Meeting Multi-Tier Annotation

  This task uses ELAN-style multi-tier annotation for multi-party meeting
  recordings from the AMI Meeting Corpus.

  ### Tier 1: Speaker Turn Segmentation
  - Segment the meeting timeline by who is speaking:
    - **Speaker A/B/C/D**: The identified participant holds the floor
    - **Overlap**: Two or more participants speaking simultaneously
  - Mark clean turn boundaries at the start and end of each contribution
  - Short backchannels (e.g., "mm-hmm") during another speaker's turn
    should be marked as overlap if they are audible

  ### Tier 2: Head Gesture Annotation
  - Mark visible head gestures of the focal participant:
    - **Nod**: Vertical movement (agreement, acknowledgment)
    - **Shake**: Horizontal movement (disagreement, negation)
    - **Tilt**: Lateral tilt (consideration, uncertainty)
    - **Turn**: Deliberate head turn toward a person or object
    - **Neutral**: No notable head movement

  ### Tier 3: Focus of Attention
  - Track the visual attention target of the focal participant:
    - **Whiteboard/Slides**: Looking at shared visual resources
    - **Speaker A-D**: Looking at a specific participant
    - **Notes**: Looking at personal notes or documents
    - **Table**: Looking at the table or objects on it
  - Gaze shifts should be marked at the moment of transition

  ### Tier 4: Dialogue Act Classification
  - Classify each speaker turn by its communicative function:
    - **Inform**: Providing information or facts
    - **Suggest**: Making a suggestion or proposal
    - **Assess**: Evaluating or judging something
    - **Comment**: Personal reaction or remark
    - **Elicit-inform/suggest/assessment**: Requesting information, suggestions, or opinions
    - **Backchannel**: Minimal response showing attention (mm-hmm, yeah, ok)
    - **Stall**: Hesitation or time-buying (well, so, let me think)
    - **Fragment**: Incomplete or abandoned utterance

  ### Tier 5: Topic Segment
  - Classify the meeting phase or topic:
    - **Opening**: Greetings and meeting start
    - **Agenda**: Setting or reviewing the agenda
    - **Design discussion**: Core design-related conversation
    - **Budget**: Budget or resource-related discussion
    - **Action items**: Assigning tasks and next steps
    - **Closing**: Wrap-up and meeting end
    - **Off-topic**: Social chat or tangential discussion

  ### Quality Notes
  - Focus on one participant at a time for head gesture and attention tiers
  - Speaker turns should have no gaps (silence = previous speaker's turn end)
  - Dialogue acts apply per utterance within a turn, not per entire turn

Sample Datasample-data.json

json

[
  {
    "id": "ami_001",
    "video_url": "https://example.com/videos/ami/IS1009a_segment_001.mp4",
    "meeting_id": "IS1009a",
    "meeting_type": "scenario",
    "num_participants": 4,
    "duration_seconds": 45.2
  },
  {
    "id": "ami_002",
    "video_url": "https://example.com/videos/ami/IS1009b_segment_001.mp4",
    "meeting_id": "IS1009b",
    "meeting_type": "scenario",
    "num_participants": 4,
    "duration_seconds": 38.7
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/video/ami-meeting-annotation
potato start config.yaml

Dataset & paper

Carletta et al., MLMI 2005

Official dataset ↗Read the paper ↗

Citation (BibTeX)

bibtex

@inproceedings{carletta2005ami,
  title={The AMI Meeting Corpus: A Pre-announcement},
  author={Carletta, Jean and Ashby, Simone and Bourban, Sebastien and Flynn, Mike and Guillemot, Mael and Hain, Thomas and Kadlec, Jaroslav and Karaiskos, Vasilis and Kraaij, Wessel and Kronenthal, Melissa and others},
  booktitle={Proceedings of the Second International Conference on Machine Learning for Multimodal Interaction},
  pages={28--39},
  year={2005},
  publisher={Springer}
}

Details

Annotation Types

video_annotationradio

Domain

Discourse AnalysisMeeting UnderstandingHCI

Use Cases

Meeting SummarizationDialogue Act ClassificationFocus of Attention Tracking

Related Designs

CHILDES Child Language Multi-Tier Annotation

Multi-tier ELAN-style annotation of child-adult interaction videos for language acquisition research. Annotators segment utterance boundaries on the timeline, provide morphological and syntactic annotations, and classify communicative context and error types. Based on the CHILDES/TalkBank project.

video_annotationtext

CMU-MOSEI: Multimodal Sentiment and Emotion Dataset

CMU-MOSEI is the largest multimodal dataset for sentiment and emotion analysis, with 23,453 annotated YouTube clips spanning text, audio, and video. This Potato config reproduces its multi-tier timeline annotation.

video_annotationradio

DGS Corpus Sign Language Multi-Tier Annotation

Multi-tier ELAN-style annotation of German Sign Language (DGS) corpus videos. Annotators segment sign types, mouth gestures, non-manual signals, classify discourse functions, and provide German translations across parallel tiers aligned to the video timeline.

video_annotationradio

AMI Meeting Multi-Tier Annotation

Configuration Fileconfig.yaml

Sample Datasample-data.json

Get This Design

Dataset & paper

Details

Annotation Types

Domain

Use Cases

Tags

Related Designs

CHILDES Child Language Multi-Tier Annotation

CMU-MOSEI: Multimodal Sentiment and Emotion Dataset

DGS Corpus Sign Language Multi-Tier Annotation