Skip to content
Showcase/MAST Failure Taxonomy
advancedtext

MAST Failure Taxonomy

Annotate multi-agent system traces to identify failure modes from the MAST taxonomy, rate severity, pinpoint the first failure step, and describe the failure mechanism.

Q1: Rate your experience12345Q2: Primary use case?ResearchIndustryEducationQ3: Additional feedback

Configuration Fileconfig.yaml

# MAST Failure Taxonomy
# Based on "Why Do Multi-Agent LLM Systems Fail?" (Cemri et al., arXiv 2025)
# Task: Classify failure modes in multi-agent LLM system traces using the MAST taxonomy

annotation_task_name: "MAST Failure Taxonomy"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

html_layout: |
  <div class="container" style="font-family: Arial, sans-serif; max-width: 1000px; margin: 0 auto;">
    <div style="background: #e8f4fd; padding: 14px; border-radius: 8px; margin-bottom: 14px;">
      <h3 style="margin: 0 0 8px 0; color: #1a5276;">Task</h3>
      <p style="margin: 0; font-size: 15px;">{{text}}</p>
      <span style="display: inline-block; margin-top: 8px; background: #8e44ad; color: #fff; padding: 3px 10px; border-radius: 12px; font-size: 12px;">{{agent_type}}</span>
    </div>
    <div style="background: #fafafa; border: 1px solid #ddd; padding: 14px; border-radius: 8px; margin-bottom: 14px;">
      <h3 style="margin: 0 0 10px 0; color: #2c3e50;">Multi-Agent Trace</h3>
      <div style="font-size: 14px; line-height: 1.8; white-space: pre-wrap;">{{trajectory}}</div>
    </div>
    <div style="background: #fdedec; border: 1px solid #e74c3c; padding: 14px; border-radius: 8px;">
      <h4 style="margin: 0 0 8px 0; color: #922b21;">Outcome</h4>
      <p style="margin: 0; font-size: 14px;">{{outcome}}</p>
    </div>
  </div>

annotation_schemes:
  - name: failure_modes
    description: "Select all failure modes from the MAST taxonomy that apply to this trace."
    annotation_type: multiselect
    labels:
      - "Specification Ambiguity"
      - "Task Decomposition Error"
      - "Resource Misallocation"
      - "Role Confusion"
      - "Miscommunication"
      - "Information Withholding"
      - "Conflicting Actions"
      - "Cascading Errors"
      - "Premature Termination"
      - "Infinite Loop"
      - "Hallucinated Action"
      - "Tool Misuse"
      - "Verification Failure"
      - "No Failure Detected"

  - name: severity
    description: "How severe is the overall failure in this trace?"
    annotation_type: radio
    labels:
      - "Critical"
      - "Major"
      - "Minor"
      - "None"
    keyboard_shortcuts:
      "Critical": "1"
      "Major": "2"
      "Minor": "3"
      "None": "4"

  - name: first_failure_step
    description: "At which step did the first failure occur?"
    annotation_type: radio
    labels:
      - "Step 1"
      - "Step 2"
      - "Step 3"
      - "Step 4"
      - "Step 5"
      - "Step 6"
      - "No Failure"

  - name: failure_description
    description: "Explain the failure: what went wrong, why, and how it affected the outcome."
    annotation_type: text

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2

Sample Datasample-data.json

[
  {
    "id": "mast-001",
    "text": "Write a Python web scraper that extracts product prices from three e-commerce sites and saves results to a CSV file.",
    "agent_type": "CrewAI",
    "trajectory": "[Planner Agent] Step 1: Decompose task into subtasks: (1) build scraper for site A, (2) build scraper for site B, (3) build scraper for site C, (4) merge results into CSV.\n\n[Coder Agent A] Step 2: Writing scraper for site A using BeautifulSoup... Code generated. Extracts prices from div.price-tag elements.\n\n[Coder Agent B] Step 3: Writing scraper for site B using Selenium... Code generated. Extracts prices from span.cost elements.\n\n[Coder Agent C] Step 4: Writing scraper for site C using BeautifulSoup... Code generated. Extracts prices from div.price-tag elements. (Note: Copied structure from Agent A but site C uses span.product-price)\n\n[Merger Agent] Step 5: Combining outputs into CSV... Agent A output: 45 prices. Agent B output: 38 prices. Agent C output: 0 prices (empty list).\n\n[Merger Agent] Step 6: CSV generated with data from sites A and B only. Reports task as complete.",
    "outcome": "PARTIAL FAILURE: CSV file created but missing all data from site C. Agent C copied incorrect selectors from Agent A without adapting to site C's HTML structure. Merger Agent did not flag the empty results."
  },
  {
    "id": "mast-002",
    "text": "Collaboratively write a research literature review on transformer architectures, covering attention mechanisms, efficiency improvements, and applications.",
    "agent_type": "AutoGen",
    "trajectory": "[Coordinator] Step 1: Assigning sections - Writer A: attention mechanisms, Writer B: efficiency improvements, Writer C: applications.\n\n[Writer A] Step 2: Drafting attention section... Covers self-attention, multi-head attention, cross-attention. 800 words generated.\n\n[Writer B] Step 3: Drafting efficiency section... Covers sparse attention, linear attention, FlashAttention. Also includes a paragraph on self-attention basics (duplicating Writer A's content).\n\n[Writer C] Step 4: Drafting applications section... Covers NLP, vision, and speech. References 'the efficient attention methods discussed above' but Writer B's section will appear after Writer C's in final document.\n\n[Editor Agent] Step 5: Merging sections in order: A, C, B. Does not notice the ordering creates a forward reference issue in section C.\n\n[Coordinator] Step 6: Final review - approves document. Total: 2,400 words.",
    "outcome": "FAILURE: Document has duplicated content between sections A and B, and section C references content that appears later in the document. Editor did not catch structural issues."
  }
]

// ... and 6 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/agentic/mast-failure-taxonomy
potato start config.yaml

Details

Annotation Types

multiselectradiotext

Domain

Agentic AIMulti-Agent SystemsFailure Analysis

Use Cases

Failure ClassificationSystem Debugging

Tags

multi-agentfailure-modestaxonomydebuggingllm-systems

Found an issue or want to improve this design?

Open an Issue