Poster
in
Workshop: Building Trust in LLMs and LLM Applications: From Guardrails to Explainability to Regulation

Why Do Multiagent Systems Fail?

Melissa Pan ⋅ Mert Cemri ⋅ Lakshya A Agrawal ⋅ Shuyi Yang ⋅ Bhavya Chopra ⋅ Rishabh Tiwari ⋅ Kurt Keutzer ⋅ Aditya Parameswaran ⋅ Kannan Ramchandran ⋅ Dan Klein ⋅ Joseph E Gonzalez ⋅ Matei Zaharia ⋅ Ion Stoica

Project Page [ OpenReview]

Abstract

Despite growing enthusiasm for Multi-Agent Systems (MAS), where multiple LLM agents collaborate to accomplish tasks, their performance gains across popular benchmarks remain minimal compared to single-agent frameworks. This gap highlights the need to analyze the challenges hindering MAS effectiveness. In this paper we conduct the first comprehensive study of challenges of MAS across 5 popular Multi-Agent Systems over 150+ tasks. We conduct an investigation with four expert human annotators studying the MAS execution traces, identifying 18 fine-grained failure modes, and propose a comprehensive failure taxonomy applicable across systems. We group these fine-grained failure modes into four key categories: (i) specification ambiguities and misalignment, (ii) organizational breakdowns, (iii) inter-agent conflict and coordination gaps, and (iv) weak verification and quality control. To understand whether these failure modes could have easily been avoided, we propose two interventions: improved agents roles specification and orchestration strategies. We find that identified failures require more involved solutions and we outline a roadmap for future research in this space. To contribute towards better development of MAS, we will open source our dataset, including the agent conversation traces and human annotations.

Chat is not available.