Poster
in
Workshop: Building Trust in LLMs and LLM Applications: From Guardrails to Explainability to Regulation

AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security

Zikui Cai · Shayan Shabihi · Bang An · Zora Che · Brian Bartoldson · Bhavya Kailkhura · Tom Goldstein · Furong Huang

Project Page [ OpenReview]

Abstract

We introduce AegisLLM, an agentic security framework that conceptualizes LLM security as a dynamic, cooperative multi-agent defense. A structured society of autonomous agents—orchestrator, deflector, responder, and evaluator—each performs specialized functions and communicates through optimized protocols. Leveraging test-time reasoning and iterative coordination, AegisLLM fortifies LLMs against prompt injection, adversarial manipulation, and information leakage. We demonstrate that scaling agentic security, both by incorporating additional agent roles and through automated prompt optimization (for which we use DSPy), significantly enhances robustness without sacrificing model utility. Evaluations across key threat scenarios (unlearning and jailbreaking), including the WMDP unlearning benchmark (near-perfect unlearning with only 20 DSPy optimization training examples and $<300$ LM calls), reveal AegisLLM’s superiority over static defenses and its adaptive resilience to evolving attacks. Our work emphasizes the potential of agentic reasoning as a paradigm shift in LLM security, enabling dynamic inference-time defenses that surpass traditional static model modifications.

Chat is not available.