Agentic Collaboration as an Information Bottleneck Problem
Shizhe He · Avanika Narayan · Ishan Khare · Christopher Re · Scott Linderman · Dan Biderman
Abstract
Agentic language model (LM) systems have rapidly become central to modern workflows, powering applications like "Deep Research" and "Claude Code." As contexts grow beyond what even the largest frontier models can process effectively, multi-LM architectures have emerged to overcome context limitations. Beneath their apparent diversity lies a recurring pattern: smaller "compressor" LMs distill raw context into compact text that is then consumed by larger "predictor" LMs that interact with the user. Despite their popularity, the design of compressor-predictor systems remains largely ad hoc. Little guidance exists on how compressor and predictor choices shape downstream performance. In practice, attributing gains to compression versus prediction requires exhaustive pairwise sweeps, which is costly and task-specific. We argue that these agentic system design questions are, at root, information-theoretic. Viewing the compressor LM as a "noisy channel", we introduce a simple estimator of the mutual information between the context and its compression to quantify compression quality in a task-independent way. We show that mutual information strongly predicts downstream performance, independent of any specific task. Through an information-theoretic framework, we perform a comprehensive empirical analysis across five datasets and three model families. Results reveal that larger compressors not only are more accurate, but also more token-efficient, conveying more bits of information per token. A 7B Qwen-2.5 compressor, for instance, is $1.6\times$ more accurate, $4.6\times$ more concise, and conveys $5.5\times$ more bits of mutual information per token. Across the datasets studied, scaling compressors is substantially more effective than scaling predictors, enabling larger on-device compressors to pair with smaller cloud predictors. When applied to a Deep Research system, these principles enable local compressors as small as 3B parameters to recover 99% of frontier-LM accuracy at 26% of API costs.
Successful Page Load