Diversified Multinomial Logit Contextual Bandits
Heesang Ann · Taehyun Hwang · Min-hwan Oh
Abstract
Conventional (contextual) MNL bandits model relevance-driven choice but ignore the potential benefit of within-assortment diversity, while submodular/combinatorial bandits encode diversity in rewards but lack structured choice probabilities. We bridge this gap with the *diversified multinomial logit* (DMNL) contextual bandit, which augments MNL choice probabilities with a generally submodular diversity function, formalizing the relevance--diversity relation in one model. Embedding diversity renders exact MNL assortment optimization intractable. We develop a *white-box* UCB-based algorithm, $\texttt{OFU-DMNL}$ that builds assortments item-wise by maximizing optimistic marginal gains, avoids black-box oracles, and provides end-to-end guarantees. We show that $\texttt{OFU-DMNL}$ achieves at least a $(1-\tfrac{1}{e+1})$-*approximate* regret bound $\tilde{O}\big(\tfrac{\sqrt{K}(d+1)}{K+1} \sqrt{T}\big)$, where $d$ is the context dimension, $K$ the maximum assortment size, and $T$ the horizon, and attains an improved approximation factor over standard submodular baselines. Experiments show consistent gains and, versus exhaustive enumeration, comparable regret with substantially lower runtime. DMNL bandits serves as a principled and practical basis for diversity-aware assortment optimization under uncertainty and our proposed algorithm offers a both statistically and computationally efficient solution.
Successful Page Load