Frozen Priors, Fluid Forecasts: Prequential Uncertainty for Low-Data Deployment with Pretrained Generative Models
Fernando Ruiz-Mazo · Vikas Garg
Abstract
Deploying ML systems with only a few real samples makes operational metrics (such as alert rates or mean scores) highly unstable. Existing uncertainty quantification (UQ) methods fail here: frequentist intervals ignore the deployed predictive rule, Bayesian posteriors assume continual refitting, and conformal methods offer per-example rather than long-run guarantees. We introduce a forecast-first UQ framework that blends the empirical distribution with a frozen pretrained generator using a unique Dirichlet schedule, ensuring time-consistent forecasts. Uncertainty is quantified via martingale posteriors: a lightweight, likelihood-free resampling method that simulates future forecasts under the deployed rule, yielding sharp, well-calibrated intervals for both current and long-run metrics without retraining or density evaluation. A single hyperparameter, set by a small-$n$ minimax criterion, balances sampling variance and model--data mismatch; for bounded scores, we provide finite-time drift guarantees. We also show how this framework informs optimal retraining decisions. Applicable off-the-shelf to frozen generators (flows, diffusion, autoregressive models, GANs) and linear metrics (means, tails, NLL), it outperforms bootstrap baselines across vision and language benchmarks (WikiText-2, CIFAR-10, and SVHN datasets); e.g., it achieves $\sim$90\% coverage on GPT-2 with 20 samples vs.\ 37\% for bootstrap. Importantly, our uncertainty estimates are operational under the deployed forecasting rule agnostic of the population parameters, affording practicable estimators for deployment in real world settings.
Successful Page Load