Poster

Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation

Haruka Kiyohara ⋅ Ren Kishimoto ⋅ Kosuke Kawakami ⋅ Ken Kobayashi ⋅ Kazuhide Nakata ⋅ Yuta Saito

2024 Poster

Project Page [ Slides] [ Poster] [ OpenReview]

Abstract

**Off-Policy Evaluation (OPE)** aims to assess the effectiveness of counterfactual policies using offline logged data and is frequently utilized to identify the top-$k$ promising policies for deployment in online A/B tests. Existing evaluation metrics for OPE estimators primarily focus on the "accuracy" of OPE or that of downstream policy selection, neglecting risk-return tradeoff and *efficiency* in subsequent online policy deployment. To address this issue, we draw inspiration from portfolio evaluation in finance and develop a new metric, called **SharpeRatio@k**, which measures the risk-return tradeoff and efficiency of policy portfolios formed by an OPE estimator under varying online evaluation budgets ($k$). We first demonstrate, in two example scenarios, that our proposed metric can clearly distinguish between conservative and high-stakes OPE estimators and reliably identify the most *efficient* estimator capable of forming superior portfolios of candidate policies that maximize return with minimal risk during online deployment, while existing evaluation metrics produce only degenerate results. To facilitate a quick, accurate, and consistent evaluation of OPE via SharpeRatio@k, we have also implemented the proposed metric in an open-source software. Using SharpeRatio@k and the software, we conduct a benchmark experiment of various OPE estimators regarding their risk-return tradeoff, presenting several future directions for OPE research.

Video

Chat is not available.