TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees
Weida Li · Yaoliang Yu · Bryan Kian Hsiang Low
Abstract
We revisit the use of probabilistic values, which include the well-known Shapley and Banzhaf values, to rank features for explaining the local predicted values of decision trees. The quality of feature rankings is typically assessed with the insertion and deletion metrics. Empirically, we observe that co-optimizing these two metrics is closely related to a joint optimization that selects a subset of features to maximize the local predicted value while minimizing it for the complement. However, we theoretically show that probabilistic values are generally unreliable for solving this joint optimization. Therefore, we explore deriving feature rankings by directly optimizing the joint objective. As the backbone, we propose TreeGrad, which computes the gradients of the multilinear extension of the joint objective in $O(L)$ time for decision trees with $L$ leaves. Building upon TreeGrad, we introduce TreeGrad-Ranker, which aggregates the gradients while optimizing the joint objective to produce feature rankings, and TreeGrad-Shap, a parallelizable and numerically stable alternative to Linear TreeShap for computing the Shapley value. In practice, we show that the numerical error of Linear TreeShap can be up to $10^{15}$ times larger than that of TreeGrad-Shap. Meanwhile, we also develop TreeProb, which generalizes Linear TreeShap to support all probabilistic values for use as baselines. Empirically, our TreeGrad-Ranker performs significantly better on both insertion and deletion metrics.
Successful Page Load