1  Introduction

Machine learning models are increasingly being used to replace or augment human judgment. The decisions that these models support have important ramifications, especially in high-stakes medical and financial settings. As a result, there is a growing demand for model explanations that address why a particular prediction was made. While some models are inherently interpretable, others are black-boxes and do not lend themselves easily to explanation. A model may be black box either because it is too complicated to understand, or because access to the model is limited. For example, a model may functionally be a black box if only the model’s predictions are made available for the purposes of explanation. The ubiquitous use of such black-box models to make decisions, has fueled the development of the field of explainable AI (XAI), which encompasses a broad set of methods that can explain a model’s decisions.

Methods derived from the Shapley value are one of the most common ways to generate post-hoc explanations of black-box models. The Shapley value is a solution concept from cooperative game theory that quantifies how to fairly distribute the payout received by a group to each player based on their contribution Shapley (1953). Most Shapley methods treat a model’s features as the players in the game and some aspect of the model, such as its prediction, loss, or variance explained, as the payout. To compute the Shapley value, the explainer specifies a value function, which indicates the payout (e.g. the model’s prediction) that is received for all possible subsets of features. Since most machine learning models cannot handle missing inputs, the value function is responsible for substituting a value for any feature that is not part of the subset. In essence, it must simulate what happens when features are removed from the model (see Janzing, Minorics, and Blöbaum (2019), Merrick and Taly (2020), and Covert (2020)). Numerous options have been proposed: train different models for each subset of features, use a fixed value (e.g. zero), use a value from a reference data point, or use the expected value over a collection of reference points. Each approach yields different results, all of which are, strictly speaking, Shapley values. The decision about how to specify the value function also impacts the interpretation of the resulting values. These differences are not only quantitatively significant, but also take on real-world significance when they are used to explain a model’s decision. For example, Shapley explanations have been proposed as a tool for providing recourse recommendations to subjects of automated decisions, for ensuring algorithmic fairness, and other applications. However, it is quite likely that different Shapley methods will lead to explanations, which lend themselves to different conclusions. It is therefore vitally important for practitioners to make an informed decision about which method is appropriate for a given situation.

Shapley-based methods are used to provide different levels of explanation. Model level explanations treat the model as an input-output system, while world-level explanations account for relationships that exist in the world. Failure to clearly distinguish between different levels of explanation has fueled two ongoing debates in the literature: the choice over which value function is appropriate and whether or not features that indirectly influence the model should have non-zero Shapley values. Recent work resolves these debates by suggesting that what is correct is context-dependent (Chen et al. (2020), Covert (2020)). While we fundamentally agree, we arrive at this conclusion in a more principled way by considering these debates from a causal perspective that is grounded in a model explanation evaluation framework.

In this work, we propose a human-centric framework for evaluating model explanations and review existing Shapley methods in light of this framework. Our primary goal is to equip practitioners with the information necessary to make informed decisions when generating model explanations. Selecting an appropriate method is challenging because it requires evaluating the quality of an explanation. Much of the work in XAI either ignores this problem entirely, or leverages an implicit operational definition of correctness. In the following section, we propose a human-centric framework for assessing the quality of an explanation based on the premise that correct explanations are ones that align with an explainee’s objectives. Next, we provide an introduction to the formalisms used in causal inference as causal reasoning is essential to evaluating explanations. We then provide a brief introduction to Shapley values and how they are used to generate model explanations. With all of this context, we then review existing Shapley methods from a causal perspective and point out the key considerations that practitioners should keep in mind based on our proposed evaluation framework. Combining the prior sections, we then provide practical guidance for selecting an appropriate explanation method. In the final section, we take a step back and consider how a causal perspective changes our understanding of the Shapley explanation literature and its implications for future work on Shapley-based model explanations.