Appendix
Formulate Approximate Explain
Merrick and Taly (2020) introduced the Formulate, Approximate Explain (FAE) framework characterized by the idea that the choice of how to estimate the value function must be chosen explicitly with a specific contrastive explanation in mind. They provide a critique, similar to Sundararajan and Najmi (2020) of methods (IME and SHAP) that define the value function using an observational conditional expectation. In particular, they provide a motivating example that shows how these methods can lead to attributions that violate both the dummy and symmetry Shapley value axioms.
The first step (formulate) in their framework is to generate a contrastive question, which they argue leads to a specific reference or distribution of references specific to the use-case. They show how different Shapley-based methods can be unified by considering the different choice of reference distribution used to simulate feature removal. Their notion of a reference distribution is roughly analogous to RBShap. They also introduce the idea of single reference games, which simulates feature absence by replacing values using a specific reference input (identical to BShap). In the second step, the Shapley value is computed in two steps: sampling references from the reference distribution and then approximating the average Shapley value for each feature over this set of single-reference games. This technique allows them to compute confidence intervals (assuming that the approximation method is unbiased), which are used to quantify the uncertainty in the Shapley value estimates as part of the final “explain” stage.
Causal Reasoning and Model Fairness
Kilbertus et al. (2018) were the first to make the case that causal reasoning is required to assess model fairness1. In the Shapley value literature, Heskes et al. (2020) demonstrates how different causal structures, even if they yield the same observational distribution, can lead to different Shapley values. While this is true in general, they provide a concrete example that directly addresses the indirect influence debate. They compare different Shapley-based methods across four causal building blocks (chain, fork, confounder, and cycle) in a scenario where there are two features \(X_1\) and \(X_2\) and a model that depends only on \(X_2\) (as in our earlier example), demonstrating that many Shapley-based methods do not yield attributions with a proper world-causal explanation. For example, marginal SHAP methods are only able to estimate direct effects and therefore yield attributions without a proper world-causal explanation if the underlying structure is a chain (see Figure 3.1 (a)). On the other hand, observational conditional SHAP methods account for the dependence between features, but are unable to account for the fact that an intervention on the same variable in a fork and a confounder have different distributional implications and may result in attributions that do not have a proper world-causal interpretation. In our view, the marginal SHAP attributions have a proper model-causal interpretation even if interpreting them in a world-causal way is improper. Therefore, distinguishing between model and world causality is necessary in order to resolve the indirect influence debate through causal reasoning.
More specifically, they are interested in unresolved discrimination↩︎