2 Evaluation Framework
In this section, we propose a human-centric framework for evaluating model explanation methods that is grounded in causal reasoning and the premise that explanations should align with an explainee’s objectives. An evaluation framework is necessary because the multiplicity of available methods makes it impossible to generate every possible explanation, effectively forcing practitioners (explainers) to select a subset of methods. However, to perform this selection in a principled way requires the explainer to define criteria for comparing methods. Since there are innumerable valid criteria, it is helpful to establish a principle, from which, criteria are derived. One approach, leveraged by Miller (2018), is to start from the premise that explanations should mirror human explanations. Consequently, model explanations that more closely resemble human explanations are considered better, which implicitly treats human explanations as a gold standard. In our view, this is a valid, but arguably objectionable standard. Instead, we propose the principle that every explanation should be aligned with the purpose of the explanation as defined by the explainee. The implication is that explanations that are more closely aligned with the explainee’s objectives are better, or more correct, than explanations that are misaligned. In this way, we avoid the pitfalls of prioritizing human-like explanations, while maintaining a human-centric approach.
To aid practitioners in generating human-centric explanations, we propose a four step process closely modeled after the formulate-approximate-explain (FAE) framework Merrick and Taly (2020). First, the explainer and explainee must specify a set of target explanatory questions that explanations should answer. Next, the explainer identifies an explanation-generating method aligned with each question. Third, the explainer generates the explanations and provides them to the explainee. In the final stage, the explanations are interpreted, evaluated, and the cycle may be repeated.
Every explanation is an answer to a question, so the first step in generating correct model explanations is to specify the question that the explanation should address. In our view, these questions can be elicited by the explainee, but should be governed by the explainee’s objectives. The explainer and explainee’s objectives may not always be aligned Mittelstadt, Russell, and Wachter (2019), necessitating a choice over whose objectives to prioritize. In our framework, we prioritize the explainee’s objectives. Since there are many potential explainees, each of which may have different objectives, these questions must be formulated on a per-case basis. A second consideration that the explainer must keep in mind is whether the specified questions are answerable under the relevant constraints of the explanation-generating process. These constraints could involve the explainer’s knowledge of the available methods, the time available for generating explanations, the explainee’s level of domain expertise, the acceptability of the assumptions required to generate the explanation, and many other factors.
Every target explanatory question has an associated level and type. By level, we refer to the idea introduced earlier that explanations can either consider the model independent of the real world, or attempt to account for real-world relationships. We will address exactly what this means in subsequent sections. A target explanatory question can also be one of three types: associative, interventional, and counterfactual. Together, these groups are referred to as the “ladder of causality” Pearl (2009). Although explanation and causality may seem like separate topics, they are highly intertwined Miller (2018). To make things concrete, consider a model used to predict risk of default as part of a loan application. The following are all possible explanatory questions:
- How did race influence the model’s decision to approve the application?
- What if the user increases their income to \(X'\) from \(X\)?
- Would my loan application have been approved had my income been \(X'\) rather than \(X\)?
Each of these questions implies a different explainee with different objectives: a model auditor interested in assessing the model for potential bias, an employee of the bank trying to understand the model’s behavior, and an individual interested in what might have happened under different circumstances.
Specifying a target question may require multiple iterations, in particular, to resolve any lingering ambiguity. For example, the intended level of explanation is not immediately clear with the current wording . We suggest that it is the explainer’s responsibility to resolve such ambiguity when possible. In situations where the explainer cannot interact directly with the explainee, the explainer may be forced to make assumptions about the explainees objectives.
Once the target questions have been specified, the next step is to select an explanation-generating method that addresses each question. In order to make this selection, the explainer must have a clear understanding of the specific types of questions that each method is capable of addressing. In reality, this stage occurs in parallel with the first, as the explainer must keep the association between explanatory methods and the questions they address in mind in order to assess the feasibility of answering the explainee’s target questions. We provide additional details and specific recommendations about which methods are most appropriate in different contexts in our discussion section.
After a method has been identified, the explainer generates the explanation. The explainer must have a sufficiently deep understanding of the method to accurately assess whether the resulting explanation, when interpreted correctly, addresses the target question. There are a variety of ways that the two may become misaligned. For example, many explanation-generating methods rely on sampling rather than computing exact values such that if the estimator is biased, then the resulting explanations may not address the target question unless certain other assumptions are met. As with the selection step, these considerations should be kept in mind when specifying target questions during the first step. Functionally, this means that the explainer may need to communicate and assess the validity of any additional assumptions that are required at this step while specifying the target questions.
The final step is for the explainer to provide the explanations to the explainee, and – when possible – engage in a dialog with the explainee about the explanations. This interactive approach to providing explanations is aligned with Miller (2018), who suggests that conversation between explainee and explainer is necessary because XAI is fundamentally a human-agent interaction problem.
Generating correct explanations requires the explainer to have a deep understanding of the available methods. In particular, the explainer must have a mental catalog of the relevant methods, the types of explanatory questions that each addresses, any required assumptions, and other relevant considerations. Causality is central to this effort because the types of questions that typically motivate the desire for model explanations span the rungs of the ladder of causality.