Interpretability | Glossary

Interpretability is a valuable tool for understanding the underlying relationships of supervised learning models. It allows us to gain insight into how the model works and what factors it considers when making predictions.

Table of Contents

What is interpretability?

Interpretability is a concept that is gaining traction in the field of Artificial Intelligence (AI). It refers to the degree to which human beings can understand the cause of a decision made by an AI system. This definition, proposed by Miller (2017), is closely related to explainability, which was popularized in the conceptualization of Explainable Artificial Intelligence (XAI) systems (Turek, 2016).

Kim et al. (2016) proposed another definition for interpretability: the degree to which a machine’s output can be consistently predicted. This definition alludes to the ability of an AI system to explain why it made a particular decision or prediction for a given input or range of inputs. In other words, interpretability is about understanding how an AI system works and why it makes certain decisions. Building trust in AI systems and ensuring their responsible use is essential.

What makes a model interpretable?

When assessing the interpretability of a model, there are several factors to consider. Firstly, the model’s complexity should be considered; a linear regression with five features is much more interpretable than one with 100 features. Secondly, it’s important to look at how transparent the logic behind the model is; for example, a decision tree used in marketing decisions in investment management should have clear and transparent logic that all stakeholders can easily understand. Finally, resources such as NIST’s reference white paper on Explainable AI can provide further insight into different types of interpretability and help guide decision-makers in choosing an appropriate model for their needs.

Why is interpretability important?

Interpretability of models is becoming increasingly important in the modern world, especially regarding decisions that have wide-ranging implications for individuals. An example of why interpretability is important is found in a visual model. Oncologists use this model to assess the presence of cancerous tumors in a patient’s scan report to determine the next course of treatment. These decisions have wide-ranging physical and psychological implications for the patient who wants clarification on the doctor’s diagnosis.

How is interpretability limited?

Interpretability has its limitations. Supervised learning is correlational and does not guarantee true cause-and-effect relationships between variables. This can lead to spurious correlations between unrelated attributes simply correlated by chance.

For example, one could argue that there is a strong correlation between the number of films Nicholas Cage has appeared in each year and the number of people who drowned by falling into a pool in the United States because the trends on appearances and drownings seem to increase and decrease along the same timeline.

While this may seem like an absurd connection, it serves as an important reminder of the limitations of interpretability and why we must always be cautious when relying on it to understand our models.

In addition to spurious correlations, interpretability also fails to capture complex nonlinear relationships between variables which can be difficult or impossible to explain with simple linear models. This means that even if we have a good understanding of how our model works, there may still be hidden patterns or relationships that we cannot detect using interpretability techniques alone. As such, interpretability should not be seen as a replacement for more rigorous testing.

Goals Of Interpretability

There are five broad goals around interpretability suggested by Robert Kirk, Tomáš Gavenčiak, and Stanislav Böhm. Each presents an idea to consider that is not specific to certain domains. There’s an overlap between the goals. However, all five help evaluate models of interpretability.

The goals are:

Predicting Behaviour- This is about predicting a model’s Behaviour in novel scenarios.

To check if this goal is achieved, you can test if users are able to build inputs that generate a particular action instead of predicting the input based on this action. However, without access to the model, it is impossible to do this (otherwise, one could simply optimize for this Behaviour).

Assurance of properties– This goal relates to the Predicting Behaviour goal, but the goal gives broader assurances described as properties rather than offering specific behavioural predictions.

The most robust version would be formal verification. However, even if we don’t have exact proof, visualizations or explanations of Behaviour that implies the model has a certain property may be acceptable.

Persuasion of properties– This is similar to the Assurance of properties goal. However, the difference in this model is that the goal focuses on persuading a person rather than a truthful representation of the model’s properties.

Experts sometimes rely on interpretability methods to show a non-specialist, like an auditor or a company manager, that the model has specific characteristics. However, this approach can be potentially dangerous since it creates unrelated incentives to display the model’s behavior and properties accurately. For instance, this model can be used to create fascinating but incorrect visualizations. It is feasible to utilize data not based on the model parameters.

Improving model performance- Many methods are designed to give us an understanding which we can use to improve the model’s performance. The drawbacks of this model are its current efficiency and an unequal set of training data, which can be addressed to optimize the model’s output.

As an example, saliency maps (such as Visualizing and Understanding Atari Agents and Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization) can show us what

Debugging model- Similar to the Improving model performance, especially in deep learning, the Debugging model focuses on finding and fixing incorrect model behaviour. This model helps clarify the difference between Improving model performance. For example, we might say a model is implemented and executed correctly if its implementation is fully faithful to the conception in your head or some pseudocode, mathematical formula or design document.

It must be noted that many interpretability techniques may not solely focus on one of the objectives listed above but instead create a broad comprehension or Feature Visualization techniques, such as the Activation Atlas, are examples of Generic Understanding methods. For example, the Activation Atlas paper outlines a method for Generic Understanding that can be used to produce adversarial examples, which is Predicting Behaviour based on overlapping natural images.

FAQ

How is interpretability different from model explainability?

Interpretability concerns how accurately a machine learning model can associate a cause with an effect. Explainability has to do with the ability of the parameters, often hidden in Deep Nets, to justify the results.

What is the trade-off between accuracy and interpretability?

A study found that even though opaque methods tend to yield higher accuracies than transparent ones, one can still get relatively accurate predictions by choosing an interpretable model – with only a minimal cost in terms of performance.