AI models need to be ‘interpretable’ rather than just ‘explainable’

Two types of black-box AI

Likemany things involving artificial intelligence, there’s a bit of confusion surrounding the black-box problem. Rudin differentiates between two types of black-box AI systems: functions that are too complicated for any human to comprehend, and functions that are proprietary.

The first kind of black-box AI includesdeep neural networks, the architecture used in deep learning algorithms. DNNs are composed of layers upon layers of interconnected variables that become tuned as the network is trained on numerous examples. As neural networks grow larger and larger, it becomes virtually impossible to trace how their millions (and sometimes, billions) of parameters combine to make decisions. Even when AI engineers have access to those parameters, they won’t be able to precisely deconstruct the decisions of the neural network.

The second type of black-box AI, the proprietary algorithms, is a reference to companies who hide the details of their AI systems for various reasons, such as intellectual property or preventing bad actors from gaming the system. In this case, the persons who created the AI system might have knowledge of its inner logic, but the people who use them don’t. We interact will all kinds of black-box AI systems every day, including Google Search’s ranking algorithm, Amazon’s recommendation system, Facebook’s Newsfeed, and more. But the more dangerous ones are those that are being used to hand out prison sentences, determine credit scores, and make treatment decisions in hospitals.

While a large part of Rudin’s paper addresses the dangers of neural network black boxes, she also discusses the implications of walled-garden systems that keep their details to themselves.

Explainability vs interpretability

We need to get one more thing out of the way before we dive deeper into the discussion. Most mainstream media outlets covering AI research use the terms “explainable AI” and “interpretable AI” interchangeably. But there’s a fundamental difference between the two.

Interpretable AI are algorithms that gives a clear explanation of their decision-making processes. Many machine learning algorithms are interpretable. For instance, decision trees and linear regression models describe associate coefficients to each of the features of their input data. You can clearly trace the path that your input data takes when it goes through the AI model.

In contrast, explainable AI are tools that apply to algorithms that don’t provide a clear explanation of their decisions. Researchers, developers, and users rely on these auxiliary tools and techniques to make sense of the logic used in black-box AI models. For instance, in deep learning-based image classifiers, researchers develop models that create saliency maps that highlight the pixels in the input image that contributed to the AI’s output.

But the explanation model does not necessarily provide a breakdown of the inner logic of the AI algorithm it investigates. “Explanation here refers to an understanding of how a model works, as opposed to an explanation of how the world works,” Rudin writes in her paper.

“Recent work on the explainability of black boxes—rather than the interpretability of models—contains and perpetuates critical misconceptions that have generally gone unnoticed, but that can have a lasting negative impact on the widespread use of ML models in society,” Rudin warns.

The myth of AI’s accuracy-interpretability tradeoff

A popular belief in the AI community is that there’s a tradeoff between accuracy and interpretability: At the expense of being uninterpretable, black-box AI systems such as deep neural networks provide flexibility and accuracy that other types of machine learning algorithms lack.

But this really depends on the problem domain, the kind of data available, and the desired results. “When considering problems that have structured data with meaningful features, there is often no significant difference in performance between more complex classifiers and much simpler classifiers after preprocessing,” Rudin notes.

In her paper, Rudin also observes that in some cases, the interpretability provided by a simpler machine learning model is more valuable than the marginal performance gained from applying a black-box AI system. “In those cases, the accuracy/interpretability trade-off is reversed—more interpretability leads to better overall accuracy, not worse,” she writes.

This is especially true in critical domains such as medicine, where physicians need to know the logic behind an AI-made decision and apply their own insights and opinion to it.

Part of the problem stems from a culture that has pervaded the AI community in the wake of the rise in popularity of deep learning. Many researchers are gravitating towardthe “bigger is better” approach, in which there’s hope that bigger deep learning models with more layers and parameters and trained on larger data sets will result in breakthroughs in artificial intelligence. This has led to the vast application of deep learning in domains where interpretable AI techniques can provide equally accurate results.

“The belief that there is always a trade-off between accuracy and interpretability has led many researchers to forgo the attempt to produce an interpretable model. This problem is compounded by the fact that researchers are now trained in deep learning, but not in interpretable ML,” Rudin writes.

The problem with AI explainability techniques

Explainability methods usually measure how changes to an AI system’s inputs modify its output without peeking inside it. For instance, in the case of an image classifier, researchers make small changes to pixel values and observe how those changes affect the class the AI detects. Based on these observations, they provide a heat map that shows which pixels (or features, in machine learning jargon) are more relevant to the AI.

In her paper, Rudin argues that explainability methods do not necessarily provide insights into how the black-box AI model works.

“Explanation models do not always attempt to mimic the calculations made by the original model,” Rudin writes. “Rather than producing explanations that are faithful to the original model, they show trends in how predictions are related to the features.”

This can lead to erroneous conclusions about black-box AI systems and explainability methods. For instance, an investigation into a black-box recidivism AI system found that the software was racially biased. But the method the researchers used to explain the AI’s decisions was a linear model that depended on race while the recidivism system in question was a complicated, nonlinear AI system. While the investigation did shed light on the need for transparency in AI systems that make critical decisions, it did not provide an accurate explanation of how the targeted system worked. For all we know, there might have been many more problematic correlations in the AI that the investigation did not unearth.

The problems of AI explanation techniques are also visible in saliency maps forcomputer vision systems. Most of these techniques will highlight which parts of an image-led an image classifier to output a label. But the saliency map for one label does not provide enough information about how the AI system is using the data.

For instance, in the following image, the saliency map provided by for the “Siberian husky” and “transverse flute” are oddly similar. This shows that while the classifier is focusing on the right part for the husky photo, there’s no evidence that it is detecting the right features.

Rudin warns that this kind of practice can mislead users into thinking the explanation is useful. “Poor explanations can make it very hard to troubleshoot a black box,” she writes.

Finally, Rudin notes that not only explainability techniques don’t solve the problem of investigating the overly complicated black-box AI, but further exacerbate the problem by giving us two systems to troubleshoot: The original AI model and the explainability tool.

Corporate greed and black-box AI

There are many cases where companies hide the details of their AI systems for commercial reasons, such as keeping the edge over their competitors. But the problem with this business model is that while it maximizes the profit of the company developing the AI system, it does nothing to minimize the harm and damage it does to the end-user, such as a prisoner getting an excessively long sentence or a needy person being refused their loan.

“There is a conflict of responsibility in the use of black-box models for high-stakes decisions: the companies that profit from these models are not necessarily responsible for the quality of individual predictions,” Rudin writes.

This trend is especially worrying in areas such as banking, health care, and criminal justice. There’s already a body of work and research onalgorithmic biasand AI systems that discriminate against certain demographics. But when the algorithms are kept behind walled gardens and only accessible to their developers, there’s little opportunity for an impartial investigation into their inner-workings and most researchers must rely on flawed black-box explanation methods that map inputs to outputs.

Another argument that tech companies often make to defend black-box AI systems is to prevent malicious actors from reverse-engineering and gaming their algorithms. Rudin also refutes this argument. “The reason a system may be gamed is because it most likely was not designed properly in the first place,” she writes, adding that transparency could in fact help improve a system by revealing its flaws.

This is an approach that is being embraced in other fields of software engineering. An example is the security, where open source and transparency are increasingly replacing the “security by obscurity” culture where companies hope that hiding the details of their software will keep them secure.

There’s no reason for the AI community not to support the same approach.

Encouraging interpretable AI development

While black-box AI systems often cost a fortune to develop and train, they are usually more accessible than the domain expertise and talent required to develop interpretable AI. This is why many companies opt to use deep learning systems that are trained on large datasets instead of putting effort into creating interpretable systems.

But, Rudin notes, “for high-stakes decisions, analyst time and computational time are less expensive than the cost of having a flawed or overly complicated model.” Companies that have experienced the backlash of their black-box AI systems making unexpected, disastrous decisions can attest to that.

To encourage the development of more interpretable AI systems, Rudin proposes regulation that prevents companies from deploying black-box models where an interpretable model can solve the same problem.

“The onus would then fall on organizations to produce black-box models only when no transparent model exists for the same task,” Rudin writes.

An alternative is to organizations that introduce black-box models to report the accuracy of interpretable modeling methods. “In that case, one could more easily determine whether the accuracy/interpretability trade-off claimed by the organization is worthwhile,” Rudin writes.

In her paper, Rudin lays out technical details on some of the pathways that can improve the accuracy and development of interpretable AI models in different domains.

A very interesting example is deep learning systems that can provide explanations of their decisions in terms of high-level features instead of pixel-by-pixel heat maps.

“If this commentary can shift the focus even slightly from the basic assumption underlying most work in explainable ML—which is that a black box is necessary for accurate predictions—we will have considered this document a success,” she writes. “If this document can encourage policymakers not to accept black box models without significant attempts at interpretable (rather than explainable) models, that would be even better.”

This article was originally published by Ben Dickson onTechTalks, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech and what we need to look out for. You can read the original articlehere.

Story byBen Dickson

Ben Dickson is the founder of TechTalks. He writes regularly about business, technology and politics. Follow him on Twitter and Facebook(show all)Ben Dickson is the founder ofTechTalks. He writes regularly about business, technology and politics. Follow him onTwitterandFacebook

Get the most important tech news in your inbox each week.

AI models need to be ‘interpretable’ rather than just ‘explainable’#

Two types of black-box AI#

Explainability vs interpretability#

The myth of AI’s accuracy-interpretability tradeoff#

The problem with AI explainability techniques#

Corporate greed and black-box AI#

Encouraging interpretable AI development#

Story byBen Dickson#

Get the TNW newsletter#

Also tagged with#