A History of Interpretable Machine Learning

September 5, 2024 • David Bau

This course arrives in late 2024, as interest in interpretable machine learning is growing. Interest is being driven by remarkable gains in capabilities in large-scale generative AI, especially large transformer language models such as ChatGPT and text-to-image synthesis diffusion models such as DallE. The question of the moment is "how do these huge generative neural networks work?"

Yet the puzzle of interpretability is inherent to all machine learning, and the problem has challanged computer scientists for many years. In this chapter, we take a moment to trace the hundred years of intellectual debate about interpretable machine learning that have preceding the current generative moment in AI.

From Lovelace to Turing to Minksy and Wolpert

Deeply ingrained in our conception of computer science is the belief that programs reflect the insights and understanding of the programmer. This notion was famously articulated by Ada Lovelace who created programs for the Babbage analytical engine.

The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.
- Ada Lovelace

Yet Alan Turing disagreed that computer science would forever be tethered to the boundaries of human knowledge; he argued that it would be possible to teach machines how to learn automatically. In his 1950 essay on machine intelligence, Turing pointed out that machine larning would immediately lead to the challenge of human interpretability:

An important feature of a learning machine is that its teacher will often be very largely ignorant of quite what is going on inside, although he may still be able to some extent to predict his pupil's behavior.
- Alan Turing (1950, p. 458)

While Turing was fascinated by the inherent human ignorance about the internals of machine-learned systems, that blindness was one reason for the slow uptake of machine learning methods in early decades of computer science.

No Free Lunch and the Role of the Programmer

For those who might believe that removing human insight from machine learning is a good thing, it is important for machine learning scientists to absorb the lessons of the No-Free-Lunch (NFL) theorem by Wolpert and Macready:

All learning methods perform equally well (and equally badly) when averaged over all possible problems.
- David Wolpert

In other words, learning algorithms only succeed because they are tuned for particular kinds of problems; those same learning methods are guaranteed to fail on just as many other problems.

A visual constructive proof of the NFL theorem can be found at an old anonymous blog entry, here.

The NFL means that machine learning provides no refuge for the responsibility of the human programmer. There is no such thing as a system that can "learn everything." Humans are stuck making decisions about what kinds of biases we want our ML systems to have, and what kinds of problems we need our ML systems to solve.

Human Responsibility is Unavoidable

The irreducible nature of the responsibility of the programmer famously led Marvin Minksy to steer AI funding in the 1970s away from machine learning methods such as Rosenblatt's neural network methods, towards "good old fashioined" symbolic AI exemplified by the LISP community. Minsky believed it was important to put human intention, logic, and interpretability front and center of the development of artificial intelligence.

Minsky is famous for the following parable about a his advice to a young Jerry Sussman:

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.
"What are you doing?", asked Minsky.
"I am training a randomly wired neural net to play Tic-Tac-Toe" Sussman replied.
"Why is the net wired randomly?", asked Minsky.
"I do not want it to have any preconceptions of how to play”, Sussman said.
Minsky then shut his eyes.
"Why do you close your eyes?", Sussman asked his teacher.
"So that the room will be empty."
At that moment, Sussman was enlightened.

Yet in the end, it proved to be Minsky's early AI pioneers, not the machine learning programmers, who had closed their eyes to the profound challenge of interpretability. By the 1980s, machine learning was becoming so useful that it could not be avoided.

Rather than retaining human insight by avoiding machine learning, computer scientists would need to confront the unsolved puzzle of how to understand self-programmed machine-learned systems. Beginning around 1985, under the leadership of statisticians and pychologists, they embarked on this journey with two very different approaches.

Approach 1: Intrinsically Interpretable Models

The community's first answer to ML interpretability was to make the problem easier by setting aside ML algorithms that were "uninterpretable," like neural networks, and instead focus on "interpretable machine learning" methods that were designed to be understandable to humans.

As an example, we will examine two of the main approaches to interpretable ML that date from the 1980s that are still widely used today: Decision Trees and Generalized Additive Models. Both these ideas sprang from clever learning algorithms devised by the statistician Leo Breiman, who is behind many of the foundational ideas of machine learning.

Decision Trees

A decision tree is a classification algorithm consisting of a series of decision steps. Typically, at each step, a single variable is examined, and a choice between next steps is taken depending on a threshold value for the variable. Decision trees are highly interpretable and are often simple enough that they can be executed "by hand" by people. In the real world, decision trees are still often used exactly in this way: for example, here is the standard high-speed emergency room decision tree called START. It is designed to be simple enough for EMTs to memorize and apply in seconds to triage the most serious emergency cases.

Classic approach: CART trees

The most popular decision tree learning algorithm is called CART (Brieman, et al, 1984), and it works by constructing a decision tree with a simple greedy heuristic:

For each input variable, a threshold is found that maximizes the "purity" of the separated training samples, where purity is quantified by the Gini value (in which \( p_k \) denotes the portion of a new partition that is occupied by class \( k \)). \[ 1 - \sum_{k} p_{k}^2 \]
The tree is built by adding a choice and threshold for the variable that achieves the highest highest Gini index at each step, with each step subdividing the choices until the accuracy is as high as you want

The resulting decision tree boundaries look like rectilinear planes. You can see the effect concretely in this colab notebook, by Aurélien Geron, which illustrates how to use Scikit-learn to run the standard CART decision tree algorithm.

Current Research: Optimal Decision Trees

CART decision trees are not optimal; there can be more economical sets of choices that lead to higher accuracy classifiers than the choices found by this greedy approach. Optimal decision tree algorithms remain an active area of research today. For example, in 2019, Hu, Rudin and Seltzer published the Optimal Sparse Decision Trees algorithm (OSDT github here), and followup work by Xin, et al relaxes the search to allow users to choose the most interpretable tree out of a set of near-optimal choices. (Github here).

Decision trees are great for human-executable decisionmaking, but what if we just need the decisions to be human-understandable without necessarily being human-executable? Can we do better?

Here I'll briefly mention a second interpretable ML approach that is popular in industry.

Generalized Additive Models

Generalized Additive Models (Hastie, et al 1986) are a type of model that is based on the fact that any multivariate function can be decomposed into a sum of univariate functions of the input variables \( x_i \): \[ y = f_1(x_1) + f_2(x_2) + f_3(x_3) + \cdots \]

(This fact is known as the Kolmogorov Arnold representation theorem.

A sum of univariate functions like this can be very interpretable. For example, suppose you need to predict somebody's income based on the year, their age, and their level of education. Rather than using a decision tree (which is based on discontinous decisions) or a neural network (which is hard to understand), you could learn univariate functions \( f_{year}(year) \), \( f_{age}(age) \) and \( f_{education}(education) \), that, when added up, make a prediction for \( y \):

As you can see, this kind of model is very interpretable: reading individual function can directly give you the intuition about how changing one variable will affect the output.

Classic Approach: Backfitting Splines

GAMs were devised by Hastie and Tibshirani in 1986, building on a 1985 algorittm by Breiman and Friedman to compute such models.

The 1985 Brieman paper describes the Backfitting algorithm for finding an additive model by beginning with a naive set of \( f_i \), and then iteratively improving them by distributing the residual error to each function.

Typically the individual functions are constrained to be smooth splines (and sometimes other constraints are imposed, such as convexity or monotonicitiy of the individual \( f_i \) components). This both helps avoid overfitting, helps ensure that the resulting model is more interpretable.

This colab notebook, by the authors of pyGAM demonstrates the pyGAM package, which applies backfitting and illustrates and the concepts of GAMs very nicely.

Current Research: Neural Additive Models and KANs

Recently, Agrawal, et al (on a team with Rich Caurana and Geoff Hinton) proposed Neural Additive Models which are a type of GAM that uses neural networks instead of spline-smoothed functions for the \( f_i \).

This year, Liu, et al (on a team with Max Tegmark) proposed Kolmogorov-Arnold Networks (KANs), that take the opposite tack: instead of using neural networks to create GAMs, they use spline-based GAMs as a drop-in for neural network MLPs, claiming better interpretability and good scaling.

In lower-dimensional settings where input variables have understandable meanings, both decision trees and GAMs are still used in practice to achieve debuggable, interpretable machine learned models. It is worth understanding both the classic approaches and modern research proposals around both these approaches, because they exemplify what it means for a learned system to be interpretable by humans.

However, as data volumes grew in the late 1990s with the growth of the internet and the digitization of society, it became clear that these early interpretable machine learning approaches did not scale as well as some more opaque methods. For example, in natural language modeling, advocates of decision trees (see Potamianos) were reporting negative results compared to ngram models.

Approach 2: Post-Hoc Interpretation

Opaque ML models require a different approach to interpretability. When Rosenblatt, Hinton, and Williams reopened the door to deep neural networks in their seminal backpropagation paper in 1986, they did not frame their work as "machine learning," because their learning algorithm was not designed to be understandable by programmers. Rather, they were hunting for a plausible computational model of the human brain, and they asked the scientific question of whether this model could learn reasonable algorithmic solutions from training data alone.

For example, they examined the weights of a model trained to classify symmetric versus nonsymmetric rows of pixels, and they concluded it was successful because it learned an understandable, correct algorithm that they could examine and comprehend after-the-fact.

The field of interpretable machine learning underwent an earthquake in 2012 when Krizhevsky et al published the AlexNet neural network that smashed the competitors in the ImageNet object classification challenge. At that moment, the industrial-strength machine learning community collided with the biologically-motivated neural network scientists, and they came to the realization that big performance gains could be achieved if we were willing to use models like deep neural networks that have many layers of internal variables that are not designed to be interperetable.

Yet in the move from intrinsically interpretable machine learning do deep neural networks, the type of post-hoc interpretabilility methods of the psychologists became more urgent. Yet this type of exploratory, open-ended approach to understanding computer systems was foreign to the way computer scientists typically approached problems. How post-hoc interpretability began to develop brings us to the second part of the history, which we will cover in the next chapter of the handbook.