Using explanations for finding bias in black-box models

There is no doubt, that machine learning (ML) models are being used for solving several business and even social problems. Every year, ML algorithms are getting more accurate, more innovative and consequently, more applicable to a wider range of applications. From detecting cancer   to banking and self-driving cars,  the list of ML applications is never ending.

However, as the predictive accuracy of ML models is getting better, the explainability of such models is seemingly getting weaker. Their intricate and obscure inner structure forces us more often than not to treat them as “black-boxes”, that is, getting their predictions in a no-questions-asked policy. Common “black-boxes” are Artificial Neural Networks (ANNs), ensemble methods. However seemingly interpretable models can be rendered unexplainable, like Decision Trees for instance when they have a big depth.

Neural Networks, Ensemble methods and most of the recent ML models behave like black-boxes

The need to shed a light into the obscurity of the “black-box” models is evident: GDPR’s article 15 and 22 (2018), OECD AI principles (2019) and Senate’s of the USA Algorithmic Accountability Act bill (2019) are some examples which indicate that ML Interpretability, along with ML Accountability and Fairness, have already (or should) become integral characteristics for any application that makes automated decisions.

Since many organizations will be obliged to provide explanations about the decisions of their automated predictive models, there will be a serious need for third-party organizations to  perform the interpretability tasks and audit those models on their behalf. This provides an additional level of integrity and objectivity to the whole audit process, as the explanations are provided by an external factor.  Moreover, not every organization (especially startups) has the resources to deal with interpretability issues, rendering third-party auditors necessary.

However, in this manner intellectual property issues arise, since organizations will not want to disclose any information about the details of their model. Therefore, from the wide range of interpretability methods, the model-agnostic approaches  (i.e. methods that are oblivious of the model’s details) are deemed to be appropriate for this purpose.

Besides explaining the predictions of a black-box model, interpretability can also provide us with insight about erroneous behavior of our models, which may be caused by undesired patterns in our data.   We will examine an example, where interpretability helps us identify  gender bias in our data, using a model-agnostic method, which utilizes surrogate models and Shapley values.

We use the “Default of Credit Card Clients Dataset”, which contains information (demographic factors, credit data, history of payment, and bill statements) about 30,000 credit card clients in Taiwan  from April 2005 to September 2005. The target of the models in our examples is to identify the defaulters (i.e. bank customers, who will not pay the next payment of their credit card).

Gender biased data

The existence of biased datasets is not uncommon. It can be caused from false preprocessing or even from collecting from a poor data source, creating skewed and tainted samples.  Examining the reasons behind a model’s prediction, may inform us about possible bias in the data.


Defaulters based on Gender: The red and blue bars represent the original distributions of female and male customers, while the purple one depicts the new constructed biased distribution of male customers.

In the “Default of Credit Card Clients Dataset”, 43% of the defaulters are male and 57% are female. This does not consist in a biased dataset, since the non-defaulters have a similar distribution (39% and 61% respectively).

We distort the dataset, by picking at random 957 male defaulters (i.e. one third of the overall male defaulters) and we alter their label. This creates a new biased dataset with 34% / 66%  male/female defaulters and 41% / 59% male/female non-defaulters.  We then take the predictions of a model trained on this biased dataset and to which we are indifferent about its structure. We then train a surrogate XGBoost model, from which we extract the Shapley values that help us explain the predictions of the original model. More precisely, we use the Shapley values to pinpoint the most important features, by sorting by absolute value, and then we use natural language to describe them in the explanations (see examples below).

First, we examine a male customer (ID: 802) for whom the model predicted falsely that he will not default (i.e. false negative prediction) and then a female customer (ID: 319)  for whom the model falsely predicted that she will default (i.e. false positive).

These two customers are very similar as the table below indicates: they both delayed the  payments of September, August and July, and paid the payments of June, May and April.

ID SEX PAY_0 PAY_2 PAY_3 PAY_4_5_6
802 male 4 month delay 3 month delay 2 month delay use of revolving credit
312 female 3 month delay 2 month delay 2 month delay use of revolving credit

Examining the explanation of the male customer, we can see that the 4-month delay of the last payment (September, 2005), had a negative impact of 28%, meaning that it contributed towards predicting that he will default. However, the gender and repayment status of April and May, as well as the amount of bill statement for September and May, had a positive impact, and resulted in classifying falsely the customer to the non-defaulters.

Reason Codes explanation of the male customer. His gender contributed positively towards predicting ‘No default’

For the female customer, the 3-month delay also contributed negatively, but in a greater percentage compared to the male customer (37%). The gender also had a negative impact with 22%. Moreover, the model also considered important, the 2-month delay for the payment of July, whereas in the male customer, who had also the same delay, this was not deemed as important.

Global explanations also ascertain the gender bias, since the gender feature is the second overall most  important feature for the model.

Top 10 most important features of the model

We repeat the experiments by removing the gender feature from the dataset. Now, the male customer is correctly predicted as a defaulter and  the explanations make a bit  more sense: the delay of the last payment (September) has a great impact of 49%, as well as the delays of the other two payments.

However, the model still falsely predicted that the female customer will default. Again, the delay of the last payment is the most important factor. We could argue that the model is still more harsh on this customer: although she paid a small amount for the payment of May (863 NT dollars), the model deemed it with a negative factor of 8%, whereas in the male case, the zero payment for April had only a negative impact of 4%. This should alarm us to examine for an unrepresented sample of male defaulters in our dataset and stimulate us to fix our data.


It is evident, that the explanations helped us identify bias in the data, as well as to pinpoint unintended decision patterns of our black-box model. Moreover, even when the gender feature was removed from the training data,   the explanations assisted us in discovering bias proxies, meaning encoded (gender) bias across other features. This could lead us to the decision to acknowledge the bias in our data and motivates as to get a better sample of defaulters.



If the dataset contains real people, it is important to ensure that the model does discriminate against one group over others. Explanations facilitate us detect bias and motivate us to fix our data.

On Algorithm Accountability: Can we control what we can’t exactly measure?

During the last months, I spent (quality) time with people of diverse backgrounds and roles; from executives in the banking sector, founders of health or tech startups and translators to name a few, discussing the impact of technology and algorithmic decision making on their daily work. Not surprisingly the gravity of the deducted decisions as they perceive them (or cognitive insights in a broader sense), is growing very fast.

Interestingly also, most of the people I talked to, had an experience of a slight or serious bias in the deducted insight, that essentially they could bypass using their own intuition and experience. Thus, it makes perfect sense that they are all concerned on how algorithms work and how they can control them in order to ensure they form decisions that can be trusted.

And so, they formed a nice question for me to think in my spare time (although such a thing doesn’t exist with a kid, a dog, a cat and another kid in the making). Simply put this question is: “How can I be in control of this thing that instructs me what to do?”

Intuitively I’d say that this not an easy task; and I firmly believe that at least for now, Tom DeMarco’s famous quote “You can’t control what you can’t measure” is not applicable in its entirety.

You see, an algorithm, which typically can be measured and controlled in certain extent, is not making decisions for itself, but it operates within an organisational context which affects its creation. That context then, is not something that can be quantitatively assessed in a straightforward way.

Nevertheless, we should strive in order to be able to control both the algorithms and the organisations that create them. My view is that only trying to do that from both perspectives will help us getting to a significant level of accountability in case things are not working as expected.

In order now to simplify things we may say that an algorithm is essentially a piece of software that:

  1. Solves a business problem set by the organisation that creates it (the algorithm),
  2. Receives data as input that have been selected and most likely pre-processed either by a human or an automated process,
  3. Utilises a model (e.g. SVM, deep-learning, RF) which processes the data and ultimately makes a decision or suggests an answer/solution to the question/problem set by the organisation.

Subsequently then, what we need is to get insights for every aspect mentioned above.

For starters, the organisation creating the algorithm needs to cater and design for accountability. In other words they should define when and how an algorithm should be guided (or restrained) in the risk of crucial or expensive errors, or any form of bias (discrimination, unfair denials, or censorship). Defining such processes, they should be guided by principles like responsibility/human involvement, explainability (known also as interpretability although they differ), accuracy, auditability and fairness.

Regarding the input data we primarily need to know about their quality meaning their accuracy, completeness, uncertainty, as well as their timeliness and representativeness. It is also important to know how these data are being handled; what are their definitions, and how they are being collected, vetted and edited (manually or automated).

As for the model itself we would like to know what are its parameters, the features or variables used and whether they are weighted or not. We must also be in a position to evaluate its performance, select the appropriate metrics for this purpose and ensure we operationalise and interpret them appropriately. Last, but not least, we should be able to assess its inferencing, that is how accurate or error prone the model is. An important element here is the model creator’s ability to benchmark its results against standard datasets and standard measures of accuracy.

So, we may say that controlling an algorithm (to a certain extent) is not an impossible task but still requires some level of maturity for the organisation that creates (or utilises) the algorithm.

However, someone has to create the compelling reason for an organisation to cater towards accountability. And this someone, it’s us, either as citizens, clients, voters, news consumers, professionals or other roles whose lives are being affected by the decisions algorithms make on behalf of us.

Sources of inspiration for this blogpost were among others the following:

Beyond Automation, Thomas H. Davenport and Julia Kirby, Harvard Business Review, June 2015

Accountability in Algorithmic Decision Making, Nicholas Diakopoulos, Communications of the ACM | February 2016 | Vol. 59 | №2

The Black Box Society, The Secret Algorithms That Control Money and Information, Frank Pasquale, Harvard University Press, January 2015,

The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems. Ethically Aligned Design: A Vision For Prioritizing Wellbeing With Artificial Intelligence And Autonomous Systems, Version 1. IEEE, 2016.