How to make sure your model is fair, accountable, and transparent (FAccT), Part 2

Nick Ball
MLOps.community
Published in
9 min readNov 16, 2020

--

Part 2 of 2

In the first part, we introduced the topic, and covered fairness and accountability. In this second part, we cover transparency, tools, conclusions, and next steps.

Transparency

Transparency means that the decision making process of the model is understandable by humans. This doesn’t just mean the people who built the model, but anyone to whom such an understanding might be necessary. Full transparency of course means different things to different people, but a good starting point is to realize that most people see things causally. If we are wanting to know “why” a model did something, then there needs to be a clearly identified “reason”, or set of reasons, that can be proven to be the cause of the outputs. Various measures may be acceptable.

Achieving good transparency can be helped by having an adequate and agreed upon statement of the business problem to be solved, and knowing who is responsible for ensuring the solution is transparent (ideally a specialist in compliance, not the data scientist).

In some cases, a model might be part of a system where no explanation is needed, such as an internal company process, or something that has no ethical implications in the model outputs (say, turn down the office heating when no one is there). In these cases, the model could be a true black box with transparency only needed in the I/O and wider pipeline. In other cases, such as a medical diagnosis or prisoner parole, explanations should obviously be more accessible.

Some considerations for transparency include:

  • Do you need machine learning?
  • Do you need nonlinear ML, and in particular, do you need deep learning?
  • Is a model-agnostic explanation acceptable?
  • Do you need human-interpretable features?
  • What are the regulatory requirements?
  • Do you need reason codes?
  • Do you need to visualize your model?
  • Is the setup that you have thus come up with in fact OK?

Do you need machine learning?

A classical or parametric model may be able to directly describe the data and fully solve the business problem without needing to use machine learning. In such cases, models are often fully interpretable as-is, and do not need extra work beyond explaining to relevant audiences in terms they can understand what the model represents.

Do you need nonlinear ML, and in particular, do you need deep learning?

If a linear model is adequate, use it. If not, then methods such as well-tuned gradient boosted decision trees can solve most business problems, and do so in terms of features that remain human-readable throughout. These can translate directly into measures such as variable importance (not perfect but often useful), partial dependence plots, and, if the trees are monotonic, into reason codes. If the trees are in an ensemble, the model size may have to be limited.

The interpretation of deep learning, however, remains a topic of research. Various methods such as integrated gradients exist, but anything that requires delving into the internal structure of the model could be tricky. Your TensorFlow keras.layers. … , or equivalent in PyTorch, etc., may not be so easy to dissect as it was to create and train.

Is a model-agnostic explanation acceptable?

Deep learning is, however, well-known to be the best solution to some problems (images, speech, etc.), so is a setup that leaves the model unprobed but looks at the input and output data OK in your particular situation? If it is, then model-agnostic methods such as LIME, or the more recent SHAP and its visualizations, might be acceptable. They assign a significance to the inputs for each data point, which can in turn, by either ranking or direct quoting of the values, be said to be the reasons why a model made a given decision. This can incorporate deep learning by running the explainer on the inputs and outputs of the deep learning model.

Do you need human-interpretable features?

A set of significances from SHAP might be acceptable, but it means in turn that the features on which they are based must be human-readable if one needs the causal story. This rules out some otherwise productive dimension reduction techniques like SVD or PCA, and probably even things like polynomial expansions. So your feature engineering step needs to be cognizant of creating features that are sufficient to establish that your model is fair and transparent. This is more difficult if the data are unstructured, like images.

What are the regulatory requirements?

These are subject to the same issue as mentioned in the fairness section above about data rules being both industry-dependent, and international. A well-known example is in finance in the US, where the denial of credit requires the reasons to be given. If a model can’t produce these reasons, then it can’t be used, which has limited which ML models can be used for this application.

The solution? Unfortunately, the same as above — there is likely no easy solution to this other than the work of being aware of which jurisdictions your product, models, and data will be subject to, and as with everything else, clearly documenting it.

Do you need reason codes?

Reason codes are directly saying that your model made its decision or prediction because of some specific condition. There could be a single reason, or there might be several, which could be weighted or ranked. Ideally the reason will want to be with regard to a given data point, i.e., a local explanation, as opposed to the overall significance in the model of each variable over all the data points it was trained on, which is a global explanation.

There is another subtlety, however, which is that while “reasons” can simply be in direct proportion to how significant the variables are in the model for that data point, you may also need them to have directionality, or thresholds. As in, your loan application is less favorable because your income is less than $50,000 per year. If the model has ascribed significance to income, but when you look more closely you see it has said something like if income is below $28,734 don’t approve, between $28,734 and $44,329 do approve, $44,329 to $48,715 don’t approve, and so on, then it is clear that saying your income was greater than or less than some number, or even “high” or “low” is not what the model is actually doing. In other words, the model should not be jumping around like this, but should be monotonic. This can be done with simple decision trees, but those may not be performant, and for more complex models such as ensembles or neural networks, it can be done but it is less obvious how to do it. So the question reverts to whether directional reasons are needed for a given situation, and if performance can be traded off for a simpler model.

Do you need to visualize your model?

A final consideration is that it can often be easier to understand your model if it is somehow visualized. Any detailed visual of the actual model, such as those given by TensorBoard, or things like visualizing a whole random forest, are likely only of interest to the advanced technical user, and will need to be translated for other audiences. So while such diagrams can be important, visualizing the explanation is probably simpler. An example might be the various SHAP plots that can be made. As with many of the concepts in F, Acc, T, the tools are still evolving and improving to do such visualizations well.

Is the setup that you have thus come up with in fact OK?

Once answered, circle back the transparency proposal as part of the wider conversation about what is an acceptable solution to the business problem. As with many things, others outside the technical team may raise points not thought of by that team.

Tools

Having now reviewed F, Acc, and T, what about tools to implement it? Unfortunately, the possible stack of tools to use is too large to examine in any detail in an article like this one. The field is still maturing and there is a plethora of competing and rapidly developing open source and proprietary products for either one part of the dataflow or end-to-end. There are arguments both for creating your own stack of best-of-breed tools for different areas, or unifying things in a platform that aids collaboration between data scientists, engineers, and others.

So the choice of tools depends on where you land on this continuum of desired setups, and your available time, resources, and expertise.

Some examples of possible setups are given in the article Emerging Architectures for Modern Data Infrastructure in their 3 Blueprint diagrams. These encompass the end-to-end data science stack and not just MLOps or FAccT, but they give a flavor of the many tool names and interconnections that are possible. You could take a setup like one of these, and apply the FAccT concepts to it. Diagram 1 is for business intelligence, diagram 2 is for a large enterprise where MLOps is coming from a platform, and diagram 3 is using best-of-breed tools including many that have been discussed in the MLOps community. There are probably as many ideal stacks as there are people in the field, but these show setups that could be built, forming a useful basis for further work in your own company.

Conclusions

This 2-part blog entry has skimmed the surface of making a model F, Acc, and T. Entire books could be written about each, and there are more topics (e.g., privacy, security, metrics) that could have been included here.

I have not gone into detailed technical recommendations for tools to solve each, because the topic is too wide-ranging and the entry would be many times the length it already is. Nevertheless, I have tried to include some pointers for next steps for anyone tasked with producing a FAccT model. These are mostly in the form of links to other articles that expand on key themes, rather than original research literature or other sites.

The next step for the interested reader would be to relate all of this to their own business problem, and perhaps follow some of the links.

The main points from this article to take away have substantial overlap with many general points about doing good data science in the enterprise setting, but they can be summarized as:

  • Just as a trained model is not useful without production (MLOps), a model in production that is not F, Acc, and T, is not useful either
  • FAccT must therefore be a primary part of any modeling setup
  • For fairness, main points include be OK with your ethics, monitor your deployed models, be robust to changing data, and embrace diversity to minimize bias
  • For accountability, version everything, ensure provenance, log as needed, and make sure the dataflow and models are auditable
  • For transparency, determine to what degree black-box versus explanation is needed, and be sure all parties agree your setup is sufficiently transparent
  • As with all data science in the enterprise, keep the primary focus on the problem to be solved, and continuously make sure all parties are on the same page with regard to it

A final point is that, as you will have seen, there are a lot of issues to be resolved to make your model F, Acc, and T. This is just part of solving the wider issues of MLOps and data science. It is likely apparent that no one person could address all of this in a company, and at minimum you need a data scientist, a data/ML engineer, and for the parts linking technical execution to the wider business context, a team lead. The author will likely return to this theme in a future blog entry.

Author Bio

Dr. Nick Ball has been doing data science for 20 years. He began in the year 2000 using neural networks to classify galaxies, and continued working with machine learning on large astronomy datasets via two postdoctoral positions in North America. He wrote the first major published refereed review of Data Mining and Machine learning in Astronomy in 2010, and helped create CANFAR+Skytree, a cloud computing data mining system for astronomy. He then joined Skytree in 2013, working on several dozen customer projects, and continued at Infosys after Skytree was acquired. He then moved on to Oracle and their Data Science Platform, and then to Dotscience as Principal Data Scientist to lead the product roadmap on their MLOps platform.

This gives him experience as a generalist / full-stack data scientist across all areas of the field. He lives in the San Francisco Bay Area.

LinkedIn: https://www.linkedin.com/in/nickballdatascientist

Website: https://nickballdatascience.com

Image Credits

Fat cat: User:Jami430 / Wikimedia Commons, CC BY-SA 4.0

Neurons: User:Mietchen / Wikipedia Commons, CC BY 2.5

Tool use: User:Mike R / Wikipedia Commons, CC BY-SA 3.0

--

--

Nick Ball
MLOps.community

I am a generalist data scientist who has been using machine learning since the year 2000, first in academia (astrophysics) then in Silicon Valley since 2013.