-
-
Notifications
You must be signed in to change notification settings - Fork 39
/
01-Introduction.Rmd
executable file
·167 lines (97 loc) · 33.9 KB
/
01-Introduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
# (PART) Introduction {-}
# Introduction {#introduction}
## The aim of the book
Predictive models are used to guess (statisticians would say: predict) values of a variable of interest based on values of other variables. As an example, consider prediction of sales based on historical data, prediction of the risk of heart disease based on patient's characteristics, or prediction of political attitudes based on Facebook comments.\index{Predictive ! model}
Predictive models have been used throughout the entire human history. Ancient Egyptians, for instance, used observations of the rising of Sirius to predict the flooding of the Nile. A more rigorous approach to model construction may be attributed to the method of least squares, published more than two centuries ago by Legendre in 1805 and by Gauss in 1809. With time, the number of applications in the economics, medicine, biology, and agriculture has grown. The term *regression* was coined by Francis Galton in 1886. Initially, it referred to biological applications, while today it is used for various models that allow prediction of continuous variables. Prediction of nominal variables is called *classification*, and its beginning may be attributed to works of Ronald Fisher in 1936.\index{Classification}
During the last century, many statistical models that can be used for predictive purposes have been developed. These include linear models, generalized linear models, classification and regression trees\index{Classification and regression trees}, rule-based models, and many others. Developments in mathematical foundations of predictive models were boosted by increasing computational power of personal computers and availability of large datasets in the era of "big data" that we have entered. \index{Generalized linear models}
With the increasing demand for predictive models, model properties such as flexibility, capability of internal variable selection or feature engineering, and high precision of predictions are of interest. To obtain robust models, ensembles of models are used. Techniques like bagging, boosting, or model stacking combine hundreds or thousands of simpler models into one super-model. Large deep-neural models may have over a billion parameters. \index{Ensemble}\index{Boosting}\index{Bagging}
There is a cost of this progress. Complex models may seem to operate like "black boxes". It may be difficult, or even impossible, to understand how thousands of variables affect a model's prediction. At the same time, complex models may not work as well as we would like them to. An overview of real problems with massive-scale black-box models may be found in an excellent book by @ONeil or in her TED Talk "The era of blind faith in big data must end". There is a growing number of examples of predictive models with performance that deteriorated over time or became biased in some sense. For instance, IBM's Watson for Oncology was criticized by oncologists for delivering unsafe and inaccurate recommendations [@IBMWatson]. Amazon's system for curriculum-vitae screening was found to be biased against women [@AmazonAI]. The COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) algorithm for predicting recidivism, developed by Northpointe (now Equivant), was accused of bias against African-Americans [@COMPAS]. Algorithms behind the Apple Credit Card are accused of being gender-biased [@AppleCreditCard]. Some tools for sentiment analysis are suspected of age-bias [@Diaz2018]. These are examples of models and algorithms that led to serious violations of fairness and ethical principles. An example of a situation when data-drift led to a deterioration in model performance is the Google Flu model, which gave worse predictions after two years than at baseline [@GoogleFLU; @Lazer1203].\index{Fairness}
A reaction to some of these examples and issues are new regulations, like the General Data Protection Regulation [@EUGDPR]. Also, new civic rights are being formulated [@RightToExpl; @RightToExpl2; @RightToExpl3]. A noteworthy example is the "Right to Explanation", i.e., the right to be provided with an explanation for an output of an automated algorithm [@RightToExpl]. To exercise the right, we need new methods for verification, exploration, and explanation of predictive models.\index{GDPR| see {General Data Protection Regulation}}\index{General Data Protection Regulation}\index{Right to Explanation}
Figure \@ref(fig:UMEPImportance) presents an attempt to summarize how the increase in the model complexity affects the relative importance of domain understanding, the choice of a model, and model validation.
In classical statistics, models are often built as a result of a good understanding of the application domain. Domain knowledge helps to create and select the most important variables that can be included in relatively simple models that yield predictive scores. Model validation is based mainly on the evaluation of the goodness-of-fit and hypothesis testing. Statistical hypotheses shall be stated before data analysis, and obtained p-values should not interfere with the way in which data were processed or models were constructed.
Machine learning,\index{Machine learning} on the other hand, exploits the trade-off between the availability of data and domain knowledge. The effort is shifted from a deep understanding of the application domain towards (computationally heavy) construction and fitting of models. Flexible models can use massive amounts of data to select informative variables and filter out uninformative ones. The validation step gains in importance because it provides feedback to the model construction.
How might this approach look in the future? It is possible that the increasing automation of the exploratory data analysis (EDA) and the modelling part of the process will shift the focus towards the validation of a model. In particular, validation will not only focus on how good a model's fit and predictions are but also what other risks (like concept drift) or biases may be associated with the model. Model exploration will allow us to better and faster understand the analyzed data.\index{EDA| see {Exploratory Data Analysis}}\index{Exploratory Data Analysis}
(ref:UMEPImportanceDesc) Shift in the relative importance and effort (symbolically represented by the shaded boxes) put in different phases of data-driven modelling. Arrows show feedback loops in the modelling process. (A) In classical statistics, modelling is often based on a deep understanding of the application domain combined with exploratory data analysis (EDA). Most often, (generalized) linear models are used. Model validation includes goodness-of-fit evaluation and hypothesis testing. (B) In machine learning (ML), domain knowledge and EDA are often limited. Instead, flexible models are fitted to large volumes of data to obtain a model offering a good predictive performance. Evaluation of the performance (applying strategies like cross-validation to deal with overfitting)\index{Overfitting} gains in importance, as validation provides feedback to model construction. (C) In the (near?) future, auto-EDA and auto-ML will shift focus even further to model validation that will include the use of explainable artificial intelligence (XAI) techniques and evaluation of fairness, ethics, etc. The feedback loop is even longer now, as the results from model validation will also be helping in domain understanding.\index{Explainable artificial intelligence}\index{XAI|see{Explainable artificial intelligence}}
```{r UMEPImportance, echo=FALSE, fig.cap='(ref:UMEPImportanceDesc)', out.width = '100%', fig.align='center'}
knitr::include_graphics("figure/UMEPImportance.png",
auto_pdf = TRUE)
```
Summarizing, we can conclude that, today, the true bottleneck in predictive modelling is neither the lack of data, nor the lack of computational power, nor inadequate algorithms, nor the lack of flexible models. It is the lack of tools for model *exploration* and, in particular, model *explanation* (obtaining insight into model-based predictions) and model *examination* (evaluation of model's performance and understanding the weaknesses). Thus, in this book, we present a collection of methods that may be used for this purpose. As development of such methods is a very active area of research, with new methods becoming available almost on a continuous basis, we do not aim at being exhaustive. Rather, we present the mindset, key concepts and issues, and several examples of methods that can be used in model exploration.
## A bit of philosophy: three laws of model explanation {#three-single-laws}
In 1942, in his story "Runaround", Isaac Asimov formulated *Three Laws of Robotics*: \index{Laws of robotics}
1) a robot may not injure a human being,
2) a robot must obey the orders given it by human beings, and
3) a robot must protect its own existence.
Today’s robots, like cleaning robots, robotic pets, or autonomous cars are far from being conscious enough to fall under Asimov’s ethics. However, we are more and more surrounded by complex predictive models and algorithms used for decision-making. Artificial-intelligence models are used in health care, politics, education, justice, and many other areas. The models and algorithms have a far larger influence on our lives than physical robots. Yet, applications of such models are left unregulated despite examples of their potential harmfulness. An excellent overview of selected issues is offered in the book by @ONeil.
It is now becoming clear that we have got to control the models and algorithms that may affect us. Asimov's laws are being referred to in the context of the discussion around *ethics of artificial intelligence* (https://en.wikipedia.org/wiki/Ethics_of_artificial_intelligence). Initiatives to formulate principles for artificial-intelligence development have been undertaken, for instance, in the UK [@AIspring2018]. Following Asimov's approach, we propose three requirements that any predictive model should fulfil:
- **Prediction's validation**. For every prediction of a model, one should be able to verify how strong the evidence is that supports the prediction.
- **Prediction's justification**. For every prediction of a model, one should be able to understand which variables affect the prediction and to what extent.
- **Prediction's speculation**. For every prediction of a model, one should be able to understand how the prediction would change if the values of the variables included in the model changed.
We see two ways to comply with these requirements. One is to use only models that fulfil these conditions by design. These are so-called "interpretable-by-design models" that include linear models, rule-based models, or classification trees with a small number of parameters [@molnar2019]. However, the price of transparency may be a reduction in performance. Another way is to use tools that allow, perhaps by using approximations or simplifications, "explaining" predictions for any model. In our book, we focus on the latter approach.
## Terminology {#teminology}
It is worth noting that, when it comes to predictive models, the same concepts have often been given different names in statistics and in machine learning. In his famous article [@twoCultures], Leo Breiman described similarities and differences in perspectives used by the two communities. For instance, in the statistical-modelling literature, one refers to "explanatory variables", with "independent variables", "predictors", or "covariates"\index{Covariate}\index{Explanatory | variable}\index{Prediction}\index{Predictor} often used as equivalents. Explanatory variables are used in a model as a means to explain (predict) the "dependent variable"\index{Dependent variable}, also called "predicted" variable or "response".\index{Response} In machine-learning terminology, "input variables" or "features"\index{Feature}\index{Input variable}\index{Variable} are used to predict the "output"\index{Output variable} or "target" variable.\index{Target variable} In statistical modelling, models are "fit" to the data that contain "observations",\index{Observation} whereas in the machine-learning world a model is "trained" on a dataset that may contain "instances"\index{Instance} or "cases"\index{Case}. When we talk about numerical constants that define a particular version of a model, in statistical modelling, we refer to model "coefficients", while in machine learning it is more customary to refer to model "parameters"\index{Model ! parameter}\index{Model ! coefficient}. In statistics, it is common to say that model coefficients are "estimated",\index{Model ! estimation} while in machine learning it is more common to say that parameters are "trained"\index{Model ! training}.\index{Statistical modelling}
To the extent possible, in our book we try to consistently use the statistical-modelling terminology. However, the reader may find references to a "feature" here and there. Somewhat inconsistently, we also introduce the term "instance-level" explanation. Instance-level explanation methods are designed to extract information about the behaviour of a model related to a specific observation (or instance). On the other hand, "dataset-level" explanation techniques allow obtaining information about the behaviour of the model for an entire dataset.\index{Dataset-level explanation}
We consider models for dependent variables that can be continuous or categorical. The values of a continuous variable can be represented by numbers with an ordering that makes some sense (ZIP-codes or phone numbers are not considered as continuous variables, while age or number of children are). A continuous variable does not have to be continuous in the mathematical sense; counts (number of floors, steps, etc.) will be treated as continuous variables as well. A categorical variable can assume only a finite set of values that are not numbers in the mathematical sense, i.e., it makes no sense to subtract or divide these values.
In this book, we treat models as "black boxes". We don't assume anything about their internal structure or complexity. We discuss the specificity of such an approach in a bit more detail in the next section.
## Black-box models and glass-box models {#glassblack}
Usually, the term "black-box"\index{Black-box} model is used for models with a complex structure that is hard to understand by humans. This usually refers to a large number of model coefficients or complex mathematical transformations. As people vary in their capacity to understand complex models, there is no strict threshold for the number of coefficients that makes a model a black box. In practice, for most people, this threshold is probably closer to 10 than to 100.\index{Glass-box model}\index{Black-box model}
A "glass-box" (sometimes also called a "white-box" or a "transparent-box") model, which is opposite to a black-box one, is a model that is easy to understand (though maybe not by every person). It has a simple structure and a limited number of coefficients.\index{Model | understanding}\index{Transparent-box model}\index{White-box model| see {glass-box model}}
The most common classes of glass-box models are decision or regression trees (see an example in Figure \@ref(fig:BILLCD8)), or models with an explicit compact structure. As an example of the latter, consider a model for obesity based on the body-mass index (BMI), with BMI defined as the mass (in kilograms) divided by the square of height (in meters). Subjects are classified as *underweight* if their BMI<18, as *normal* if their BMI lies in the interval [18,25], and as *overweight* if their BMI>25. The compact form of the model makes it easy to understand, for example, how does a change in BMI change the predicted obesity class.
The structure of a glass-box model is, in general, easy to understand. It may be difficult to collect the necessary data, build the model, fit it to the data, or perform model validation, but once the model has been developed its interpretation and mode of working is straightforward.
Why is it important to understand a model's structure? There are several important advantages. If the structure is transparent, we can easily see which explanatory variables are included in the model and which are not. Hence, for instance, we may be able to question the model from which a particular explanatory variable is excluded. Also, in the case of a model with a transparent structure and a limited number of coefficients, we can easily link changes in the model's predictions with changes in particular explanatory variables. This, in turn, may allow us to challenge the model on the ground of domain knowledge if, for instance, the effect of a particular variable on predictions is inconsistent with previously-established results. Note that linking changes in the model's predictions to changes in particular explanatory variables may be difficult when there are many variables and/or coefficients in the model. For instance, a classification tree with hundreds of nodes is difficult to understand, as is a linear regression model with hundreds of coefficients.
(ref:BILLCD8Desc) An example of a decision-tree model for melanoma risk patients developed by @BILLCD8. The model is based on two explanatory variables, Breslow thickness and the presence of tumor infiltration lymphocytes. These two variables classify patients into three groups with a different probability of survival.
```{r BILLCD8, echo=FALSE, fig.cap='(ref:BILLCD8Desc)', out.width = '60%', fig.align='center'}
knitr::include_graphics("figure/wbBILL8model.png")
```
Note that some glass-box models, like the decision-tree model presented in Figure \@ref(fig:BILLCD8), satisfy by design the explainability laws introduced in Section \@ref(three-single-laws). In particular, regarding *prediction's validation*, we see how many patients fall in a given category in each node. With respect to *prediction's justification*, we know which explanatory variables are used in every decision path. Finally, regarding *prediction's speculation*, we can trace how changes in particular variables will affect the model's prediction. We can, of course, argue if the model is good or not, but the model structure is obviously transparent.\index{Model | validation}\index{Prediction’s justification}
Comprehending the performance of black-box models presents more challenges. The structure of a complex model, such as, for example, a neural-network model, may be far from transparent. Consequently, we may not understand which features influence the model decisions and by how much. Consequently, it may be difficult to decide whether the model is consistent with our domain knowledge.
In our book, we present tools that can help in extracting the information necessary for the evaluation of models in a model-agnostic fashion, i.e., in the same way regardless of the complexity of the analyzed model.
## Model-agnostic and model-specific approach {#agnosticspecific}
Interest in model interpretability is as old as statistical modelling itself. Some classes of models have been developed for a long period or have attracted intensive research. Consequently, those classes of models are equipped with excellent tools for model exploration, validation, or visualisation. For example:\index{Model-agnostic approach}\index{Model-specific approach}
* There are many tools for diagnostics and evaluation of linear models (see, for example, @Galecki2013 or @Faraway02practicalregression). Model assumptions are formally defined (normality, linear structure, homogeneity of variance) and can be checked by using normality tests or plots (like normal qq-plots), diagnostic plots, tests for model structure, tools for identification of outliers, etc. A similar situation applies to generalized linear models (see, for example, @Dobson2002).\index{Linear | model}\index{Linear | regression}\index{Model | assumptions}
* For more advanced models with an additive structure, like the proportional hazards model, many tools can be used for checking model assumptions (see, for example, @rms or @sheather2009modern).\index{Proportional hazards model}
* Random forest models are equipped with the out-of-bag method of evaluating performance and several tools for measuring variable importance [@R-randomForest]. Methods have been developed to extract information about possible interactions from the model structure [@randomForestExplainer;@ehrlinger2016ggrandomforests]. Similar tools have been developed for other ensembles of trees, like boosting models (see, for example, @xgboostExplainer or @EIXkarbowiak).
* Neural networks enjoy a large collection of dedicated model-explanation tools that use, for instance, the layer-wise relevance propagation technique [@BachLWRP], saliency maps technique [@SaliencyMaps], or a mixed approach. A summary can be found in @samek2017explainable and @alber2018innvestigate.\index{Neural network}
* The Bidirectional Encoder Representations from Transformers (BERT) family of models leads to high-performance models in Natural Language Processing. The exBERT method [@hoover2019exbert] is designed to visualize the activation of attention heads in this model.
Of course, the list of model classes with dedicated collections of model-explanation and/or diagnostics methods is much longer. This variety of model-specific approaches does lead to issues, though. For instance, one cannot easily compare explanations for two models with different structures. Also, every time a new architecture or a new ensemble of models is proposed, one needs to look for new methods of model exploration. Finally, no tools for model explanation or diagnostics may be immediately available for brand-new models.
For these reasons, in our book we focus on model-agnostic techniques. In particular, we prefer not to assume anything about the model structure, as we may be dealing with a black-box model with an unspecified structure. Note that often we do not have access to model coefficients, but only to a specified Application Programming Interface (API) that allows querying remote models as, for example, in Microsoft Cognitive Services [@MicrosofrCognitiveServices]. In that case, the only operation that we may be able to perform is the evaluation of a model on a specified set of data.\index{API|see{Application Programming Interface}}\index{Application Programming Interface}\index{Microsoft Cognitive Services}
However, while we do not assume anything about the structure of the model, we will assume that the model operates on $p$-dimensional vector of explanatory variables/features and, for a single observation, it returns a single value (score/probability), which is a real number. This assumption holds for a broad range of models for data such as tabular data, images, text data, videos, etc. It may not be suitable for, e.g., models with memory-like sequence-to-sequence models [@seq2seq] or Long Short-Term Memory models [@lstm] in which the model output depends also on sequence of previous inputs, or generative models that output text of images.
## The structure of the book {#bookstructure}
This book is split into four major parts. In the first part, *Introduction*, we introduce notation, datasets, and models used in the book. In the second part, *Instance-level Exploration*, we present techniques for exploration and explanation of a model's predictions for a single observation. In the third part, *Dataset-level Exploration*, we present techniques for exploration and explanation of a model for an entire dataset. In the fourth part, *Use-case*, we apply the methods presented in the previous parts to an example in which we want to assess the value of a football player. The structure of the second and the third part is presented in Figure \@ref(fig:UMEPpiramide).\index{Dataset-level exploration}
(ref:UMEPpiramideCaption) Model exploration methods presented in the book. The left-hand side (corresponding to the second part of the book) focuses on instance-level exploration, while the right-hand side (corresponding to the third part of the book) focuses on dataset-level exploration. Consecutive layers of the stack are linked with a deeper level of model exploration. The layers are linked with laws of model exploration introduced in Section \@ref(three-single-laws).
```{r UMEPpiramide, echo=FALSE, fig.cap='(ref:UMEPpiramideCaption)', out.width = '100%', fig.align='center'}
knitr::include_graphics("figure/UMEPpiramide.png",
auto_pdf = TRUE)
```
In more detail, the first part of the book consists of Chapters \@ref(modelDevelopmentProcess)--\@ref(dataSetsIntro). In Chapter \@ref(modelDevelopmentProcess), we provide a short introduction to the process of data exploration and model construction, together with notation and definition of key concepts that are used in consecutive chapters. Moreover, in Chapters \@ref(doItYourselfWithR) and \@ref(doItYourselfWithPython), we provide a short description of R and Python tools and packages that are necessary to replicate the results presented in the book. Finally, in Chapter \@ref(dataSetsIntro), we describe two datasets that are used throughout the book to illustrate the presented methods and tools.
The second part of the book focuses on instance-level explainers and consists of Chapters \@ref(breakDown)--\@ref(summaryInstanceLevel). Chapters \@ref(breakDown)--\@ref(shapley) present methods that allow decomposing a model's predictions into contributions corresponding to each explanatory variable. In particular, Chapter \@ref(breakDown) introduces break-down (BD) for additive attributions for predictive models, while Chapter \@ref(iBreakDown) extends this method to attributions that include interactions. Chapter \@ref(shapley) describes Shapley Additive Explanations (SHAP) [@SHAP], an alternative method for decomposing a model's predictions that is closely linked with Shapley values developed originally for cooperative games by @shapleybook1952. Chapter \@ref(LIME) presents a different approach to the explanation of single-instance predictions. It is based on a local approximation of a black-box\index{Black-box} model by a simpler glass-box one. In this chapter, we discuss the Local-Interpretable Model-agnostic Explanations (LIME) method [@lime]. These chapters correspond to the second layer of the stack presented in Figure \@ref(fig:UMEPpiramide).\index{SHAP| see {Shapley Additive Explanations}}\index{Shapley Additive Explanations}
In Chapters \@ref(ceterisParibus)--\@ref(localDiagnostics) we present methods based on the ceteris-paribus (CP) profiles. The profiles show the change of model-based predictions induced by a change of a single explanatory-variable. The profiles are introduced in Chapter \@ref(ceterisParibus), while Chapter \@ref(ceterisParibusOscillations) presents a CP-profile-based measure that summarizes the impact of a selected variable on the model’s predictions. The measure can be used to determine the order of variables in model exploration. It is particularly important for models with large numbers of explanatory variables. Chapter \@ref(localDiagnostics) focuses on model diagnostics. It describes local-stability plots that are useful to investigate the sources of a poor prediction for a particular single observation.
The final chapter of the second part, Chapter \@ref(summaryInstanceLevel), compares various methods of instance-level exploration.
The third part of the book focuses on dataset-level exploration and consists of Chapters \@ref(modelLevelExploration)--\@ref(residualDiagnostic). The chapters present methods in the same order as shown in the right-hand side of Figure \@ref(fig:UMEPpiramide). In particular, Chapter \@ref(modelPerformance) presents measures that are useful for the evaluation of the overall performance of a predictive model. Chapter \@ref(featureImportance) describes methods that are useful for the evaluation of an explanatory-variable's importance. Chapters \@ref(partialDependenceProfiles) and \@ref(accumulatedLocalProfiles) introduce partial-dependence and accumulated-dependence methods for univariate exploration of a variable's effect. These methods correspond to the third (from the top) layer of the right-hand side of the stack presented in Figure \@ref(fig:UMEPpiramide). The Chapter \@ref(residualDiagnostic) summarises diagnostic techniques based on model residuals. The final chapter of this part of the book is Chapter \@ref(summaryModelLevel) that summarises global techniques for model exploration.
The book is concluded with Chapter \@ref(UseCaseFIFA) that presents a worked-out example of model-development process in which we apply all the methods discussed in the second and third part of the book.
To make the exploration of the book easier, each chapter of the second and the third part of the book has the same structure:
* Section *Introduction* explains the goal of the method(s) presented in the chapter.
* Section *Intuition* explains the general idea underlying the construction of the method(s) presented in the chapter.
* Section *Method* shows mathematical or computational details related to the method(s). This subsection can be skipped if you are not interested in the details.
* Section *Example* shows an exemplary application of the method(s) with discussion of results.
* Section *Pros and cons* summarizes the advantages and disadvantages of the method(s). It also provides some guidance regarding when to use the method(s).
* Section *Code snippets* shows the implementation of the method(s) in R and Python. This subsection can be skipped if you are not interested in the implementation.
## What is included in this book and what is not {#whatisinthebook}
The area of model exploration and explainability is quickly growing and is present in many different flavors. Instead of showing every existing method (is it really possible?), we rather selected a subset of consistent tools that form a good starting toolbox for model exploration. We mainly focus on the impact of the model exploration and explanation tools rather than on selected methods. We believe that by providing the knowledge about the potential of model exploration methods and about the language of model explanation, we will help the reader in improving the process of data modelling.
Taking this goal into account **in this book, we do show**
* how to determine which explanatory variables affect a model's prediction for a single observation. In particular, we present the theory and examples of methods that can be used to explain prediction like break-down plots, ceteris-paribus profiles, local-model approximations, or Shapley values;
* techniques to examine predictive models as a whole. In particular, we review the theory and examples of methods that can be used to explain model performance globally, like partial-dependence plots or variable-importance plots;
* charts that can be used to present the key information in a quick way;
* tools and methods for model comparison;
* code snippets for R and Python that explain how to use the described methods.
On the other hand, **in this book, we do not focus on**
* any specific model. The techniques presented are model-agnostic and do not make any assumptions related to the model structure;
* data exploration. There are very good books on this topic by, for example, @r4ds2019 or @McKinney2012, or the excellent classic by @tukey1977;
* the process of model building. There are also very good books on this topic by, for instance, @MASSbook, @James20147, or @Efron2016;
* any particular tools for model building. These are discussed, for instance, by @Kuhn2013. \index{Model | building}
## Acknowledgements {#thanksto}
This book has been prepared by using the `bookdown` package [@R-bookdown], created thanks to the amazing work of Yihui Xie. A live version of this book is available at the GitHub repository https://github.com/pbiecek/ema. If you find any error, typo, or inaccuracy in the book, we will be grateful for your feedback at this website.
Figures and tables have been created mostly in the R language for statistical computing [@RcoreT] with numerous libraries that support predictive modelling. Just to name a few packages frequently used in this book: `randomForest` [@randomForest], `ranger` [@rangerRpackage], `rms` [@rms], `gbm` [@gbm], or `caret` [@caret]. For statistical graphics, we have used the `ggplot2` package [@ggplot2]. For model governance, we have used `archivist` [@archivist]. Examples in Python were added thanks to the fantastic work of Hubert Baniecki and Wojciech Kretowicz, who develop and maintain the `dalex` library. Most of the presented examples concern models built in the `sklearn` library [@scikitlearn]. The `plotly` library [@plotly] is used to visualize the results.\index{package | archivist}\index{package | DALEX}
We would like to thank everyone who contributed with feedback, found typos, or ignited discussions while the book was being written, including GitHub contributors: Rees Morrison, Alicja Gosiewska, Kasia Pekala, Hubert Baniecki, Asia Henzel, Anna Kozak, Agile Bean, Wojciech Kretowicz, Tuomo Kalliokoski, and Xiaochi Liu. We would like to acknowledge the anonymous reviewers, whose comments helped us to improve the contents of the book. We thank Jeff Webb, Riccardo De Bin, Patricia Martinkova, and Ziv Shkedy for their encouraging reviews. We are very grateful to John Kimmel from Chapman & Hall/CRC Press for his editorial assistance and patience.
Przemek's work on model interpretability started during research trips within the RENOIR (H2020 grant no. 691152) secondments to Nanyang Technological University (Singapour) and Davis University of California (USA). He would like to thank Prof. Janusz Holyst for the chance to take part in this project. Przemek would also like to thank Prof. Chris Drake for her hospitality. This book would have never been created without the perfect conditions that Przemek found at Chris's house in Woodland. Last but not least, Przemek would like to thank colleagues from the MI2DataLab and Samsung Research and Development Institute Poland for countless inspiring discussions related to Responsible Artificial Intelligence and Human Oriented Machine Learning.
Tomasz would like to thank colleagues from the Data Science Institute of Hasselt University and from the International Drug Development Institute (IDDI) for their support that allowed him finding the time to work on the book.