Welcome to the new IOA website! Please reset your password to access your account.

Traffic noise and children's health: New insights from a machine learning algorithm? Jan Spilski 1 University Kaiserslautern, Center for Cognitive Science, Erwin-Schrödinger-Straße, Building 57, D-67663 Kaiserslautern, Germany Christoph Giehl 2 University Kaiserslautern, Center for Cognitive Science, Erwin-Schrödinger-Straße, Building 57, D-67663 Kaiserslautern, Germany Hendriek Boshuizen 3 RVIM, Rijksinstituut voor Volksgezondheid en Milieu 3720 VB Bilthoven, Netherland Albert Wong 4 RVIM, Rijksinstituut voor Volksgezondheid en Milieu 3720 VB Bilthoven, Netherland

Kirstin Bergström 5 University Kaiserslautern, Center for Cognitive Science, Erwin-Schrödinger-Straße, Building 57, D-67663 Kaiserslautern, Germany Thomas Lachmann 6 University Kaiserslautern, Center for Cognitive Science; Centro de Investigación Nebrija en Cognición (CINC) Erwin-Schrödinger-Straße, Building 57, D-67663 Kaiserslautern, Germany; Universidad Nebrija, Madrid, Spain

Maria Klatte 7 University Kaiserslautern, Center for Cognitive Science, Erwin-Schrödinger-Straße, Building 57, D-67663 Kaiserslautern, Germany

1 Jan.Spilski@sowi.uni-kl.de 2 Christoph.Giehl@sowi.uni-kl.de 3 hendriek.boshuizen@rivm.nl 4 albert.wong@rivm.nl 5 mail1@example.com 6 lachmann@rhrk.uni-kl.de 7 klatte@rhrk.uni-kl.de

ABSTRACT

Studies on the influence of traffic noise on children's health are usually very comprehensive and collect data on a large number of measured variables in comparatively large samples. In the NORAH Study, for example, almost 700 variables have been considered including 91variables related to traffic noise. With a theory-based approach, the statistical evaluation of that data focused on a limited number of variables to be included in the regression models as predictors, mediators, moderators, or confounders. In contrast, machine learning (ML) methods are able to consider the complete scope of variables in an analysis. Random forest models are one type of ML methods for dealing with possible multicollinearity of predictors or non-linear relationships. Although these methods can offer advantages, they have hardly been used in relation to traffic noise and children’s health. In the EU project EqualLife, random forest models are computed in order to obtain information on the significance of individual exposomes (e.g., traffic noise) for children’s health. In the present paper, we compare the results of a regression model and a random forest model using the NORAH Study as an example. Possible advantages and disadvantages of the methods are discussed.

1. INTRODUCTION

Traffic noise exposure can have an impact on children’s health (e. g., [1]). Children are vulnerable to the effects of traffic noise because they are less able to anticipate and cope with environmental noise than adults [2]. Nevertheless, the evidence is less clear for children's health than for children's cognition. Clark and Paunovic [3] accordingly pointed out that there are few and inconsistent results.

The EU project EqualLife (Early environmental quality and life-course mental health) aims to improve the evidence base. Currently, the effects of traffic noise on the mental health of children in Europe are being investigated using existing data sets of more than 200,000 children, among many other influencing variables (for more information on the EqualLife project, see [4].

However, it is a great challenge for researchers to select the essential variables for the analyses from very large and different data sets. For example, for the NORAH dataset, almost 700 variables have been considered including 91 variables related to traffic noise. With a theory-based approach, the statistical evaluation of that data focused on a limited number of variables to be included in the regression models as predictors, mediators, moderators, or confounders. However, even with a limited number of variables, the use of regression models quickly became challenging.

In contrast, machine learning (ML) methods are able to consider the complete scope of variables in an analysis. Random forest models [5] are one type of ML methods for dealing with possible multicollinearity of predictors or non-linear relationships. Although these methods can offer advantages, they have hardly been used in relation to traffic noise and children’s health.

For this reason, we present the results of random forest models for a specific, selected health- related outcome ( intake of medically prescribed drugs ) from the NORAH data set [6]. Our contribution is two-fold: We illustrate how a random forest approach can be used to discover the most important predictors of the selected variable. The goal is to leverage the potential of the wide range of potential influencing factors included in the NORAH study and overcome being limited to a few predictors. The exploration on further relevant predictors of aircraft noise effects can then provide further impetus for additional theoretical considerations. Second, the question arises of whether effects previously found in theory-based models can also be discovered by this “theory-free” analysis approach. Therefore, we compare the random forest results to those of previously reported theory- based regression analyses of the NORAH data [7] in order to weigh strengths and weaknesses.

2. METHOD

2.1. Participants

The present data set was collected within the NORAH Study. Detailed information on sampling are provided by Klatte et al. [6]. Mean age for children was 8 years and 4 months ( SD 5 months). Concerning children's health and living environment at home, complete data were available for 1,118 children and their parents.

2.2. Assessment of Traffic Noise Exposure and Room Acoustics

Road traffic and railway noise levels (LAeq, Lden, LAmax) were estimated for the children's school and home addresses using a combination of information (e.g., traffic flow data, number of train journeys) that were provided by local authorities (for details see [8]). Aircraft noise levels at children's school and home address (LAeq, Lden, LAmax) and number above thresholds (NAT) for bands of five dB(A) were calculated for the time period of 12 months before data collection was conducted (for details see [8]). We would like to emphasise that NORAH had a focus on identifying aircraft noise effects. Therefore, of the 91 metrics used, 58 metrics are related to aircraft noise exposure and 33 to road and rail noise. Descriptive statistics for selected metrics at the home address are shown in Table 1.

The room acoustics were determined for the children's classrooms. For example, insulation was estimated according to German standards (VDI 2719, 1987 [9]). The reverberation times were classified according to DIN 18041:2004 [10] as a guideline for the requirements of the room acoustic design of classrooms. For more informations about room acoustics in the NORAH study, see Klatte et al. [11].

Table 1: Descriptive statistics for selected traffic noise metrics (NORAH Study, home address)

Aircraft Road traffic Railway M Range M Range M Range LAeq, 06-22 dB (A) 49.15 36.40-60.80 53.24 36.20-77.30 45.94 30.00-76.80 LAeq, 22-06 dB (A) 42.47 30.00-55.10 45.05 30.10-67.80 46.58 30.00-78.10 Lden, 00-24 dB (A) 51.74 40.80-63.60 54.86 37.40-78.50 51.24 30.00-84.20 LAmax, 06-22 dB (A) 67.12 51.00-83.00 LAmax, 22-06 dB (A) 57.95 39.00-76.00 LAmax, 00-24 dB (A) 62.32 35.00-96.00 61.81 30.00-97.00

≥60 <65 dB (A) 45.52 0.04-165.23 ≥65 <70 dB (A) 28.77 0.01-180.37 ≥70 <75 dB (A) 11.60 0.00-121.21 ≥75 <80 dB (A) 3.90 0.00-40.73 ≥80 <85 dB (A) 0.88 0.00-20.76

Day 06-22 NAT (bands)

≥60 <65 dB (A) 4.61 0.00-15.37 ≥65 <70 dB (A) 3.05 0.00-14.79 ≥70 <75 dB (A) 1.33 0.00-12.31 ≥75 <80 dB (A) 0.41 0.00-5.31 ≥80 <85 dB (A) 0.08 0.00-2.08 Notes. Taken from Spilski et al., 2019. Table 1 shows a sample of selected traffic noise metrics. A total of

Night 22-06 NAT (bands)

91 traffic noise metrics was included in the random forest calculations.

2.3. Assessment of Living Environment

Population density and average imperviousness (excluding green space) were calculated as "objective" measures of the residential environment alongside noise metrics. For more details see Spilski et al. [7]. Other variables concerning the living environment (e.g., lots of nature , many places to play , good air) were surveyed with questionnaires (see also Table 2 and [11]).

2.4. Materials and Procedure

The following description of the procedure has been published elsewhere (e.g., [6, 12]). The questioning of the children was performed in groups of whole classes. The scales used in the children questionnaire comprised health-related quality of life (KINDL-R [13]), home environment, annoyance, and noise at home [14], among others. The children took the parent questionnaire to their parents who filled it out at home. The parental questionnaire contained questions concerning children’s well-being, living environment, health-related outcomes (e.g., intake of medically prescribed drugs ) and potential confounding factors (e.g., SES). More details concerning the procedure, the questionnaires, and statistical examination of psychometrics are provided by Klatte et al. [11]. Table 2 lists some non-acoustic variables and indexes that were included in the prediction of children’s intake of medically prescribed drugs.

Table 2: Selected non-acoustic NORAH variables that were included in the statistical analyses

Factor Variable Measurement living environment

population density (degree of urbanization) people per km²

average imperviousness (gray space) percentages of sealed areas around the home addresses (INSPIRE grid)

lots of nature

(LE)

parental judgments, 4-point scale (1=is not true at all; …; 4=is exactly right) bad air due to industry/ trade

traffic noise (air, road, railway) disturbs

many places to play

where I live - good air child judgements, 4-point scale (1=is not true at all; …; 4=is exactly right) where I live - beautiful houses

Gender 0=boy, 1=girl

Age Numeric age

socio economic status (SES) index household, value range 3 to 21

migration background 0=no migration background; 1=migration background

socio factors (demographic & social = SF)

crowding at home Size home in m² divided by number of people in the household parental support for school (index 4 items) parental judgements, 5-point scale (1=never; …; 5=very often) friends (index 2 items)

class climate prosocial (index 6 items) child judgements, 4-point scale (1=is not true at all; …; 4=is exactly right)

psychological (index 6 items) parental judgements, 5-point scale (1=never; …; 5=very often) physical (index 3 items)

wellbeing & health (WH)

poor sleep due to (aircraft, familiy) noise child judgements, 4-point scale (1=is not true at all; …; 4=is exactly right) 2.5. Statistical Analysis

Machine Learning Algorithm

For statistical analysis, we included 570 predictor variables (including 91 traffic noise exposure metrics, examples were given in Tables 1 and 2) in the decision tree models. We used the random

forest algorithm. The random forest algorithm is a widely used ensemble learning method based on decision trees [5]. This supervised machine learning model can be used for both classification and regression problems, depending on the type of outcome variable. In the reanalysis of the NORAH data, the random forest algorithm was used for a regression task. Random forests combine multiple individual decision trees, each of which is fitted on a bootstrap sample of the data. Moreover, a different random subset of the variables is considered for each split in an individual tree. In the regression setting, variance reduction is applied as a splitting criterion to determine the optimal split. A final prediction is obtained by averaging over the predictions of the different trees. By considering many decision trees, the prediction accuracy can be highly increased in comparison to an individual decision tree [5]. The implementation of the random forest algorithm for this analysis is done in the "randomForest" R package [15] in combination with the "iml" [16] and "partykit" [17] R package to obtain additional measures of model quality.

Despite using random forests to make predictions of the outcome variable, the aim can also be to identify relevant predictor variables from a larger set of variables. Random forests can be used as a nonlinear and nonparametric approach for variable selection purposes, by producing variable importance measures. In this study, we used the permutation method for computing the variable importance measures. The relative importance of one predictor variable for the prediction is derived by computing the difference in the error before and after random permutating the values of the predictor variable. A variable is important, if permutating its values results in an increase of the prediction error. Variables with large importance values will contribute to the prediction and are therefore associated with the outcome variable, whereas variables that are not related to the outcome have importance values close to zero.

The importance of variables can be expressed as both the mean decrease in prediction accuracy (referred to as "MeanDecreaseAccuracy", MDA) as well as the mean decrease in the Gini coefficient (referred to as "MeanDecreaseGini", MDG) when the values of the predictor variable are randomly permutated in a so-called out-of-bag sample [18]. As described earlier, the random forest utilizes bootstrapping on a subset of variables for each tree. This subset of variables is "in-the-bag" for one tree. Thus, the out-of-bag sample contains the subset of variables not included in the bootstrap sample of one specific tree [19]. The MDA can be interpreted to be the loss of explained variance of the outcome, averaged over all trees, if the predictor was omitted from the model. The Gini coefficient, if applied in random forests, can be interpreted as a measure of how the predictor contributes to the homogeneity of the nodes in resulting forests whit more homogenous nodes between the trees representing higher variable importance. The MDG can thus be interpreted to be the loss of node homogeneity between the trees if the predictor would be omitted from the model [20].

In addition to identifying relevant predictor variables in the dataset, the variable importance measures also lead to improved interpretability of the random forest analysis. This paper focuses on the outcome intake of medically prescribed drugs and used the MDA as importance metric. For this purpose, 1,000 randomization trials with a maximum of 5 nodes per tree were performed to obtain robust prediction trees as a result.

3. RESULTS

First, we report the results of the random forest models. In the second step, these results are discussed in the context of previously reported results [7].

3.1. Results of the Random Forest Models

A total of 570 predictors were included in the random forest analysis. Figure 1 shows the results of the random forest analysis for the most important 20 predictors. Not surprisingly, diagnosed diseases such as asthma or allergies are predictors for intake of medically prescribed drugs . Both predictors are therefore the best predictors of medication use.

The traffic noise related predictors with an influence on the outcome intake of medically prescribed drugs are highlighted in grey. It is apparent that only aircraft noise exposure emerged as important predictors. Both average aircraft noise exposure levels and the number of flight events (NAT) were found to be relevant predictors. Most important, the analysis showed LAeq for the period between 8 a.m. and 2 p.m. and the number of aircraft noise events between 10 p.m. and 6 a.m. with 75 ≤ noise exposure < 80 dB (A). Both night and day metrics are important to predict intake of medically prescribed drugs . It is noticeable that all aircraft noise metrics are home-related whereas school- related levels did not prove to be important. For the school environment, however, a parameter of room acoustics was found to be a relevant predictor. In addition to aircraft noise exposure, comparatively few living environment variables and social variables emerged as important predictors.

Figure 2: Results of random forest algorithm – Top 20 variable importance plot, MDA = Mean Decrease

Accuracy

As indicated in Tables 1 and 2, the predictors can be clustered into the four groups: 1. TNA = traffic noise & acoustics, 2. LE = living environment, 3. WH = wellbeing & health, 4. SF = socio factors (social & demographic). In the random forest with focus on on the outcome intake of medically prescribed drug, traffic noise exposure and room acoustic metrics were found to be the most important predictors among the top 20 predictors (70 %), followed by wellbeing & health related variables (15 %), see Table 3.

r= 2 een ene eso 66) wd wa tore et Se) __ SSO ngnnaT 275-4008) 08 See LAr 06.22 68) cy ee meee nn = “WA secatnoce NORA LE soja arat ose atch iStet wou cane navies we a Soon SLE wm vo a lame any we Set nguiat asa) vs 1 Sat nie wy vie CCC CT cK Co rr CT) Seo eg 8 8) scot nar 215 20080) vad °° ° oe re

Table 3: Frequency of predictors being among the top 20 predictors

Frequency Relative frequency 1 TNA = traffic noise & acoustics 14 .70 2 WH = wellbeing & health 3 .15 3 SF = socio factors (social & demographic) 2 .10 4 LE = living environment 1 .05

3.2. Random Forest Results in the Context of Previous Regression Results

To set our re-analysis in the context of previous results, the regression models by Spilski et al. [7] are used. They calculated mediation and moderated regression models using IBM SPSS 25 and the PROCESS 3.3 macro by Hayes [21]. The focus of the analyses was the impact of aircraft noise exposure on health. The LAeq for the home address between 6 a.m. and 10 p.m. was used as the aircraft noise exposure metric. Intake of medically prescribed drug was one health-related variable that was analysed as an outcome. Aircraft noise annoyance was included as a mediator. All models also consider further influences (e.g., age, gender, socioeconomic status (SES), road-traffic and railway noise at home).

A total of three hypotheses was tested there. Hypothesis 1 tested a simple mediation. The results show neither a direct effect of aircraft noise exposure nor a mediation effect for the outcome intake of medically prescribed drug ( p < .05, 95% C.I . includes 0). In the second and third hypothesis, the mediation was extended by moderation. With hypothesis 2, the population density (degree of urbanization) was used as a moderator. Spilski et al. (2019) found a significant interaction of aircraft noise exposure (X) and degree of urbanization (W): W×X: b = -.122, SE = .051, p = .018; 95% C.I .: -.223/-.021. This statistical result indicates that the likelihood of taking medically prescribed drugs increased with increasing aircraft noise exposure in areas of medium urbanization, but not in high urban areas.

For hypothesis 3, an extension of the existing model was made and a second moderator was added. The second moderator was the degree of imperviousness . It was shown that degree of imperviousness (Z) is another significant moderator of the direct relationship of aircraft noise exposure (X) on intake of medically prescribed drug (Z×X: b = -.003, SE = .001, p < .01; 95% C.I .: -.004/-.001). According to Spilski et al (2019), this result indicates that “The probability of taking medically prescribed drugs increased with increasing aircraft noise exposure in areas with a low degree of imperviousness (- 1SD), but not in medium (0SD) or high degree of imperviousness (+1SD)”.

The variables that served as moderators in these regressions were also included into the random forest models. However, the results show that their importance relative to other variables was estimated quite low within the random forest models. These moderators neither were among the top 20 of important predictors nor could they be identified by indicators of interaction strength within the models. 4. CONCLUSIONS

While data-sets in studies on aircraft noise effects often include a wide range of potential predictor variables, regression models quickly become a challenge because of statistical limitations like multicollinearity. The aim of the present paper was to show how random forest models can be used to discover the most important predictors of the selected variable ( intake of medically prescribed drugs ). Therefore, we included 570 predictor variables (including 91 traffic noise exposure metrics) in the random forest analyses. The results showed that different aircraft noise exposure metrics were important predictors of intake of medically prescribed drug . Thirteen aircraft noise exposure metrics were among the top 20 predictors. Furthermore, different time periods and aircraft noise exposure metrics (LAeq, LAmax, NAT) are important predictors, among them the noise metric LAeq, 06-22 dB (A) used by Spilski [7] that was represented in the top 20.

These results indicate that the results of random forest analyses can be suitable for identifying important noise metrics as predictors. Potentially, they even help to advance discussion on more suitable noise metrics; for example, the random forest results indicated that LAeq, 08-14 or NAT for night time (≥75 <80 dB (A)) are even better predictors compared to LAeq, 06-22, which could improve future regression models. Surprisingly, only the aircraft noise exposure metrics were important, but not the road and railway traffic noise exposure variables. It was also found that only aircraft noise exposure metrics for the home address are important predictors, whereas aircraft noise exposure metrics for the school address are less important predictors of the outcome intake of medically prescribed drug . These results show that random forest models can serve as a tool to identify the (relative) importance of predictors that could not be included in one and the same regression model. To avoid multicollinearity, a multitude of sequentially conducted regressions would have been necessary, which, however, would have increased type I error.

Although the random forest approach was able to identify aircraft noise exposure as an important predictor of intake of medically prescribed drug , the comparison with theory-based results also indicated potential weaknesses of the approach. We compared the random forest results to those of previously reported theory-based regression analyses of the NORAH data which had found the variables degree of urbanization and degree of imperviousness to serve as moderators of aircraft noise effects [7]. However, these moderation effects were not reflected by the results of the overall interaction strength in the random forest analyses. Obviously, theory-based research continues to be important in order to reveal effects that cannot be identified by exploratory methods.

Taken together, random forest analyses can be used to holistically identify important predictors in large data sets without having to consider statistical limitations of regression models. However, beside predictor importance, regression models remain the preferred method for obtaining statements about the statistical significance of a predictor. As an outlook and area where random forest and regression models may complement, future analyses can benefit from the ability of random forest models to explore and map non-linear relationships. So-called “partial dependence plots” for each individual predictor can be used in order to provide a better understanding of potential turning points in non-linear relationships. 5. ACKNOWLEDGEMENTS

This study is part of the NORAH research project and the EU Horizon 2020 project EqualLife. NORAH is commissioned by the Environment & Community Center / Forum Airport & Region, Kelsterbach, Germany. EqualLife is funded by the EU commission Horizon 2020, grant agreement ID 874583. 6. REFERENCES

1. Stansfeld, S. & Clark, C. Health Effects of Noise Exposure in Children. Current Environmental

Health Reports , 2 , 171-178 (2015). doi: 10.1007/s40572-015-0044-1 2. van Kamp, I. & Davies, H. Noise and health in vulnerable groups: A review. Noise & Health ,

15 , 153–159 (2013). doi: 10.4103/1463-1741.112361 3. Clark, C. & Paunovic. K. WHO Environmental Noise Guidelines for the European Region: A

Systematic Review on Environmental Noise and Quality of Life, Wellbeing and Mental Health. Int J Environ ResPublic Health , 15(11) , (2018). 4. van Kamp, I., Persson Waye, K.; Kanninen, K., Gulliver, J., Bozzon, A., Psyllidis, A … Early

environmental quality and life-course mental health effects: The Equal-Life project. Environmental Epidemiology , 6 , e183 (2021). 5. Breiman, L. Random Forests. Machine Learning, 45, 5-32 (2001).

https://doi.org/10.1023/A:1010933404324 6. Klatte, M., Spilski, J., Mayerl, J., Möhler, U., Lachmann, T., Bergström, K. Effects of aircraft

noise on reading and Quality of life in primary school children in Germany: Results from the NORAH study. Environment and Behavior, 49(4) , 390–424 (2017).

7. Spilski, J., Rumberg, M., Berchterhold, M., Bergström, K., Möhler, U., Kurth, D., Lachmann, T.,

& Klatte, M. Effects of aircraft noise and living environment on children's wellbeing and health. Proceedings of the 23rd International Congress on Accoustics pp.7080-7087. Aachen, Germany, September 2019. doi: 10.18154/RWTH-CONV-239176 8. Möhler, U., Liepert, M., Mühlbacher, M., Beronius, A., Nunberger, M., Braunstein, G. & Bartel,

R. Verkehrslärmwirkungen im Flughafenumfeld: Erfassung der Verkehrsgeräuschexposition [Effects of traffic noise in areas surrounding airports: Report of methods of noise exposure assessment]. Retrieved from http://www.laermstudie.de/ergebnisse/basismodul-akustik/. 2014). Available from: URL: www.laermstudie.de/ergebnisse/basismodul-akustik. 9. VDI 2719. (1987). Schalldämmung von Fenstern und deren Zusatzeinrichtungen [Sound isolation

of windows and their auxiliary equipment]. Berlin, Germany: Beuth. 10. DIN 18041. (2004). Hörsamkeit in kleinen bis mittelgroßen Räumen [Acoustic quality in small

to medium-sized rooms]. Berlin, Germany: Beuth . 11. Klatte, M., Bergström, K., Spilski, J., Mayerl, J. & Meis, M. Wirkungen chronischer

Fluglärmbelastung auf kognitive Leistungen und Lebensqualität bei Grundschulkindern Endbericht [Effects of chronic aircraft noise exposure on cognitive performance and quality of life in primary school children Final report]”, NORAH Verkehrslärmwirkungen im Flughafenumfeld–Endbericht (2014). 12. Spilski, J., Bergström, K., Mayerl, J., Möhler, U., Lachmann, T. & Klatte, M. Do we need different

metrics to predict the effects of aircraft noise on children’s well-being and health? Proceedings of INTER-NOISE 2020 . Seoul, Republic of Korea, August 2020 . 13. Ravens-Sieberer, U., Klasen, F., Bichmann, H., Otto, C., Quitmann, J. & Bullinger, M. Erfassung

der gesundheitsbezogenen Lebensqualität von Kindern und Jugendlichen [Measuring the health related quality of life of children and adolescents]. Gesundheitswesen , 75 (10) , 667–78 (2013). 14. Ising, H., Pleines, F. & Meis, M. Beeinflussung der Lebensqualität von Kindern durch

militärischen Fluglärm. Umweltbundesamt, Institut für Wasser-, Boden- und Lufthygiene, [Influence of military aircraft noise on the quality of life of children. Federal Environment Agency, Institute for Water, Soil and Air Hygiene] BaWoLu-Hefte, 5 (1999). 15. Breiman, L., & Cutler, A. Package 'randomForest', v.4.7-1. Breiman and Cutler's Random

Forests for Classification and Regression (2022). https://cran.r- project.org/web/packages/randomForest/randomForest.pdf (28.04.2022). 16. Molnar, C., & Schratz, P. (2020). Package 'iml', v.0.10.1. Interpretable Machine Learning

(2020). https://cran.r-project.org/web/packages/iml/iml.pdf (28.04.2022). 17. Hothorn, T., Seibold, H., & Zeileis, A. Package 'partykit', v.1.2-15. A Toolkit for Recursive

Partytioning (2021). https://cran.r-project.org/web/packages/partykit/partykit.pdf (28.04.2022). 18. Louppe, G., Wehenkel, L., Sutera, A., & Geurts, P. Understanding variable importances in

forests of randomized trees. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K.Q. Weinberger (Ed.): Advances in Neural Information Processing Systems , 26 (2013). 19. James, G., Witten, D., Hastie, T., & Tibshirani, R. An Introduction to Statistical Learning.

Springer. pp. 316–321 (2013). 20. Han, H., Guo, X. & Hu, H. Variable selection using Mean Decrease Accuracy and Mean

Decrease Gini based on Random Forest. 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 219-224 (2016). doi: 10.1109/ICSESS.2016.7883053. 21. Hayes, A. F. Introduction to mediation, moderation, and conditional process analysis: A

regression-based approach. 2 ed. New York, N.Y.: The Guilford Press (2018).