Welcome to the new IOA website! Please reset your password to access your account.

Polynomial Chaos-Based Procedural Generation of Synthetic Training Data in Machine Learning for Automated Acoustic Monitoring

Ömer F. Yildiz 1 , Sören Keuchel 2 , Olgierd Zaleski 3

Novicos GmbH Veritaskai 8, 21079 Hamburg, Germany

Peter Gross 4 , Julian B. Storch 5 , Matthias Weigold 6

Institute of Production Management, Technology and Machine Tools Otto-Berndt-Straße 2, 64287 Darmstadt, Germany

ABSTRACT In additive manufacturing such as powder bed fusion the acoustic monitoring taking care of timely process termination in case of failure is commonly achieved by ear and therefore highly susceptible to human bias. Solutions based on machine learning algorithms need large datasets for training purposes which are not readily available. Additionally, capturing high-quality audio samples and providing respective material parts are expensive both in terms of time and cost. To overcome this problem, this work proposes a method by which the required synthetic datasets are obtained by way of procedural generation. Here, synthetic data implies the substitution of measured audio data by equivalent virtual and artificial samples from 3D acoustic simulations. In order to cover process variations as well as consider the variability of multiple input parameters, a design-of-experiments based on the theory of generalized polynomial chaos is conducted. Additionally, the polynomial chaos method is extended through use of a decision tree so that the prevalence of specific critical events may be accounted for.

1. INTRODUCTION

Additive manufacturing processes, such as powder bed fusion (PBF) with a laser beam on metal, enable the construction of complex structures directly without further assembly [1]. PBF puts less constraints on the final geometry of components and is therefore a popular choice for lightweight construction. The involved melting and solidifying mechanisms result in defects which prevent fully exhausting the potential of PBF. These defects include tears and fissures [2] due to high temperature

1 yildiz@novicos.de

2 keuchel@novicos.de

3 zaleski@novicos.de

4 p.gross@ptw.tu-darmstadt.de

5 julianbastin.storch@stud.tu-darmstadt.de

6 m.weigold@ptw.tu-darmstadt.de

a slaty. inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS O ¥, ? GLASGOW

gradients and high internal stress, delamination [3] due to insu ffi cient and poor connection during the layering process, thermally induced warpage [4] appearing particularly in areas of low sti ff ness, and plastic deformation caused by collision with the coater. Other defects such as porosity and balling [5] may not be as serious or critical, but do not satisfy quality requirements as demanded aerospace and aviation, thus justifying early termination of the process nonetheless. Figure 1 contains cross- sectional photographs for some of the exemplary material defects occurring during PBF. To save

(a)

(b)

(c)

(d)

Figure 1: Cross-sectional photograph of exemplary material defects occurring during PBF: (a) tears and fissures, (b) delamination, (c) pores, and (d) balling.

resources and reduce rejections, process and quality control systems are deployed which typically rely on optical sensors [6–8]. Apart from reliability issues [9], the control systems are usually not automated and require machine operators to decide on whether or not the process should be stopped. Experienced operators can perceive deviations from common system noise which, in turn, motivates the engineering of an acoustics-based control. To decrease human subjectivity, the authors aim to develop a control system that is based on a machine learning (ML) algorithm trained specifically for the purpose of automated acoustic monitoring. The success of any ML algorithm is determined by both quality and quantity of appropriate training data which in this case translates into accurate measurement of an extraordinarily large set of audio samples [10]. Naturally, there too few adequate datasets [11–13]. Alternatively, one may turn to synthetic training data, meaning data obtained numerically from acoustics simulation of equivalent three-dimensional (3D) models using established state-of-the-art methods such as boundary element method (BEM) or finite element method (FEM). This work therefore focuses on how a suitable design of experiments (DoE) can be generated in a procedural or systematic manner and is structured as follows: In Section 2, polynomial chaos expansion (PCE) is introduced and extended by a decision tree that together serve as a basis for procedural generation of a DoE. In Section 3, an example is shown on how the proposed method can be applied to an analytical, auxiliary function. Finally, in Section 4 conclusions are drawn and an outlook on future research is given.

2. GENERALIZED POLYNOMIAL CHAOS

For synthetic datasets to supplement or replace measured ones, they need to satisfy several conditions to be considered useful in the context of ML algorithms. One, the data must contain a diverse set of errors and defects, process and environmental parameters, and a reasonable share of failed as well as successful PBF runs. Second, the training data must reflect the prevalence of certain parameters

over others as well as their variability, since the defects do not occur every time and vary in intensity. Lastly, considering the limited and finite computational resources, one is pressured to simulate those cases that best describe the internal dynamics. Each item in such dataset can be seen as part of a DoE that is procedurally generated. For this purpose, a PCE-based approach is proposed.

2.1. Polynomial Chaos Expansion The PCE method originated in the domain of uncertainty quantification (UQ) and has been widely used for non-intrusive analysis of stochastic systems subject to random inputs [14–16]. As such, it is primarily used to quantify the variability of the system output f ( ξ ) with ξ = ( ξ 1 , . . . , ξ N ) as the vector of N input random variables ξ i . Extending the theory for the multivariate case is straightforward, which is why, for the sake of clarity, PCE shall be introduced using only the univariate case. If the distribution of the input random variable ξ is known, then PCE can be understood as the best polynomial approximation of the real output f ( ξ ):

P X

i = 0 ˆ f i φ i ( ξ ) (1)

f ( ξ ) =

Here, P is the order of approximation, φ i ( · ) is the orthogonal basis function, and ˆ f i are the PCE coe ffi cients that describe the stochastic properties of f ( ξ ). For example, once the coe ffi cients ˆ f i are determined, any statistical metric including expectation, variance, global and local sensitivities, skewness, and kurtosis are readily obtained. In the case of the expectation, it is simply equal to the first coe ffi cient, meaning E ( f ( ξ )) = ˆ f 0 , and the remaining metrics are obtained in a similar fashion. As such, the PCE coe ffi cient are equivalent to Fourier coe ffi cients, but instead of representing time- periodic signals in the frequency domain, these coe ffi cients can be said to be representations in the “stochastic” domain. The basis functions φ i are determined by the distribution of ξ : In the case of a Gaussian distributed random variable, i.e. ξ ∼N ( µ, σ ) with mean µ and standard deviation σ , Hermite polynomials must be chose for an optimal fit. On the other hand, should the input be uniformly distributed, i.e. ξ ∼U ( a , b ) with lower and upper bounds a and b , respectively, Legendre polynomials are used. The two families of polynomials are depicted in Figure 2a and Figure 2b, respectively. In any case, these

Hermite Polynomials

Legendre Polynomials

Growth of Qudrature Nodes

5 . 0

1 . 0

10 7

Tensor Product, P = 3 Sparse Grids, P = 3 Tensor Product, P = 4 Sparse Grids, P = 4

10 6

2 . 5

0 . 5

10 5

Node Count

10 4

H i

L i

0 . 0

0 . 0

H 0 H 1 H 2 H 3 H 4 H 5

L 0 L 1 L 2 L 3 L 4 L 5

10 3

− 2 . 5

− 0 . 5

10 2

10 1

− 5 . 0

− 1 . 0

− 5 . 0 − 2 . 5 0 . 0 2 . 5 5 . 0 ξ

− 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 ξ

2 4 6 8 10 N

(a)

(b)

(c)

Figure 2: Overview of the first six (a) Hermite polynomials, and (b) Legendre polynomials for a Gaussian and uniformly distributed random variable ξ , respectively. (c) Growth of the number of quadrature nodes ξ k with order of approximation P and number of random variables N .

polynomial basis functions are always orthogonal with respect to the inner product, meaning

⟨ φ m ( ξ ) , φ n ( ξ ) ⟩ = Z

φ m ( ξ ) φ n ( ξ ) ρ ( ξ ) d ξ (2)

= γ m δ mn , (3)

where Ω is the support, ρ ( ξ ) is probability density function (PDF) of the input random variable, δ mn is the Kronecker delta, and γ m is the univariate norm and can be precomputed. For Gaussian distributions γ m = m ! and uniform distributions γ m = 1 / (2 m + 1). Essentially, the only unknowns are the PCE coe ffi cients ˆ f i of Equation 1. Obtaining the coe ffi cients requires projection onto the orthogonal basis, i.e.

ˆ f i = ⟨ f ( ξ ) , φ i ( ξ ) ⟩

⟨ φ i ( ξ ) , φ i ( ξ ) ⟩ (4)

Z

= 1

f ( ξ ) φ i ( ξ ) ρ ( ξ ) d ξ (5)

γ i

P + 1 X

≈ 1

k = 1 f ( ξ k ) φ i ( ξ k ) ω k . (6)

γ i

In this case, f ( · ) describes the output of 3D acoustics simulation so that the integral in Equation 5 cannot be solved analytically and must be replaced the numerical integration of Equation 6. In the univariate case, su ffi cient accuracy is obtained by P + 1 quadrature nodes ξ k and weights ω k . However, in the multivariate case the number of nodes increases exponentially with the number input random variable like ( P + 1) N , making the evaluation of ˆ f i costly. This is shown in Figure 2c for an increasing number of variables N . Luckily, we neither need to evaluate the coe ffi cients of Equation 4 nor the full PCE representation of f ( · ) in Equation 1, but only need to extract the nodes ξ k and weights ω k for the procedural generation of training data.

2.2. Procedural Generation of Training Data Procedural generation refers to an automated method of designing DoEs around specific 3D acoustics simulations. This takes into account which combination of errors and defects together with their corresponding severity shall be included in the final simulation model. In turn, the simulation provide the synthetic audio data to train the ML algorithm with. Once the errors and defects ξ = ( ξ 1 , . . . , ξ N ) to be modeled is known, PCE immediately provides a DoE, which is comprised of the full set of quadrature nodes { ξ k } ( P + 1) N

1 of Equation 6 in the multivariate case. However, not all but the important nodes have to be simulated to capture the system behavior: This is achieved by ranking the nodes ξ k by their corresponding multivariate weights ω k in descending order. Figure 3a, Figure 3b, and Figure 3c show the full set of nodes as well as the corresponding Smolyak sparse grid for increasing orders of approximation P and two stochastically independent random variables ξ 1 ∼N (0 , 1) and ξ 2 ∼U ( − 1 , 1). As can be seen in all plots of Figure 3, the location of nodes are determined by the distribution of ξ 1 and ξ 2 . Furthermore, the size of the glyphs which mark the location nodes is scaled linearly by the magnitude of the corresponding weights.

2.3. Extension via Decision Trees

The individual nodes in the set { ξ k } ( P + 1) N

1 of the DoE are chosen in accordance to the distribution of the corresponding input random variables ξ = ( ξ 1 , . . . , ξ N ). Since not every defect occurs every time during PBF, they may not have to be considered in the DoE. Additionally, certain defects cannot be modeled using continuous distributions, but are rather binary in nature. For example, delamination can be understood as a binary event wherein layers are either touching or not touching. To solve this problem, the authors proposes a decision tree as outlined in Figure 4. On the left side of Figure

N = 2, P = 3

N = 2, P = 4

N = 2, P = 5

1 . 00

0 . 75

0 . 75

0 . 75

0 . 50

0 . 50

0 . 50

0 . 25

0 . 25

0 . 25

ξ 2 ∼U ( − 1 , 1)

ξ 2 ∼U ( − 1 , 1)

ξ 2 ∼U ( − 1 , 1)

0 . 00

0 . 00

0 . 00

− 0 . 25

− 0 . 25

− 0 . 25

− 0 . 50

− 0 . 50

− 0 . 50

Quadrature Nodes Sparse Grids: Positive Weights Sparse Grids: Negative Weights

− 0 . 75

− 0 . 75

− 0 . 75

− 1 . 00

− 2 − 1 0 1 2 ξ 1 ∼N (0 , 1)

− 2 0 2 ξ 1 ∼N (0 , 1)

− 2 0 2 ξ 1 ∼N (0 , 1)

(a)

(b)

(c)

Figure 3: Quadrature nodes for N = 2 stochastically independent random variables ξ 1 ∼N (0 , 1) and ξ 2 ∼U ( − 1 , 1) for di ff ering orders of approximation (a) P = 3, (b) P = 4, (c) P = 5.

. . .

defect 1A p = 0 . 05 ξ 1 A ∼U

defect 2A p = 0 . 1 ξ 2 A ∼N

defect 3A p = 0 . 05 ξ 3 A ∼N

delam. p = 0 . 01

. . .

fissure p = 0 . 1 ξ F ∼N

defect 1B p = 0 . 15 ξ 1 B ∼N

defect 2B p = 0 . 1 ξ 2 B ∼U

defect 3B p = 0 . 075 ξ 3 B ∼U

pressure p = 1 . 0 ξ P ∼N . . .

temp. p = 1 . 0 ξ T ∼N

volume p = 1 . 0 ξ V ∼U

humidity p = 1 . 0 ξ H ∼N

...

...

...

...

...

. . .

defect 1C p = 0 . 2 ξ 1 C ∼N

defect 2C p = 0 . 55 ξ 2 C ∼U

defect 3C p = 0 . 4 ξ 3 C ∼U

none p = 0 . 85

Figure 4: Decision tree for procedural generation of multiple DoEs with di ff erent parameters. Green items represent environmental parameters that are always present, red ones critical parameters that decide whether the PBF is successful, and blue ones are common defects that impair the quality.

4, colored in green, are the so-called environmental parameters that are always present such as the volume of the product, the average temperature in the chamber, the pressure level, or the humidity. The environmental parameters can be modeled using continuous distributions, e.g. ξ V ∼U or ξ T ∼N . Marked red are the critical parameters that would characterize successful or failed PBF runs. Lastly, the blue items represent common non-critical defects that impair the quality of the PBF to varying degrees. The decision tree is applied in the following sense: Starting on the very left item, only movements down the decision tree is allowed, meaning a lateral movements to an adjacent column of Figure 4. In the case of a fork, the next item is chosen in accordance to their prevalence with p ∈ [0 , 1]. Thus, running through to the bottom of the decision tree returns a DoE. For ML purposes, multiple run-throughs of the tree are required, each of them returning a DoE with the corresponding set of simulation cases { ξ k } ( P + 1) N

1 .

3. EXAMPLE: THE ISHIGAMI FUNCTION

The Ishigami function is an analytical function which has been widely used as a benchmarking tool for sensitivity analysis due to its non-monotonicity and non-linearity [17]. It is defined as

I ab ( ξ ) = sin( ξ 1 ) + a sin 2 ( ξ 2 ) + b ξ 4 3 sin( ξ 1 ) , (7)

with typical values being a = 7 and b = 1. Here, ξ = ( ξ 1 , ξ 2 , ξ 3 ) is a vector of three stochastically independent and identically distributed random variable with ξ i ∼U ( − π, π ). In a future real scenario, the Equation 7 is of course to be replaced by the 3D acoustics simulation with many more input parameters. Figure 5a shows the fast convergence of PCE with appreciable results even for P = 4 as evidenced by comparing the corresponding PDF with the established Monte Carlo simulation (MCS) method. For P = 4 and N = 3 the number nodes becomes ( P + 1) N = 125 which is smaller by three orders of magnitude than the number of MCS nodes, i.e. 125 < N MCS = 2 × 10 5 . The fast convergence

Probability Density Function

Cumulative Distribution Function

Stochastic Collocation

0 . 20

1 . 0

15

P = 2 P = 4 P = 6 P = 8 MCS

P = 2 P = 4 P = 6 P = 8 MCS

true 10% 20% 40% 60%

0 . 8

10

0 . 15

0 . 6

5

I ab

ρ

0 . 10

0 . 4

0

0 . 05

− 5

0 . 2

− 10

0 . 00

0 . 0

− 10 0 10 I ab

− 10 0 10 I ab

− 10 0 10 I ab

(a)

(b)

(c)

Figure 5: a

of PCE is also clearly seen in the cumulative distribution function (CDF) of Figure 5b. As mentioned in the previous chapter though, computing the PCE is not necessary for training the ML algorithm. This example exists only to demonstrate the impact of carefully choosing the nodes of the DoE. However, computing the PCE may still have merit and be advantageous for ML purposes. Consider the example where the only some of the nodes, i.e. the nodes corresponding to largest weights as shown in Figure 3c, have already been computed. Then, instead of using Equation 6, the PCE can be obtained via statistical regression using the already simulated nodes. The typical formulation Ax = b only needs to be translated using PCE:

ˆ f 0 ˆ f 1 ... ˆ f P



 | {z } = A



 |{z} = x



 | {z } = b

φ 0 ( ξ 1 ) φ 1 ( ξ 1 ) . . . φ P ( ξ 1 )

f ( ξ 1 )

φ 0 ( ξ 2 ) φ 1 ( ξ 2 ) . . . φ P ( ξ 2 ) ... ... ... ...

f ( ξ 2 ) ...

(8)

φ 0 ( ξ P + 1 ) φ 1 ( ξ P + 1 ) . . . φ P ( ξ P + 1 )

f ( ξ P + 1 )

The results when using only 10%, 20%, 40%, and 60% of all nodes with an order of P = 8 are given in Figure 5c. Little more than half of all nodes already returns a PCE which is a good approximation of the true Ishigami function. It is conceivable that such an approximation could then function as a surrogate model and replace the 3D acoustics simulation altogether, which would accelerate obtaining the final training dataset even further.

4. CONCLUSIONS & OUTLOOK

The theory of generalized polynomial chaos (PC) allows for a comprehensive, yet straightforward way of designing a DoE for 3D acoustical model subject to input uncertainties with varying prevalence. PCE is sophisticated in the sense that it takes into consideration the variability of multiple input parameters and selects only the most important parameter values according to a

specific quadrature rule. In contrast to MCS, PCE requires far fewer nodes to accurately capture the stochastic properties of the output function and gives greater insight into the model at hand since more statistical metrics are readily available. The ability to construct intermediate surrogate models via stochastic collocation method (SCM) that are an approximation of the full surrogate model using stochastic galerkin matching (SGM) and pseudo-spectral projection is also useful. Additionally, the employed decision tree alleviates some of the shortcomings of PCE over MCS, particularly if the number of random input variables happens to become very large during the design phase of 3D acoustical model. However, some points remain unclear that justify further investigation: First, it remains to be proven how well the the proposed procedural generation of synthetic data actually performs when training the specific ML algorithm. Second, a subject of future discussion must include an analysis of the inevitable discrepancies between di ff erent algorithms. Third, the examples given in this work treated the inputs as stochastically independent from another. Naturally, as the knowledge about the noise occurring in acoustic monitoring grows, the PCE method may be extended to include stochastically dependent random variables using copulas.

ACKNOWLEDGEMENTS

The authors appreciate and would like to thank Holger Merschroth of the Institute of Production Management, Technology and Machine Tools for his support.

REFERENCES

[1] M. Leary, L. Merly, F. Torti, M. Mazur, and M. Brandt. Optimal topology for additive manufacture: A method for enabling additive manufacture of support-free optimal structures. Materials & Design , 63:678–690, 2014. [2] B. Vrancken. Study of Residual Stresses in Selective Laser Melting . Dissertation, Catholic University of Leuven, Leuven, Belgium, 2016. [3] L. Thijs, F. Verhaeghe, T. Craeghs, J. V. Humbeeck, and J.-P. Kruth. A study of the microstructural evolution during selective laser melting of Ti-6Al-4V. Acta Materialia , 58(9):3303–3312, 2010. [4] J.-P. Kruth, M. Badrossamay, E. Yasa, J. Deckers, L. Thijs, and J. V. Humbeeck. Part and material properties in selective laser melting of metals. 16th International Symposium on Electromachining, ISEM 2010 , 2010. [5] S. K. Everton, M. Hirsch, P. Stravroulakis, R. K. Leach, and A. T. Clare. Review of in-situ process monitoring and in-situ metrology for metal additive manufacturing. Materials & Design , 95:431–445, 2016. [6] B. K. Foster, E. W. Reutzel, A. R. Nassar, B. T. Hall, S. W. Brown, and C. J. Dickman. Optical, layerwise monitoring of powder bed fusion. 26th Annual International Solid Freeform Fabrication Symposium - An Additive Manufacturing Conference, SFF 2015 , pages 295–307, 2020. [7] G. Tapia and A. Elwany. A review on process monitoring and control in metal-based additive manufacturing. Journal of Manufacturing Science and Engineering , 136(6), 2014. [8] T. Craeghs, S. Clijsters, J. P. Kruth, F. Bechmann, and M. Ebert. Detection of process failures in layerwise laser meling with optical process monitoring. Physics Procedia , 39:753–759, 2012. [9] H. Krauss. Qualitätssicherung beim Laserstrahlschmelzen durch schichtweise thermografische In-Process-Überwachung . Dissertation, Technische Universität München, München, 2016. [10] H. Purohit, R. Tanabe, K. Ichige, T. Endo, Y. Nikaido, K. Suefusa, and Y. Kawaguchi. MIMII dataset: Sound dataset for malfunctioning industrial machine investigation and inspection. 2019.

[11] M. He and D. He. Deep learning based approach for bearing fault diagnosis. IEEE Transactions on Industry Applications , 53(3):3057–3065, 2017. [12] W. Ren, G. Wen, S. Liu, Z. Yang, B. Xu, and Z. Zhang. Seam penetration recognition for GTAW using convolutional neural network based on time-frequency image of arc sound. 2018 IEEE 23rd International Conference on Emerging Technologies and Factory Automation (ETFA) , 2018. [13] S. Grollmisch, D. Johnson, and J. Liebetrau. Visualizing neural network decisions for industrial sound analysis. SMSI 2020 - Measurement Science , 2020. [14] D. Xiu. Numerical Methods for Stochastic Computations: A Spectral Method Approach . Princeton University press, 2010. [15] T. J. Sullivan. Introduction to Uncertainty Quantification . Springer International Publishing AG Switzerland, 2015. [16] J. Feinberg and H. P. Langtangen. Chaospy: An open source tool for designing methods of uncertainty quantification. Journal of Computational Science , 11:46–57, 2015. [17] T. Ishigami and T. Homma. An importance quantification technique in uncertainty analysis for computer models. First International Symposium on Uncertainty Modeling and Analysis , 1990.