A A A Volume : 44 Part : 2 Equivalent source method based near-field acoustic holography using machine learning S.K., Chaitanya 1 Indian Institute of Technology Madras Chennai, India Siddharth Sriraman 2 SSN College of Engineering Chennai, India Srinath Srinivasan 3 SSN College of Engineering Chennai, India K Srinivasan 4 Indian Institute of Technology Madras Chennai, IndiaABSTRACT The equivalent source method has been a commonly used method for sound source localization. It involves equivalent sources spread over the source plane (or region). The pressure fields from these equivalent sources are usually planar, cylindrical, or spherical harmonics. But, these harmonic fields are derived for the Sommerfeld boundary condition with no reflection or reverberation. Data- driven methods can help perform sound source localization in a reverberant environment when no prior information about the surroundings is available. In this paper, the performance of linear regression (LR) with Adam, linear regression with L-BFGS, and multilayer perceptron (MLP) with one and two hidden layers are studied. The simulations are conducted for two monopoles in rooms with different absorption coefficients and compared with one norm convex optimization (L1CVX). It is observed that overall, LR with L-BFGS gave the best results. Also, for low reverberation time room, LR with L-BFGS was able to localize the sources better than L1CVX.1. INTRODUCTIONSound source localization and characterization have great importance in noise control and developing silent machines. The field broadly uses two different approaches for sound source localization, namely, beamforming [1], [2] and acoustic holography [3], [4]. Both the methods involve using an array of microphones to measure the acoustic pressure. The phase lag between the microphones is used to localize the source in beamforming, but the method cannot predict the1 chaitanya.acharya.007@gmail.com 2 sid.sriraman5@gmail.com 3 srinath.ksrini@gmail.com 4 ksri@iitm.ac.in source strength [1]. Whereas in acoustic holography, the pressure measurements are used to localize and predict the acoustic sources' source strength [5]. The acoustic holography method got a boost from Fourier methods [6], and recently equivalent source methods [7], [8] have become popular in sound source localization.It is to be noted that a Fourier transformation should be available for the geometry under study for the Fourier-based methods. Although several methods have been suggested for irregularly shaped objects, they involve other methods like the inverse boundary element method [9]. The Fourier methods also limit all the major sources to be within the array and truncation errors [3]. These complexities have made ESM a more favorable alternative that uses an array of equivalent sources (generally monopoles) to represent a sound field. The ESM methods mostly use spherical harmonics derived from the Sommerfeld boundary condition. These approximations are not valid when applied in rooms that have reflections.Machine learning [10] has found great application in recent years in multiple fields because of its ability to relate multiple features [11], [12]. Therefore, machine learning models can be used to relate position, source strengths, and measured pressure. In that regard, linear regression and MLP with 1 and 2 hidden layers are studied.In the current study, we aim • To study the performance of machine learning methods, linear regression, MLP with 1hidden layer, and MLP with 2 hidden layers. • To study the effect of the reverberation time (by varying the sound absorption coefficients)on the reconstruction of pressure field from multiple monopoles. • Compare the results with L1CVX’s. 2. THEORY According to the equivalent source method, an arbitrary sound field can be expressed as a net summation of fields from elementary sound sources as𝐼𝜓 𝑖 ( 𝒓, 𝒓 𝟎 ), (1)𝑝(𝒓, 𝜔) = 𝜌𝑐∑𝑓(𝒓 𝟎 )𝑖=1where 𝑝 is the pressure, 𝑓 is the source strength, 𝜓 𝑖 is the i th basis field, 𝜌 is the density of the medium, 𝑐 is the speed of sound in the medium, 𝒓 is the coordinates of the measurement points, 𝒓 𝟎 is the coordinates of the sources. The sound fields from the elementary sound sources can be planar, cylindrical, or spherical harmonics. The most commonly used elementary sound source is a monopole, a spherical harmonic as given by Equation (2)𝝍(𝒓, 𝜽, 𝝋) = ℎ 𝑛 (𝑘𝒓) 𝑌 𝑛 𝑙 (𝜽, 𝝋), (2)𝑙 is spherical harmonic of order 𝑙 and degree 𝑛 . It can be expressed as a linear system of equations aswhere ℎ 𝑛 (𝑘𝒓) is spherical Hankel function of order 𝑛 and parameter 𝑘 , 𝑌 𝑛𝒑 𝒎𝒆𝒔 = 𝑨 𝒎𝒆𝒔 𝒇, (3)where 𝒑 𝒎𝒆𝒔 is the measured pressure vector, 𝑨 𝒎𝒆𝒔 is a matrix of basis field-induced pressure on the measurement points, and 𝒇 is the field weight. It should be noted that the spherical harmonic equation is derived for the Sommerfeld boundary condition with no reflection. This approximation fails when we try to apply ESM in closed rooms. Machine learning methods can help relate the source strengths and the measured pressure in such situations. The inverse problem can be written as𝒇 ̂ = 𝑵𝑵( 𝒑 𝒎𝒆𝒔 ) . (4) And the error in reconstruction is calculated as𝒑 𝒆𝒓𝒓 = |𝒑 𝒓𝒆𝒄 − 𝒑 𝒕𝒉𝒆𝒐 |, (5)where 𝒑 𝒓𝒆𝒄 is the SPL calculated using the equivalent sources at reconstruction points and 𝒑 𝒕𝒉𝒆𝒐 is the SPL calculated at reconstruction points theoretically. 2.1. Linear Regression Since, in ESM, the field weights and the pressure fields are related linearly, linear regression should be a good approximation for the parameters. The regression coefficients relating the field weights and the pressure fields are as shown in𝒇 ̂ = 𝑾 𝒑 𝒎𝒆𝒔 + 𝒃 , (6)with 𝒇 ̂ being the quantitative response, 𝒑 𝒎𝒆𝒔 is the predictor variable, 𝑾 is the regression coefficients or field weights, and 𝒃 is the bias. The training data is used to calculate the regression coefficients and bias based on the ridge loss function The training data comprises field weights and pressure fields the equivalent sources generate. Figure 1a shows the network configuration of the linear regression model. Equation (6) can be solved by optimization techniques like gradient descent and conjugate gradient. This paper uses a first-order optimizer (Adam) and a second-order optimizer (L-BFGS). The first-order optimizers search for the direction of gradient descent and proceed in that direction in each step. In comparison, the second-order optimizers use the curvature information along with the gradient descent direction to reach the optimization point faster. However, due to inverse operations present in second-order optimization, they are usually expensive. 2.2. Multilayer perceptron MLP is formulated similar to LR; however, the linear relation between the field weights and measured pressure is replaced with alternating linear transformations and nonlinear activation functions applied sequentially, called hidden layers. Similar to the linear regression model, the training data for MLP comprises field weights and pressure fields that are generated by the equivalent sources. Figure 1b shows the network configuration of the multilayer perceptron model. The models were implemented and tested in Python, using the scikit-learn [13] library for linear regression and the TensorFlow [14] library for the multilayer perceptron.f 1f 1p 1p 1f 2f 2p 2p 2f 3f 3p 2mp 2mf 2nf 2nOutput LayerInput LayerHidden LayerInput LayerOutput Layer(a) (b) Figure 1: (a) model of linear regression and (b) model of multilayer perceptron. 3. NUMERICAL SIMULATION In order to study the effect of acoustic reflection on reconstruction, a room of size 3×4×5 m 3 is chosen. The wall absorption coefficients (alpha) are varied to obtain various reverberation times using the Sabine formula, as shown in Table 1.Table 1: Reverberation time of the roomsSl. no Absorption co- efficient (alpha)Reverberation time(s)1 0.15 0.68512 0.5 0.20553 0.95 0.1082The two monopoles are placed at (+0.1, +0.1) m and ( − 0.1, − 0.1 ) m in the room. The equivalent source array and the measurement array are shown in Figure 2. The measurement points are distributed with an average spacing of 0.073 m, and the equivalent sources are distributed with an average spacing of 0.025 m. The distance between the measurement plane and the equivalent source plane is 0.18 m. The general arrangement of the measurement plane, equivalent source plane, and reconstruction planes are shown in Figure 3. The training data for different rooms are generated by simulating the pressure field on the measurement plane produced by multiple sources of random strengths in the equivalent source plane using COMSOL. Fifty thousand examples were generated, of which forty thousand were used for training, and the remaining ten thousand were used for validation. The performance impact of the training dataset size is evaluated by studying the reconstruction error in the validation data. The performance of the methods at different source frequencies is also studied.0.20.10-0.1-0.2-0.2 -0.1 0 0.1 0.2(b) (a)Figure 2: (a) 30 pressure measurement points on the measurement plane and (b) monopole sources on the equivalent source plane.0.02 m 0.05 mSound SourceMeasurement planeEquivalent source planeReconstruction planeFigure 3: Schematic representation of the source, equivalent source plane, reconstruction plane, and the measurement plane. 4. RESULTS 4.1. Optimization of training data set size Machine learning models perform better with more data. However, it becomes computationally more expensive to train a large amount of data. Therefore, it is essential to find the optimal size of the data size required to train the model. One of the ways to find the optimum size of the training data is by measuring errors for different training data sizes. Figures 4a and 4b show the SPL error plots for the testing data and training time using machine learning models with varying training dataset sizes, respectively. It is observed that the errors flattened after a training set size of 10,000 examples. Therefore, 10,000 training examples should suffice in training the model. Since LR with L-BFGS uses second-order optimization, it showed the lowest errors and highest training time. MLP (DNN) methods with extra hidden layers require optimization of more parameters leading to higher errors observed. This is often termed as ‘curse of dimensionality.’(a) (b)Figure 4: (a) Mean and standard deviation of pressure reconstruction error v/s number of training examples and (b) training time v/s number of training examples for machine learning models.4.2 Results of two monopole simulation Two monopoles were placed at (+0.1, +0.1) m and ( − 0.1, − 0.1 ) m in a room, and the measurement plane was 0.18 m away from the equivalent source plane. Figure 5 shows the source strengths reconstructed from the machine learning methods and L1CVX for rooms with different absorption coefficients. Only results of 500 Hz are shown for brevity. It is observed that L1CVX could not localize the sources well in the presence of reflection, and as the absorption coefficient decreased, the performance of L1CVX also decreased. This is due to more reflection caused by less absorptive walls. Out of all the machine learning methods, LR with L-BFGS performed the best as we can see higher source strengths around the source positions. Figure 6 shows the pressure field reconstructed using different methods compared with the simulated pressure field from COMSOL at 500 Hz. It is also observed that the pressure field from L1CVX and LR with L-BFGS were closest to the simulated pressure field. Figure 7 shows the performance of inverse methods with frequency. In general, it can be observed that all methods performed well at low frequencies, and the performance decreased with an increase in frequency. It is also observed that LR with LBFGS performed the best of all machine learning methods and better than L1CVX due to its second-order optimization. The poor performance by MLP with 1 and 2 hidden layers could be attributed to ‘curse of dimensionality,’ which led to higher errors. alpha = 0.15 alpha = 0.5 alpha = 0.95LR LBFGS LR Adam DNN 1 HL DNN 2 HL L1CVXFigure 5: Source strengths calculated for two monopole simulation using various inverse methods and varying absorption coefficient (alpha) of walls in room at 500 Hz. alpha = 0.15 alpha = 0.5 alpha = 0.95Simulatedy (m) y (m) y (m) y (m) y (m) y (m)LR LBFGS LR Adam DNN 1 HL DNN 2 HL L1CVXx (m) x (m) x (m)Figure 6: Reconstructed pressure field along with simulated pressure field for two monopole simulation using various inverse methods and varying absorption coefficient (alpha) of walls in room at 500 Hz. (a)(b)(c) Figure 7: Reconstructed pressure (SPL) error for room walls with absorption coefficients (a) 0.15, (b) 0.5, and (c) 0.95.5. CONCLUSION From the numerical simulations, it is possible to conclude that 10,000 training examples are sufficient to train the models. It is also observed that L1CVX was not good at localizing the sources in rooms with reflection. As the frequency increased, the performance of all the methods deteriorated. The performance of the methods also became poor with decreasing absorption coefficient of walls. Out of all the machine learning methods tested, LR with L-BFGS performed best. It localized the source region and predicted the pressure field much better than the other methods across the frequencies. 6. REFERENCES [1] J. Hald and J. J. Christensen, “Brüel and Kjær Technical review: Beamforming,” Measurement , no. 1, 2004. [2] G. Raman, R. Ramachandran, K. Srinivasan, and R. Dougherty, “Advances in experimental aeroacoustics,” Int. J. Aeroacoustics , vol. 12, no. 5–6, pp. 579–637, 2013, doi: 10.1260/1475-472X.12.5-6.579. [3] Y. Lee and E. G. Williams, “Nearfield acoustic holography: I. Theory of generalized holography and the development of NAH,” J. Acoust. Soc. Am. , vol. 78, no. 4, pp. 1395– 1413, 1985, doi: 10.1121/1.392911. [4] P. Gerstoft, C. F. Mecklenbräuker, W. Seong, and M. Bianco, “Introduction to compressive sensing in acoustics,” J. Acoust. Soc. Am. , vol. 143, no. 6, pp. 3731–3736, 2018, doi: 10.1121/1.5043089. [5] G. Ping, Z. Chu, Z. Xu, and L. Shen, “A refined wideband acoustical holography based on equivalent source method,” Sci. Rep. , vol. 7, no. March, pp. 1–9, 2017, doi: 10.1038/srep43458. [6] E. G. Williams, Fourier Acoustics . Academic Press Inc. (London) Limited, 1999. [7] J. Hald, “A comparison of iterative sparse equivalent source methods for near-field acoustical holography,” J. Acoust. Soc. Am. , vol. 143, no. 6, pp. 3758–3769, 2018, doi: 10.1121/1.5042223. [8] S. K. Chaitanya and K. Srinivasan, “Equivalent source method based near field acoustic holography using multipath orthogonal matching pursuit,” Appl. Acoust. , vol. 187, p. 108501, 2022, doi: 10.1016/j.apacoust.2021.108501. [9] E. G. Williams, Fourier Acoustics . ACADEMIC PRESS, 1999. [10] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature , vol. 521, no. 7553, pp. 436–444, 2015, doi: 10.1038/nature14539. [11] G. Sharma, K. Umapathy, and S. Krishnan, “Trends in audio signal feature extractionmethods,” Appl. Acoust. , vol. 158, p. 107020, 2020, doi: 10.1016/j.apacoust.2019.107020. [12] A. M. Tripathi and A. Mishra, “Self-supervised learning for Environmental SoundClassification,” Appl. Acoust. , vol. 182, p. 108183, 2021, doi: 10.1016/j.apacoust.2021.108183. [13] F. Pedregosa et al. , “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res. , vol.12, pp. 2825–2830, 2011. [14] M. Abadi et al. , “TensorFlow: A System for Large-Scale Machine Learning,” 2016. Previous Paper 552 of 808 Next