A A A Machine-learning-based estimation of absorption coe ffi cients from transfer functions modeled by virtual sources Yukiko Okawa 1 Tokyo Denki University 5 Senju-Asahi-Cho, Adachi-ku, Tokyo, JAPAN Haruka Matsuhashi 2 Tokyo Denki University 5 Senju-Asahi-Cho, Adachi-ku, Tokyo, JAPAN Izumi Tsunokuni 3 Tokyo Denki University 5 Senju-Asahi-Cho, Adachi-ku, Tokyo, JAPAN Yusuke Ikeda 4 Tokyo Denki University 5 Senju-Asahi-Cho, Adachi-ku, Tokyo, JAPAN Yasuhiro Oikawa 5 Waseda University 3-4-1 Ookubo, Shinjuku-ku, Tokyo, JAPAN ABSTRACT In the field of room acoustics, it is important to determine the absorption coe ffi cients of the wall surface, which is a boundary condition for modeling the room acoustic field. However, it is not easy to measure the acoustic impedances of the entire room because it requires many measurement points near the wall surface. Recently, a method to estimate the acoustic impedance and absorption coe ffi cients by using both measurement and simulation methods has been proposed. However, a large number of measurement points are required to obtain su ffi cient estimation accuracy. In this study, we propose an estimation method of the sound absorption coe ffi cients using machine learning by virtually increasing the number of microphones. First, the transfer functions at the virtual microphones are obtained for a small number of transfer functions based on the sound field modeling using sparse 1 21fmi03@ms.dendai.ac.jp 2 21fmi18@ms.dendai.ac.jp 3 21udc02@ms.dendai.ac.jp 4 yusuke.ikeda@mail.dendai.ac.jp 5 yoikawa@waseda.jp a slaty. inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS O ¥, ? GLASGOW equivalent sources. Subsequently, both transfer functions at the virtual and actual microphones are used as the training data for machine learning. To evaluate estimation accuracy of the proposed method, we conducted the two-dimensional simulation experiments based on the boundary element method. 1. INTRODUCTION Sound field simulation is useful for room acoustics and sound visualization to understand acoustic phenomena in the room. Sakamoto e t al. performed a sound field simulation for an actual concert hall [1]. Raghuvanshi e t al. performed sound field simulations using the geometry of 3D objects in a virtual space for acoustic rendering in a virtual reality games [2]. To perform the sound field simulations based on an actual sound field, we require to obtain the room geometry and the sound absorption coe ffi cients (acoustic impedances) of the boundaries. Recently, measurement of room geometry has become easier because of the development of simultaneous localization and mapping (SLAM) and laser range finder. On the other hand, the measurement of sound absorption coe ffi cients is still di ffi cult because it is measured at many measurement points near the boundaries such as all surface of walls. In recent years, methods of estimating sound absorption coe ffi cients that combine numerical simulations and actual measurements have been proposed [3]. Antonello e t al. proposed the estimation method of acoustic impedances using finite-deference time-domain (FDTD) simulations from a small number of sound pressures [4, 5]. Nava e t al. have estimated the acoustic impedances using the inverse boundary element method from a large number of signals measured at random locations in a room [6, 7]. With this method, when the room geometry and positions of sound source are known in advance, the acoustic impedance of each element of boundary can be estimated. Foy e t al. proposed the machine-learning-based method to estimate the mean absorption coe ffi cients by the convolutional neural network (CNN) and multilayer perceptron (MLP) using the room impulse responses as training data. In this method, for simple room geometries, the only information required is the room impulse response (RIR) inside the room. In the experiments with actual measurements, the estimation accuracy was comparable to that of conventional methods. In this study, to reduce the number of measurement points and improve the estimation accuracy, we propose the estimation method of sound absorption coe ffi cients using machine-learning with virtual microphone signals based on equivalent source method. 2. METHOD The overview of the proposed method is shown in Fig.1. In this study, it is assumed that the room Wall (with Absorption. Coef.) Virtual mic. Estimated Absorp. Coef. Step.3 Training Amplitude & Phase Step.4 Estimation ・・・ R I Rs ・・・ ・・・ ・・・ Loudspeaker Step.2 Estimation Step.1 Measurement MLP(Fully connected) Actual mic. Figure 1: Overview of proposed method. The main flow of the proposed method is as follows. Step.1 Measurement of a small number of microphone signals. Step.2 Estimation of microphone signals at virtual microphones using the measurement signals. Step.3 Training the phases and the amplitudes of the virtual signals. Step.4 Estimation of absorption coe ffi cients using regression learning. geometry and sound source locations are approximately known. First, a small number of RIR signals measured are used as the input. Next, we obtain the RIR signals at a greater number of virtual microphones than actual microphones by modeling the sound field using the actual RIR signals. Then, the phases and amplitudes at virtual microphones are used as the training data for the machine- learning to estimate the sound absorption coe ffi cients. Next, we explain the estimation method of RIR signals at the virtual microphones and the network design for machine-learning. 2.1. Modeling transfer functions using virtual sources Actual mic. Virtual source Real sound source Path of sound propagation …… wall Image source Figure 2: Overview of equivalent source method and image source method. In the equivalent source method, the actual sound source is represented as a superposition of point sources (virtual source). The image source method is used to represent reflected sounds. The reflected sounds are assumed to arrive from a position that is plane-symmetrical with respect to the actual sound source and the wall, and the virtual sources are placed at the positions. The virtual microphone signals are estimated from the measured RIR signals using equivalent source method and image source method [8–10]. With the equivalent source method, the sound field is represented by a linear sum of point sources (virtual sources), the solution of which satisfies the wave equation. Virtual sources that represent direct sound are placed around the location of the actual sound source. Similarly, virtual sources for that represent reflected sound are placed around the image sound source obtained from the room geometry. Fig. 2 shows the overview of equivalent source method and image source method. Thus, based on the equivalent source method, the transfer function measured by the m -th microphone at position x ′ m can be expressed as follows. N X n = 1 D ( x ′ m , x n ) w n (1) y m = where D and w n are the transfer function of the n -th virtual source at position x n and its weight coe ffi cient, respectively. The transfer function D of an virtual source can be obtained using the following Green’s function: 4 π e − ik | x ′ m − x n | D ( x ′ m , x n ) = 1 | x ′ m − x n | , (2) where i is the imaginary unit and k is the wavenumber. Since Eq. (1) is valid for the number of microphones, the transfer function for all microphones is y Dw , (3) where y h y 1 , · · · , y M i (4) D ( x ′ 1 , x 1 ) · · · D ( x ′ 1 , x N ) ... ... ... D (5) D ( x ′ M , x 1 ) · · · D ( x ′ M , x N ) w h w 1 , · · · , w N i T (6) However, the vector of weight coe ffi cients w is unknown. The number of virtual sources are significantly larger than the actual and image sources. Thus, w is obtained by solving an optimization problem using the spatial sparsity of virtual sources. minimize w || w || 1 subject to || y − Dw || 2 ≤ ε, (7) where ϵ is the error tolerance. It is possible to estimate the transfer functions of virtual microphones using virtual sources. Thus, the m ′ -th virtual microphone signal is derived by, N X n = 1 D ( x ′ m ′ , x n ) w n . (8) y m ′ = In this study, the transfer functions including a direct sound and primary reflected sounds were estimated at the virtual microphones. 2.2. Network design In the proposed method, regression learning with MLP is used. In the input layer, the vector of the phases and amplitudes of the transfer functions at the virtual microphones estimated using Eq.(9) are used as the input. The hidden layer is a simple structure combining fully connected layers and the activation function is ReLU. In the following simulation experiments, the transfer functions were obtained through numerical simulation using the boundary element method (BEM). Next, we describe the simulation experiments using the proposed method. 3. SIMULATION EXPERIMENTS 3.1. Experiment conditions The simulation conditions are shown in Table. 1. We conducted the simulation of a two- dimensional closed room (3 m × 3 m). For simplicity, the sound absorption coe ffi cients of the equally-divided elements on the single wall were estimated. The arrangement of simulation experiment is shown in Fig. 3. In this experiment, the walls except for the target wall were assumed to be fully absorbing. Thus, the absorption coe ffi cients of walls except for the target are one. To estimate the signals at virtual microphones, we placed the virtual sources (equivalent sources) at a loudspeaker and a imaginary source. Three microphones and six virtual microphones were placed at 0.2 m intervals (Fig.3). Gaussian noise (SNR 30 dB) was added to the microphone signals, and the number of RIR was 100 for the dataset. We compared the estimation accuracy using three, six actual microphones, and six virtual microphones. Table 1: Simulation conditions for training data Frequency [Hz] 500 Sound speed [m / s] 340 Grid width [m] 0.05 Absorption coe ffi cient 0.4–0.6 Step of absorption coe ffi cient 0.05 Max Iteration 1000 Learning rate 0.001 Optimization method of the network Adam 3 m Actual mic. Virtual mic. 0.2 m Estimated target wall 0.2 m 2 m 3 m Loudspeaker Image source Virtual source 1.5 m Figure 3: Arrangement of the simulation experiment. The room size was 3 m × 3 m, and the distance between the actual sound source and the microphone array was 2 m. The signals of six virtual microphones were estimated from three actual microphones. The interval of microphones was 0.2 m. 3.2. Result Figure 4 shows the probability of estimation errors of the sound absorption coe ffi cients with three and six actual microphones, and six virtual microphones. First, comparing three and six actual microphones, the estimation accuracy was improved by using six actual microphone compared to three actual microphones. Thus, the estimation accuracy was improved as the number of microphones increased. Next, we compare the estimation accuracy with six virtual microphones and three actual microphones. As shown in Fig. 4, the probability of errors between 0–0.05 were slightly improved using six virtual microphones. Table. 2 shows the root bean squared error (RMSE) with three conditions. the RMSE is defined by v t N X 1 N RMS E = n = 1 ( a n − ˆ a n ) 2 (9) where a n represents the true absorption coe ffi cients, ˆ a n represents estimated absorption coe ffi cients, and N denotes the number of them. As the table shows, the use of six actual microphones improved the RMSE by 0.006 compared with three actual microphones. Moreover, six virtual microphones improved the RMSE by 0.002 compared with three actual microphones. Compared with increasing the number of actual microphones, virtually increasing them achieved a slight improvement in the estimation accuracy. The improvement by virtually increasing the microphone signals (proposed method) depends on the estimation accuracy at the virtual microphones. The estimation accuracy of the virtual microphone Figure 4: Probability of estimation errors for each method. The estimation error is defined by the ratio of di ff erences between the true and estimation values to the true value. Blue indicates the results using six actual microphones. Red indicates the results with three actual microphones. Yellow indicates the results with six virtual microphones. The vertical axis indicates the estimation errors, and the horizontal axis indicates the probability of the errors. signals tends degrade as the position of microphone increases from the actual microphones. To improve the accuracy of the proposed method, the estimation accuracy of the virtual microphones must be improved. Table 2: Comparison of RMSE with three conditions. Mic. condition Number of Mic. RMSE Actual mic. 3 0.049 Actual mic. 6 0.043 Virtual mic. 6 0.047 4. CONCLUSIONS We propose an estimation method for sound absorption coe ffi cients using machine-learning and virtual sources. The simulation experiments indicated that the estimation accuracy of the sound absorption coe ffi cients can be slightly improved by using virtual microphone signals. In future research, we will improve the estimation accuracy of sound absorption coe ffi cients by improving the estimation of the virtual microphone signals and by investigating the e ff ective networks for machine- learning. ACKNOWLEDGEMENTS This work was partially supported by JSPS KAKENHI Grant Number 19K12049 and 22K12099. REFERENCES [1] Shinichi Sakamoto, Hiroshi Nagatomo, Ayumi Ushiyama, and Hideki Tachibana. Calculation of impulse responses and acoustic parameters in a hall by the finite-di ff erence time-domain method. Acoustical Science and Technology , 29(4):256–265, jul 2008. [2] Nikunj Raghuvanshi and John Snyder. Parametric directional coding for precomputed sound propagation. ACM Trans. Graph. , 37(4), July 2018. [3] N. Bertin, S. Kiti´c, and R. Gribonval. Joint estimation of sound source location and boundary impedance with physics-driven cosparse regularization. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 6340–6344, 2016. [4] Niccolò Antonello, Toon Waterschoot, Marc Moonen, and Patrick Naylor. Identification of surface acoustic impedances in a reverberant room using the fdtd method. pages 114–118, 09 2014. [5] N. Antonello, T. van Waterschoot, M. Moonen, and P. A. Naylor. Evaluation of a numerical method for identifying surface acoustic impedances in a reverberant room. In Euronoise 2015 , pages 185–190, 2015. [6] Gabriel Pablo Nava et al. On the in situ estimation of surface acoustic impedance in interiors of arbitrary shape by acoustical inverse methods. Acoustical Science and Technology , 3:100–109, 2009. [7] Gabriel Pablo Nava, Yosuke Yasuda, Yoichi Sato, and Shinichi Sakamoto. In situ estimation of acoustic impedance on the surfaces of realistic interiors: An inverse approach. Proceedings of Meetings on Acoustics , 2(1):015001, 2007. [8] G. H. Koopmann et al. A method for computing acoustic fields based on the principle of wave superposition. J. Acoust. Soc. Am. , 86, no. 5:2433–2438, 1989. [9] I. Tsunokuni et al. Extrapolation of spatial transfer functions for primary reflections with equivalent sources. 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE). , pages 34–38, 2020. [10] Izumi Tsunokuni, Kakeru Kurokawa, Haruka Matsuhashi, Yusuke Ikeda, and Naotoshi Osaka. Spatial extrapolation of early room impulse responses in local area using sparse equivalent sources and image source method. Applied Acoustics , 179:108027, 2021. Previous Paper 453 of 769 Next