A A A Volume : 44 Part : 2 Performance Evaluation of Selective Fixed-filter Active Noise Control based on Di ff erent Convolutional Neural NetworksZhengding Luo 1 , Dongyuan Shi 2 , Woon-Seng Gan 3School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore.Libin Zhang 4Huawei Technologies Co., Ltd.Qirui Huang 5Huawei International Pte. Ltd.ABSTRACT Due to its rapid response time and a high degree of robustness, the selective fixed-filter active noise control (SFANC) method appears to be a viable candidate for widespread use in a variety of practical active noise control (ANC) systems. In comparison to conventional fixed-filter ANC methods, SFANC can select the pre-trained control filters for di ff erent types of noise. Deep learning technologies, thus, can be used in SFANC methods to enable a more flexible selection of the most appropriate control filters for attenuating various noises. Furthermore, with the assistance of a deep neural network, the selecting strategy can be learned automatically from noise data rather than through trial and error, which significantly simplifies and improves the practicability of ANC design. Therefore, this paper investigates the performance of SFANC based on di ff erent one-dimensional and two-dimensional convolutional neural networks. Additionally, we conducted comparative analyses of several network training strategies and discovered that fine-tuning could improve selection performance.1. INTRODUCTIONAcoustic noise problems are becoming more prevalent as the quantity of industrial equipment increases [1]. The attenuation of low-frequency noises is quite di ffi cult and expensive for passive noise control techniques such as enclosures, barriers, silencers, etc. Di ff erent from passive techniques, active noise control (ANC) involves the electro-acoustic generation of a sound field to cancel an unwanted existing sound field [2]. Moreover, ANC can o ff er a possible lower-cost alternative for the control of low-frequency noises. Thus, it attracts much interest from the industry. When dealing1 luoz0021@e.ntu.edu.sg2 dongyuan.shi@ntu.edu.sg3 ewsgan@ntu.edu.sg4 zhanglibin@huawei.com5 huang.qirui@huawei.coma slaty. inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS O ¥, ? GLASGOW Error signal +( ) x n Primary noise ( ) e n( ) d n DisturbancePrimary path ( ) P z_' ( ) y n Secondary pathControl filter ( ) W z ( ) S zAnti-noiseCo-processorMin-max operationPre-trainedFilter indexCNN Control filterdatabase Control filterdatabaseFigure 1: Block diagram of the CNN-based SFANC algorithm.with di ff erent types of noises, traditional ANC systems typically use adaptive algorithms to adjust control filter coe ffi cients to minimize the error signal [3]. Among adaptive algorithms, the filtered-X least mean square (FxLMS) and filtered-X normalized least-mean-square (FxNLMS) algorithm are commonly used since they can compensate for the delay involved by the secondary path to increase the system robustness [4]. However, due to the least mean square (LMS) based algorithms’ inherent slow convergence and poor tracking ability [5], FxLMS and FxNLMS are less capable of dealing with rapidly varying or non-stationary noises. Their slow responses to noises may impact customers’ perceptions of the noise reduction e ff ect [6]. Fixed-filter ANC methods [7] can be adopted to tackle slow convergence, where the control filter coe ffi cients are pre-trained rather than adaptive updated. However, the pre- trained control filter is only suitable for a specific noise type, resulting in the degradation of noise reduction performance for other types of noises. To rapidly select di ff erent pre-trained control filters given di ff erent noise types, a selective fixed-filter active noise control (SFANC) method based on the frequency band matching was proposed in [8]. Though the SFANC method [8] selects the most suitable pre-trained control filters in response to di ff erent noise types, several critical parameters of the method can only be determined through trials and errors. Considering the limitations, deep learning techniques, particularly convolutional neural networks (CNNs) [9–12], appear to be powerful in classifying noises in SFANC methods. Automatic learning of the SFANC algorithm’s critical parameters based on deep learning would broaden its applications in real-world scenarios. With the learning ability of CNN models, the SFANC algorithm can automatically learn its parameters from noise datasets and select the best control filter given di ff erent noise types without resorting to extra-human e ff orts [13]. Additionally, a CNN model implemented on a co-processor can decouple the computational load from the real-time noise controller. Therefore, in this paper, we compared the performance of several one-dimensional (1D) CNNs and two-dimensional (2D) CNNs in the SFANC method. Also, di ff erent network training strategies are tried to choose the best one for training the networks. Experiments show that the SFANC method based on CNN not only achieves faster responses than FxLMS and FxNLMS but also exhibits good robustness. Thus, it is expected to be used for attenuating dynamic noises such as tra ffi c noises and urban noises, etc.2. CNN-BASED SFANC ALGORITHMThe overall architecture of the CNN-based SFANC algorithm is depicted in Figure 1. Throughout the control process, the real-time controller conducts filtering to generate anti-noise while simultaneously sending the primary noise to a co-processor (e.g., a mobile phone). Given the primary noise, the co- processor employs a pre-trained CNN to produce the index for the most appropriate control filter and Figure 2: Markov model of the ANC progress.delivers it to the real-time controller. The controller then adjusts the control filter coe ffi cients based on the received filter index. Notably, if the network is a 1D CNN, its input is the raw waveform [14]. However, if the network is a 2D CNN, its input is the Log Mel-spectrogram [15].2.1. Concise Explanation of SFANC An ANC progress can be abstracted as a first-order Markov chain [16] as shown in Figure 2, where w o ( n ) represents the optimal control filter to attenuate the disturbance d ( n ). To achieve the best noise reduction performance, the best control filter w o ( n ) can be selected from a pre-trained filter set { w i } C i = 1 . Hence, the SFANC method can be represented as follows:w o = argmin w ∈{ w i } C i = 1 E h d ( n ) − x T ( n ) w ( n ) ∗ s ( n ) i 2 , (1)where argmin( · ) operator returns the input value for minimum output; ∗ , x ( n ), and s ( n ) represent the linear convolution, the reference signal, and the impulse response of the secondary path, respectively. The reference signal is assumed to be the same as the primary noise. In practice, d ( n ) is typically seen as the linear combination of x ( n ). Thus, Equation 1 equals tow o = argmax w ∈{ w i } C i = 1 P [ w | d ( n )] = argmax w ∈{ w i } C i = 1 P [ w | x ( n )] , (2)which means that the selected control filter is the one with maximum posterior probability in the presence of reference signal x ( n ). Moreover, according to Bayes’ theorem [17], the posterior probability can be replaced with a conditional probability asw o = argmax w ∈{ w i } C i = 1 P [ x ( n ) | w ] , (3)which predicts the most suitable control filter straight from the primary noise x ( n ). A classifier model ˆ P [ x ( n ) | w , Θ ] can be developed to approximate P [ x ( n ) | w ] from the pre-recorded sampling set { x j ( n ) , w j } N j . The Θ denotes the parameters of the classifier and can be obtained through maximum likelihood estimation (MLE) [18] asN X1 Nj = 1 log ˆ P h x j ( n ) | w j , Θ i . (4)Θ = argmaxTherefore, we can utilize deep learning approaches to lean the classifier model from the training set { x j ( n ) , w j } N j .2.2. CNN-based SFANC algorithm Motivated by the work [13], this paper compares some 1D CNNs and 2D CNNs used for classifying noises in the frequency domain and time domain, respectively. The min-max operation firstlyHidden state Optimal control filter >: ) Ew, (nr): wi(n—1 d(t) d(n — 1) d(n) Observation Disturbance J Figure 3: Architecture of the proposed 1D CNN. The configuration of convolution layer is denoted as: (kernel size, channels, stride, padding).Figure 4: The frequency bands of the noise tracks for pre-training control filters.normalizes the input of the network:ˆ x ( n ) = x ( n ) max[ x ( n )] − min[ x ( n )] , (5)Noise waveform (1 second) Conv Layer (80, 128, 4, 38) v BatchNorm y ReLU Max Pooling v Residual Block v Residual Block t Max Pooling y Adaptive Avg Pooling FC Layer Filter index Residual Block: Input Conv Layer (3, 128, 1, 1) v BatchNorm 4 ReLU Conv Layer (3, 128, 1, 1) v BatchNorm y ReLU BatchNorm Outputwhere max[ · ] and min[ · ] mean obtaining the maximum and minimum value of x ( n ). It aims to rescale the input range into ( − 1 , 1) and retain the signal’s negative part that contains phase information. Phase information is quite critical for ANC applications. A lightweight 1D CNN illustrated in Figure 3 is proposed. Every residual block in the network comprises two convolutional layers, subsequent batch normalization, and ReLU non-linearity. Note that a shortcut connection is adopted to add the input with the output in each residual block since residual architecture is demonstrated easy to be optimized [19]. Additionally, the network uses a broad receptive field (RF) in the first convolutional layer and narrow RFs in the rest convolutional layers to fully exploit both global and local information.2.3. Training of CNNs The primary and secondary paths used in the training stage of the control filters are band-pass filters with a frequency range of 20Hz-7 , 980Hz. Broadband noises with 15 frequency ranges shown in Figure 4 are used to pre-train 15 control filters. The FxLMS algorithm is adopted to obtain the optimal control filters for these broadband noises due to its low computational complexity. Subsequently, the 15 pre-trained control filters are saved in the control filter database. A noise dataset including synthetic and real noise tracks is used in this work. Specifically, 80 , 000 synthetic noise tracks and 80 , 000 real noise tracks are used for training, 2 , 000 real noise tracks for validation, and 2 , 000 real noise tracks for testing. The synthetic noise tracks are randomly generated with various frequency bands, amplitudes, and background noise levels. The SFANC system’s sample rate is 16 , 000Hz, so each noise track of 1-second duration consists of 16 , 000 samples. Each noise track of 1 second duration is taken as primary noise to generate disturbance. The class label of a noise track corresponds to the index of the control filter that achieves the best noise reduction performance on the disturbance.7.08 kHz C, C60, €,,6, CP 10~ 11 Vy 137 147 15 3. EXPERIMENTSThe Adam algorithm was employed to optimize the network during training. The training epoch was set to be 30. The glorot initialization [20] was used to avoid bursting or vanishing gradients. Additionally, to prevent overfitting, the weights of CNNs were subjected to ℓ 2 regularization with a coe ffi cient of 0 . 0001.3.1. Comparison of Di ff erent Training Schemes Four di ff erent training schemes are compared in training the proposed 1D CNN. The comparison results are summarized in Table 1. According to Table 1, training firstly on the synthetic noise tracks and then fine-tuning on the real noise tracks achieves the highest testing accuracy. Noted that simultaneously using synthetic dataset and real dataset for training has not obtained a superior testing accuracy since the characteristics of synthetic noises and real noises are quite di ff erent. As discussed above, in the SFANC system, the CNN models can be firstly trained with the synthetic dataset and then fine-tuned with the real noise dataset.Table 1: The performance of di ff erent network training schemes.Training Scheme Testing AccuracyOnly using synthetic dataset 46.4%Only using real dataset 94.6%Fine-tuning method * 95.3 %Simultaneously using synthetic dataset and real dataset 94.5%* Training firstly by the synthetic dataset and then fine-tuning by the real dataset.3.2. Comparison of Di ff erent Networks Based on above fine-tuning training scheme, we compared several di ff erent 1D networks utilizing raw acoustic waveforms: the proposed 1D CNN, M3 [21], M5 [21], M11 [21], M18 [21], and M34- res [21]. Also, some light-weight 2D networks including Shu ffl eNet v2 [22], MoblieNet v2 [23] and Attention Network [24] are compared in the SFANC method. The performance of these networks on the real testing dataset are summarised in Table 2.Table 2: Comparisons of di ff erent networks used in the SFANC system.Network Testing Accuracy Network Parameters1D Convolutional Neural NetworksProposed 1D Network 95.3 % 0.21MM3 Network 93.7% 0.22MM5 Network 94.9% 0.56MM11 Network 94.5% 1.79MM18 Network 93.8% 3.69MM34-res Network 94.4% 3.99M2D Convolutional Neural NetworksShu ffl eNet v2 95.5 % 0.25MMoblieNet v2 95.6% 2.89MAttention Network 94.9% 4.95M As shown in Table 2, the proposed 1D network obtains the highest classification accuracy of 95 . 3% with the fewest network parameters among the 1D networks. As for 2D networks, the Shu ffl eNet v2 achieves a similar classification accuracy as MoblieNet v2 and requires far fewer parameters. By considering both the testing accuracy and the number of parameters, the Shu ffl eNet v2 performs best on the testing dataset among the 2D networks. Compared to the proposed 1D network, the Shu ffl eNet v2 obtains a slight improvement in classification accuracy but requires a little more network parameters. Therefore, it is found that the proposed 1D network and Shu ffl eNet v2 perform better in classifying noises in the SFANC system. The two light-weight networks can be implemented on mobile platforms, but using acoustic models directly from the raw waveform data is more convenient [25]. Hence, the proposed 1D network is preferred.3.3. Non-stationary Noise Cancellation This section uses the SFANC algorithm based on the proposed 1D network, FxLMS algorithm, and FxNLMS algorithm to attenuate a recorded aircraft noise. The aircraft noise is non-stationary and has a frequency range of 50Hz-14,000Hz. It does not belong to the training dataset. The step size of the FxLMS and FxNLMS algorithm is set to 0 . 0001, and the control filter length is 1 , 024 taps. The noise reduction results using di ff erent ANC methods on the aircraft noise are shown in Figure 5. From the results in Figure 5, we can observe that the SFANC method responds to the aircraft noise much faster than the FxLMS and FxNLMS algorithms. Also, the SFANC method consistently outperforms the FxLMS and FxNLMS algorithm in the noise reduction process. In particular, during 1s-2s, the averaged noise reduction level achieved by the SFANC algorithm is about 7dB and 8dB more than that of FxLMS and FxNLMS, respectively. Therefore, the results on the aircraft noise confirm that the SFANC method can rapidly select the most suitable pre-trained control filter given the noise type. In contrast, adaptive algorithms show slow responses to the aircraft noise due to adaptive updating.(a) The SFANC Algorithm(b) The FxNLMS AlgorithmANC off ANC onANC off ANC on0.50.5MagnitudeMagnitude0.00.00.50.50 2 4 6 8 10 Time (seconds)0 2 4 6 8 10 Time (seconds)(d) Averaged Noise Reduction Level of Every 1s(c) The FxLMS AlgorithmNoise Reduction Level (dB)20SFANC FxNLMS FxLMSANC off ANC on0.515Magnitude100.050.500 1 2 3 4 5 6 7 8 9 10 Time (seconds)0 2 4 6 8 10 Time (seconds)Figure 5: (a)-(c): Error signals of di ff erent ANC algorithms, (d): Averaged noise reduction level of every 1 second, on the aircraft noise. 4. CONCLUSIONSActive noise control (ANC) technologies have been widely used to deal with low-frequency noises. However, adaptive ANC algorithms are typically limited by slow convergence speed. In this paper, CNNs are used to automatically select the best pre-trained control filters given di ff erent noises. Also, light-weight CNNs implemented on a co-processor can decouple the computational load from the real-time noise controller. Numerical simulations show that the CNN-based SFANC method improves response time while maintaining low computational complexity and high robustness. Additionally, the e ff ectiveness of the proposed 1D network and the fine-tuning training strategy are confirmed in the SFANC method. In future works, we will explore more e ffi cient and robust ANC algorithms based on deep learning.REFERENCES[1] Colin N Hansen. Understanding active noise cancellation . CRC Press, 1999. [2] Sen M Kuo and Dennis R Morgan. Active noise control: a tutorial review. Proceedings of the IEEE , 87(6):943–973, 1999. [3] Rishabh Ranjan, Tatsuya Murao, Bhan Lam, and Woon-Seng Gan. Selective active noise control system for open windows using sound classification. In INTER-NOISE and NOISE- CON Congress and Conference Proceedings , volume 253, pages 1921–1931. Institute of Noise Control Engineering, 2016. [4] Feiran Yang, Jianfeng Guo, and Jun Yang. Stochastic analysis of the filtered-x lms algorithm for active noise control. IEEE / ACM Transactions on Audio, Speech, and Language Processing , 28:2252–2266, 2020. [5] Rishabh Ranjan and Woon-Seng Gan. Natural listening over headphones in augmented reality using adaptive filtering techniques. IEEE / ACM Transactions on Audio, Speech, and Language Processing , 23(11):1988–2002, 2015. [6] Dongyuan Shi, Woon-Seng Gan, Bhan Lam, Shulin Wen, and Xiaoyi Shen. Active noise control based on the momentum multichannel normalized filtered-x least mean square algorithm. In INTER-NOISE and NOISE-CON Congress and Conference Proceedings , volume 261, pages 709–719. Institute of Noise Control Engineering, 2020. [7] Chuang Shi, Rong Xie, Nan Jiang, Huiyong Li, and Yoshinobu Kajikawa. Selective virtual sensing technique for multi-channel feedforward active noise control systems. In ICASSP 2019- 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 8489–8493. IEEE, 2019. [8] Dongyuan Shi, Woon-Seng Gan, Bhan Lam, and Shulin Wen. Feedforward selective fixed-filter active noise control: Algorithm and implementation. IEEE / ACM Transactions on Audio, Speech, and Language Processing , 28:1479–1492, 2020. [9] Yann LeCun, Yoshua Bengio, and Geo ff rey Hinton. Deep learning. nature , 521(7553):436–444, 2015. [10] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 1–9, 2015. [11] Zhengding Luo, Qinghua Gu, Gege Qi, Song Liu, Yuesheng Zhu, and Zhiqiang Bai. A robust single-sensor face and iris biometric identification system based on multimodal feature extraction network. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI) , pages 1237–1244, 2019. [12] Zhengding Luo, Qinghua Gu, Guoxiong Su, Yuesheng Zhu, and Zhiqiang Bai. An adaptive face-iris multimodal identification system based on quality assessment network. In MultiMedia Modeling , pages 87–98. Springer International Publishing, 2021. [13] Dongyuan Shi, Bhan Lam, Kenneth Ooi, Xiaoyi Shen, and Woon-Seng Gan. Selective fixed-filter active noise control based on convolutional neural network. Signal Processing , 190:108317, 2022. [14] Zhengding Luo, Dongyuan Shi, and Woon-Seng Gan. A hybrid sfanc-fxnlms algorithm for active noise control based on deep learning. IEEE Signal Processing Letters , 29:1102–1106, 2022. [15] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 4510–4520, 2018. [16] Paulo AC Lopes and Moisés S Piedade. A kalman filter approach to active noise control. In 2000 10th European Signal Processing Conference , pages 1–4. IEEE, 2000. [17] Steven M Kay. Fundamentals of statistical signal processing: estimation theory . Prentice-Hall, Inc., 1993. [18] Sjoerd van Ophem and Arthur P Berkho ff . Multi-channel kalman filters for active noise control. The Journal of the Acoustical Society of America , 133(4):2105–2115, 2013. [19] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016. [20] Xavier Glorot and Yoshua Bengio. Understanding the di ffi culty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics , pages 249–256. JMLR Workshop and Conference Proceedings, 2010. [21] Wei Dai, Chia Dai, Shuhui Qu, Juncheng Li, and Samarjit Das. Very deep convolutional neural networks for raw waveforms. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 421–425. IEEE, 2017. [22] Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shu ffl enet v2: Practical guidelines for e ffi cient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV) , pages 116–131, 2018. [23] Sainath Adapa. Urban sound tagging using convolutional neural networks. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019) , pages 5–9, 2019. [24] Zhengding Luo, Junting Li, and Yuesheng Zhu. A deep feature fusion network based on multiple attention mechanisms for joint iris-periocular biometric recognition. IEEE Signal Processing Letters , 28:1060–1064, 2021. [25] Erfan Loweimi, Peter Bell, and Steve Renals. On the robustness and training dynamics of raw waveform models. In INTERSPEECH , pages 1001–1005, 2020. Previous Paper 346 of 808 Next