A A A Volume : 44 Part : 2 Subjective evaluation for sharp sound image construction based on reverberation control with surround sound system using parametric and electro-dynamic loudspeakersYuna Harada 1Graduate School of Information Science and Engineering, Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, JapanYuting Geng 2 , Kenta Iwai 3College of Information Science and Engineering, Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, JapanMasato Nakayama 4Faculty of Design Technology, Osaka Sangyo University 3-1-1 Nakagaito, Daito, Osaka 574-8530, JapanTakanobu Nishiura 5College of Information Science and Engineering, Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, JapanABSTRACT Three-dimensional sound-field reproduction systems can provide a high sense of presence. These systems commonly use electro-dynamic loudspeakers. Such loudspeakers tend to construct di ff use sound images due to their wide directivity and high reverberation. In contrast, parametric array loudspeakers can construct sharp sound images due to their sharp directivity and low reverberation. However, it is di ffi cult to provide reverberation presence when using parametric array loudspeakers because of this sharp directivity. We previously proposed a sharp sound image construction method that is based on reverberation control with a surround sound system using parametric and electro- dynamic loudspeakers. With this method, a sharp sound image is rendered using parametric array loudspeakers, and reverberation presence is provided by electro-dynamic loudspeakers, emitting reverberation signals synthesized with reverberation control filters. Through objective experiments, we confirmed that this method can construct a sharp sound image with reverberation presence. In this paper, we propose a sharp sound image construction method, which is based on our previous method, to construct a sharp sound image at any desired position. We conducted subjective evaluations to confirm if listeners can perceive the reverberation presence provided with the proposed method. From the subjective evaluation, we confirmed that the reverberation presence can be perceived.1 is0426sh@ed.ritsumei.ac.jp2 geng@fc.ritsumei.ac.jp3 iwai18sp@fc.ritsumei.ac.jp4 nakayama@ise.osaka-sandai.ac.jp5 nishiura@is.ritsumei.ac.jpMin aussi, inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS GLASGOW 1. INTRODUCTION With the development of video technology, three-dimensional (3D) sound-field reproduction systems that can provide a high sense of presence have attracted attention. Conventional systems commonly use electro-dynamic loudspeakers (EDLs) [1], therefore, the sound image is di ff used when constructing a narrow sound image such as a point source [2]. Conventional systems with parametric array loudspeakers (PALs) [3] have been proposed, such as acoustic planetarium [4] and multichannel surround systems with PALs and EDLs [5]. PALs can produce sharper directivity than EDLs by using the straightness of ultrasound. PALs can construct a sharp sound image [5]; however, it is di ffi cult to present reverberation [2]. It is also di ffi cult to reproduce the low-frequency sounds with PALs [3]. In the acoustic planetarium [4], sound images can be rendered on walls and ceilings by the reflected sounds emitted from PALs. It is di ffi cult to present reverberation with PALs, so it is conceivable to add reverberation with EDLs. However, the acoustic planetarium requires reflective objects such as walls and ceilings, and it is di ffi cult to construct a 3D sound source at any position in the air. In surround sound systems with PALs and EDLs [5], loudspeakers are placed around the listener. PALs are used to create a sharp sound image in the air, and EDLs are used to reproduce low-frequency sounds, which are di ffi cult to reproduce with PALs. Such surround systems can achieve sharp sound images with high sound quality. However, the systems cannot reproduce reverberation because reverberation is not taken into account, and the sense of presence degrades. We previously proposed a method for 3D sound-field reproduction to achieve both sharp sound image and reverberation presence by combing PALs and EDLs in a surround sound system [6]. With this method, a sharp sound image is rendered using a single PAL, and reverberation presence is provided by EDLs. In this paper, we propose a sharp sound image construction method, which is based on previous method, for constructing a sharp sound image [2] at any desired position. Two PALs are used with the proposed method for sharp sound image rendering instead of a single PAL. The EDLs provide reverberation presence by emitting reverberation signals synthesized with reverberation control filters, which is the same as our previous method. The reverberation control filters are designed to reproduce the reverberation time [7] and direct-to-reverberation ratio (DRR) [7] of the target field in other environments. To demonstrate the e ff ectiveness of the proposed method, we carried out objective and subjective evaluations on the reverberation, sharpness, and direction of a sound image.2. PROPOSED SURROUND SOUND SYSTEM WITH PALS AND EDLS With the proposed method, we focus on the reverberation time and DRR [7] among room acoustical parameters because they represent reverberation characteristics. The method achieves both sharp images and reverberation presence by combing PALs and EDLs in a surround sound system. Sharp sound images are rendered using PALs, and reverberation presence is provided by EDLs. We design filters to reproduce the reverberation time and DRR of the target field in other environments by generating indirect sounds. Therefore, the proposed method can reproduce the sharp sound image of the target field. Figure 1 shows an overview of the proposed method. In this figure, l ( l = 1 , 2 , . . . , L ) denotes the index of control points, k ( k = 1 , 2 , . . . , K ) denotes the index of PAL, n ( n = 1 , 2 , . . . , N ) denotes the index of EDL, MIC l denotes the microphone at the l -th control point, PAL k denotes the k -th PAL, and EDL n denotes the n -th EDL. The proposed method consists of three major steps. Step 1 involvesMin aussi, inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS GLASGOW Figure 1: Overview of the proposed methoddesigning the amplitude modulated (AM) wave. The AM wave v AM ( t ) is expressed asv AM ( t ) { 1 + m · v S ( t ) } v C ( t ) , (1)where m (0 < m ≤ 1) denotes the amplitude modulation factor, v S ( t ) denotes the input signal, and v C ( t ) denotes the carrier wave. Step 2 involves rendering the sharp sound image at the desired position by amplitude panning using two nearby PALs with indexes k L and k R , where the k L -th PAL is on the left and k R -th PAL is on the right. The gain factors for the PALs are expressed on the basis of the position of the listening point asMin aussi, inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS GLASGOW2 + sin 2 θ ) W k R = sin φW k L = sin φ2 + sin θ2 − sin θ2 + sin 2 θ ) (2)2(sin 2 φ2(sin 2 φwhere W k denotes the gain factors for k -th PAL, φ denotes the angle between the two PALs, and θ denotes the angle between the listening position and sound image. The signal emitted from the k -th PAL v PAL k ( t ) is expressed as W k v AM ( t ) , if k = k L , k R , 0 , otherwise . (3)v PAL k ( t ) =Step 3 involves reproducing the reverberation of the target field using N EDLs by using the designed reverberation control filters [6]. As shown in Figure 1, reverberation is controlled at L points near the head so that listeners can move their heads. The input signal is convolved with the filters. The signal processed by the filter v EDL n ( t ) is expressed asv EDL n ( t ) h n ( t ) ∗ v S ( t ) , (4)where h n ( t ) denotes the reverberation control filter for the n -th EDL, and ∗ stands for the convolution operator. Finally, we construct sharp sound image and reproduce reverberation by emitting v PAL k ( t ) and v EDL n ( t ) from the PALs and EDLs.3. OBJECTIVE EVALUATION 3.1. Conditions for Objective Evaluation In this evaluation, the following four sound fields were compared._--- Step 2.---- | ) | VpaL, (€) Amplitude |! Vam(t)! Amplitude |! modulation ) panning : VpaL, (€) I I A | An, cS ceal ne PAL, PAL Input signal / Ys) | MIC; MIC, \ PAL EDLy PAL34\ EDL ____ Step 3 -___- PALx 3 3 7 P | Vedi, MICe. MIC; ) \ o 8 | | - MIC. MIC, ; .| Reverberation \ ) / control PAL; PAL / VEDLy (€) Se —— is EDL, Min aussi, inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS GLASGOW– Target field: The sound image is constructed using a single EDL in the target field.– Synthesized field (EDL-VBAP): The sound image is constructed by vector based amplitude panning (VBAP) using EDLs [1] in the synthesized field.– Synthesized field (PAL-VBAP): The sound image is constructed by VBAP using PALs [8] in the synthesized field.– Synthesized field (Proposed method): The sound image is constructed by VBAP using PALs and reverberation is controlled using EDLs in the synthesized field.Figure 2 shows the arrangement of the sound fields under each condition when rendering the sound image. Table 1 lists the measurement conditions of the impulse response in the target field and synthesized field, and Table 2 lists the equipment used in the evaluation experiments. Considering the e ff ect of reflections from the listener’s head, we placed a dummy head in the measurement environment. As shown in Figure 2, the dummy head was placed at a fixed position in the 0-deg. direction. The sound image was constructed in four directions (0 , 30 , 60 , and 90 degs.). Eight microphones were used to record the sounds for evaluation: six were placed at the control points, and two were placed near the left and right ears. The impulse responses were measured using the time-stretched pulse (TSP) signal [9].i > PAL | > EDL @ : Microphone @ : Dummy head xX : Position of sound image3.2. Results of Objective Evaluation The sounds in each field were analyzed and evaluated from the following three aspects. Reverberation : We evaluated reverberation with early decay time (EDT) [7] and DRR. Tables 3 and 4 show the mean absolute errors of EDT and DRR between the target field and each synthesized field. Figures 3 and 4 show the EDT and DRR when the sound image is constructed in the 0-deg.90°(d) Synthesized field (Proposed method)(a) Target field (b) Synthesized field (EDL-VBAP)(c) Synthesized field (PAL-VBAP)Figure 2: Arrangements of each sound field for evaluationTable 2: Experimental equipment PAL MITSUBISHI, MSP-50ETable 1: Evaluati on conditionsTarget field Synthesized fieldEDL FOSTEX, FE83EnReverberation time T 60 948 ms 350 msMicrophone SONY, ECM-88BAmbient noise level L A 38.8 dB 27.0 dBDummy head microphone NEUMANN, KU100Sound source TSP signal (0 – 8 kHz)White noise (0 – 8 kHz)Microphone amplifier RME, OctaMic IISampling frequency 96 kHzPower amplifier VICTOR, PS-A2002 Min aussi, inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS GLASGOWdirection. From Tables 3 and 4, the EDT and DRR in the synthesized field (Proposed method) were closer to the target field than in the synthesized field (PAL-VBAP). However, Figure 3 shows the EDT in the synthesized field (Proposed method) was larger than the target field. This is because we took the whole band of 0 – 8 kHz into account in the design of reverberation control filters. Due to the di ffi culty of low-frequency reproduction with PALs, the energy of the direct sound in practice is lower than predicted. Therefore, the ratio of direct sounds emitted by PALs and indirect sounds emitted by EDLs becomes lower than predicted, which leads to a larger EDT with the proposed method. To improve the accuracy of reverberation presence, we should consider the frequency dependence of EDT when designing the reverberation control filters. S harpness : We evaluated sharpness with the inter-aural cross coe ffi cient (IACC) [10]. The sharpness suggests whether the sharp sound image can be constructed. Figure 5 shows the IACC for each direction of sound image presentation in each sound field. The IACC in the synthesized field (Proposed method) was smaller than in the synthesized field (PAL-VBAP). This is because reverberation is reproduced from the EDL in the synthesized field (Proposed method). Also, the IACC in the synthesized field (PAL-VBAP) when the sound image was constructed in the 60-deg. direction was smaller than that of the other sound fields. This is because the directivity of a PAL di ff ers from that of an EDL; thus, it is necessary to set the loudspeakers in an optimal arrangement. Direction : We evaluated direction with inter-aural level di ff erence (ILD) [11]. Figure 6 shows the ILD for each direction of sound image presentation in each sound field. The ILD in the synthesized field (PAL-VBAP) and synthesized field (Proposed method) had marked di ff erences when the direction of sound image presentation was changed. This result indicates that sound localization is improved by using PALs.MIC, MIC, MIC, MIC, MIC; MIC, MIC, MIC, Measurement pointsMIC, MIC, MIC, MIC, MIC; MIC, MIC, MIC, Measurement pointsTable 3: Mean absolute error of EDT [ms]Table 4: Mean absolute error of DRR [dB]EDL- vpap | 232 | 247 | 268 | 405 PAL- vexp | 464 | 449 | 515 | 622 Proposed | 978 263 248 60 method0 degs. 30 degs. | 60 degs. | 90 degs. EDL- VBAP 2.4 2.0 1.8 1.8 PAL- VBAP 13.5 10.7 12.1 14.4 Proposed 2.0 7 14 15 methodFigure 4: Results on DRR for reverberationFigure 3: Results on EDT for reverberation( θ = 0 degs.)( θ = 0 degs.)_| Target field = Synthesized field (EDL-VBAP) [|] Synthesized field (PAL-VBAP) fi Synthesized field (Proposed method) Min aussi, inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS GLASGOWFigure 6: Results on ILD for directionFigure 5: Results on IACC for sharpness4. SUBJECTIVE EVALUATION 4.1. Conditions for Subjective Evaluation In the subjective evaluation, we evaluated the reverberation, sharpness, and direction of a sound image by comparing a listener’s perceptions of rendered sound images with each sound field, as shown in Figure 2. Japanese voice ( / ikioi / ) was used as the sound source for evaluation of reverberation, and white noise (0 – 8 kHz) was used as the sound source for evaluation of sharpness and direction. Other experimental conditions and equipment were the same as those of the objective evaluation. Each sound image was presented twice in random order of four directions (0 , 30 , 60 , and 90 degs.). The number of participants was six (two females and four males). They were requested to keep maintain a straight posture and keep their eyes closed to shut out the information from sight. Head movement was allowed in the 0- to 90-deg. direction range.0 30 60 90 Angle [deg. |-e Target field - Synthesized field (EDL-VBAP) -- Synthesized field (PAL-VBAP)- Synthesized field (Proposed method)4.2. Results of Subjective Evaluation The sounds in each field were evaluated by the participants from the following three aspects. Reverberation : The length of reverberation of the sound image in the synthesized fields were evaluated on a five-point scale in comparison with the target field. Table 5 shows evaluation indexes for reverberation of sound image compared with the target field. Figure 7 shows the experimental results of reverberation perception with each field. As shown in this figure, the sound image constructed in the synthesized field (Proposed method) was perceived to have a longer reverberation than the target field. This is the same result as with EDT (in Figure 3). S harpness : The sharpness of the sound image in the synthesized fields were evaluated on a five- point scale in comparison with the target field. Table 6 lists the evaluation indexes for sharpness ofTab le 6: Evaluation index for sharpn essTable 5: Evaluation index for reverbe rationILD [dB] 30 Angle [de 60 g. | 90Length of reverberation ScoreSharpness of sound image ScoreMuch longer 5Much sharper 5Slightly longer 4Slightly sharper 4About the same 3About the same 3Slightly shorter 2Slightly lacks sharpness 2Much shorter 1Lacks sharpness 1_| Target field = Synthesized field (EDL-VBAP) [|] Synthesized field (PAL-VBAP) fi Synthesized field (Proposed method) Min aussi, inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS GLASGOWLong > K Average score Uv Short EDL-VBAP PAL-VBAP Proposed methodFigure 8: Results on sharpness perceptionFigure 7: Results on reverberation perceptionSharp >° Average score oS) EDL-VBAP PAL-VBAP Proposed method Diffuse(d) Synthesized field (Proposed method) CAR = 100 % CAR = 75 % CAR = 63 % CAR = 38 % Figure 9: Results on direction perception(a) Target field (b) Synthesized field (EDL-VBAP)(c) Synthesized field (PAL-VBAP)90 0 Presented angle [deg. | 6 30 0 [sop] o[sue poloMsUYysound image compared with the target field. Figure 8 shows the experimental results of sharpness perception with each field. As shown in this figure, the perceived sound images were more di ff use in the synthesized field (Proposed method) that those in the target field. This suggests that the combination of EDLs in the synthesized field (Proposed method) a ff ects sharpness perception. Also, the large variance in the synthesized field (Proposed method) indicates that there were individual di ff erences in perception. These individual di ff erences might have been caused by di ff erences in the head position of the participants. Direction : The direction of the sound image was evaluated with the correct answer rate (CAR). Figure 9 shows the experimental results of direction perception with each field. The CAR in the synthesized field (Proposed method) was the worst. This suggests that the combination of EDLs in the synthesized field (Proposed method) a ff ects the direction perception. In addition, the CAR in the synthesized field (PAL-VBAP) was lower than in the synthesized field (EDL- VBAP). This is because the directivity of a PAL di ff ers from that of an EDL; thus, it is necessary to set the loudspeakers in an optimal arrangement.[sop] o[sue poloMsUYy Presented angle [deg. |[sop] o[sue poloMsUy Presented angle [deg. |5. CONCLUSIONS We proposed and evaluated a sharp sound image construction method that is based on reverberation control with a surround sound system using PALs and EDLs through objective and subjectiveAnswered angle [deg. ] 0 30 60 90 Presented angle [deg. | evaluations. From the results of the subjective evaluation, it is necessary to reconsider the measurements for designing the reverberation control filters. We should also consider the directivity of PALs for better sharpness and direction perception. We will add low-frequency reproduction to EDLs for better sound quality.ACKNOWLEDGEMENTSThis work was partly supported by the Ritsumeikan Global Innovation Research Organization (R- GIRO), and JSPS KAKENHI Grant Numbers JP19H04142 and JP21H03488.REFERENCES[1] V. Pulkki. Virtual sound source positioning using vector base amplitude panning. Journal of the Audio Engineering Society , 45(6):456–466, 1997. [2] E. Tan, W. Gan, and C. Chen. Spatial sound reproduction using conventional and parametric loudspeakers. In Proceedings of 2012 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) , pages 1–9. IEEE, 2012. [3] W. Gan, J. Yang, and T. Kamakura. A review of parametric acoustic array in air. Applied Acoustics , 73(12):1211–1219, 2012. [4] D. Ikefuji, H. Tsujii, S. Masunaga, M. Nakayama, T. Nishiura, and Y. Yamashita. Reverberation steering and listening area expansion on 3-D sound field reproduction with parametric array loudspeaker. In Proceedings of 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) , pages 1–5. IEEE, 2014. [5] N. Shimada, K. Iwai, M. Nakayama, and T. Nishiura. High-presence sharp sound image based on sound blending using parametric and dynamic loudspeakers. Journal of Signal Processing , 24(4):171–174, 2020. [6] Y. Harada, K. Iwai, M. Nakayama, and T. Nishiura. 3-D sound field reproduction with reverberation control on surround sound system by combining parametric and electro-dynamic loudspeakers. In Proceedings of INTER-NOISE and NOISE-CON Congress and Conference Proceedings , volume 263(5), pages 1083–1094. Institute of Noise Control Engineering, 2021. [7] H. Kuttru ff . Room acoustics . Crc Press, 2016. [8] S. Aoki, M. Toba, and N. Tsujita. Sound localization of stereo reproduction with parametric loudspeakers. Applied Acoustics , 73(12):1289–1295, 2012. [9] N. Aoshima. Computer-generated pulse signal applied for sound measurement. The Journal of the Acoustical Society of America , 69(5):1484–1488, 1981. [10] S. Sato and Y. Ando. On the apparent source width (ASW) for bandpass noises related to the iacc and the width of the interaural cross-correlation function (W IACC). The Journal of the Acoustical Society of America , 105(2):1234–1234, 1999. [11] S.T. Birchfield and R. Gangishetty. Acoustic localization by interaural level di ff erence. In Proceedings of 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05) , page IV: 1109. IEEE, 2005.Min aussi, inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS GLASGOW Previous Paper 250 of 808 Next