Welcome to the new IOA website! Please reset your password to access your account.

Distance control of virtual sound source based on switching electro- dynamic and parametric loudspeaker arrays Ayano Hirose 1 Graduate School of Information Science and Engineering, Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan Haonan Wang 2 College of Information Science and Engineering, Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan Masato Nakayama 3 Faculty of Design Technology, Osaka Sangyo University 3-1-1 Nakagaito, Daito, Osaka 574-8530, Japan Takanobu Nishiura 4 College of Information Science and Engineering, Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan

ABSTRACT High-presence sound-field reproduction technology has been gaining attention. We previously pro- posed a virtual sound source (VSS) construction method that is based on wave field synthesis (WFS) using a parametric loudspeaker (PAL) array. This method can construct a VSS at any position with- out relocating the PALs. However, it is difficult to construct a VSS near a PAL array with this method. In this paper, we propose a method for controlling the distance of a VSS on the basis of WFS using an electro-dynamic loudspeaker (EDL) array and PAL array. When using an EDL array, VSSs can be constructed near it, and when using a PAL array, VSSs can be constructed far from it. Therefore, with this method, we place an EDL array and PAL array parallel to each other and switch between them for VSS construction depending on the desired position to achieve VSS distance control. The threshold of switching the loudspeaker array was determined through objective and subjective pre- liminary evaluations. We demonstrated the effectiveness of the proposed method through evaluation experiments.

1 is0471hf@ed.ritsumei.ac.jp

2 h-wang@fc.ritsumei.ac.jp

3 nakayama@ise.osaka-sandai.ac.jp

4 nishiura@is.ritsumei.ac.jp

i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOW

1. INTRODUCTION

Attention has been focused on high-presence sound-field reproduction technology [1], which can provide highly realistic sensations. Multi-channel surround systems, such as 5.1 ch. surround [2] and 22.2 ch. surround [3], and Acoustic Planetarium [4] using parametric array loudspeakers (PALs) [5, 6] have been proposed for reproducing a sound field with high presence. In a multi-channel surround system, multiple loudspeakers are placed above, below, left, and right of the listener, so that a virtual sound source (VSS) is constructed between each loudspeaker by amplitude panning. However, the listening area is limited by the placement of the loudspeakers. In Acoustic Planetarium using PALs, the sound radiated from the PAL is reflected off walls and ceilings, creating a VSS on the reflective surfaces. Therefore, the listener perceives an acoustic sound image from a reflective surface. How- ever, this method requires reflective objects such as walls and ceilings, which must be moved to construct a VSS in the air.

To solve this problem, a method for constructing a VSS on the basis of wave field synthesis (WFS) using a loudspeaker array was proposed [7, 8]. With this method, loudspeakers are placed in a single line, facing the frontal direction, so it is possible to construct a VSS at the desired position without repositioning the loudspeakers. However, this method may inaccurately construct a VSS due to the radiation characteristics of the loudspeakers used. When using an electro-dynamic loudspeaker (EDL) array, it is difficult to construct a VSS far from this array because the sound pressure decreases as the distance from the loudspeaker increases. When using a PAL array, however, it is difficult to construct a VSS near this array because the demodulation is insufficient in the vicinity of the PAL.

We propose a method for constructing a VSS at the desired position with high accuracy, which is based on a WFS using an EDL array and PAL array. With this method, an EDL array and PAL array are placed vertically parallel. An EDL array is suitable for constructing a VSS near it, while a PAL array is suitable for constructing a VSS far from it [9]. Therefore, focusing on the characteristics of each loudspeaker array, VSS distance control is achieved by switching the loudspeaker array to be used. Objective and subjective evaluations were conducted on a VSS constructed using the proposed method and those constructed using only an EDL array or PAL array to confirm the effectiveness of the proposed method.

i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOW

Figure 1: Overview of proposed method

Step 1: Design of driving filter f position Desired Design of driving filter Input signal Z Loudspeaker switching with desired position (xs,¥5) and switching threshold EDL array 4 Ampliud \patarry modulation x Step 2: Distance control of VSS based on switching loudspeaker array

2. PROPOSED METHOD

2.1. Overview of Proposed Method Figure 1 shows an overview of the proposed method. We construct VSSs on the 𝑥𝑦 plane. We con- struct VSSs near an EDL array and far from a PAL array. In Step 1 of Figure 1, driving functions are designed to construct a VSS at the desired position 𝒓 ୱ ൫= [ 𝑥 ୱ 𝑦 ୱ ] ୘ ൯ on basis of WFS. In Step 2, the loudspeaker array (EDL or PAL) to be used is selected in accordance with the VSS construction position. When constructing a VSS using an EDL array, the radiated signals are the convolved signals of the input signals and the designed driving function. When a PAL array is used to construct a VSS, the radiated signals are the amplitude-modulated signals of the convolved signals. Thus, it is possible to control the distance of the VSS.

2.2. Design of Driving Filter on Basis of Wave Field Synthesis WFS is a sound-field reproduction technique that reproduces the spatial wavefronts of sound on the basis of physical acoustic models [10]. Figure 2 shows the coordinate system for WFS, where 𝒓(= [ 𝑥 𝑦] ୘ ) is the position vector of a certain position in the sound field in the Cartesian coordinate system, 𝒓 ଴ (= [𝑥 ଴ 0] ୘ ) is the position vector of a position on an infinite line, and 𝑦= 𝑦 ୰ୣ୤ is refer- ence line [8]. We can calculate the driving function 𝐷 ୡ୭୬୲ (𝒓 ଴ , 𝜔) to construct the focused VSS [10] at 𝒓 ୱ using the following equation.

𝐷 ୡ୭୬୲ (𝒓 ଴ , 𝜔) = −ඥ2𝜋|𝑦 ୰ୣ୤ −𝑦 ଴ | 𝑗𝑘

𝑦 ଴ −𝑦 ୱ |𝒓 ଴ −𝒓 ୱ | 𝐻 ଵ

(ଵ) (𝑘|𝒓 ଴ −𝒓 ୱ |), (1)

2

(ଵ) is the Hankel function of first kind and first-order. However, with the driving function 𝐷 ୡ୭୬୲ (𝒓 ଴ , 𝜔) calcu- lated in Equation (1), an infinite-length and continuous linear source as the secondary source in WFS theory are assumed, so it must be approximated by a discrete and finite-length loudspeaker array when implemented in a real environment. Discretization of the secondary source can be achieved by a spatial sampling of the driving function [11]. Equation (2) shows the formula for calculating the driving function 𝐷 ୢ୧ୱ (𝑥 ௜ , 𝜔) after spatial sampling.

where 𝜔 is the angular frequency, 𝑗 is the imaginary unit √−1 , 𝑘 is the wavenumber, and 𝐻 ଵ

𝐷 ୢ୧ୱ (𝑥 ௜ , 𝜔) = 𝐷 ୡ୭୬୲ (𝒓 ௜ , 𝜔) ∙ 1

, (2)

∆𝑥 ෍𝛿(𝑥−𝑖∆𝑥)

௜ୀିஶ

where 𝑖 is the index of the secondary source, 𝒓 ௜ (= [𝑥 ௜ 0] ୘ ) is the position vector of the secondary source, and 𝛿 is the delta function. However, spatial aliasing occurs when the secondary source is replaced with discrete loudspeakers. Spatial aliasing is a phenomenon in which extra wavefronts are generated at frequencies above the spatial aliasing frequency, and the wavefronts are disturbed so that the sound field cannot be accurately reproduced [8]. Spatial aliasing is manifested as spectral overlap in the wavenumber domain. Local sound field synthesis [8] suppresses the spectral overlap, which makes it possible to reproduce the desired sound field even in bands above the spatial aliasing.

The truncation of a secondary source corresponds to applying a rectangular window to the second- ary source in the Cartesian coordinate system [12]. Equation (3) shows the driving function 𝐷(𝑥 ௜ , 𝜔) of the secondary source after the truncation.

𝐷(𝑥 ௜ , 𝜔) = 𝐷 ୢ୧ୱ (𝑥 ௜ , 𝜔) ∙ቊ 1, if 1 ≤𝑖≤𝐼,

(3)

0, otherwise,

i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOW

where 𝐼 is the number of loudspeakers. The area where the sound field can be reproduced is limited due to the truncation error caused [13]. Figure 3 shows the area in which the sound field can be reproduced after discretization and truncation of the secondary sound sources. When the secondary sound sources are distributed from 𝑥 ଵ to 𝑥 ூ , the region where the sound field can be accurately repro- duced is limited to the front of the line connecting 𝒓 ୱ and the two ends 𝑥 ଵ and 𝑥 ூ of the loudspeaker array.

Finally, the driving function 𝑑(𝑥 ௜ , 𝑡) in the time domain is calculated by inverse Fourier transform, which can be expressed as

𝑑(𝑥 ௜ , 𝑡) = IDFT[𝐷(𝑥 ௜ , 𝜔)], (4)

i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOW

where 𝑡 is the time index and IDFT[∙] is the inverse discrete Fourier transform.

2.3. Distance Control of Virtual Sound Source on Basis of Switching Loudspeaker Array The designed driving function 𝑑(𝑥 ௜ , 𝑡) is convolved with the input signal 𝑣 ୱ (𝑡) . The convolved signal 𝑣 ୱ ෝ(𝑥 ௜ , 𝑡) can be expressed as

𝑣 ୱ ෝ(𝑥 ௜ , 𝑡) = 𝑣 ୱ (𝑡) ∗𝑑(𝑥 ௜ , 𝑡), (5)

where ∗ denotes the convolution operator.

As described above, the loudspeaker array is switched depending on the position for VSS construc- tion. The threshold for the VSS distance control is 𝑦 ୲ . When 𝑦 ୱ < 𝑦 ୲ , the EDL array is used, and the radiation signal is the convolved signal 𝑣 ୱ ෝ(𝑥 ௜ , 𝑡) . When 𝑦 ୱ ≥𝑦 ୲ , the PAL array is used, and amplitude modulation is applied to the convolved signal 𝑣 ୱ ෝ(𝑥 ௜ , 𝑡) . Equations (6) and (7) show the radiation sig- nals 𝑣 ୉ୈ୐ (𝑥 ௜ , 𝑡) and 𝑣 ୔୅୐ (𝑥 ௜ , 𝑡) from the EDL and PAL, respectively.

𝑣 ୉ୈ୐ (𝑥 ௜ , 𝑡) = ቊ 𝑣 ୱ ෝ(𝑥 ௜ , 𝑡) if 𝑦 ୱ < 𝑦 ୲ ,

(6)

0 otherwise,

𝑣 ୔୅୐ (𝑥 ௜ , 𝑡) = ቊ 0 if 𝑦 ୱ < 𝑦 ୲ ,

(7)

{1 + 𝑚∙𝑣 ୱ ෝ(𝑥 ௜ , 𝑡)}𝑣 େ (𝑡) otherwise,

Arbitrary point in the sound field Secondary sound source f

where 𝑚 is the modulation and 𝑣 େ (𝑡) is the carrier wave.

The radiation signal is generated by the above process, and the loudspeaker array is switched in accordance with the VSS construction position. Therefore, it is possible to achieve VSS distance control on the basis of the WFS using the EDL and PAL arrays.

Figure 3: Area in which sound field can be re- produced after discretization and truncation of

Figure 2: Coordinate system for WFS

secondary sound sources

Accurate sound-field reproduction arca Far away Near ee eee er Secondary sound source

3. OBJECTIVE EVALUATION ON SOUND-PRESSURE DISTRIBUTION

3.1. Conditions for Objective Evaluation We conducted an objective evaluation on sound-pressure distribution to confirm the accuracy of con- structing a VSS at the desired position. This evaluation was conducted for the following three cases.

i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOW

• Real: Real sound sources (Fostex, FE83En) are placed at the VSS construction positions • WFS-EDL: VSSs are constructed using an EDL array of 16 EDLs at an interval of 0.22 m • WFS-PAL: VSSs are constructed using a PAL array of 16 PALs at an interval of 0.22 m Tables 1 and 2 list the experimental conditions and equipment used in this experiment, respectively. Figure 4 shows the equipment arrangement for this experiment. Microphones were placed at intervals of 0.1 m, and the sound pressure at each position was calculated.

3.2. Experimental Results of Objective Evaluation Figure 5 shows the results of the objective evaluation on sound-pressure distribution. Figure 5 (b) shows the WFS-EDL results. When the VSS was constructed at 𝒓 ୱ = [0.0 0.5 ] ୘ , [ 0.0 1.0] ୘ , the sound pressure at this position was higher. At 𝒓 ୱ = [0.0 1.5 ] ୘ , [ 0.0 2.0] ୘ , the sound pressure near the loudspeaker array was higher than that of the VSS. This can be explained by the sound pressure near the loudspeaker being the highest for EDL, and the sound pressure decreases as the distance from the loudspeaker increases. Figure 5 (c) shows the WFS-PAL results. When the VSS was con- structed at 𝒓 ୱ = [0.0 0.5 ] ୘ , [ 0.0 1.0] ୘ , the sound pressure at this position was low. This can be explained by the insufficient demodulation in the vicinity of the PALs. At 𝒓 ୱ = [0.0 1.5 ] ୘ ,

=:PAL/EDL __*: Observation point -1.5 -10-05 0.0 05 1.0 15 x[m)

Table 1: Experimental conditions Table 2: Experimental equipment

Ultrasonic transducer SPL (Hong Kong) Limited,

Ambient noise level 𝐿 ୅ = 25.3 dB

UT1007-Z325R

Sampling frequency 96 kHz

Electro-dynamic

Pinbotronix, AT37YF2655MUI5112

Environment Office room ( 𝑇 60 = 0.65 s)

loudspeaker

came (ee. alan fol folloli- EDL array HOOO +: OOO’ ovaray %

Sound velocity 𝑐= 343 m/s

Microphone SONY, ECM-88B

Interval of loudspeakers ∆𝑥= 0.22 m

Power amplifier YAMAHA, P4050 (EDL), YAMAHA, IPA8200 (PAL)

Reference line 𝑦 ୰ୣ୤ = 2.5 m

Microphone amplifier RME, OctaMic Ⅱ

Number of loudspeakers 𝐼= 16

D/A converter RME, M-32 DA

Input signal White noise (0~8 kHz)

A/D converter RME, M-32 AD

𝒓 ୱ = [0.0 0.5] T , [0.0 1.0] T , [0.0 1.5] T , [0.0 2.0] T

Position of virtual sound source

Audio interface RME, MADIface USB

Figure 4: Equipment arrangement

i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOW

𝒓 ୱ = [0.0 0.5] ୘

𝒓 ୱ = [0.0 1.0] ୘

$8 : Position of VSS “06 04-02 0 02 paver [dB] 06 is 04 “1 02 * “06 04-02 0 x[m) ° 02 04 06 0.6 04-02 0 x{m)

𝒓 ୱ = [0.0 1.5] ୘

poner [a] “4 “06 04-02 0 02 04 06 x [m] ° | ‘Ee q 32 a j-102 08 im 06. 14 04 16 02 Bis 0.6 04-02 0 02 04 06 x [m] g 08 12 06 a4 04 16 02 Bis 0.6 04-02 0 02 04 06 x [m] ‘Power [4B] 06 04-02 0 02 04 06 x[m]

𝒓 ୱ = [0.0 2.0] ୘

(a) Real (b) WFS-EDL (c) WFS-PAL Figure 5: Experimental results of objective evaluation on sound-pressure distribution [0.0 2.0] ୘ , the sound pressure at the VSS construction position was the highest. This can be ex- plained by PALs propagating sound farther from them without attenuation of sound pressure. From the above results, it is possible to construct a VSS near or far from an EDL array or PAL array, respectively. Therefore, the distance of the VSS can be controlled by switching the loudspeaker array to be used depending on the position of the VSS construction. 4. SUBJECTIVE EVALUATION ON PERCEPTION OF SOUND-IMAGE LOCATION

r - fa 1.6) 62 4 3 E13 : E2 i) 08 = 06 ao ssi Lis si 2 . i 1 ls “06 04 4 0 02 04 06 x[m] 4 6 8 an 16 18 0.6 04-02 0 02 04 06 x[m] — a 6 8 10 08 2 oa an 04 16 oF 18 “06 04-02 0 02 04 06 x{m) eri) ‘Power [4B]

4.1. Experimental Conditions for Subjective Evaluation To confirm the effectiveness of the proposed method, we conducted a subjective evaluation on the location perception of VSSs. The experimental conditions and equipment used in this evaluation were the same as in Tables 1 and 2. Two participants (one man and one woman) were asked to separately stand upright at position (𝑥, 𝑦) = (0,2.5) on the reference line shown in Figure 4. The VSS at each construction position was presented twice, then the participants were asked to indicate the position of the VSS, and the correct response rate was calculated.

4.2. Experimental Results of Subjective Evaluation Figure 6 shows the results of this evaluation. Figures 6 (a) and (b) show all VSSs constructed under the WFS-EDL and WFS-PAL conditions, respectively. As shown in Figure 6 (c), the VSSs were constructed at 𝒓 ୱ = [0.0 0.5 ] ୘ , [ 0.0 1.0] ୘ under the WFS-EDL condition and at 𝒓 ୱ = [0.0 1.5 ] ୘ , [ 0.0 2.0] ୘ under the WFS-PAL condition. The size of the circle represents the number of correct and incorrect responses.

Figure 6 (a) shows that the VSS near the EDL array could be correctly perceived, but the VSS far from this array tended to be perceived near it by the participants. This can be explained by the sound emitted from an EDL attenuates as it gets further from it, making it difficult to construct a VSS far from this array. Figure 6 (b) shows that the VSS far from the PAL array could be correctly perceived, while that near this array tended to be perceived near the participant. This is because the demodulation is insufficient near the PALs, and the VSS is perceived at a position where the sound pressure is higher than that of its construction position, which is in the vicinity of the listener. From Figure 6 (c), the sound-image location can be perceived at the VSS construction location in all positions, and a high percentage of correct responses was obtained. These results confirm that the proposed method can control the distance of VSSs. 5. CONCLUSIONS

We proposed a VSS-distance control method that is based on WFS using an EDL array and PAL array to construct a VSS at the desired position with high accuracy. With this method, the EDL array and PAL array are placed parallel to each other. VSS distance control is achieved by switching be- tween these arrays.

i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOW

Objective and subjective evaluations were conducted to confirm the effectiveness of the proposed method. In the objective evaluation, the sound-pressure distribution was evaluated to confirm the position where the VSS can be constructed when an EDL array or PAL array is used. From the ob- jective evaluation, it was confirmed that it is suitable to construct a VSS near an EDL array and far from a PAL array. From the subjective evaluation, it was confirmed that switching the loudspeaker array improves the accuracy of VSS-position perception.

In the future, we will introduce crossfading for more seamless switching of the loudspeaker array. We will also investigate a method of generating radiation signals by taking into account the frequency amplitude characteristics of each loudspeaker.

(a) WFS-EDL (b) WFS-PAL (c) Proposed method Figure 6: Experimental results of subjective evaluation on perception of sound-image location

@ :Correct response ©: Incorrect response 25 2 0 0.5 1 15 z 25 0 0.5 1 13 2 23 0 0.5 1 is 2 25 Presented position [m] Presented position [m] Presented position [m]

6. ACKNOWLEDGEMENTS

This work was partly supported by the Ritsumeikan Global Innovation Research Organization (RGIRO), and JSPS KAKENHI Grant Numbers JP19H04142 and JP21H03488. 7. REFERENCES

1. A. Ando, "Theory of Three-Dimensional Sound Field Reproduction," IEICE, SP, Fundamentals

Review, vol. 3, no. 4, pp. 33–46, 2010. 2. ITU-R Rec. BS. 775-3, "Multichannel stereophonic sound system with and without accompany-

ing picture," ITU, 2012. 3. K. Hamasaki, T. Nishiguchi, R. Okumura, Y. Nakayama, and A. Ando, "A 22.2 Multichannel

Sound System for Ultrahigh-Definition TV (UHDTV)," SMPTE Motion Imaging Journal, vol. 117, no. 3, pp. 40–49, 2008. 4. Y. Sugibayashi, S. Kurimoto, D. Ikefuji, M. Morise, and T. Nishiura, "Three-dimensional acous-

tic sound field reproduction based on hybrid combination of multiple parametric loudspeakers and electrodynamic subwoofer,'' Applied Acoustics, vol. 73, no. 12, pp. 1282–1288, 2012. 5. P. J. Westervelt, "Parametric acoustic array," The Journal of the ASA, vol. 35, no. 4, pp. 535–

537, 1963. 6. W. S. Gan, J. Yang, T. Kamakura, "A review of parametric acoustic array in air," Applied Acous-

tics, vol. 73, no. 12, pp. 1211–1219, 2012. 7. J. Ahrens, "Analytic Methods of Sound Field Synthesis," Springer, 2012. 8. J. Ahrens and S. Spors, "An analytical approach to local sound field synthesis using linear arrays

of loudspeakers," ICASSP, pp. 65–68, 2011. 9. S. Sayama, M. Nakayama and T. Nishiura, "Virtual sound source construction based on wave

field synthesis using multiple parametric array loudspeakers," INTER-NOISE 2020, pp. 12–9– 338, E–Congress, Aug 2020. 10. S. Spors, H. Wierstorf, M. Geier, and J. Ahrens, "Physical and Perceptual Properties of Focused

Sources in Wave Field Synthesis," AES 127th Convention, Paper Number: 7914, 2009. 11. J. Ahrens and S. Spors, "Spatial Sampling Artifacts of Focused Sources in Wave Field Synthesis,"

International Conference on Acoustics, pp. 1556–1559, 2009. 12. S. Spors and J. Ahrens, "Spatial sampling artifacts of wave field synthesis for the reproduction of

virtual point sources," AES 126th Convention, Paper Number: 7714, 2009. 13. H. Wierstorf, M. Geier, A. Raake, and S. Spors, "Perception of Focused Sources in Wave Field

Synthesis," Journal of the AES, vol. 61, no. 1/2, pp. 5–16, 2013.

i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOW