Welcome to the new IOA website! Please reset your password to access your account.

Proceedings of the Institute of Acoustics

 

 

Distance control of virtual sound source based on switching electro dynamic and parametric loudspeaker arrays

 

Ayano Hirose1, Graduate School of Information Science and Engineering, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan


Haonan Wang2, College of Information Science and Engineering, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan


Masato Nakayama3, Faculty of Design Technology, Osaka Sangyo University, 3-1-1 Nakagaito, Daito, Osaka 574-8530, Japan


Takanobu Nishiura4, College of Information Science and Engineering, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan

 

ABSTRACT

 

High-presence sound-field reproduction technology has been gaining attention. We previously proposed a virtual sound source (VSS) construction method that is based on wave field synthesis (WFS) using a parametric loudspeaker (PAL) array. This method can construct a VSS at any position without relocating the PALs. However, it is difficult to construct a VSS near a PAL array with this method. In this paper, we propose a method for controlling the distance of a VSS on the basis of WFS using an electro-dynamic loudspeaker (EDL) array and PAL array. When using an EDL array, VSSs can be constructed near it, and when using a PAL array, VSSs can be constructed far from it. Therefore, with this method, we place an EDL array and PAL array parallel to each other and switch between them for VSS construction depending on the desired position to achieve VSS distance control. The threshold of switching the loudspeaker array was determined through objective and subjective preliminary evaluations. We demonstrated the effectiveness of the proposed method through evaluation experiments.

 

1. INTRODUCTION

 

Attention has been focused on high-presence sound-field reproduction technology [1], which can provide highly realistic sensations. Multi-channel surround systems, such as 5.1 ch. surround [2] and 22.2 ch. surround [3], and Acoustic Planetarium [4] using parametric array loudspeakers (PALs) [5, 6] have been proposed for reproducing a sound field with high presence. In a multi-channel surround system, multiple loudspeakers are placed above, below, left, and right of the listener, so that a virtual sound source (VSS) is constructed between each loudspeaker by amplitude panning. However, the listening area is limited by the placement of the loudspeakers. In Acoustic Planetarium using PALs, the sound radiated from the PAL is reflected off walls and ceilings, creating a VSS on the reflective surfaces. Therefore, the listener perceives an acoustic sound image from a reflective surface. However, this method requires reflective objects such as walls and ceilings, which must be moved to construct a VSS in the air.

 

To solve this problem, a method for constructing a VSS on the basis of wave field synthesis (WFS) using a loudspeaker array was proposed [7, 8]. With this method, loudspeakers are placed in a single line, facing the frontal direction, so it is possible to construct a VSS at the desired position without repositioning the loudspeakers. However, this method may inaccurately construct a VSS due to the radiation characteristics of the loudspeakers used. When using an electro-dynamic loudspeaker (EDL) array, it is difficult to construct a VSS far from this array because the sound pressure decreases as the distance from the loudspeaker increases. When using a PAL array, however, it is difficult to construct a VSS near this array because the demodulation is insufficient in the vicinity of the PAL.

 

We propose a method for constructing a VSS at the desired position with high accuracy, which is based on a WFS using an EDL array and PAL array. With this method, an EDL array and PAL array are placed vertically parallel. An EDL array is suitable for constructing a VSS near it, while a PAL array is suitable for constructing a VSS far from it [9]. Therefore, focusing on the characteristics of each loudspeaker array, VSS distance control is achieved by switching the loudspeaker array to be used. Objective and subjective evaluations were conducted on a VSS constructed using the proposed method and those constructed using only an EDL array or PAL array to confirm the effectiveness of the proposed method.

 

 

Figure 1: Overview of proposed method

 

2. PROPOSED METHOD

 

2.1. Overview of Proposed Method

 

Figure 1 shows an overview of the proposed method. We construct VSSs on the 𝑥𝑦 plane. We construct VSSs near an EDL array and far from a PAL array. In Step 1 of Figure 1, driving functions are designed to construct a VSS at the desired position 𝒓s= [𝑥 𝑦s]T on basis of WFS. In Step 2, the loudspeaker array (EDL or PAL) to be used is selected in accordance with the VSS construction position. When constructing a VSS using an EDL array, the radiated signals are the convolved signals of the input signals and the designed driving function. When a PAL array is used to construct a VSS, the radiated signals are the amplitude-modulated signals of the convolved signals.

 

Thus, it is possible to control the distance of the VSS.

 

2.2. Design of Driving Filter on Basis of Wave Field Synthesis

 

WFS is a sound-field reproduction technique that reproduces the spatial wavefronts of sound on the basis of physical acoustic models [10]. Figure 2 shows the coordinate system for WFS, where 𝒓(= [𝑥  𝑦]T) is the position vector of a certain position in the sound field in the Cartesian coordinate system, 𝒓0(= [𝑥0  0]T) is the position vector of a position on an infinite line, and 𝑦 = 𝑦ref is reference line [8]. We can calculate the driving function 𝐷count(𝒓0, 𝜔) to construct the focused VSS [10] at 𝒓s using the following equation.

 

 

where 𝜔 is the angular frequency, 𝑗 is the imaginary unit √−1, 𝑘 is the wavenumber, and 𝐻1(1) is the Hankel function of first kind and first-order. However, with the driving function Dcount(𝒓0, 𝜔) calculated in Equation (1), an infinite-length and continuous linear source as the secondary source in WFS theory are assumed, so it must be approximated by a discrete and finite-length loudspeaker array when implemented in a real environment. Discretization of the secondary source can be achieved by a spatial sampling of the driving function [11]. Equation (2) shows the formula for calculating the driving function 𝐷dis(𝑥i , 𝜔) after spatial sampling.

 

 

where 𝑖 is the index of the secondary source, 𝒓i (= [𝑥i  0]T) is the position vector of the secondary source, and 𝛿 is the delta function. However, spatial aliasing occurs when the secondary source is replaced with discrete loudspeakers. Spatial aliasing is a phenomenon in which extra wavefronts are generated at frequencies above the spatial aliasing frequency, and the wavefronts are disturbed so that the sound field cannot be accurately reproduced [8]. Spatial aliasing is manifested as spectral overlap in the wavenumber domain. Local sound field synthesis [8] suppresses the spectral overlap, which makes it possible to reproduce the desired sound field even in bands above the spatial aliasing.

 

The truncation of a secondary source corresponds to applying a rectangular window to the secondary source in the Cartesian coordinate system [12]. Equation (3) shows the driving function 𝐷(𝑥i , 𝜔) of the secondary source after the truncation.

 

 

where 𝐼 is the number of loudspeakers. The area where the sound field can be reproduced is limited due to the truncation error caused [13]. Figure 3 shows the area in which the sound field can be reproduced after discretization and truncation of the secondary sound sources. When the secondary sound sources are distributed from 𝑥1 to 𝑥I , the region where the sound field can be accurately reproduced is limited to the front of the line connecting 𝒓s and the two ends 𝑥1 and 𝑥I of the loudspeaker array.

 

Finally, the driving function 𝑑(𝑥i, 𝑡) in the time domain is calculated by inverse Fourier transform, which can be expressed as

 

 

where 𝑡 is the time index and IDFT[∙] is the inverse discrete Fourier transform.

 

2.3. Distance Control of Virtual Sound Source on Basis of Switching Loudspeaker Array

 

The designed driving function 𝑑(𝑥i , 𝑡) is convolved with the input signal 𝑣(𝑡). The convolved signal (𝑥i, 𝑡) can be expressed as

 

 

where ∗ denotes the convolution operator.

 

As described above, the loudspeaker array is switched depending on the position for VSS construction. The threshold for the VSS distance control is 𝑦t. When 𝑦s < 𝑦t, the EDL array is used, and the radiation signal is the convolved signal (𝑥i, 𝑡). When 𝑦s ≥ 𝑦t, the PAL array is used, and amplitude modulation is applied to the convolved signal (𝑥i, 𝑡). Equations (6) and (7) show the radiation signals 𝑣PAL(𝑥i , 𝑡) and 𝑣PAL (𝑥i , 𝑡) from the EDL and PAL, respectively.

 

 

where 𝑚 is the modulation and 𝑣C(𝑡) is the carrier wave.

 

The radiation signal is generated by the above process, and the loudspeaker array is switched in accordance with the VSS construction position. Therefore, it is possible to achieve VSS distance control on the basis of the WFS using the EDL and PAL arrays.

 

 

Figure 2: Coordinate system for WFS


 

Figure 3: Area in which sound field can be reproduced after discretization and truncation of secondary sound sources

 

3. OBJECTIVE EVALUATION ON SOUND-PRESSURE DISTRIBUTION

 

3.1. Conditions for Objective Evaluation

 

We conducted an objective evaluation on sound-pressure distribution to confirm the accuracy of con structing a VSS at the desired position. This evaluation was conducted for the following three cases.

  • Real: Real sound sources (Fostex, FE83En) are placed at the VSS construction positions
  • WFS-EDL: VSSs are constructed using an EDL array of 16 EDLs at an interval of 0.22 m
  • WFS-PAL: VSSs are constructed using a PAL array of 16 PALs at an interval of 0.22 m

 

Tables 1 and 2 list the experimental conditions and equipment used in this experiment, respectively. Figure 4 shows the equipment arrangement for this experiment. Microphones were placed at intervals of 0.1 m, and the sound pressure at each position was calculated.

 

3.2. Experimental Results of Objective Evaluation

 

Figure 5 shows the results of the objective evaluation on sound-pressure distribution. Figure 5 (b) shows the WFS-EDL results. When the VSS was constructed at 𝒓s = [0.0 0.5]T, [0.0 1.0]T, the sound pressure at this position was higher. At 𝒓s = [0.0 1.5]T, [0.0 2.0]T, the sound pressure near the loudspeaker array was higher than that of the VSS. This can be explained by the sound pressure near the loudspeaker being the highest for EDL, and the sound pressure decreases as the distance from the loudspeaker increases. Figure 5 (c) shows the WFS-PAL results. When the VSS was constructed at 𝒓s = [0.0 0.5]T, [0.0 1.0]T, the sound pressure at this position was low. This can be explained by the insufficient demodulation in the vicinity of the PALs. At 𝒓s = [0.0 1.5]T,

 

Table 1: Experimental conditions

 

 

Table 2: Experimental equipment

 

 

 

Figure 4: Equipment arrangement

 

 

Figure 5: Experimental results of objective evaluation on sound-pressure distribution

 

[0.0 2.0]T, the sound pressure at the VSS construction position was the highest. This can be explained by PALs propagating sound farther from them without attenuation of sound pressure. From the above results, it is possible to construct a VSS near or far from an EDL array or PAL array, respectively. Therefore, the distance of the VSS can be controlled by switching the loudspeaker array to be used depending on the position of the VSS construction.

 

4. SUBJECTIVE EVALUATION ON PERCEPTION OF SOUND-IMAGE LOCATION

 

4.1. Experimental Conditions for Subjective Evaluation

 

To confirm the effectiveness of the proposed method, we conducted a subjective evaluation on the location perception of VSSs. The experimental conditions and equipment used in this evaluation were the same as in Tables 1 and 2. Two participants (one man and one woman) were asked to separately stand upright at position (𝑥, 𝑦) = (0,2.5) on the reference line shown in Figure 4. The VSS at each construction position was presented twice, then the participants were asked to indicate the position of the VSS, and the correct response rate was calculated.

 

4.2. Experimental Results of Subjective Evaluation

 

Figure 6 shows the results of this evaluation. Figures 6 (a) and (b) show all VSSs constructed under the WFS-EDL and WFS-PAL conditions, respectively. As shown in Figure 6 (c), the VSSs were constructed at 𝒓s = [0.0 0.5]T , [0.0 1.0]T under the WFS-EDL condition and at 𝒓s = [0.0 1.5]T, [0.0 2.0]T under the WFS-PAL condition. The size of the circle represents the number of correct and incorrect responses.

 

Figure 6 (a) shows that the VSS near the EDL array could be correctly perceived, but the VSS far from this array tended to be perceived near it by the participants. This can be explained by the sound emitted from an EDL attenuates as it gets further from it, making it difficult to construct a VSS far from this array. Figure 6 (b) shows that the VSS far from the PAL array could be correctly perceived, while that near this array tended to be perceived near the participant. This is because the demodulation is insufficient near the PALs, and the VSS is perceived at a position where the sound pressure is higher than that of its construction position, which is in the vicinity of the listener. From Figure 6 (c), the sound-image location can be perceived at the VSS construction location in all positions, and a high percentage of correct responses was obtained. These results confirm that the proposed method can control the distance of VSSs.

 

5. CONCLUSIONS

 

We proposed a VSS-distance control method that is based on WFS using an EDL array and PAL array to construct a VSS at the desired position with high accuracy. With this method, the EDL array and PAL array are placed parallel to each other. VSS distance control is achieved by switching between these arrays.

 

Objective and subjective evaluations were conducted to confirm the effectiveness of the proposed method. In the objective evaluation, the sound-pressure distribution was evaluated to confirm the position where the VSS can be constructed when an EDL array or PAL array is used. From the objective evaluation, it was confirmed that it is suitable to construct a VSS near an EDL array and far from a PAL array. From the subjective evaluation, it was confirmed that switching the loudspeaker array improves the accuracy of VSS-position perception.

 

In the future, we will introduce crossfading for more seamless switching of the loudspeaker array. We will also investigate a method of generating radiation signals by taking into account the frequency amplitude characteristics of each loudspeaker.

 

Figure 6: Experimental results of subjective evaluation on perception of sound-image location

 
6. ACKNOWLEDGEMENTS

 

This work was partly supported by the Ritsumeikan Global Innovation Research Organization (RGIRO), and JSPS KAKENHI Grant Numbers JP19H04142 and JP21H03488.

 

7. REFERENCES

 

  1. A. Ando, "Theory of Three-Dimensional Sound Field Reproduction," IEICE, SP, Fundamentals Review, vol. 3, no. 4, pp. 33–46, 2010.
  2. ITU-R Rec. BS. 775-3, "Multichannel stereophonic sound system with and without accompanying picture," ITU, 2012.
  3. K. Hamasaki, T. Nishiguchi, R. Okumura, Y. Nakayama, and A. Ando, "A 22.2 Multichannel Sound System for Ultrahigh-Definition TV (UHDTV)," SMPTE Motion Imaging Journal, vol. 117, no. 3, pp. 40–49, 2008.
  4. Y. Sugibayashi, S. Kurimoto, D. Ikefuji, M. Morise, and T. Nishiura, "Three-dimensional acoustic sound field reproduction based on hybrid combination of multiple parametric loudspeakers and electrodynamic subwoofer,'' Applied Acoustics, vol. 73, no. 12, pp. 1282–1288, 2012.
  5. P. J. Westervelt, "Parametric acoustic array," The Journal of the ASA, vol. 35, no. 4, pp. 535–537, 1963.
  6. W. S. Gan, J. Yang, T. Kamakura, "A review of parametric acoustic array in air," Applied Acous tics, vol. 73, no. 12, pp. 1211–1219, 2012.
  7. J. Ahrens, "Analytic Methods of Sound Field Synthesis," Springer, 2012.
  8. J. Ahrens and S. Spors, "An analytical approach to local sound field synthesis using linear arrays of loudspeakers," ICASSP, pp. 65–68, 2011.
  9. S. Sayama, M. Nakayama and T. Nishiura, "Virtual sound source construction based on wave field synthesis using multiple parametric array loudspeakers," INTER-NOISE 2020, pp. 12–9–338, E–Congress, Aug 2020.
  10. S. Spors, H. Wierstorf, M. Geier, and J. Ahrens, "Physical and Perceptual Properties of Focused Sources in Wave Field Synthesis," AES 127th Convention, Paper Number: 7914, 2009.
  11. J. Ahrens and S. Spors, "Spatial Sampling Artifacts of Focused Sources in Wave Field Synthesis," International Conference on Acoustics, pp. 1556–1559, 2009.
  12. S. Spors and J. Ahrens, "Spatial sampling artifacts of wave field synthesis for the reproduction of virtual point sources," AES 126th Convention, Paper Number: 7714, 2009. 13. H. Wierstorf, M. Geier, A. Raake, and S. Spors, "Perception of Focused Sources in Wave Field Synthesis," Journal of the AES, vol. 61, no. 1/2, pp. 5–16, 2013.

 


1 is0471hf@ed.ritsumei.ac.jp

2 h-wang@fc.ritsumei.ac.jp

3 nakayama@ise.osaka-sandai.ac.jp

4 nishiura@is.ritsumei.ac.jp