Welcome to the new IOA website! Please reset your password to access your account.

Amplitude-decorrelation avoiding spectral artifacts of sound field synthesis in the anechoic chamber or free sound field

Franz Zotter 1 , Matthias Frank 2

University of Music and Performing Arts, Institute of Electronic Music and Acoustics In ff eldgasse 10, 8010 Graz

Oliver Bayer 3 , Julia Pinkas 4 , Gregor-Johannes Müller 5

BMW Group, Entwicklung Gesamtfahrzeug - Akustik, Schwingungen Knorrstraße 147, 80788 München

ABSTRACT Sounds to appear in directions between the loudspeakers will typically activate neighboring loudspeakers with the same signal when spatially rendered. The resulting interference causes frequency- and position-dependent cancellations in the superimposed sound field. Keeping the number of active loudspeakers uniformly small is often su ffi cient when unwanted cancellations are concealed by the few acoustic reflections of a studio environment. However, cancellations will remain a problem when synthesizing sound fields under anechoic or free-field conditions. Our contribution investigates short filters for amplitude decorrelation. The filters supply neighboring loudspeakers via frequency responses that are mostly mutually exclusive and hereby avoid cancellation in the superimposed soundfield, at the expense of a necessary spread that is required to obtain a spectrally balanced synthesis. We present listening experiments investigating in how far the proposed filter strategy is able to improve the spectral stability within a reasonable range of motion for a seated listener in the anechoic chamber.

1. INTRODUCTION

Rendering of variable-direction sounds on surrounding loudspeakers set up at discrete directions including height often employs amplitude panning [1]. Common candidates for panning are vector- base amplitude panning [2] or Ambisonics [3]. While there exist more powerful soundfield control techniques in the wave-field synthesis technique [4–7], it also requires dense loudspeaker spacings that are often not feasible in surround- with-height audio systems, with sophisticated exceptions that can be found in recent literature [8–10].

1 zotter@iem.at

2 frank@iem.at

3 Oliver.BA.Bayer@bmw.de

4 Julia.Pinkas@bmw.de

5 Gregor-Johannes.Mueller@bmw.de

a slaty. inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS O ¥, ? GLASGOW

In this paper, we will regard amplitude-panning-based rendering because of its less rigid requirements on the density and shape of the loudspeaker layout, and its less harmful processing when rendering becomes time-varying. While amplitude-based rendering is common in typical audio applications, studio and cinema sound [1], its application in the free field or anechoic environments are not as common, and in anechoic measurement environments, often strict criteria have to be fulfilled. Recent and comprehensive work by Kuntz et al. [11–13] therefore o ff ers a profound analysis, regarding the ability of state-of-the-art amplitude-panning methods to render spatially smooth interaural cues, third-octave levels, and smooth time-variant rendering, in the anechoic chamber. When experienced listeners compare typical audio production and listening environments such as studios and cinemas with audio reproduction in the anechoic chamber or free field, they may notice some di ff erences that appear to be more disturbing in the more controlled, anechoic environment. This may seem counter-intuitive, but at the same time, interference and comb filtering is unlikely to be spectrally concealed by residual reflections. A similar problem can be observed in e ff orts that are necessary to accomplish binaural Ambisonic rendering [14–18], when compared to loudspeaker- based rendering in studio spaces or when using binaural room impulse responses or reverberated virtual loudspeakers [19]. In loudspeaker-based synthesis, this e ff ect has been observed in Ambisonics by Daniel (spectral unbalance) [20] and Solvang (spectral impairment) [21], and it will be particularly strong when more loudspeakers are activated than necessary for the angular resolution provided by the Ambisonic order. In such cases, phasiness and a roll o ff at high frequencies will be observed around the center of the loudspeaker playback facility. Bessel arrays [22] have been used as a way to obtain an interference of several discrete loudspeakers that maintains an all-pass behavior across the entire frequency range. And literature on audio widening and decorrelation, e.g. [23–29], described ways to render sounds with frequency- varying direction, or group delay. From the above-mentioned interference-diversifying or avoidance strategies, we will deal with a simplified amplitude-based decorrelation strategy in this paper. By the proposed strategy, destructive interference will be avoided by making it less likely. After analyzing the superimposed sound fields of two or more loudspeakers and presenting the filter-design strategy, we evaluate and optimize the proposed technique via listening experiments, and provide a comparison for di ff erently dense loudspeaker subsets and directional resolutions / Ambisonic orders.

2. CAN TIME OR LEVEL DIFFERENCES DEFEAT DESTRUCTIVE INTERFERENCES?

Two sources playing a coherent sinusoidal signal received with the same amplitude will produce hyperbola-shaped destructive and constructive interference ridges in a free sound field. The application of time delays to one of the sources will mainly shift the interference patterns but won’t be able to avoid interference. Therfore, also phase-based approaches will only be able to shift the destructive interference patterns, in general. By contrast, attenuation of one of the sources will largely avoid destructive interferences, as full cancellation requires equal amplitudes. For instance, if two sources are activated with the amplitudes 1 and 1 / 3, the di ff erence between constructive and destructive interference will be 2 / 3 to 4 / 3, therefore only amount to 6 dB, and not to a full loss of theoretically −∞ dB. When the amplitude-based decorrelation filters alternate the attenuation of the even or odd loudspeaker in a pair over frequency, destructive interference is highly unlikely to occur, in the field, see Fig. 1. The sound pressure levels shown are measured in an semi-anechoic environment, therefore common floor reflections of both loudspeakers are visible at about 360 Hz, 1.1 kHz, 1.8 kHz, 2.5 kHz, etc.

-95 -85 -75 -65 -55 -45 -35 -25 -15 -5 5 15 25 35 45 55 65 75 85 95

-95 -85 -75 -65 -55 -45 -35 -25 -15 -5 5 15 25 35 45 55 65 75 85 95

0

Microphone possition in cm

Microphone possition in cm

-5

12th-oct. SPL in dB

-10

-15

-20

-25

1k

2k

4k

8k

1k

2k

4k

8k

500

500

Frequency in Hz

Frequency in Hz

(a) loudspeaker pair

(b) loudspeaker pair with filters

Figure 1: 12th-octave-smoothed measured sound pressure levels of a loudspeaker pair in a semi- anechoic chamber, measured in 10 cm steps, at loudspeaker height (1.1 m), and a distance of 5 m, without filters, and with power-complementary filters as in Fig. 3.

However, in a surrounding loudspeaker setup, we are dealing with more than two loudspeakers, e.g. in Ambisonics or VBAP with spread. Fig. 2a shows a situation with N th -order Ambisonics using L = 2( N + 1) loudspeakers, and a virtual sound source between the frontal loudspeakers. Even with the optimal Ambisonics order N = 11 for the L = 24 channel layout, there will be an interference limiting the sweet area at high frequencies. With half the order N = 5, the lateral limitations becomes more pronounced Fig. 2a, and yields a low-pass behavior. When attenuating every odd loudspeaker in Fig. 2c, the situation improves, thus an amplitude-based strategy appears promising, at least.

0

-5

SPL in dB

-10

-15

-20

(a) L = 24, N = 11

(b) L = 24, N = 5

(c) L = 24, N = 5, with filters

Figure 2: Sound fields with full and reduced Ambisonics order and filtering that attenuates odd channels for f = 1 kHz, when panning with max- r E .

3. FILTER DESIGN

The book by Schüßler [30, Sec. 2.8.2] refers to a suitable filter design that is described by the polynomial function that we will exploit to obtain a pair of power-complementary amplitude decorrelating filters. It employs a polynomial function

Q − 1 X

P Q , K ( x ) = (1 − x ) K 1 ( K − 1)! d K − 1

n = 0 x n (1)

d x K − 1

that features a K -fold zero at x = 1 and, and P Q , K − 1 has an Q − K + 1 fold zero at x = 0. The substitution of the variable x = 1 − cos ω

2 provides a zero-phase frequency response with 2 K zeros at ω = π and a flatness of the degree 2( Q − K + 1) for unity gain at ω = 0. We use it to define a magnitude-squared frequency response peaking at 2

| H ( ω ) | 2 = 2 h g 2 + (1 − g 2 ) P Q , K ( 1 − cos ω

2 ) i , (2)

with g 2 = 0 . 1 to adjust the pass-to-stop-band ratio | H (0) | 2

| H ( π ) | 2 = g 2 + (1 − g 2 )

g 2 1 g 2 . Evaluated at the bins ω = 2 π N FFT n of an N FFT points DFT, the square-root thereof can be made mininum-phase via the real- valued cepstrum. The resulting minimum-phase, time-domain impulse response h [ n ] is finite and can be truncated to Q + 1 samples. We obtain two complementary combfilters by inserting T − 1 zeros between the samples of h [ n ], and for h 2 [ n ] by additional modulation with ( − 1) n

h 1 [ n T ] = h [ n ] , h 2 [ n T ] = ( − 1) n h [ n ] . (3)

The corresponding responses with the settings K = 5, Q = 9, that we will use in our study are shown in Fig. 3. (Despite of minor importance, as long as chosen larger than Q , the algorithmic setting for the DFT was N FFT = 64.) By the narrow transition between the high and low bands, the frequency responses will make destructive cancellation in the field rather unlikely, i.e. they can hereby only occur in narrow bands of about ± π

6 around ω T = ( k + 1

2 ) π . Alg. 1 lists the pseudo code.

|H 12 ( )| in dB

0 /2 3 /2 2 -8 -6 -4 -2 024

...

Ambisonic Decoder

h 1 [ n ]

12 ( ) in T samples

h 2 [ n ]

1

0

h 1 [ n ]

-1

0 /2 3 /2 2 frequency period 0 T 2

h 2 [ n ]

h 12 [nT]/h 12 [0]

1

...

0

-1

0 1 2 3 4 5 6 7 8 9 time in multiples of T samples

(a) filter responses

(b) block diagram

Figure 3: Magnitude response and group delay of power-complementary comb-filter pair H 12 ( ω ) over a frequency period 0 ≤ ω T ≤ 2 π , and its finite, mimimum-phase impulse response h 12 [ n T ] normalized to h [0] and block diagram for a rendering application with L = 4( N + 1) loudspeakers.

4. PERCEPTUAL EVALUATION

Listening experiments were carried out in an anechoic room at Institute of Electronic Music and Acoustics (IEM) and at a new semi-anechoic chamber of BMW. At IEM, 24 Genelec 8020 loudspeakers were equi-angularly arranged on a horizontal ring with a radius of 1 . 25 m, at ear hight. At BMW, 30 Fohhn AT-09 loudspeakers were equi-angularly arranged at ear height. Participants were seated 20 cm left o ff -center at IEM and 50 cm o ff -center at BMW.

Algorithm 1 Power-complementary minimum-phase FIR f i lters (pseudo code w. Matlab functions)

1: procedure F ilterpair ( K , T , g ) ▷ generate power-complementary comb filters 2: Q = 2 K − 1 ▷ calculate Q 3: N FFT = 64 ▷ initialize constant 4: p = ones (1 , Q ) ▷ coe ffi cients of P Q − 1 n = 0 x n

5: for i = 1 : K − 1 do 6: p = polyder ( p ) ▷ derive K − 1 times 7: end for 8: p = p / factorial ( K − 1) ▷ scale by 1 ( K − 1)! 9: for i = 1 : K do 10: p = conv ( p , [ − 1 , 1]) ▷ multiply K times with (1 − x ) 11: end for 12: w = [0 : N FFT / 2 , − N FFt / 2 + 1 : − 1] ′ ∗ 2 ∗ π/ N FFT ▷ bins of an N FFT -pt DFT 13: h = g + (1 − g ) ∗ polyval ( p ′ , (1 − cos( w )) / 2) ▷ magnitude-squared frequency resp. 14: h = sqrt (2 ∗ h ) ▷ scale by 2, evaluate zero-phase magnitude response 15: h = ifft ( h ) ▷ evaluate zero-phase impulse response sequence 16: [ ∼ , h ] = rceps ( h ) ▷ obtain minimum-phase impulse response sequence 17: h 1 , h 2 = h (1 : Q + 1) ▷ store Q + 1 non-zero samples in h 1 , h 2 18: h 2 (2 : 2 : end ) = − h 2 (2 : 2 : end ) ▷ complementary response by half-band modulation 19: h 1 = upsample ( h 1 , T ) ▷ insert T − 1 zeros between original samples 20: h 2 = upsample ( h 2 , T ) ▷ insert T − 1 zeros between original samples 21: return h 1 , h 2 22: end procedure

L / N / T A1 A2 A3 A4 A5 A6

I IEM 24 / 5 / − 24 / 5 / 1 . 25 24 / 5 / 2 . 5 24 / 5 / 5 24 / 5 / 10 24 / 5 / 20

I BMW 32 / 7 / − 32 / 7 / 1 . 25 32 / 7 / 2 . 5 32 / 7 / 5 32 / 7 / 10 32 / 7 / 20

(a) exp. 1 - variation of T = {− , 1 . 25 , 2 . 5 , 5 , 10 , 20 } ms, pulses and static pink noise

L / N / T B1 B2 B3 B4

II IEM 24 / 11 / − 24 / 5 / − 12 / 5 / − 24 / 5 / 2 . 5

II BMW 32 / 15 / − 32 / 7 / − 16 / 7 / − 32 / 7 / 2 . 5

III IEM 12 / 5 / − 12 / 2 / − 6 / 2 / − 12 / 2 / 2 . 5

III BMW 16 / 7 / − 16 / 3 / − 8 / 3 / − 16 / 3 / 2 . 5

(b) exp. 2 (static pink noise) and 3 (moving pink noise) - varying L / N , T = {− , 2 . 5 } , static and moving pink noise

Table 1: Conditions of the three experiments concerning loudspeaker number L , Ambisonic order N , and time constant T of the filters or inactive filtering.

The signals for the experiment were either pink noise or a sequence of Gauß pulses repeating each half second. These signals were rendered using 2D Ambisonic panning with max- r E weights, using di ff erent orders N = { 2 , 5 , 11 } for the subsets of L = { 6 , 12 , 24 } loudspeakers at IEM, and the orders N = { 3 , 7 , 15 } for the subsets of L = { 8 , 16 , 32 } , either statically at 0 ◦ , frontally, or with a 4 s triangular motion trajectory between ± 60 ◦ . Note that at BMW, of the 32 rendered channels the 2 in the back were discarded, while the others were directly fed to the 30 available loudspeakers; this was done to accomodate the subsampling scheme by the factors 2 and 4. For an Ambisonic order N , it is known from [21] that the optimal loudspeaker number would be L = 2( N + 1), and taking twice as many L = 4( N + 1) will cause spectral impairment. With the perfectly power-complementary filters designed above and depicted in Fig. 3, we should be able to circumvent the spectral impairment with L = 4( N + 1). The Tab. 1 summarizes the condition sets of the experiments below. Twelve participants took part in the experiment (10 at IEM, 2 at BMW), and in the analysis below, results for the di ff erent experimental setups (IEM / BMW) were pooled. Listeners were familiarized with the pink-noise rendering according to the conditions B2 and B4 of the condition set II, Tab. 1b.

4.1. Experiment 1: optimal time constant In the first experiment we want to find out the optimal time delays from the candidates T = { 1 . 25 , 2 . 5 , 5 , 10 , 20 } ms, all given loudspeakers when driven with sub-optimal Ambisonic orders N = 5 (IEM) or N = 7 (BMW), respectively, see condition set I in Tab. 1a. While the largest time constant might be most e ff ective, it might also destroy impulse fidelity. Therefore, the experiment involved a condition with an 41-samples Gauß pulse s [ n ] = e − n 2 / 2 rendered frontally at 0 ◦ using condition set I in Tab. 1a to see what time constant is still delivering reasonable results on impulse fidelity. To evaluate the desired e ff ect of avoiding spectral impairment, a frontal pink-noise signal is rendered unfiltered and as a filtered version with the complementary filters h 1 , 2 [ n ] using the di ff erent time delays of condition set I. Participants were advised to lean to the right (the center) to check where the spectra of the di ff erent conditions are most similar, and then rate the loss of consistency heard at a normal head position. The participants were presented two MUSHRA-like [31] multi-stimulus comparison GUI screens to report their comparative ratings. One for the e ff ectiveness of the conditions in avoiding spectral impairment, spatial changes, and phasiness, and the other one to rate the impulse fidelity. Each listener was presented a randomized order of the conditions A1...A6 to compare on screen, per task, and both of the tasks were also presented in randomized order. Playback was looped and conditions could be seamlessly switched, and also re-arranged on screen according to the own ratings.

Results While the unfiltered condition and the filtered one with T = 1 . 25 ms were not producing any audible impairment of impulse fidelity, see Fig. 4a, condition A3 with T = 2 . 5 ms was the first one to produce a slight but distinctive degradation of impulse fidelity for the Gauß impulse sequence. Each further increase was associated with distinctive and monotonic degradation of the perceived impulse fidelity. Concerning the e ff ectiveness in avoiding noticeable, spatially varying spectral artifacts in pink noise, any time-constant setting of the filtered conditions A2...A6 was e ff ective, and therefore better than the ratings for the unfiltered A1 condition, see Fig. 4b. Apart from that, there was no clear distinction between the di ff erent time settings A2...A6 in the median ratings. From the informal reports of some listeners, we concluded that condition A2 with excellent impulse fidelity exhibited a spectral change of the Gauß impulse signal. Correspondingly, we decided against A2 with T = 1 . 25 ms and for A3 using T = 2 . 5 as a final setting, accepting a slight degradation of impulse fidelity.

100

100

80

80

spectral balance

impulse fidelity

60

60

40

40

20

20

A1 A2 A3 A4 A5 A6 0

A1 A2 A3 A4 A5 A6 0

(a) impulse fidelity

(b) avoidance of spaectral impairment

Figure 4: Median ratings and their confidence intervals of the conditions with time constants T = {− , 1 . 25 , 2 . 5 , 5 , 10 , 20 } ms (A1 . . . A6) reveal losses to be expected in impulse fidelity, versus the e ff ectiveness in avoiding spectral impairment.

4.2. Experiment 2: e ff ectiveness for loudspeaker subsets / ambisonic orders (static) With T = 2 . 5 ms from experiment 1, we conducted the next experiment for a static pink-noise sound source to appear at 0 ◦ (frontal). Conditions are listed in Tab. 1b. Condition sets II and III asssume di ff erently many, maximally available loudspeakers, and within the set, a reduction to half the number is considered as condition B3. In either II or III, B1 is the consistent reference condition, B2 is the spectrally impaired condition, B3 is the properly sub-sampled condition, and B4 is the condition in which the proposed filters provide an alternating sub-sampling across frequencies. Listeners were presented two multi-stimulus comparison tasks with conditions B1...B4, one for each II and III. They were instructed to again lean to the right (the center) to find a location where the spectra heard would be most similar, and then comparatively rate the change by which the spectra degrade at the given, o ff -center listening position, when comparing the conditions. The condition sets II and III were presented in random order, and their 4 conditions were also randomly arranged on screen, per listener. As before, listeners could seamlessly switch between the looped playback of the conditions and could have the conditions sorted by their current ratings.

100

100

80

80

spectral balance

continuity

60

60

40

40

20

20

B1 B2 B3 B4 0

B1 B2 B3 B4 0

(a) Experiment 2: Spectral balance

(b) Experiment 3: Continuity

Figure 5: Mean ratings and their confidence intervals for spectral balance and continuity for di ff erent loudspeaker-subsets conditions without filtering and filtering with T = 2 . 5 ms.

Results For statistical analysis of the responses, the condition sets II and III were pooled and shown in Fig. 5a. The results indicate that the condition B1 with a higher number of loudspeakers and a higher Ambisonic order is generally preferred, however it was not rated significantly better than the proposed B4 ( p = 0 . 1975). The spectrally impaired condition B2 is always rated worst ( p < 0 . 027). In the comparison between fixed sub-sampling B3 and B4 with frequency-dependent sub-sampling by the proposed filters, B4 is significantly preferred in terms of spectral balance ( p = 0 . 040).

4.3. Experiment 3: e ff ectiveness for loudspeaker subsets / ambisonic orders (moving) For moving sounds, not only a consistently rendered coloration should be accomplished by the renderer, but also a consistently rendered distance and speed in the perceived motion trajectory. The third experiment evaluated this motion-rendering consistency using the conditions listed in Tab. 1b but with moving noise. Listeners were asked to rate the smoothness of the motion rendering of the Tab. 1b conditions B1...B4 in the condition sets II and III. Smoothness was defined as the absence of fluctuations in terms of spectral changes, distance changes, and the definition of the motion trajectory. They were given the hint to close their eyes if uncertain, and not to forget to check whether their ratings would hold after lateral head displacement. The conditions in either of the sets II and III comprised the hidden reference rendering condition B1, the spectrally impaired condition B2, the properly sub-sampled condition B3, and the condition B4 in which the proposed filters provide an alternating sub-sampling across frequencies.

Results The statistical analysis of the pooled responses for II and III in Fig. 5b shows that the sub- sampled condition B3 is preferred compared to the spectrally impaired condition B2 ( p = 0 . 015), and that the frequency-dependent sub-sampling by the proposed filters B4 outperforms all other three conditions ( p < 0 . 012). The results for the spectrally impaired condition B2 are similar to the reference condition B1 ( p = 0 . 890). Continuity even increases by activating fewer loudspeakers B3 at reduced resolution compared to B1 ( p = 0 . 048), which lies in a similar range as observed in literature [32, Fig.5.4] and might relate to modulation speed. Other causes may be the distance fluctuations some listeners reported informally about a directionally well-resolved condition (B1).

5. CONCLUSIONS

With spherical microphone array recordings, it is often necessary to subsample a given high- density loudspeaker playback facility and only use a part of the available loudspeakers, in order to avoid spectral impairment. We could show that amplitude-based decorrelation filters that alternate between stopping and passing even and odd loudspeaker signals across frequencies can be a good alternative. With the frequency period defined by the time constant T = 2 . 5 ms, these filters will not only avoid spectral impairment with twice as many active loudspeakers than needed: there will even be a benefit in continuity when rendering in the anechoic environment. In particular, it appears that the maximum achievable directional resolution 180 ◦ / ( N + 1) as defined by the Ambisonic order N is not resulting in the best playback strategy for motion trajectories with L = 2( N + 1) loudspeakers. An L = 4( N + 1) loudspeaker setup with the proposed alternating filtering delivered a smoother rendering in the anechoic environments of our experiments.

REFERENCES

[1] Sascha Spors, Hagen Wierstorf, Alexander Raake, Frank Melchior, Matthias Frank, and Franz Zotter. Spatial sound with loudspeakers and its perception. Proceedings of the IEEE , 101(9), 2013. [2] Ville Pulkki. Virtual sound source positioning using vector base amplitude panning. J. Audio Eng. Soc. , 45(6):456–466, 1997. [3] Franz Zotter and Matthias Frank. Ambisonics - A practical 3D audio theory for recording, studio production, sound reinforcement, and virtual reality . SpringerOpen, 2019. [4] Agustinus J. Berkhout, Diemer de Vries, and Peter Vogel. Acoustic control by wave field synthesis. J. Acoust. Soc. Am. , 93(5), 1993. [5] Jens Ahrens. Analytic Methods of Sound Field Synthesis . Springer, 2012.

[6] Hagen Wierstorf. Perceptual assessment of sound field synthesis . PhD thesis, TU Berlin, 2014. [7] Gergely Firtha. A Generalized Wave Field Synthesis Theory . PhD thesis, MIT Budapest, 2019. [8] Fiete Winter. Local Sound Field Synthesis . PhD thesis, University of Rostock, 2019. [9] Pierre Grandjean, Alain Berry, and Philippe-Aubert Gauthier. Sound field reproduction by combination of circular and spherical higher-order ambisonics: Part I—a new 2.5-D driving function for circular arrays. J. Audio Eng. Soc. , 69(3), 2021. [10] Pierre Grandjean, Alain Berry, and Philippe-Aubert Gauthier. Sound field reproduction by combination of circular and spherical higher-order ambisonics: Part II—hybrid system. J. Audio Eng. Soc. , 69(3), 2021. [11] Matthieu Kuntz and Bernhard U. Seeber. Sound field synthesis: Simulation and evaluation of auralized interaural cues over an extended area. In Proc. Euronoise , Madeira, Portugal, October 2021. [12] Matthieu Kuntz, Norbert Kolotzek, and Bernhard U. Seeber. Gemessener Schalldruckpegel im Lautsprecherarray für verschiedene Schallfeldsyntheseverfaren. In Fortschritte der Akustik, DAGA , Vienna, August 2021. [13] Matthieu Kuntz, Norbert Kolotzek, and Bernhard U. Seeber. Investigating the smoothness of moving sources reproduced with panning methods. In Fortschritte der Akustik, DAGA , Stuttgart, March 2022. [14] Benjamin Bernschütz, Arnau Vázquez Giner, Christoph Pörschmann, and Johannes M. Arend. Binaural reproduction of plane waves with reduced modal order. Acta Acustica u. Acustica , 100(5), 2014. [15] Zamir Ben-Hur, Fabian Brinkmann, Jonathan Shea ff er, Stefan Weinzierl, and Boaz Rafaely. Spectral equalization in binaural signals represented by order-truncated spherical harmonics. J. Acoust. Soc. Am. , 141(6), 2017. [16] Markus Zaunschirm, Christian Schörkhuber, and Robert Höldrich. Binaural rendering of ambisonic signals by head-related impulse response time alignment and a di ff useness constraint. J. Acoust. Soc. Am. , 143(6):3616–3627, 2018. [17] Christian Schörkhuber, Markus Zaunschirm, and Robert Höldrich. Binaural rendering of ambisonic signals via magnitude least squares. In Fortschritte der Akustik - DAGA , Munich, March 2018. [18] Isaac Engel, Dan F. M. Goodman, and Lorenzo Picinali. Assessing hrtf preprocessing methods for ambisonics rendering through perceptual models. Acta Acustica , 6(4), 2022. [19] Isaac Engel, Craig Henry, Sebastia V. Amengual Gari, Philip W. Robinson, and Lorenzo Picinali. Perceptual implications of di ff erent ambisonics-based methods for binaural reverberation. J. Audio Eng. Soc. , 149(2), 2021. [20] Jérôme Daniel. Représentation des champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia . PhD thesis, Université Paris 6, 2001. [21] Audun Solvang. Spectral impairment of 2D higher-order ambisonics. J. Audio Eng. Soc. , 56(4), 2008. [22] Ronald M. Aarts and A. J. E. M. Janssen. On analytic design of loudspeaker arrays with uniform radiation characteristics. J. Acoust. Soc. Am. , 107(1), 2000. [23] Gary S. Kendall. The decorrelation of audio signals and its impact on spatial imagery. Computer Music Journal , 19(4), 1995. [24] Franz Zotter, Matthias Frank, Georgios Marentakis, and Alois Sontacchi. Phantom source widening with deterministic frequency-dependent time delays. In Proc. of the DAFx-11 , Paris, September 2011.

[25] Franz Zotter and Matthias Frank. E ffi cient phantom source widening. Archives of Acoustics , 38(1):27–37, 2013. [26] Franz Zotter, Matthias Frank, Matthias Kronlachner, and Jung-Woo Choi. E ffi cient phantom source widening and di ff useness in ambisonics. In EAA Symposium on Auralization and Ambisonics , Berlin, March 2014. [27] Franz Zotter and Matthias Frank. Phantom source widening by filtered sound objects. In prepr. 9728, 142nd AES Conv. , Berlin, May 2017. [28] Elliot Kermit Canfield-Dafilou and Jonathan S. Abel. A group delay-based method for signal decorrelation. In prepr. 9991 144th AES Conv. , Milan, Italy, May 2018. [29] Matthias Blochberger, Franz Zotter, and Matthias Frank. Sweet area size for the envelopment of a recursive and a non-recursive di ff useness rendering approach. In Proc. ICSA , Ilmenau, September 2019. [30] Hans W. Schüßler. Digitale Signalverarbeitung 2: Entwurf diskreter Systeme . Springer, 2010. [31] ITUR BS. 1534-3,“method for the subjective assessment of intermediate quality level of audio systems,”. International Telecommunication Union, Tech. Rep , 2015. [32] Matthias Frank. Phantom Sources using Multiple Loudspeakers in the Horizontal Plane . PhD thesis, University of Music and Performing Arts, Graz, 2013.