Welcome to the new IOA website! Please reset your password to access your account.

Proceedings of the Institute of Acoustics

 

Virtual bass expansion for cross-talk cancellation systems

 

Daniel Wallace, Audioscenic, Southampton, England
Marcos Simon, Audioscenic, Southampton, England
Roberto Grilli, Audioscenic, Southampton, England
Davi Carvalho, Audioscenic, Southampton, England
Filippo Fazi, Audioscenic, Southampton, England
University of Southampton

 

1 INTRODUCTION

 

Cross talk cancelation (CTC) allows speaker systems to produce binaural audio. This allows rendering of immersive 3D audio that conventionally can only be consumed using headphones. Soundbars suited to CTC as well as small speaker systems such as those found in laptops are many times restricted by size and cost, which lead to a reduced bass response because of small speaker drivers. CTC capability also diminishes with decreasing frequency, which leads to overall poor performance at low frequency. Virtual bass expansion (VBE) systems have been developed to extend the bandwidth of such speakers by use of the ‘missing fundamental’ psychoacoustic effect. VBE has been used successfully in car audio systems as well as in phones and laptop speakers [1] [2]. Audio producers can also benefit from commercially available VBE plugins [3].

 

This paper describes research and implementation of VBE algorithms to increase low frequency performance of a domestic CTC system. Accurate assessment of performance is carried out using a subjective audio test using music and film audio to match the system use case. The proposed method is evaluated against a reference and anchor, as well as comparing with conventional bass enhancement method using equalization (EQ). Section 2 describes the CTC system used. Section 3 describes the VBE method chosen followed by an explanation of implementation with CTC in section 4. Section 5 and 6 are a description of the subjective test and a discussion of results. A conclusion is presented in section 7.

 

2 CROSS-TALK CANCELLATION

 

Recording of binaural audio uses a dummy head with microphones in each of its ears. This results in an audio signal that represents more closely to what the recording space would sound like if one was physically present. This means reproduction of binaural audio requires each ear to independently receive its corresponding channel of the binaural signal. Although headphones can easily ensure this, it is also possible to achieve this using an array of loudspeakers. When listening to loudspeakers, any one channel of audio can always reach both ears. Using loudspeakers to play binaural audio creates cross-talk between the left channel and the right ear, and vice versa. A cross-talk cancellation system is one that can remove this cross-talk and leave only the binaural signal delivered to each ear correctly.

 

Figure 1 shows a block diagram taken from the 2020 Audioscenic whitepaper of how this can be implemented. H is a 2x2 filter matrix which is designed to be the inverse of the transmission path matrix of the direct and crosstalk paths from the loudspeakers to each ear. This results in a frequency response that is maximally flat for the same side and reduced in magnitude for the alternative side. This process is very inefficient at lower frequencies due to the difference between the direct and crosstalk path being small compared to the wavelength. Thus, both the CTC algorithm and the small speaker drivers used in a CTC soundbar are less efficient at lower frequencies. A VBE approach, implemented in such systems could ultimately increase low frequency perception while also increasing low frequency spatial perception.


 

Figure 1: Cross-talk cancellation scheme using filter matrix H [4]

 

3 VIRTUAL BASS EXPANSION

 

Virtual bass expansion (VBE) algorithms operate by generating harmonics from low frequency components of an audio signal which results in an improved bass impression due to the 'missing fundamental' psychoacoustic effect. This effect describes how a series of harmonics are perceived as having the same pitch as the corresponding fundamental tone, even if the fundamental is not actually present in the signal. The algorithm only processes frequency components below a certain crossover frequency, below which the speaker drivers become inefficient. This effectively overcomes the low frequency limit of small speakers by ‘translating’ low frequencies into higher harmonics that the speaker can reproduce more efficiently. Properly implemented, the algorithm can increase the low frequency range of a speaker up to around 1.5 octaves [5].

 

Harmonics can be generated using a time domain method or a frequency domain method. The former uses a non-linear device (NLD) to generate controlled distortion in the form of overtones (harmonics). The frequency domain method of using a phase vocoder (PV) generates harmonics after a short time Fourier transform is applied to locate the fundamental frequency [6]. NLDs and PV have been shown to be each more suited to transient and steady state signals respectively. This has resulted in hybrid VBE methods which use both ways of generating harmonics [7] [8]. Due to simplicity and speed the NLD approach is chosen to be more suitable for a real time application with a domestic CTC system. The nature of most NLDs means they are level dependent. This causes the generated harmonics to be relatively stronger at higher input levels, making the ‘missing fundamental’ effect more pronounced. Low input levels are more suitable to conventional bass boosting with Equalization (EQ).

 

Overall, audio quality should not be significantly compromised. However, intermodulation distortion (IMD) can occur when multiple tones are passed through an NLD. This type of distortion is usually not audible but depends on the level of the tones and how they are harmonically related. The content of the input signal and the input bandwidth of the NLD will determine whether multiple tones are present in the NLD input. Employing a filter bank before the NLD can help reduce IMD, but this will require additional computation power and is not used in this paper [5]. Equivalent CTC capability might be expected to reach 1 octave lower in frequency as the processed signal is made of harmonics at least 1 octave higher.

 

4 IMPLEMENTATION

 

Figure 2 shows a basic signal flow diagram for a NLD based approach. The output of the VBE is fed directly into the input of the CTC filters. The crossover filter is used to separate the input, enabling only the low frequencies to be processed. Frequencies above the crossover frequency, which the loudspeaker can efficiently reproduce, are not processed in any way.

 

Figure 2: Virtual bass framework

 

A Bandpass filter is implemented after the NLD and shapes the generated harmonics as well as reducing potential intermodulation distortion. Its high-pass flank has a cutoff frequency that is equal to the crossover frequency. This ensures the protection of the speaker from high level bass. Which enables overall higher input levels to drive the speaker without risking low frequency related distortion. The lowpass flank has a filter order and cutoff frequency that are chosen to shape the spectral envelope of the harmonics to make it sound as natural as possible. Higher order harmonics being too strong results in harshness and unnatural buzzing.

 

The choice of NLD affects whether even and/or odd harmonics are generated and their relative strengths. Common NLD functions include full and half wave rectifier, full wave integrator and clipper [9] [5]. NLDs that produce only even harmonics result in a pitch doubling effect as the generated even harmonic series is equivalent to the full harmonic series of a fundamental with doubled frequency. Therefore, odd harmonics are preferred as they have an unambiguous fundamental. The NLD chosen is the Arc-Tangent Square Root (ATSR) function and is proposed in [10]. The function is defined as

 


Where parameters, 𝑎 , 𝛽 , 𝜓 and 𝜁 , can be adjusted to change the harmonic strengths. This combines odd and even nonlinear functions to generate all harmonics, which is closer to natural harmonic generation. This NLD was compared against 9 other NLDs in [11] and was deemed the strongest suitable harmonic generator. Further research comparing 13 NLDs showed ATSR to have strong bass impression, minimal distortion while generating all harmonics [12]. ATSR is implemented as an NLD by using a Taylor Series polynomial expansion. A polynomial approximation is used to have greater control over the harmonic strengths by tuning the coefficients of the polynomial. As shown in [10], the polynomial coefficients are calculated using

 

 

The first and second terms generate odd and even harmonics respectively. These are summed and subtracted by a constant 𝜓 . To ensure convergence we set 𝛽= 0.9 and 𝜁= 0.9 . Parameters 𝑎 and 𝜓 can be varied to control the strengths of the odd and even harmonics. The decay rate between subsequent harmonics is independent of 𝑎 and 𝜓 , meaning the spectral envelope and timbre perception can be preserved while adjusting the harmonic strengths. An order of around 10 is suitable.

 

5 SUBJECTIVE LISTENING TEST

 

The performance of the proposed method was assessed subjectively through a listening test. The VBE was tuned to the CTC soundbar used which was an Audioscenic prototype using 2’’ drivers.

 

Each participant sat facing the CTC system and listened to four different versions of the same audio sample, including a reference, an anchor, a bass boosted signal, and a VBE signal. The reference was an unprocessed signal, and the testers scored the three other signals in relation to this. This was using a 7-grade bipolar comparison scale as shown in figure 3. The bass boosted signal had a gain of +6dB applied below 150Hz using a graphical EQ. The anchor was passed through a high-pass filter of 4th order. The VBE signal is the proposed algorithm and also doesn’t feature frequencies below the cutoff. The experiment was completed three times with different audio. The stimulus used were two excerpts from bass heavy music and one from the opening scene of Mad Max: Fury Road (2015). Samples were chosen to have sufficient and varied low frequency content to ensure a comprehensive test. The audio samples were normalized to an RMS level of -6dBFs which ensured the CTC system uses equal array effort for each sample. This made it possible to make a fair judgement between the processed signals. Only the reference sample was identified to the listener. The listener could play each sample as many times as needed for scoring and was given no time limit to complete the test.

 

Listeners were asked to grade two audio attributes of the samples relative to the reference, consisting of bass impression and spatial immersion. Bass impression refers to how strongly the listener perceives low frequency sound. Spatial immersion is measured in terms of perceived 3D binaural performance provided by the CTC. A total of 6 expert and non-expert listeners were tested.


 

Figure 3: 7-grade bipolar comparison scale

 

6 RESULTS


 

Figure 4: Radar Chart of Results

 

Figure 4 show radar charts of the averaged scores obtained in the subjective test. Bass Impression results show VBE is comparable or greater than the +6dB condition. Most testers could accurately identify the anchor. The movie stimulus was least affected by processing whereas both songs were scored much higher for VBE.

 

Spatial immersion seemed to be affected by the presence of low frequency components of the test signals, with +6dB having much lower scores than both anchor and VBE which both had no frequency components below 150Hz.

 

Overall, it seems that VBE effectiveness depends on the content of the input signal. It might be difficult to have ‘one size fits all’ VBE tuning for any given system and for any given audio signal. The anchor of the test could have been made into a mono signal to turn it into a spatial anchor as well as a spectral anchor. This could have helped testers identify spatial qualities of the audio samples as well as giving a set of results more indicative of spatial immersion.

 

7 CONCLUSION

 

In this paper, a virtual bass extension algorithm has been implemented into a CTC system. This has been tested subjectively using music and film. It has been shown that for equal array effort a higher bass impression can be expected while using VBE compared to conventional bass boosting. This offers an application to improve the low frequency performance of CTC systems using small speakers, while retaining a high bass impression and spatial immersion. This is a preliminary study on the interaction of VBE and CTC, and more extensive research and experimental work can be done in the future.

 

8 ACKNOWLEDGEMENTS

 

The author thanks Audioscenic Limited for funding the internship that developed into this work.

 

9 REFERENCES

 

  1. Subaru, “Subaru Crosstrek Owners Manual: Sound settings.” [Online]. Available: https://www.sucross.com/sound_settings-347.html. [Accessed 13 October 2022].

  2. The Verge, “Dirac Bass will trick you into perceiving deeper bass from your phone’s speakers,” 2018. [Online]. Available: https://www.theverge.com/2018/12/18/18144289/dirac-bass-smartphone-audio-speakers-ces-2019. [Accessed 13 October 2022].

  3. M. C. Daniel Ben-Tzur, The Effect of the MaxxBass Psychoacoustic Bass Enhancement System on Loudspeaker Design, 1999.

  4. Audioscenic, AS-WP-001: Audioscenic Virtua Technology – The New Sound Dimension, 2020.

  5. R. M. A. Erik Larsen, Audio Bandwidth Extension: Application of Psychoacoustics, Signal Processing and Loudspeaker Design, 2004.

  6. M. Bai, “Synthesis and implementation of virtual bass system with a phase-vocoder approach,” Journal of the Audio Engineering Society, vol. 54, no. 11, 2006.

  7. E.-L. T. and W.-S. Gan, “A psychoacoustic bass enhancement system with improved transient and steady-state performance,” 2012.

  8. M. J. Hawksford and A. J. Hill, “A hybrid virtual bass system for optimized steady-state and transient performance,” 2010.

  9. B. T. Gerstle, “Tunable virtual bass enhancement,” 2009.

  10. W.-S. Gan, S. F. Wong, and L. Nay Oo, “Generalized harmonic analysis of Arc-Tangent Square Root (ATSR) nonlinear device for virtual bass system,” January 2010.

  11. W.-S. Gan and Nay Oo, “Analytical and perceptual evaluation of nonlinear devices for virtual bass system,” May 2010.

  12. M. J. Hawksford and Nay Oo, “Perceptually-motivated objective grading of nonlinear processing in virtual bass systems,” Journal of the Audio Engineering Society, vol. 59, no. 11, 2011.