A A A Volume : 43 Part : 2 Proceedings of the Institute of Acoustics Multi-domain analyses of acoustic room correction systems P. Thakkar, GP Acoustics (UK), Maidstone, Kent, UK 1 INTRODUCTION At low frequencies, where wavelengths of sound waves are comparable to the physical dimensions of the enclosing rooms, room modes contribute to several audible artefacts due to pressure irregularities and extended decay times at specific frequencies. While careful selection of room dimensions and layouts help minimise their audibility at some listening positions[1]–[3], multiple room modes may still be simultaneously excited to varying degrees. Common acoustic treatments, such as absorbers and diffusors, are often impractical in this frequency range due to their size and operating range limitations. As a result, acoustic technologies, such as room equalisation and room correction, are popular choices for mitigating the detrimental effects of low-frequency room modes in professional and domestic spaces alike. In this paper, three objective performance metrics are derived to contrast the performance of the two types of technologies in rectangular rooms across the time, frequency, and spatial domains. 2 IDEAL LOW-FREQUENCY ROOM PERFORMANCE METRICS 2.1 Steady-state Performance and Steady-state Deviation (SSD) Room modes manifest themselves as severe resonances (peaks) and anti-resonances (notches) in the in- room transfer functions of the source loudspeakers. The variations can be as much as 25dB, resulting in significant colouration perceived at the listening positions. The steady- state deviation metric (SSD) in equation 1 is the deviation of the RMS steady-state pressure response, over multiple listening seats, from a flat line coinciding with the mean pressure level within the analysis window of 20 – 100Hz – lower values indicate flatter responses in the range of interest. The assumption is that a flat line response with no peaks or dips represents a room transfer function with no room modes or associated audible artefacts. Figure 1: In-room and anechoic response of the KEF LS50 Meta loudspeaker (1m on-axis) π»π ππ (π) is the RMS average SPL of the pressure response across all seating positions at frequency bin π, π»ππ£πππππ is the average SPL and, π is the total number of frequency bins between 20-100Hz. The RMS average SPL, π»π ππ, is given by where ππ(π) is the pressure response at the ππ‘β seating position and ππ‘β frequency bin and π is the total number of seating positions. The RMS average rather than the complex average is used, as the latter would imply superimposition of the room transfer functions, which is not an accurate representation of the perceived room acoustic behaviour. Finally, the flat line average level is calculated as 2.2 Time Domain Performance and Modal Decay Time (MT60) The occurrence of room modes causes energy to build up between reflecting surfaces. This phenomenon, combined with the ineffective damping of low-frequency sound energy offered by common room treatments and furnishings, causes the stored energy to decay at a significantly slower rate than nearby frequencies. Figure 2 shows an example of this behaviour. In terms of subjective impact, the extended decay times alter the temporal response of a loudspeaker in a room. For instance, short bass notes, like a kick drum or a bass guitar, may ring on at one or more closely spaced mode frequencies. Furthermore, pitch shifting may occur when a slowly decaying mode replaces a rapidly decaying mode at a nearby frequency, causing the perceived pitch to be different from the original. Therefore, the ideal low-frequency transient response of a loudspeaker-room coupled system should feature a consistent decay across all frequencies in the modal region. Modal decay time (MT60) is analogous to the reverberation time (T60) used to characterise diffuse sound fields. In the low-frequency region, energy decay is dominated by individual room modes around their centre frequencies. MT60 measures the time taken for a single mode's initial sound pressure level to decay by 60dB. Figure 2: Waterfall plot of a subwoofer in-room. Note the prolonged ringing at mode frequencies The typical approach for estimating MT60 involves calculating a Short-Time Fourier Transform (STFT) of the room impulse response to obtain exponential decay envelopes at the individual mode frequencies. Schroeder-integrated energy-time curves are then obtained from the squared decay envelopes. Finally, the MT60 times are derived from the slope of the energy-time decay curves. Schroeder backward integration for a time response β(π‘) is given by where β(π‘) is the exponential decay envelope. The energy-time decay curve πΈππΆ(π‘) for πΈ(π‘) is given by The slope or the decay rate π of the energy-time curve πΈππΆ(π‘) is obtained from a least-squares line fit (linear regression) given by where π is the optimal decay rate, π is the y-intercept of the linear regression line and [π‘1, π‘2 ] is the numerically determined fitting interval. Finally, the MT60 is calculated as Several practical limitations of the line fitting approach, described in equation 6, arise from the fitting interval [π‘1 , π‘2 ] . In practice, the regression line should only be fitted to the exponential decay portion (straight line on a log scale) of the energy-time envelope. Therefore, the initial onset and noise floor bias at the beginning and the end of the envelope should be excluded from the fitting interval, respectively. However, in cases where the useful dynamic range of the measured data is severely limited, alternative fitting approaches such as non-linear decay plus noise models or a dedicated noise compensation stage before ETC calculation may be more appropriate[4]–[6]. Additionally, the stationary nature of the FFT calculation within individual STFT analysis windows may distort the true decay envelope of the mode peaks to some extent[7]. Nevertheless, the proposed method is sufficiently reliable for relative comparisons, such as the ones presented in this paper. 2.3 Spatial Domain Performance and Mean Seat-to-Seat Variation (MSSV) Room mode transfer functions are highly variable with source and listener positions throughout the rooms. The variations may be significant with as little as small head movements. Whilst it is reasonable to assume that the source positions may remain fixed in a listening setup, no such assumptions can be made about listener positions. For instance, in a group listening environment such as a home cinema or listening room, listeners less than a metre away from each other may experience vastly different low-frequency performances from the same source loudspeakers. Therefore, to ensure all listeners within a specific listening area perceive the same low-frequency performance, seat-to-seat pressure variations should be minimised as much as possible. Mean seat-to-seat variation (MSSV) is the average variance of the SPLs observed across a fixed number of seating positions within a listening area, given by where |π»(π, π)| is the SPL at the ππ‘β seat at frequency bin π . π»ππ£πππππ (π) is the average SPL across all π receivers at frequency bin π and π and π are the total number of frequency bins and receivers, respectively. Note the difference with equation 1, where π»ππ£πππππ is calculated over the entire RMS pressure response as opposed to an amplitude average per frequency bin for π»ππ£πππππ(π). This metric is also known as the mean spatial variation (MSV)[8]. 3 ACOUSTIC ROOM CORRECTION SYSTEMS 3.1 Digital Room Equalisation For room equalisation systems, the correction approach involves pre-processing the excitation signal to reduce or limit the influence of the physical room acoustics on the direct sound field reaching the listeners. However, due to the high variability of room transfer functions with source and receiver positions, room equalisation systems typically only aim to improve the system performance over a pre-determined listening area. Examples include the parametric equalisation of individual room modes [4], [5] and simple adjustments such as tone and low-frequency tilt controls found on amplifiers and AV receivers. Modal equalisation is a form of parametric equalisation that operates on individual room modes by pre-filtering the primary signal to reduce the net excitation and increase the consequent decay rate corresponding to each of them. A modal resonance can be represented as a low pass filter in the s- domain, given by where πππ is the unequalised Q factor of the modal resonance and is the angular mode frequency π€ππ = 2πππ. πππ is inversely proportional to the exponential modal decay rate π in equations 6 and 7. Therefore, by reducing πππ, a room mode at frequency ππ can be proportionately controlled. The general form of a modal equaliser is given by where ππ‘πππππ‘ is the target Q factor of the mode after equalisation. In normal use, ππ‘πππππ‘ < πππ such that the modal decay rate is increased after equalisation. As such, the equaliser described by equation 10 is a notch filter with peak attenuation πΎ , given by Figure 3 illustrates a simple case of a modal equaliser operating on a single mode peak at 100Hz. The ratio of the equalised to unequalised Q factors is 0.3, corresponding to a peak attenuation of -10dB and a 70% reduction of the modal decay time. This simple scheme can be extended to equalise multiple mode peaks by cascading multiple independent notch filters. Physically, modal equalisation is approximately analogous to adding virtual resonant absorbers to attenuate targeted mode peaks to a fixed degree. Pressure nulls in the room transfer functions are left untreated under this equalisation scheme. Figure 3: Modal equaliser operating on a single mode at 100Hz. Q fn = 20, Q target = 6, peak attenuation = -10dB 3.2 Physical Room Correction The room correction approach concerns the entire acoustic environment instead of a targeted listening area. The goal for these systems is to eliminate room modes by complete suppression or cancellation such that they no longer impact the direct sound field produced by the primary systems. Modal suppression and cancellation are achieved by deploying multiple carefully positioned sound sources throughout the acoustic environment[8], [11]. At mode frequencies, the physical distances travelled by the incident waves before reflection are multiples of the half wavelengths resulting in constructive summation and a pattern of pressure nodes and antinodes along the related rectangular room dimensions. By placing sources at pressure nodes of a particular mode, energy transfer and, therefore the net excitation of that mode, are minimised. Naturally, higher-order modes sharing the same nodal line are suppressed simultaneously. Similarly, by placing two identical sources symmetrically about the nodal line, net zero excitations at that mode and related multiples of higher-order modes are achieved due to cancellation. Figure 4 illustrates this concept inside an ideal one-dimensional duct. Figure 4: Simulated pressure response at x=5m inside an ideal 1D duct of length 5m. Sources are assumed to be ideal monopoles with a free field sensitivity of 90dB In practical application, careful placement schemes featuring two or four identical and identically driven sources can be implemented to achieve meaningful broadband control below 100Hz, as shown in Figure 5[8], [12]. Figure 5: Source placement schemes for modal control. All sources are assumed to be identical and identically driven 4 MODAL DECOMPOSITION ROOM MODEL AND CONTROL CASE 4.1 Low-frequency Modal Decomposition Model A modal decomposition model was chosen over its counterpart, the image source model, primarily for its computation speed. In terms of accuracy, both are comparable up to 125Hz, where the modal density is still relatively low, beyond which both differ significantly from real in-room responses[2]. At low frequencies, the in-room loudspeaker behaviour can be modelled as a linear combination of the room resonances. Equation 12 describes the pressure sound field in a rectangular enclosure with rigid boundaries, a full derivation of this expression is given in Nelson and Elliot[13]. Where p , the complex pressure amplitude at the receiver position πΜ within the enclosure volume V, is given by a linear combination of the enclosure eigenfunctions Ψπ (πΜ ). π is the angular frequency of excitation, π0 is the ambient fluid density, π is the wavenumber (given by where π0 is the speed of sound) and ππ is the ππ‘β eigenfrequency expressed as a wavenumber. The magnitude of excitation of each eigenfunction is determined by the summation on the right of the expression, where ππ describes the volume velocity of the source π and Ψπ( π Μ ) is the value of the ππ‘β eigenfunction at source position π Μ . Note that for this cuboid case, π is short for ππ₯ππ¦ππ§ .The sources were assumed to be ideal monopoles with a uniform free-field pressure sensitivity of 90dB across all frequencies. The eigenfunction Ψπ for a cuboid room is given by The Eigen wavenumbers kπ and scaling factors ππ are found by equations 14 and 15, respectively. The scaling factors π π ensure that the volume integral of each eigenfunction is normalised to the room volume. These are In addition, real room boundaries have some absorption or damping characteristics. To include this and to ensure that the solution for pressure is not infinite at mode frequencies, equation 12 needs to be adjusted as shown in equation 16. Where ππ represents the Q factor of the ππ‘β eigenfunction or mode and π is the complex operator √−1 . For lightly damped enclosures, ππ is relatively large. For the case studies presented in this paper, this was set to 20 for all modes up to 100Hz. The frequency limits for the numerical room model and calculation of the metrics described in section 2 are 20-100Hz. 4.2 Control Case This paper analyses room correction systems for stereo performance in a lightly damped rectangular room of dimensions 7.0 x 5.0 x 2.7m. The primary stereo pair are positioned at 1.0 x 1.0 x 1.2 m and 1.0 x 4.0 x 1.2 m as the L and R channels, respectively. A targeted listening area of 6 seats is centred around the typical stereo listener position derived from an equilateral triangle arrangement, as shown in Figure 6. The seated ear height is assumed to be 1.2m, as per BS1116-3[14]. The sources and listening seats are positioned to represent a typical home cinema setup with two rows of seating. The RMS average of the pressure responses obtained using equation 16 and the corresponding aggregate time-frequency behaviour across all six seats are shown in Figures 7 and 8, respectively. The control room acoustic parameters and the corresponding performance metrics are shown in Table 1. Figure 6: Rectangular room with stereo primary speakers and a 6-seat listening area Figure 7: RMS pressure average across all 6 seats. Room correction case: OFF (control) Mode frequency (Hz) Mode order (L W H) MT60 (s) 24.5 1 0 0 1.91 49.0 2 0 0 0.96 68.1 1 0 1 0.69 68.6 0 2 0 0.71 72.8 1 2 0 0.82 84.3 2 2 0 0.56 SSD (dB) 7.32 MSSV (dB2) 14.27 Table 1: Control room acoustic parameters and performance metrics Figure 8: Spectrogram plot for the RMS pressure average in Fig 7 (Magnitude normalised and IR zero padded by 0.15s) 5 RESULTS 5.1 Digital Room Equalisation A modal equaliser was set up to attenuate all modes in Table 1 by 6dB, which corresponds to halving their respective pressure amplitudes. Six independent notch filters, given by equation 10, were cascaded before applying the result to the excitation signals for the primary L and R channels. The results are summarised in Table 2 and Figure 9. Mode frequency (Hz) Mode order (L W H) MT60 (s) 24.5 1 0 0 0.96 49.0 2 0 0 0.48 68.1 1 0 1 0.54 68.6 0 2 0 0.53 72.8 1 2 0 0.63 84.3 2 2 0 0.23 SSD (dB) 7.41 MSSV (dB2) 14.27 Table 2: Control room acoustic parameters and performance metrics after room equalisation Figure 9: Spectrogram plot for the RMS pressure average after digital room equalisation (Magnitude normalised and IR zero padded by 0.15s) Clear improvements in modal decay times are achieved across all modes. However, the steady-state and spatial performances are unaffected. This result is expected as equalisation of mode peaks alone is insufficient to achieve a flat in-room response where pressure nulls below 100Hz still dominate the flatness metric. For this reason, commercial modal equalisation systems often opt for an additional inverse-filtering based magnitude equalisation stage to improve the overall flatness (SSD) of the equalised responses at the selected seats. As for the spatial performance or MSSV, equalising the excitation signal reduces the net excitation at the targeted modes, however, this reduction is uniform across all seats and consequently, does not improve the underlying seat-to-seat variation. 5.2 Physical Room Correction The room correction system was based on the four midpoints scheme illustrated in Figure 5. Four identical and identically driven monopole sources were positioned halfway up the height and at the midpoints of the four enclosing room boundaries. The primary L/R channels were high-pass filtered to form a 2.4 channel bass-managed system. The results are summarised in Table 3 and Figure 10. Mode frequency (Hz) Mode order (L W H) MT60 (s) 24.5 1 0 0 0.12 49.0 2 0 0 0.11 68.1 1 0 1 0.13 68.6 0 2 0 0.15 72.8 1 2 0 0.39 84.3 2 2 0 0.56 SSD (dB) 5.23 MSSV (dB2) 4.20 Table 3: Control room acoustic parameters and performance metrics after physical room correction Figure 10: Spectrogram plot for the RMS pressure average after physical room correction (Magnitude normalised and IR zero padded by 0.15s) As seen above, the performance of the control room is significantly better across all three domains. This improvement is attributed to two key features. First, sources placed at the mid-points of the length and height dimensions coincide with the pressure nodes of the fundamental and all odd-order axial modes along those dimensions, resulting in complete modal suppression at those frequencies (24.5Hz and 68.1Hz). Second, by virtue of symmetrical placement of sources within the room, excitation generated at even-order axial modes by one pair of laterally opposing sources is balanced and cancelled by the second pair of sources radiating with inverted polarities. This form of complete modal cancellation occurs at 49Hz, 68.6Hz and 72.8Hz. At 84.3Hz, all four sources coincide with pressure antinodes of the same polarity, resulting in significant excitation of the mode which decays at the same rate as the control case. Note that the MT60 times for the suppressed modes in Table 3 are largely determined by the STFT time-frequency resolution parameters. Finally, the overall bass level of the corrected system is significantly higher than the control system due to the room being driven by four sources instead of two. This can be addressed by adjusting the pre-amp gain for each of the four sources. Modal suppression and cancellation achieved by the physical room correction scheme eliminates all pressure undulations (peaks and nulls) caused by room modes up to 84.3Hz across all six seats. This results in a reduction in the steady-state deviation metric. Furthermore, by supressing the modes themselves, the same improvement is achieved across the entire volume of the room, i.e., lower mean seat-to- seat variation (MSSV). The RMS pressure averages for the three cases are shown in Figure 11. Figure 11: Post-correction RMS pressure averages across all six seats 6 DISCUSSION The previous sections summarise the motivation and performance of the two acoustic room correction systems across three domains of interest: time, frequency, and space. By simulating representative examples, it has been shown that modal equalisation systems can provide meaningful improvements in the time domain by pre-filtering the primary excitation signals of a typical stereo setup. However, while subsequent magnitude equalisation stages may further optimise their steady-state performance, equalising the primary signals alone is insufficient to improve their spatial domain performance and consistency. On the other hand, the room correction system achieved modal suppression and/or cancellation at most of the targeted frequencies. Consequently, the corrected sound field was free of resonant behaviour across all three domains. The critical limitation of the multi-source placement scheme is the lack of robustness with changes in the relative positions of the constituent sources. Minor deviation from their optimised positions may imbalance the suppression and cancellation mechanisms such that the net excitations at the affected modes are no longer close to zero. The residual modal excitation at such modes would then deteriorate the performance of the correction system across all three metrics. Furthermore, such correction schemes require multiple identical sources, making them expensive and space-consuming, especially in small domestic listening spaces. 7 CONCLUSION This paper objectively contrasts the acoustic mechanisms underpinning room equalisation and room correction systems. Three performance metrics have been derived; modal decay times (MT60), steady-state deviation (SSD), and mean seat-to-seat variation (MSSV) to characterise the performance of the correction systems in the time, frequency, and spatial domains, respectively. Only the room correction system with additional sources for spatial control of the resultant sound field has been shown to offer meaningful improvements across all three domains. Finally, while the chosen room model is only valid for rectangular rooms, the metrics presented can readily be used to assess the performance of non-rectangular rooms, provided the full complex room transfer functions are available by alternative means. 8 ACKNOWLEDGEMENTS The author would like to thank his friends and colleagues at KEF R&D for their help and support. 9 REFERENCES P. Thakkar and J. Oclee-Brown, “Statistical optimisation of room dimensions and layout for critical listening applications,” presented at the Reproduced Sound 2021, Nov. 2021. doi: 10.25144/13802. T. J. Cox, P. D’Antonio, and M. R. Avis, “Room sizing and optimization at low frequencies,” J. Audio Eng. Soc., vol. 52, no. 6, pp. 640–651, Jun. 2004. R. Petrolli, P. D’Antonio, J. Storyk, J. Hargreaves, and T. Betcke, “Non-cuboid iterative room optimizer,” Dec. 2020. M. Karjalainen, P. Ansalo, A. Mäkivirta, T. Peltonen, and V. Välimäki, “Estimation of modal decay parameters from noisy response measurements,” J. Audio Eng. Soc., vol. 50, no. 11, pp. 867–878, 2002. R. Dragonetti, C. Ianniello, and R. Romano, “A study about the improvement of the dynamic range of Schroeder plots by the method of the two impulse responses product,” Aug. 2005. A. Venturi, A. Farina, and L. Tronchin, “On the effects of pre-processing of impulse responses in the evaluation of acoustic parameters on room acoustics,” Proc. Meet. Acoust., vol. 19, no. 1, p. 015006, Jun. 2013. doi: 10.1121/1.4800277. R. Magalotti and D. Ponteggia, “Use of wavelet transform for the computation of modal decay times in rooms,” presented at the Audio Engineering Society Convention 147, Oct. 2019. T. Welti, “Optimal configurations for subwoofers in rooms considering seat-to-seat variation and low frequency efficiency,” Oct. 2012. R. Wilson, M. D. Capp, and J. R. Stuart, “The loudspeaker-room interface – controlling excitation of room modes,” May 2003. A. Mäkivirta, P. Antsalo, M. Karjalainen, and V. Välimäki, “Modal equalization of loudspeaker-room responses at low frequencies,” J. Audio Eng. Soc., vol. 51, no. 5, pp. 324–343, 2003. A. Celestinos and S. B. Nielsen, “Controlled Acoustic Bass System (CABS): A method to achieve uniform sound field distribution at low frequencies in rectangular rooms,” J. Audio Eng. Soc., vol. 56, no. 11, pp. 915–931, 2008. T. Welti, “How many subwoofers are enough?,” presented at the Audio Engineering Society Convention 112, Apr. 2002. P. A. Nelson and S. J. Elliott, Active Control of Sound. London: Academic, 1992. ITU-R BS.1116, “Methods for the subjective assessment of small impairments in audio systems.” Available: https://www.itu.int/rec/R-REC-BS.1116/en Previous Paper 9 of 16 Next