A A A Objective measurements and subjective assessment of the speech intelligibility in rooms with an Active Acoustic Enhancement Systems Łukasz Błasiński 1 , Anna Pastusiak 2 , Jędrzej Kociński 3 , Maciej Buszkiewicz 4 Department of Acoustics, Faculty of Physics, Adam Mickiewicz University, Poznań, ul. Uniwersytetu Poznańskiego 2, 61-614 Poznań, Poland ABSTRACT Speech Transmission Index (STI) remains one of the most important parameters used for the speech intelligibility assessment indoors. Achieving its sufficient values is crucial not only in classrooms and offices but also at cultural centers and halls. As the Active Acoustic Enhancement Systems (AAES) are getting more and more commonly used, we decided to verify how such systems affect speech perception. Objective parameters were measured in three rooms equipped with the AAES. The listening test was designed to verify if the subjectively assessed intelligibility would be compatible with the results of the objective measurements. Speech material, presented to the 180 normally hearing subjects, was obtained by convolving 50-element nonsense word lists with the room’s impulse responses. The measured RT values varied from 0.91 to 4.21 s and STI from 0.54 to 0.64 Despite the significant differences in the RT values, STI coefficient values remained only slightly decreased. RT and EDT were highly correlated with intelligibility in the opposition to the C50 parameter. The results proved that the comparison of the subjective examination’s results and calculated objective parameters is crucial in terms of comprehensive speech intelligibility evaluation. 1. INTRODUCTION Conditions of the speech presentation may concern the appearance of the background noise, but, in the case of indoors, one, even more important problem remains the reverberation, that value is strongly related with speech intelligibility. Speech intelligibility [1], then, refers to how well the meaning of speech is transmitted to the listeners. In general, the higher the value of reverberation time, the lower speech intelligibility is, due to the overlap-masking effect [2]. Providing sufficient quality of speech remains the most important aim of broadly understood room acoustics. This issue, apart from being relevant, remains sophisticated and hard to analyze in detail. Not only do halls need to fulfill their regular function, but also provide comfortable listening conditions while the room's role changes (so called ‘multi-use’). However, modifying acoustic parameters indoors often involves high costs and may negatively influence the visual attributes, such important in e.g., auditoria or theaters. One of the available solutions are Active Acoustic Enhancement Systems (AAES) which are getting more and more widespread in architectural acoustics. Involving powerful DSP technologies, precise tuning techniques, as well as high quality electro-acoustic systems, they allow the acoustic parameters of a halls and venues to be changed far more extensively and controllably than with passive techniques (e.g., absorbing panels and drapes). As a result, e.g., the energy of early reflections and reverberation time may be increased, while maintaining the natural sound of the room. [3, 4]. Three most popular types of AAES are: in-line (which synthesizes reflections based on the direct sound), regenerative (where reflections are added to the venue’s original reflections) and ‘hybrid regenerative’ combining both methods. Although the AAES history, as well as their technical parameters are quite well described, the influence of different settings on speech perception measured in laboratory conditions (subjective assessment) remain unknown. 1 lukasz.blasinski@amu.edu.pl 2 anna.pastusiak@amu.edu.pl 3 jedrzej.kocinski@amu.edu.pl 4 maciej.buszkiewicz@amu.edu.pl The most natural method for subjective speech intelligibility estimation remains listening tests [5-7]. The type of used material can be very diverse - consisting of short words, syllables (with or without meaning) or sentences. The results of the subjective speech transmission’s quality should depend, to the maximum extent, on the physical parameters of the tested communication channel, and not on the structure of the language test itself. The elimination of information at the semantic level is ensured while using logatome lists. Those are available also in Polish [8]. They are, in fact, more difficult for listeners, but, on the other hand, more reliable, which is related to the logatome’s low redundant nature (comparing to e.g., words or digits) - using nonsense words has the advantage of removing the higher language processing that people use to understand words with degraded quality and so provides a less biased measure [9]. In effect, the influence of the cognitive association is limited and hearing acuity remains more important than a lexical prediction based on the speech context as well as participant’s vocabulary. This results in decreasing speech intelligibility with the amount of provided useful (meaningful) information being reduced [10]. To avoid unnecessary fatigue impacting speech perception on one hand, and provide sufficient accuracy on the other, lists of 50 or 100 logatomes are typically used [11]. Test lists should be phonetically balanced that includes uniform distribution of initial consonants, vowels and final consonants, according to their statistical occurrence in a certain language. Apart from subjective assessment of speech perception, also objective methods may be used for describing the acoustical situation in different enclosures and defining parameters such as reverberation time, clarity, early decay time, speech transmission index and others. It can be concluded that the mutual relationships between the objective parameters and the subjective assessment, still remains ambiguous to some extent. In this paper, speech intelligibility in different conditions defined by AES settings was assessed using the results of objective measurements (impulse response convolved with the test material) and subjective method of speech intelligibility evaluation (polish logatome test). Then, obtained results were compared to find a relationship between logatome intelligibility and objective measures of the room: speech transmission index, reverberation time, early decay time and clarity. 2. MATERIALS AND METHODS 2.1. Objective measures 2.1.1. Reverberation time (RT) RT is usually in the range of 0.7 - 2.0 that enables good intelligibility of speech (or music) and prevents sounds overlap that could happen with too long RT values. Listeners can follow the sound pressure level decrease until the room’s noise level is achieved. In effect, subjective perception of the reverberation depends on the excitation and the noise level, remaining consistent with the Early Decay Time parameter, rather than the reverberation time itself [12]. 2.1.2. Early decay time (EDT) EDT can be derived from the decay curve between 0 dB and 10 dB below the initial level and relates most to modulation reduction (hence - speech recognition reduction) as it relates more than the other reverberation parameters to the initial and highest level part of the decaying energy [13]. 2.1.3. Speech Transmission Index (STI) STI is one of the most important objective parameters used for assessing speech intelligibility indoors. It takes into account not only location’s acoustic conditions, but the characteristics of the entire transmission channel [14]. The STI is calculated from the measurement results of the Modulation Transfer Function MTF [15] and is based on the assumption that the distortion of amplitude modulation of the informative signal is crucial for speech intelligibility decrease. Transmitted signal’s quality is determined by the value on a scale from 0 (bad intelligibility) to 1 (excellent intelligibility), which corresponds with the amount of modulation retained in all combinations of octave scales and modulation frequency. STI over 0.6 is treated as ‘good’. 2.1.4. Clarity (C50, C80) The balance of sound arriving early to that arriving late in the impulse response determines the clarity of sound indoors [16]. In general, late reflections are unfavorable for understanding speech because it may cause speech sounds to merge making speech unclear. However, if the delay does not exceed a certain time limit, the reflections will contribute positively to the intelligibility. In other words, C50 (introduced by Marshall [17]) is the logarithmic early-to-late arriving sound energy ratio, where “early” means “during the first 50 (or 80) ms'' and “late” means “after the first 50 (or 80) ms [18]. 50 ms is the critical time limit separating useful from detrimental reflections. In this paper we use weighted C50. 2.2. Measurement set-up To assess acoustical parameters describing reverberation time and speech clarity in tested halls, stereophonic room responses (RIR) were recorded. Three halls in three cities (Minsk Mazowiecki, Pulawy and Szczecin) were considered. All of them have similar characteristics and installed AAES. According to the ISO standards [19], an omnidirectional Neotek Dodecahedron DO12-PLUS sound source and iSEMCom EMX7150 omnidirectional microphones were used. Both the measurements and their analysis were performed using the EASERA (AFMG Technologies GmbH) software. IACC parameter value was measured with a Neumann KU100 dummy head. In the next step, speech material used in the listening test was convoluted with obtained IRs. In the subjective evaluation part, sound files were played from a computer and presented binaurally through Sennheiser HDA201 headphones and SR46OH DOD preamplifier. The output sound-pressure levels of the headphones were previously calibrated using a Brüel & Kjaer 2203 level meter and a Brüel & Kjaer 4152 artificial ear. Speech test was presented at the level of 65 dB SPL, without masking signals in the background in a room compliant with ANSI S3.1-1999 [20]. 2.3. Subjects The group of 180 volunteers took part in the experiment. Their age span was from 19 to 49 (mean value: 24.5). Before the test, each subject underwent pure tone audiometry (PTA) using GSI 61 clinical audiometer and HDA200 headphones. Mean pure-tone thresholds (average across frequencies 0.5, 1, 2, and 4 kHz, PTA) are presented on Figure 1. All listeners had normal hearing, according to WHO [21]. Maximum 6 listeners took part in the experiment in the same session. Figure 1: Partakers' mean pure-tone audiometry thresholds (PTA4) gf i = g Fe 3 2 * © © ¢ © @ GH ap] fone Suey 2.4. Intelligibility test The main part of the experiment was speech intelligibility (SI) measurement. All subjects were informed that they would be listening to meaningless word-like structures. To familiarize partakers with the type of speech material, a practice presentation preceded each actual measurement. Similarly, to the majority of the researches employing logatome test ([22-24]) open-set form was used - listeners were asked to reproduce the perceived test items (in this case in a written form) on a specially provided form. Three lists were presented to the subjects. Each contained 50 different logatomes . The silence gaps between the logatomes in the recordings lasted seconds to allow partakers to write heard test elements. Binary scoring was used in the assessment of logatome intelligibility - only correctly written logatomes (except spelling mistakes) were counted as correctly understood. The total level of the speech material headphone presentation was 65 dB SPL (previously calibrated) - similar to the typical sound level in the investigated enclosures. Apart from the recognition test, participants provided subjective ratings of listening effort after completing each list using a 7-point scale from 1 (no effort) do 7 (extreme effort). 3. RESULTS In each of the three rooms, the objective parameters were carried out according to the ISO standards (see section 2.2). Their mean values are given in Table 1. Table 1: Objective parameters measured in tested enclosures. RT [s] STI EDT [s] C50 0.75 0.69 0.66 5.10 Puławy 1.08 0.65 0.80 4.80 1.43 0.64 0.87 3.80 0.91 0.64 0.83 1.50 Minsk 1.24 0.61 0.98 0.80 1.61 0.57 1.24 0.10 2.5 0.56 1.68 0.40 Szczecin 3.13 0.55 2.09 0.40 4.21 0.54 2.88 0.10 It’s worth mentioning, that the differences in obtained RTs and EDTs are related to the double- slope decay curve, typical for rooms with operating AAES. As all three measured rooms (in Pulawy, Minsk and Szczecin) have similar capacity, number of seats and similar AAES installed, it is possible to analyze obtained results altogether. Specific relationships between objective and subjective measures are presented in the following sections. 3.1 Logatome intelligibility vs. RT Logatome intelligibility measured as a number of correctly written test’s elements in different reverberated conditions for three halls (Minsk, Pulawy and Szczecin) are presented in Figure 2. Figure 2: Logatome intelligibility [%] as a function of RT [s] There is a strong correlation between RT and logatome intelligibility as coefficient of determination (R 2 ) reaches 0.85. The relationship of intelligibility and RT is: SI = -8.25*RT + 78.26 (1) where SI is logatome intelligibility [in %] and RT - reverberation time [s]. 3.2. Logatome intelligibility vs. EDT Figure 3 presents logatome intelligibility as a function of EDT. Linear fitting was used to outline the relationship between EDT and SI (similarly to RT and SI). Significant correlation may be observed: R 2 =0.89; SI = -21.00*EDT + 87.4 (2) where SI is logatome intelligibility [in %] and EDT is early decay time [s]. Figure 3: Logatome intelligibility [%] as a function of EDT [s] 100 20 20 10 R= 083 0s 18 EDT|[s} 2 25 3.3. Logatome intelligibility vs. C50 Figure 4 presents speech material intelligibility as a function of C50: C50=4.25*SI + 55.15 (3) where SI is logatome intelligibility [in %] and C50 – clarity. In general, intelligibility is lower for low C50 values R 2 parameter reaches only 0.62, meaning poor correlation. Figure 4: Logatome intelligibility [%] as a function of C50 [dB] 3.4. Logatome intelligibility vs. STI 100 $ 8 @ @ &@ ¢ ® equ! ewoyeBo} & 4s 35 18 08 Strong correlation (R 2 =0.88) between subjective logatome intelligibility and objective STI parameters is presented in the Figure 5. Figure 5: Logatome intelligibility [%] as a function of STI 3.5. Listening effort Apart from the intelligibility test, all participants were asked to rate perceived listening effort on a scale from 1 (no effort) to 7 (extreme effort) after each logatome list. Declared listening effort across all proposed RTs is presented in Figure 6. 06 0.65 o7 stl 0.55 8 96] AyuqSijequ! ewore6o Figure 6: Declared listening effort across all proposed RTs Relationship between SI and listening effort is presented in Figure 7. 25 35 45 RT fs] 18 0s Figure 7: Declared listening effort and logatome intelligibility 4. DISCUSSION 4.1 Logatome intelligibility measurements Results of the performed measurements proves that there is a significant correlation between logatome recognition and RT - intelligibility decreases with increasing reverberation time. What is interesting, for three longest RTs, changes in their values (which are up to 2.5s) do not imply intelligibility reduction - it is kept at the constant level of around 50%, which is higher than observed by Kocinski et. al. [18] for the same speech material (50% vs. 30-35% in the same ranges of the RTs). Similar tendencies may be observed for the EDT. A slight tendency that for lower C50 values, logatome intelligibility is low and increases along with the increase of C50 may be observed. As shown, logatome intelligibility is weakly correlated with logatome intelligibility. It is in line with the results obtained by others (e.g., [18]). The most important, is the relationship between subjective and objective (STI) parameters describing speech intelligibility. In this research, a strong correlation of R 2 =0.88 was proved. The obtained logatome recognition vs. STI is not consistent with the data presented by Houtgast and Steeneken - although the curve shape is similar, shift in intelligibility values may be observed - in this research intelligibility is significantly lower for all tested STIs (over 0.54). One of the reasons standing behind this fact may be that in the cited article, Dutch logatome test was used. It’s possible 100 8 2 e [0] @8 28 8 |Byy2,U! ewo\eBo| listening effort that speech material-related differences could influence obtained data. Especially, that the STI vs. intelligibility results gathered in our experiment are precisely in line with Kocinski et al., who also used polish logatomes. It should be noted that both the intelligibility expressed by the STI and the SI for a reverberation time of 2.5 seconds are lower than expected. When analyzing the course of the regression line, the SI value of around of 60% was to be expected. 4.2. Listening effort Although a slight trend may be observed - increasing mean listening effort values with increasing RTs as well as higher listening effort related to decreased logatome’s intelligibility, no statistical significance was found (p>0.05). One of the reasons standing behind this fact is that the 7-point scale may not be sensitive enough to differences in RT values. It is interesting to note that even though in four longest reverberation times STI reaches <0.6 (which means only ‘fair’ quality rating according to the IEC 60268-16), the LE scores are not significantly different from those determined in lower RTs (with STI>0.6). Changing the environment, although less favorable, still does not significantly affect cognitive effort. Moreover, it is possible that the results would have been different if another way of evaluating listening effort had been proposed, e.g., the use of an adaptive method, putting logatome samples that differ in RT in an AB-type test or asking for rating of the shorter parts of the logatome lists randomly mixed. Even though measuring listening effort on a 7-point scale in such a range of RTs does not yield spectacular results nor reflect differences in the comfort of signal perception under different acoustic conditions (which, in general, are unambiguous), the authors believe that eliminating subjective effort assessment from speech intelligibility test battery is not justified. However, redefinition of the study protocol in this aspect should be the subject of further research. There are also other limitations of the conducted experiment. What else may appear to be problematic is that in the real communication process, basic units are sentences [25, 26]. On one hand, logatome tests are of low redundancy and may be used multiple times. However, it could be more natural to measure intelligibility in different acoustic conditions using material similar to everyday use (sentence tests) . The other aspect of conducted research is that a very limited literature is available for comparison, as there are no results of speech intelligibility measurements performed in AEES-based environments published. Not to mention the fact that, in general, speech intelligibility under different reverberant conditions is studied, but the results are not fully in agreement and it is difficult to obtain unequivocal conclusions [18]. 5. CONLCUSIONS In this paper the relationship between the speech intelligibility value expressed by the STI and SI (% of correctly understood logatomes) in various reverberation conditions was examined. The reverbs for which the test was performed were generated using AAES. 180 listeners with normal hearing took part in the test. For short reverberation times (shorter than 2.5 seconds), the values of the STI and SI are similar both for rooms with and without AAES. For the tested rooms with reverberation longer than 2.5 seconds, both the STI and SI values did not decrease with the increase of the reverberation time. The STI value is approximately 0,55 and the SI value is around 50%. The use of AAES to generate short reverbs causes the conditions of speech intelligibility to be similar to those for rooms with a natural reverb. The use of AAES to generate long reverberation times (greater than 2.5 s) does not adversely affect the level of speech intelligibility and allows for better intelligibility than in the rooms with natural reverberation of long decays. At the same time, in relation to the entire tested reverberation time range there is a correlation between the extension of the reverberation time and the decrease in speech intelligibility. Due to the above, the long reverberation time from AAES can be used creatively during the theatrical, operas or other performances without adversely affecting the understanding of content. Due to increasing popularity of AAES and the importance of speech intelligibility during performances, further research in this area should be carried out. It also would be interesting to compare the Dutch and Polish logatome tests under the same reverberation condition. Although no statistical significance was found, one can notice that with the increase of reverberation time, the effort declared by the listeners slightly increases. For the three shortest reverberation times, the declared effort is slightly lower than 3, and for all other reverberation times it is slightly higher than 3 (on a seven-level scale). It shows that the effort to understand logatomes does not depend on reverberation time. 6. REFERENCES 1.Houtgast, T., Steeneken, H.J.M. The MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. J. Acoust. Soc. Am. 77 , 1069–1077 (1985) 2.Bradley, J.S., Sato, H., Picard, M. On the importance of early reflections for speech in rooms, Journal of Acoustical Society of America , 113 , 3233– 3244 (2003) 3.Bakker, R., Gillian, S. The history of Active Acoustic Enhancement Systems. Proceedings of the Institute of Acoustics , 36(2) , (2014) 4.Poletti, M. A. Active Acoustic Systems for the Control of Room Acoustics, Proceedings of the International Symposium on Room Acoustics . Melbourne, Australia, August 2010 5.Steeneken, H.J.M. On measuring and predicting speech intelligibility, Doctoral thesis , University of Amsterdam (1992) 6.Steeneken, H.J.M., Houtgast T. Validation of the revised STI method, Speech Communication , 38 , 413–425 (2002) 7.Sato, H., Morimoto, M., Wad,a M. Relationship between listening difficulty and acoustical objective measures in reverberant sound fields, Journal of the Acoustical Society of America , 123(4) , 2087–2093 (2008) 8.Brachmański, S., Staroniewicz, P. Phonetic structure of a test material used in subjective measurements of speech quality, Speech and Language Technology , 3 , 71–80 (1999) 9. Danhauer, J., Doxle, P., Lucks, L. Effects of Noise on NST and NU 6 Stimuli, Ear & Hearing, 6 , 266-269 (1985) 10.Stickney. G.S., Assmann, P.F. Acoustic and linguistic factors in the perception of bandpass- filtered speech. J Acoust Soc Am , 109(3) , 1157-1165 (2001) 11.Howard, D.M., Angus, J.A.S. Acoustics and Psychoacoustics, 5th Edition, Routledge, 2017. 12.Ahnert, W., Tennhardt, H.P. Acoustics for Auditoriums and Concert Halls. In Ballou, G.M. (editor) Handbook for Sound Engineers , Focal Press, 2008 13.Kuttruff, M. Room Acoustics , 5th Ed., Spon Press, 2009 14.Houtgast, T., Steeneken, H.J.M. A multi-language evaluation of the RASTI-method for estimating speech intelligibility. Acustica, 54(4) , 185-199 (1984) 15.Houtgast, T., Steeneken, H.J.M. The Modulation Transfer Function in Room Acoustics as a Predictor of Speech Intelligibility, The Journal of the Acoustical Society of America, 54 , 557-557 (1973) 16.Cox, T. J., Davies, W. J., Lam, Y. W. The Sensitivity of Listeners to Early Sound Field Changes in Auditoria, Acta Acustica united with Acustica , 79(1) , 27-41 (1993) 17.Marshall, L.G., An acoustic measurement program for evaluating auditoriums based on the early/late sound energy ratio, Journal of Acoustical Society of America , 96(4), 2251–2261 (1994) 18.Kociński, J., Ozimek, E., logatome and sentence recognition related to acoustic parameters of enclosures, Archives of Acoustics , PAN, 42(3) , 385-394 (2017) 19.PN-EN-ISO3382, Acoustics – Measurement of room acoustic parameters – Part 1 and Part 2 (2010) 20.ANSI/ASA S3.1-1999, Maximum permissible ambient noise levels for audiometric test rooms. Washington, USA: American National Standards Institute, (2008) 21. Report of the informal working group on prevention of deafness and hearing impairment programme planning , Geneva, 18-21 June 1991. Geneva: World Health Organization; 1991. Available from: http://www.who.int/iris/handle/10665/58839 (acces: 27.04.2022) 22.Doyle, K.J., Danhauer, J.L., Edgerton. B.J. Features from Normal and Sensorineural Listeners’ Nonsense Syllable Test Errors. Ear & Hearing, 2(3) , 117-121 (1981) 23.Butts, F.M., Ruth, R.R., Schoeny, Z.G. Nonsense Syllable Test (NST) Results and Hearing Loss. Ear & Hearing , 8(1) , 44-48 (1987) 24.Dillon, C., Pisoni, D.B., Cleary, M., Carter, A.K. Nonword imitation by children with cochlear implants: consonant analyses. Arch Otolaryngol Head Neck Surg, 130(5) , 587-591 (2004) 25.Ozimek, E., Kutzner, D., Sęk, A.P., Wicher, A., Szczepaniak, O. The Polish sentence test for speech intelligibility evaluations measurements, Archives of Acoustics, 31(4) , 431–438 (2006) 26.Plomp, R., Mimpen, A.M. Improving the reliability of testing the speech reception threshold for sentences, Audiology , 18 , 43–53 (1979) Previous Paper 198 of 769 Next