A A A Volume : 44 Part : 2 A virtual reality tool to aid in soundscapes in the built environment (SiBE) through machine learning Semiha Yilmazer 1 Ray W. Herrick Laboratories Purdue University 177, S. Russell Street, West Lafayette, IN, 47907, USA Patricia Davies 2 Ray W. Herrick Laboratories Purdue University 177, S. Russell Street, West Lafayette, IN, 47907, USA Cengiz Yilmazer 3 CSY R&D and Architecture Engineering, Bilkent, Ankara, 06800, TurkeyABSTRACTCan we use virtual-reality tools to design audio-visual elements and optimize them for the user’s benefit? A sound simulation algorithm in MATLAB was developed to extract the room impulse re- sponse of the test environment in a virtual environment (IRV) and compare it with the real environ- ment (IRR). In the developed program, the image source model (ISM) was used to be performed the real-time acoustic source generation and real-time signal recording, image functions, data pro- cessing, and metric calculation operations. To achieve audio and psychoacoustic analysis, we used a nature sound (Bird) and human-generated sound (Speech) as the original sound signals. B&K 4292 Omni-power loudspeaker, B&K 2716 power amplifier, and B&K 2250 hand-held analyzer were used to measure impulse response in the real environment. In addition to impulse response, psychoacoustic metrics such as loudness, sharpness, roughness, and fluctuation strength were calculated in both environments. The results showed that sound simulation, based on the image source model, demon- strates validation of the virtual environment, and can be used to characterize virtual reality develop- ment environments.1. INTRODUCTIONIt is a fact that manipulation has been used for centuries to influence, direct, convince, and even trick people. Architectural space accommodates many elements that affect the space’s visual1 syilmaze@purdue.edu2 daviesp@purdue.edu3 censem64@hotmail.comi, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOW perception, such as form, color, material, and light. Naturally, people can be impressed with a created space through its manipulation. Additionally, types of “dimensional, thermal, and audio” perception also affect our spatial perception. Can people’s perceptions of the spaces be manipulated by changing acoustic stimuli? Can we use the results of manipulations to design better spaces?The use of virtual reality aims to allow users to manipulate and experience the environment before it is physically modified or built. Would they change the virtual environment in the same way they would wish to alter the environment in the real world? Virtual reality is an immersive world where people can both explore and interact with the world that is created by using 3-dimensional computer technologies. Can we use virtual-reality tools to design audio-visual elements and optimize them for the user’s benefit? Can we imagine a system to create a virtual environment where we could overlap more than one sound object or get rid of the sounds that we do not prefer to hear, even when exposed to them due to the environment surrounding us? To answer these questions, we can refer to acoustics in architectural design, which would provide us with aspects such as sonic environment manipulation, visual-audio interactions, and the characterization/analysis of the space.i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOWWithin the sound quality assessment, psychoacoustic metrics such as loudness, roughness, sharpness, and fluctuation strength are used to help optimize the sounds within an environment to make them more pleasing 1 . This study enables the acoustic characterization of a virtual reality development tool to measure auditory perception based on user behavior. It aims to collect acoustic measurements from the real environment, create the virtual environment using the developed sound simulation program, and compare their results to validate the new virtual environment tool. 2. METHOD2.1. SettingsTo achieve audio and psychoacoustic analysis, we used a nature sound (Bird) and human-generated sound (Speech) as the original sound signals. Both have a 44100 Hz—file sampling rate. The duration of the bird signal is 2.45 s, while speech one is 9.94 s.(a) (b)an gh at an ae a ae | Vee N z a s 2 oy $ z 2 rm ee i)Figure 1: The spectrograms of original sound signals: (a) Bird and (b) Speech (F)SPEECH (F).wav Nv = x, > fs) c oy P] > 2 rm Ai il LS sD ed Figure 1 reveals the spectrogram of the original signals we used during the experimental setup. As seen in the spectrograms, the bird’s sound runs from left to right, with the high-pitched notes near the top, seeing in Figure 1 (a). It consists of reverse parts, separated by pauses approximately one-third of a second long. The phrases repeat except for the first ones. Like the bird spectrogram, the female speech spectrogram, given on the right side, shows the Fourier Transform of a signal as it varies with time. The magnitude of the frequency components is represented as changing colors from cold to warm colors, which means the warmer the color, the higher the magnitude. Color-mapped spectro- gram of a female voice says, “One of the universal rules of happiness is, always be wary of any helpful item that weighs less than its operating manual.” Voicing runs left to the right, and the lines represent a row of energy in very low frequencies, corresponding to the energy in the first and second harmonics. Figure 1 (b) shows that the pausing represents non-voicing, which is the silence between the vocals. Each vertical ‘line” means a single pulse of the vocal folds, a single puff of air moving through the glottis. Bird sound has the biggest energies at high-frequency components, from 5 kHz to 10 kHz, while speech has it from 125 Hz to 1 kHz. Some voiced fricatives increase the magnitude of the phrases through the 10 kHz sometimes, e.g., alwa-“ys” in 4s. and “le-ss th-an” in 8s. They show aspects of regular vocal fold vibrations. So, we shall say [s] has a higher average frequency than [l] does; both [s] and [t] are higher than others. In the study, the reason behind the selection of these bird and speech signals is to consider all possibilities occurring in the built environment to ensure a correct conversion and simulation to the virtual environment. 2.2. Ray W. Herrick’s Lab (HLAB)_Conference Roomi, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOWDuring the sound simulation, we plan to study working spaces like offices, conference rooms, etc. We selected a conference room at Ray W. Herrick’s Lab (HLAB) as a case study. The reason behind choosing this conference room is its shape, form, and used materials. i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOWFigure 2. Yellow rendered areas show the location of the HLAB conference room (top) plan of the conference room (bottom) section of the building, conference room is behind the two shaded offices across from a corridorHLAB conference room dimensions are 1130 x 630 x 300 (height) with 193 m 3 volume. HLAB conference room location is on the second floor of the building (Figure 2). This space is between the staircase and the offices, and all walls are in the interior space. The conference room building materials and finishes are carpet (floor), gypsum wall panels (two walls), heavy glass (two walls), and acoustical suspending panels (ceiling). All measurements are taken after 5 pm due to low background sound. However, the space has continuous background noise (Leq: 38 dB) because of the HVAC system.ISO 3382–1:2009 2 , a B&K (type 4292) standard dodecahedron Omni-power sound source was used to generate an acoustic signal with a B&K (type 2716) power amplifier. The impulse responses at the various measurement points were captured by a B&K (Type 4189) microphone incorporated into a B&K (type 2250) hand-held analyzer (Figure 3). The sampling frequency of the recorded multi- spectrum impulse was 48000 Hz, and the interval of interest was between 20 Hz and 18 kHz. The computer was used to generate sweep signals. Since sweep provides the highest peak-to-noise ratio (PNR) values, the results from these samples were utilized in post-processing. Up to five pre-averages were applied over multiple measurements, with an impulse response length of 1.5-2 s.Figure 3: Typical configuration for building acoustics measurements: the sound source, generator, analyzer, and PC for reporting (left side); Impulse Response measurement (right side) Ray W. Herrick’s Acoustic Laboratory provided real-size measurement equipment at Purdue University. 2.3. Test StimuliWe selected two sources, S1: in the middle; S2: at the corner. R1 to R4, four-receiver locations were in each corner, one meter away from the associated walls (Figure 4). Having bird and speech sound signals as independent variables, 16 alternatives were created to analyze (e.g., B_S1R1: Bird sound in S1 source and R1 location; S_S2R2: Speech sound in S2 source and R2 location, etc.).i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOWFigure 4. Three-dimensional view from Sketchup (left) and source (red dot) and receiver (blue dot) locations from the room sound simulation (right)The source and receiver locations were selected not only to reflect positions of significance in the different uses of the room but also to examine possible multi-rate decay patterns. Thus, the first source position (S1) represents a specific position of the talker, while the others were chosen to investigate different decay patterns.2.4. Sound Simulation Program and Psychoacoustic MetricsThe sound simulation program generates the impulse response in the enclosed shoebox form de- pending on the source and receiver locations with the Image Source Model (ISM). It uses this im- pulse response for auralization (Figure 5). The inputs to the program written in MATLAB are room dimensions, absorption coefficients of the materials, source (s) locations, receiver (s) locations, number of Loop, and Gain for Impulse Response. The sampling frequency is 44100 Hz.When correctly computed, and combined, psychoacoustic metrics (indices loudness, sharpness, roughness, and fluctuation strength) usually go a long way in predicting the adverse reactions like annoyance and sometimes positive sensations (e.g., pleasantness, eventfulness, etc.) produced by environmental sounds. Sound files were analyzed with the developed virtual-reality tool (Figure 6). Psychoacoustic metrics were extracted. To calculate loudness, the method for non-stationary sources described in ISO 532-1 was used 3 . Sharpness was calculated following the method described DIN 45631/A1 4 . For N 5 th , the 95 th percentile was calculated (Table 1). i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOWFigure 5: An example of the impulse responses from the S1source and R1 receiver location: (left) in the real conference room (IRR) and (right) in the virtual conference room (IRV)10 x10% 02 Time [ sec} 03 04 05This study performed real-time acoustic source generation and real-time signal recording, image functions, acoustic data processing, and metric calculation (psychoacoustic metrics, spectrogram, etc.) operations on Matlab ® 2021b.(a)10 8 0 x10 0.05 4 0.15 02 0.25 Time [sec] 03 0.35 4 0.45(b)8 F< a ° (sauos) sseupnoq Time (seconds)Figure 6: Loudness time histories: (a) Bird signal and (b) Speech signalgs 8B (sauos) ssaupno7 Time (seconds) Table 1: Metrics for Bird and Speech (F) stimuliLoudnessSharpnessRoughnessFluctuationLoudnessA-weighted sound pres-(N)-sone(S)-acum(R)- asperStrength (F)-vacilexceeded 5% of thesure level (LeqA)-dBAtime (N5)-soneBird 63.64 4.33 0.02 1.07 57.62 82.94 Speech (F) 40.13 1.60 0.08 0.24 68.79 75.073. RESULTSIt is seen in psychoacoustics studies that spectral manipulation of the indoor environment where sound propagates influences the listener’s perception. To observe the effects of spectral manipula- tion, it is necessary to use multiple octaves (24 bark) sounds.The original bird sound signal has high-frequency components. There are harmonic sounds and pitches at high frequencies (Figure 1). However, in Figure 7 (a), the pattern is deformed in the real environment as if there is a frequency-dependent absorption. It is also observed in figure 7 (a) that the spectral character of the original bird sound has deteriorated in space. Figure 7 (b) shows that similar deterioration is created in the virtual environment. Moreover, different forms and characters can be acquired.i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOWIn the conference room, the floor and ceiling materials used in the real environment are highly ab- sorbent, which are carpet and absorptive suspending ceiling panels, having 0.37 and 0.90 alpha val- ues, respectively. Since both environments’ characteristics should be the same during the percep- tual experimental studies, a filter function is used in the sound simulation program to be sure of fit- ting the sound energy decay curve.(a) (b)Figure 7: Spectrograms: (a) Bird sound in the conference room and (b) Bird sound in the Virtual conference roomvy = x, > fs) c o 5 > 2 rm sde (08) Frequency [kHz] Time [s] (a) (b)i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOWFigure 8: Convolved bird sound with impulse response: (a) in the real environment (IRR) and (b) virtual environment (IRV)Speech sounds mainly contain low-frequency components, as seen in the spectrograms given in Figure 9. Therefore, mostly they are affected by the components in the frequency ranges where re- flection is high, regarding the spectrum in the impulse response pattern. So, virtual environment sim- ulation (VE) is more affordable because the general character is on reflection. .Pressure [Pa] 2 Convolved Sound with IRR_ B83 & ES é 0s 1 15 2 25 Time [sec] °(a) (b)Convolved Sound with IRV. 0 05 1 15 2 25 Time [sec](a) Figure 9: Spectrograms: (a) Speech sound in the conference room and (b) Speech sound inthe virtual conference roomConv IRR S1R1 Speech.wav Conv IRV S1R1 Speech.wav Magr Frequency [kHz] Nn’ = x, oy c o Ss > 2 rm eae ae ii i i Hb i Time [s] (a) (b)i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOWFigure 10: Convolved speech sound with impulse response in the (a) real environment (IRR) and (b) virtual environment (IRV)Table 2. Metrics for Bird and Speech (F) stimuli. Loudness (N-sone), Sharpness (S-acum), Roughness (R-asper), Fluctuation Strength (F-vacil), Loudness exceeding 5% of the time (N5-sone), A-weighted sound pressure level (LeqA-dBA). RE: Real Environment, VE: Virtual Environment. B_S1R1: Bird sound signal in the S1 source and R1 receiver location. S_S1R1: Speech sound signal in the S1 source and R1 receiver location.B_S1R1 B_S1R2 B_S1R3 B_S1R4 B_S2R1 B_S2R2 B_S2R3 B_S2R4 RE VE RE VE RE VE RE VE RE VE RE VE RE VE RE VEN 12.51 11.89 12.20 11.86 12.20 12.23 11.84 12.29 10.96 8.27 13.34 14.82 9.32 9.28 9.18 9.79S 3.68 3.14 3.91 3.16 3.83 3.14 3.86 3.14 3.91 3.19 3.89 3.18 3.80 3.19 3.73 3.17R 0.06 0.08 0.08 0.08 0.05 0.08 0.07 0.08 0.06 0.05 0.07 0.08 0.06 0.03 0.06 0.08F 0.01 0.51 0.01 0.49 0.01 0.51 0.01 0.52 0.02 0.46 0.01 0.01 0.01 0.46 0.01 0.41Pressure [Pa] 15 4 Convolved Sound with IRR_ Time [sec] 10 12N5 20.39 20.35 21.13 20.04 20.18 20.91 20.76 21.19 19.50 14.16 22.30 25.32 15.44 15.90 15.20 16.75LeqA 63.72 63.39 64.37 63.21 63.55 63.89 64.43 64.14 63.21 58 65.25 67.03 59.49 60.18 59.12 60.75S_S1R1 S_S1R2 S_S1R3 S_S1R4 S_S2R1 S_S2R2 S_S2R3 S_S2R4 RE VE RE VE RE VE RE VE RE VE RE VE RE VE RE VEN 25.19 21.65 25.24 21.62 26.27 22.52 25.45 22.57 23.83 21.64 26.91 21.97 20.58 17.90 22.33 17.72S 1.04 1.10 1.10 1.10 1.08 1.10 1.08 1.10 1.07 1.10 1.11 1.09 0.99 1.13 0.92 1.13R 0.04 0.05 0.04 0.04 0.04 0.05 0.04 0.05 0.04 0.05 0.05 0.04 0.04 0.04 0.04 0.04F 0.22 0.22 0.22 0.21 0.22 0.21 0.22 0.22 0.22 0.22 0.23 0.21 0.22 0.20 0.21 0.21N5 45.24 40.90 42.34 40.04 45.05 42.02 45.24 43.22 40.38 41.06 47.78 43.44 37.27 34.81 35.42 34.38LeqA 69.06 63.44 67.95 63.40 68.90 64.18 67.78 64.22 67.22 63.46 69.88 64.02 65.31 61.87 66.59 62.04It is a fact that sound propagation changes according to the location of the source and receiver in the enclosed space, depending on its surface materials and frequency. As seen in Table 2, the fad- ing/sound attenuation similarity shows that the receiver-transmitter simulation is quite correct. For example, N5 [sone] is RE: 21.13; VE:20.04 in B_S1R2 while RE: 20.18; VE: 20.91 in B_S1R3. R1Pressure [Pa] Convolved Sound with IRV g 2 e eS 2 o s © 10 12 receiver and R2 receiver are the same distance as the S1 source. (B: Bird sound; S1: Source in the middle); N5 [sone] is RE: 22.30; VE:25.32 in B_S2R2 while RE: 15.44; VE: 15.90 in B_S2R3. R2 receiver is 4.60 m away from the S2 source, while the R3 receiver is 9.30 m away from the S2 source (B: Bird sound; S2: Source at the corner).RE and VE values show the same different loudness and LeqA (dB) values compared with other locations, but the loudness values of RE and VE are not the same even though they are in the exact same location. For example, in B_S2R2, N5 [sone] is 22.30 in RE while 25.32 in VE. This devia- tion would be the absorption values of the materials used in the virtual environment (e.g., for the ceiling, the alpha value is taken as 0.90 because of its given value for the absorptive suspending panel). They have not yet been transferred to the simulation with perfect match and accuracy.The higher frequency components of the Birds’ sound are reflected in the Sharpness, S [acum], values in Table 2 when compared to the Speech sound. For example, Sharpness is around 3.50 for Bird sounds while 1.10 for Speech sounds. The higher Fluctuation Strength, F [vacil], values in Speech sound show consistency with the situation regarding the temporal changes in loudness of the taken example (e.g., F is 0.22 in speech sound and 0.01 in bird sound). 4. CONCLUSIONIn this study, the validity of virtual environment tools to design audio-visual elements has been acquired initially. It has been shown that psychometric parameters, that will be optimized for the user’s benefits, can be used through sound manipulation. The results showed that sound simulation, based on the image source model (ISM), demonstrates validation of the virtual environment, and can be used for the characterization of virtual reality development environments.Since a limited number of psychoacoustic metrics were used in this preliminary study, we believe that a successful creation has been acquired at this phase regarding virtual reality in indoor acoustic environments. The developed virtual reality tool is not yet perfect and still being developed. Furniture in the room is not included in the simulation and is known to be an additional cause of deviation among the results of real and virtual environments.Even though it still lacks in some parts of the simulation program, the analysis results have shown that psychometric parameters are quite close to correct and can be manipulated, particularly and powerfully, in the room impulse response-based simulation environment.It has been proven that after audio-visual manipulation, we can imagine a system to create a vir- tual environment where we could overlap more than one sound object or get rid of the sounds that we do not prefer to hear, even when we’re exposed to them due to the environment surrounding us. We plan to perfect the simulation environment and initiate the subject evaluation study in our future work. 5. ACKNOWLEDGEMENTSThe Scientific and Technological Research Council of Turkey (TÜBİTAK) funded this research pro- ject (Research No: B191900072), and their contribution is gratefully acknowledged.i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOW 6. REFERENCES1. Zwicker E. & Fastl, H. Psychoacoustics: Facts and Models , 3rd Edition, Springer, 2006. 2. ISO 3382–1:2009 (ISO, 2009), ISO 3382-1, Acoustics—Measurement of Reverberation Time ofRooms with Reference to other Acoustical Parameters (ISO, Geneva, Switzerland). 3. ISO 532-1. Acoustics — Methods for calculating loudness — Part 1: Zwicker method. Interna-tional Organization of Standardization, 2017: Geneva, Switzerland 4. DIN 45631/A1. Calculation of loudness level and loudness from the sound spectrum – Zwickermethod – Amendment 1: Calculation of the loudness of time-variant sound, 2010: Beuth Verlag.i, orn inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS ? O? ? GLASGOW Previous Paper 371 of 808 Next