Welcome to the new IOA website! Please reset your password to access your account.

Effect of spatial masking release on perception of vehicle's approaching sound in headphone music listening Reika TAKAKURA 1 Graduate School of Music, Kunitachi College of Music, Japan 5-5-1, Kashiwa-cho, Tachikawa-city, Tokyo,190-8520 Masanobu MIURA 2 Kunitachi College of Music, Japan 5-5-1, Kashiwa-cho, Tachikawa-city, Tokyo,190-8520

ABSTRACT Since music listening with headphones or earphones disturbs auditory information necessary for safety, pedestrians, for example, have a high risk of causing a severe accident. To improve such situations, the authors aim to actualize a listening environment where pedestrians enjoy listening to music with headphones by perceiving their ambient sounds. As a process, the authors investigate the effect of the music localization played through headphones on the perceptibility of the vehicle's approaching sound. The approaching sound of several types of cars is presented to listeners, who are asked to answer when the cars arrive at their places. The timing when the car reaches is measured. The results showed that when a car approached from the left rear at a speed of 30 km/h and passed by the left side and the music played from headphones was localized at the vertex, the distance to the approaching sound was 5.5 meter to 10.4 meter closer than when the music was localized at vertex compared to the original. When a motorcycle approached from the right rear at 30 km/h and passed by the right side, the subjects noticed the approaching sound from 6.7 meter to 20.1 meter earlier. 1. INTRODUCTION

When pedestrians listen to music with headphones outdoors, the music can cause serious accidents by obstructing surrounding sounds, such as approaching cars and motorcycles. It is thought that the hearing masking by the music makes it difficult to perceive these approaching sounds that would usually be audible if the music were not being listened to with headphones. In order to avoid such a situation, the sound pressure level of music should be lowered to obtain auditory information about the surroundings. However, people tend to listen to music at 70 dB or higher in actual noisy environments [1], which is problematic to recognize environmental sounds. Secondly, although earphones that take in external sound and bone conduction earphones have become popular in recent years, problems such as noise generated in the mechanism that takes in external sound are expected to be a problem. Furthermore, bone-conduction earphones do not use conventional air-conduction sound, which causes discomfort to the user. Therefore, the ultimate goal of this research is to realize a new listening environment in which pedestrians can perceive surrounding sounds while listening to

1 takakura.reika.ox9@st.kunitachi.ac.jp 2 miura.masanobu@kunitachi.ac.jp

music using conventional headphones. One of the clues to this goal is the phenomenon known as the spatial release from masking (SRM) [2]. The occasion of SRM in a natural environment provides the best positional relationship between target sound and noise [3]. According to the literature, an example of the best position of music to approaching vehicle from rear is interpreted as near the vertex so that it may avoid danger situation. Therefore, in this report, we investigated the effect of music localization on the perceptibility of vehicle approach sounds using the approach sounds of a car and a motorcycle.

2. INVESTIGATION OF THE OPTIMAL HEAD TRANSFER FUNCTION FOR INDIVIDUAL

2.1. Aims

According to a survey on music localization, it is necessary to provide each individual with an optimal head-related transfer function (HRTF) in consideration of individual characteristics in order to optimize the localization perception [4]. Furthermore, by using the HRTF of each individual, it is possible to propose a safe listening environment tailored to the individual. Therefore, the purpose of this report is to determine the HRTF closest to each listener from multiple HRTFs using DOMISO (see 2.2), which has been proposed as a method that can obtain high localization accuracy in a short time with little burden on the listener.

2.2. Overview of DOMISO and determination of HRTFs

DOMISO is a method to create a virtual sound image using multiple HRTFs and to select the sound image with the best localization in terms of auditory perception from them by a tournament method [4]. The HRTFs used in this report is the data of 105 people measured at the Research Institute of Electronics and Communication, Tohoku University [5]. The DOMISO procedure used in this report is as follows. (1) Randomly select 32 HRTFs from the 105 HRTFs. (2) Using the 32 HRTFs, create 32 sound images moving discretely in the horizontal plane (0°, 90°, 180°, 270°) and at an elevation angle of 90°, which is a specialized trajectory in this report. The trajectory of the sound images used is shown in Figure 1. Pink noise (2s/16-bit/48 kHz sampling) is used to synthesize the HRTF. (3) Ten students (S1-S10) of the university are taught the trajectory of the sound image to be presented in advance. (4) Two virtual sound images synthesized from 32 HRTFs are presented to each listener, and the listener is asked to select the one with the better localization of the sound image trajectory in (3). (5) This selection is performed in a tournament fashion, and (4) is repeated until the sound image with the best localization sensation wins. (The environment in which the winning image was presented was judged to be the HRTF that was the closest to the listener.

Fig.1: Trajectories of sound images used in the localization experiment. ( The sound images are presented in the order of 1st to 5th, with the front of the listener at 0°.)

3. SURVEY ON THE PERCEPTION OF AUTOMOBILE APPROACHING SOUND WHILE LISTENING TO MUSIC

3.1. Aims

The purpose of this study is to investigate the perceptibility of the sound of approaching automobiles by the localization of music.

3.2. Overview

We investigated the perceptibility of the sound of an approaching car, when listening to music from headphone. Concerning the localization of the music, the music is localized by the HRTF, and presented from each of five different positions: 0°, 90°, 180°, 270° and at the vertex of the head, and simultaneously presenting the sound of an approaching car passing from behind on the left (90° horizontal plane) and to the left side.

3.3. Approaching sound of a car

Three types of vehicles were used in this study: one hybrid vehicle, one gasoline engine vehicle, and one diesel engine vehicle. Table 1 shows the vehicle types used. The automobiles were driven at 30 km/h, and the vehicle’s approaching sounds were recorded with a sampling frequency of 44.1 kHz and a quantization bit rate of 24 bits using a SAMREC dummy head microphone recorder Type 2700Series manufactured by Southern Acoustics.

Table 1 List of vehicles used ID Car name Engine type C1 TOYOTA YARiS Hybrid C2 TOYOTA COROLLA FIELDER Gasoline C3 TOYOTA HIACE Diesel

3.4. Experimental stimuli

The music used in the experiment was a part of the chorus (15 sec length) of two different J-Pop songs. The participants' HRTFs obtained in 2. were used to create a composite sound from original stereo signal, and the obtained monoaural signal is convolved to the stereo HRTFs at 0°, 90°, 180°, and 270° in the horizontal plane. Two music are used as original signal. The original sound source was also included in the experimental stimuli in order to compare the music localized at five different positions with the non-localized sound source. The total number of stimuli used in the experiment was 6 musical pieces (5 sound images + 1 original sound source) × 2 musical pieces × 3 sounds of approaching cars = 36 stimuli in total.

3.5. Experimental Method

Subjects: Five students (S1-S5) from the same university as in 2.2 were asked to listen to the stimuli played on a PC while wearing headphones in a sitting position.

Stimulus playback method: The experimental stimuli were played using Audacity and displayed on the left side of the PC screen. In addition, since it was unlikely that the participants concentrated on the sound of an approaching car while listening to music under walking on the road, a walking video while walking on the sidewalk was displayed on the right side of the PC screen to distract their attention, and a simple question was asked to count the number of cars and bicycles passing by in the walking image. In order to distract the participants' attention, a walking video walking on the sidewalk was displayed on the right side of the screen of the PC. The participants were asked to respond to the questions in the walking video in their minds. The auditory stimuli and the walking video were simultaneously played back, and when the sound of an approaching car was heard, the experimenter was instructed to stop time by pressing the P key, and the time displayed on the screen was recorded by the experimenter.

Stimulus playback level: The sound pressure level of the music was adjusted by loudness matching so that the music heard through the headphones and the pure tone of 1kHz heard through the

loudspeakers were equal in loudness. The level of pure tone of 1kHz was set at L A =75(dB). For the approaching sound of a car, the sound pressure level was lowered by 6 dB from the peak sound pressure level of the approaching sound at the time of recording, and the peak of the approaching sound played back from the headphones was adjusted by loudness matching in the same way as described above so that the sound pressure level was equal to the peak sound pressure level of the approaching sound played back from the headphones. The sound pressure levels were L A =47(dB) for C1, L A =55(dB) for C2, and L A =64(dB) for C3.

3.6. Results

3.6.1. Method of obtaining traveling position

Using the video recorded during the recording of the sound of an approaching car, the distance corresponding to the time when the subject noticed the sound of an approaching car was calculated. Since the sound source recorded by the dummy head and the recording time of the video recording were not synchronized, it was necessary to synchronize them. First, in order to align the start time of the sound source in the dummy head with the running sound obtained from the video recording, we extracted the sound source from the recorded video and synchronized the start time of the sound source with the dummy head sound source by using cross-correlation between them. Then, using cones set up in 10-meter increments as landmarks on the running video, the time at which the vehicle passes the 0 to 60-meter point is recorded. A curve representing the position of the traveling vehicle was then obtained by spline interpolation of the distance on the vertical axis and the passing time of the car and motorcycle on the horizontal axis. From the curve, the corresponding distance was calculated based on the time when the participants noticed the approaching sound of the car.

Original 0° 90° 180° 270° Condition of location for music Vertex

3.6.2. Comparison of means and standard deviations for each localization position for all experimental participants

The comparison consisted of two factors: vehicle and music. In order to compare the position of the vehicle in these two factors, the mean and standard deviation for each localization position of all participants were calculated for five conditions: three conditions (C1, C2, C3) for each car (Car) used and two conditions (M1, M2) for each music piece (Music). The results are shown in Figures 2 through 6. It can be seen that in all conditions, the vertex of the car that noticed the approaching sound was shorter than that of the Original. Other localization positions also have shorter driving positions than the Original, so it can be said that changing the localization position of the music makes it easier to notice the approaching sound of a car.

0° 90° «180° -270°Vertex Condition of location for music

Fig.2 Mean and standard deviation for all participants in C1 by localization position

Fig.3 Mean and standard deviation for all participants in C2 by localization position

Original 0° 90° 180° 270° Vertex Condition of location for music

Fig.4: Mean and standard deviation for all participants in C3 by localization position

Fig.5: Mean and standard deviation for all participants in M1 by localization position

Original 0° 90° 180° 270° Vertex Condition of location for music

Fig.6: Mean and standard deviation for all participants in M2 by localization position

3.6.3. Results of Statistical Analysis

In order to investigate whether there is a statistically significant difference in the distance at which people become aware of the sound of an approaching car depending on the localization of the music, a Kruskal-Wallis test was conducted for each of the five conditions described above. The results are shown in Table 2, which confirms that there is a significant difference between music localization in the C2, C3, and M2 conditions. In contrast, no significant differences were found in the C1 and M1 conditions. Next, we used a multiple comparison test (Tukey test) to determine which localization position among the conditions in which significant differences were found. The results are shown in Table 3. For vertex, a significant difference was found in the M2 condition.

Table 2 Results of Kruskal-Wallis test

Conditions C1 C2 C3 M1 M2

p 0.058 0.009** 0.023** 0.160 0.004*

*: p <.05 , **: p <.01 Table 3 Results of Tukey test

C2 C3 M2

0° 0.043* 0.357 0.045*

90° 0.046* 0.016* 0.006**

180° 0.288 0.212 0.045*

270° 0.217 0.022* 0.001**

Vertex 0.376 0.176 0.01* *: p <.05 , **: p <.01 4. SURVEY ON PERCEPTION OF MOTORCYCLE APPROACHING SOUND WHILE LISTENING TO MUSIC

4.1. Aims

The purpose of this study is to investigate the perceptibility of motorcycle approaching sound at the localization of music.

4.2. Overview

We investigated the perceptibility of the sound of an approaching motorcycle, when listening to music from headphone. Concerning the localization of the music, the music is localized by the HRTF, and presented from each of five different positions: 0°, 90°, 180°, 270° and at the vertex of the head, and simultaneously presenting the sound of an approaching motorcycle passing from behind on the right (270° horizontal plane) and to the right side.

4.3. Approaching sounds of cars and motorcycles used

The target motorcycles used in this study were two models. Table 4 shows the motorcycle models used. The motorcycles were driven at 30 km/h and the approaching sound was recorded with a

(woouerstqy 0° 90° 180° 270° Vertex Condition of location for music Original

sampling frequency of 44.1 kHz and a quantization bit rate of 24 bits using a SAMREC dummy head microphone recorder Type 2700Series manufactured by Southern Acoustics.

Table 4 List of motorcycles used ID Motorcycle name Engine type Displacement B1 DIO Single cylinder OHC 49cc B2 MAGUNA250 V-twin engine DOHC 249cc

4.4. Experimental stimuli

The music used in the experiment was a part of the chorus (15 sec length) of two different J-Pop songs. The participants' HRTFs obtained in 2. were used to create a composite sound from original stereo signal, and the obtained monoaural signal is convolved to the stereo HRTFs at 0°, 90°, 180°, and 270° in the horizontal plane. Two music are used as original signal. The original sound source was also included in the experimental stimuli in order to compare the music localized at five different positions with the non-localized sound source. The total number of stimuli used in the experiment was 6 musical pieces (5 sound images + 1 original sound source) × 2 musical pieces × 2 sounds of approaching cars = 24 stimuli in total.

4.5. Experimental method

Stimulus playback level: The sound pressure level of the music was adjusted by loudness matching so that the music heard through the headphones and the pure tone of 1kHz heard through the loudspeakers were equal in loudness. The level of pure tone of 1kHz was set at L A =75(dB). For the approaching sound of a motorcycle, the sound pressure level was lowered by 6 dB from the peak sound pressure level of the approaching sound at the time of recording, and the peak of the approaching sound played back from the headphones was adjusted by loudness matching in the same way as described above so that the sound pressure level was equal to the peak sound pressure level of the approaching sound played back from the headphones. The sound pressure levels were L A =55(dB) for B1 and L A =60(dB) for B2.

4.6. Results

4.6.1. Method of obtaining traveling position

The same method as described in 3.6.1 was used to calculate the travelling position of the corresponding motorcycle based on the time when the approaching sound of the motorcycle was noticed.

4.6.2. Comparison of means and standard deviations per localization position for all experimental participants

The comparison consisted of two factors: motorcycle and music. In order to compare the position of the vehicle in the localization position of the music for these factors, comparisons were made in four conditions: two conditions (B1, B2) for each motorcycle (Bi ke) used and two conditions (M1, M2) for each music (Mu sic). First, the mean values and standard deviations for each localization position were obtained from the responses of four of the five participants (S5-S9), excluding the participant who could not recognize the approaching sound of the motorcycle. The results are shown in Figures 7 to 10. In all conditions, it can be seen that the vertex of the car that noticed the approaching sound was shorter than that of the Original. The other localization positions were also shorter than the original, indicating that changing the localization position of the music makes it easier to notice the approaching sound of a motorcycle.

Fig.7 Mean and standard deviation for all participants in B1 by localization position

Fig.8 Mean and standard deviation for all participants in B2 by localization position

EDtaaal

Fig.10 Mean and standard deviation for all participants in M2 by localization position

Fig.9 Mean and standard deviation for all participants in M1 by localization position

4.6.3. Results of Statistical Analysis

In order to investigate whether there were statistically significant differences in music localization, multiple comparison tests (Kruskal-Wallis test) were conducted for each of the four conditions described above. The results are shown in Table 5, where significant differences were found among the music localizations in the B2, M1, and M2 conditions, but not in the B1 condition. Next, we used a multiple comparison test (Tukey test) to determine which localization position among the conditions in which significant differences were found. The results are shown in Table 6. No significant difference in vertex was observed in any of the comparisons with the original.

Original 0° 90° 180° 270° Vertex Condition of location for music

Table 5 Results of the Kruskal-Wallis test

Conditions B1 B2 M1 M2

p 0.074 0.006** 0.038* 0.013*

*: p <.05 , **: p <.01 Table 6 Results of Tukey test

B2 M1 M2

0° 0.116 0.831 0.244

90° 0.012* 0.309 0.101

180° 0.015* 0.361 0.151

Condition of location for music

270° 0.002** 0.208 0.059

Vertex 0.059 0.477 0.24 *: p <.05 , **: p <.01 5. DISCUSSION

5.1. Good location of music and auditory information necessary for safety

In the experiment using the approaching sound of a car, it was confirmed from Figures 2 to 6 that the subjects noticed the approaching sound from 5.5 meter to 10.4 meter, or from 0.7 sec to 1.2 sec, earlier when the music was localized to the vertex compared to the Original. In the experiment using

Original 0° 90° 180° 270° Vertex Condition of location for music.

the approaching sound of a motorcycle, Figures 7 to 10 show that the subjects noticed the approaching sound from 6.7 meter to 20.1 meter, or from 0.8 sec to 2.4 sec, earlier when the music was localized at the vertex compared to that of the original. In addition, as a result of the Tukey method, which investigated differences in localization position, no significant differences in vertex were found in Table 8, which shows the results of the experiment using the approaching sound of a motorcycle. Table 6, which shows the results of the experiment using the approaching sound of a car, however, confirms that there is a significant difference in vertex in the M2 condition. Therefore, it is possible that the approaching sound may be more easily noticed by localizing the music to the vertex, although the difference may depend on the music and the approaching sound.

5.2. Car Comparison

Table 5 shows that C2 and C3 showed significant differences under the car condition, whereas C1 showed no significant difference among the six localizations of music. This may be due to the quietness of C1. However, Table 5 shows that the p -value for C1 is 0.058, confirming that there is a trend. In other words, although there were differences depending on the characteristics of the music piece, it was thought that by changing the localization position of the music, the approaching sound could be noticed even in a quiet car while listening to music with headphones.

5.3. Motorcycle Comparison

Table 7 shows that in the motorcycle condition, B1 showed no significant difference, whereas B2 showed a significant difference. The reason for this was considered to be that B2 is more easily perceived than B1 due to its larger displacement and louder moving engine noise.

5.4. Music Comparison

In the experiment using the sound of an approaching car, Table 5 shows that M2 showed a significant difference, while M1 showed no significant difference among the localizations of music. The distance between M1 and M2 was lower than that between M2 and the other localization points. It was thought that the distance was low in the original and that no difference occurred between the original and the other localization positions. In the experiment using the sound of an approaching motorcycle, Table 8 shows that there was no significant difference in the distance comparison with the Original. The reasons for this were considered to be the difference in the fact that the music pieces used in this report were both pop tunes, and there was no difference in the composition of the instruments. 6. CONCLUSIONS

The purpose of this report is to investigate the influence of the localization of music played from headphones on the ease of noticing the sound of an approaching car or motorcycle. The results were then used to calculate the distance from the time when the subjects became aware of the sound of an approaching car or motorcycle, and the differences were examined. The results showed that when a car approached from the left rear at a speed of 30 km/h and passed by the left side (90° horizontal plane) and the music played from headphones was localized at vertex, the distance to the approaching sound was 5.5 meter to 10.4 meter, or about 0.7 sec to 1.2 sec, when the music was localized at vertex compared to the original. When a motorcycle approached from the right rear at 30 km/h and passed by the right side (270° horizontal plane), the subjects noticed the approaching sound from 6.7 meter to 20.1 meter, or 0.8 sec to 2.4 sec earlier. In the future, we would like to conduct a test with multiple genres of music and to introduce a dispersed localization of sound image rather than single point localization. 7. ACKNOWLEDGEMENTS

We would like to express our deepest gratitude to the students of the Kunitachi College of Music Japan and all those involved for their cooperation in the experiments described in this report.

8. REFERENCES

1. Hamamura. M., Iwamiya. S., “Survey on the use of portable audio devices by university

students” , Acoustical Society of Japan.69(7) , pp.331-339(2013, in Japanese ) 2. K. Saberi, L. Dostal, T. Sabralobai, V. Bull and D. R. Perrot, ”Free-field release from masking” ,

J.Acoust.Soc.Am.90(3) , pp.1355-1370(1991). 3. Nakanishi. J., Unoki. M., and Akagi, M., “Effect of ITD and component frequencies on perception

of alarm signals in noisy environments”, Journal of Signal Processing.10(4), pp231-234 (2006). 4. Saito. K., Iwaya. Y., Suzuki. Y., “Sound Localization with individualized HRTF selected by

tournament matches” , Proc. of the 4th Forum on Information Technology , pp.381-383(2005, in Japanese) 5. Summary of HRTF measurements taken at Research Institute of Electrical Communication,

Tohoku University “https://www.ais.riec.tohoku.ac.jp/lab/db-hrtf/index.html” (referred on 2021.10)