Welcome to the new IOA website! Please reset your password to access your account.

Manchester soundscape experiment online 2020: an overview Maria Luiza de Ulhôa Carvalho 1 School of Visual Arts of the Federal University of Goiás UFG - Av. Esperança, s/n - Campus Samambaia Goiânia, GO, 74690-900, Brazil PhD student of Acoustics Research Centre, University of Salford William J. Davies 2 Acoustics Research Centre, University of Salford Newton Building, The Crescent Salford M5 4WT, UK Bruno M. Fazenda 3 Acoustics Research Centre, University of Salford Newton Building, The Crescent Salford M5 4WT, UK

ABSTRACT This paper presents results from the Manchester Soundscape Experiment Online 2020. It consisted of an online virtual reality (VR) experiment, where participants rated 12 different scenarios with questions. The selected Manchester locations were Piccadilly Gardens, Market Street, Peel Park, and a bus stop. Each site was visited and recorded with a 360 camera and a sound field microphone on three crowd densities (empty, medium, and busy). Audio stimuli were converted from first-order ambisonic recordings to head-tracked binaural using Facebook 360 Spatial Workstation software. The questions included the eight soundscape attributes from ISO 12913, three emotional self-assess- ment manikins, and demographic information. A total of 63 nationalities composed the group of 155 participants. Results from Piccadilly Gardens and Market Street demonstrated that the uneventful to eventful scale significantly increased with the number of people in the scene. This tendency also hap- pened at Peel Park and Market Street for the arousal emotional state. Additionally, the chaotic to calm scale at Peel Park decreased, and the monotonous to vibrant scale at Piccadilly Gardens in- creased with the number of individuals in the scene. The other sites and conditions may have differ- ences in the scores of the semantic scales, but they were not as significant or did not follow the same tendency. Future research will include verifying these findings in laboratory conditions alongside measurements of brain activity via electroencephalogram.

1 luizaled@ufg.br

2 W.Davies@salford.ac.uk

3 B.M.Fazenda@salford.ac.uk

worm 2022

1. INTRODUCTION

Given the difficulties of face-to-face experiments during the pandemic, some researchers adapted their methodology to an online approach. We present a soundscape virtual reality (VR) experiment implemented online in 2020. A soundscape composes all surrounding sounds manifested in indoor or external environments within a context that may include social, personal, collective, climatic, functional, and ecological factors, among other elements. In specific, this paper studies different urban soundscapes with diverse quantities of people in the Greater Manchester, UK, through self- report assessments using the ISO soundscape attributes [1] and the Emotional Self-Assessment Manikins (SAM) [2].

When experts and lay urban agents participate in the design, better solutions can be achieved [3], becoming more efficient and gaining urban policy legitimacy. Additionally, virtual reality (VR) has proven ecological validity when representing urban scenes [4]. In specific, the virtual sonic environment can be reproduced in different formats from mono to spatial audio [5], changing the perception of how immersive and realistic it is. No dought, VR facilitates experiencing places by anyone, reducing time and expenses with travel which can aid urban design with simulations or real- life reproductions.

Within the initial steps of planning a city, the project brief integrates details of the physical aspects of the environment, historical elements of the urban fabric, road transport system, existent urban infrastructure, and other information, including urban noise. For the acoustic environment, the ISO 12913-2:2018 [1] standard introduces and describes the perceived soundscape quality by eight attributes: eventful, vibrant, pleasant, calm, uneventful, monotonous, annoying, and chaotic. These semantic scales, in theory, are traced back to the arousal-valence scales of Russell [6], who inspired Axelsson and colleagues [7] to develop the mentioned attributes. In part 3 of the ISO 12913-3:2019 [8], each attribute has an additional definition and some terms are correlated to different sound environments.

To better understand subjective responses to soundscapes, an investigation proceeded towards how the sonic environment triggered emotional states using the Emotional Self-Assessment Manikins (SAM) [2]. Valence, arousal, and dominance integrate the three scales of SAM which include graphic representations. Valence is a pleasant or likeable feeling that can be sonically represented by sounds of joyful laughter or birds singing. Arousal relates to how excited, alert, or agitated the person is and may relate to the sounds of a cheerful group. Dominance reports how in control you feel toward the experience. In soundscape studies, the feeling of dominance can be hard to correlate, but an example would be a quiet library that can impose you not to speak, reflecting in a dominating sonic environment. Finally, soundscape attributes and emotional states combine the subjective responses assessed in this paper that may indicate desirable soundscapes for possible urban sound design applications.

2. METHODOLOGY

Figure 1 illustrates the general pipeline for the methodology of this experiment.

2.1. Site selection and field recordings

After a survey to identify sites representing the vibrant, calm, monotonous, and chaotic sound- scape scales [9], each location was visited in three human agglomerations (empty, medium, and busy),

worm 2022

referred to as “crowd density”. The urban places were Piccadilly Gardens, Peel Park, a bus stop, and Market Street visited from January to December of 2019 on days with no precipitation forecast.

worm 2022

Figure 1: Pipeline of the experimental methodology.

Figure 2 illustrates a map of Greater Manchester, UK, with the studied locations (2a) and the locations plotted over the two-dimensional soundscape coordinates: eventfulness and pleasantness or calmness and vibrancy (2b). Participants were divided into two groups, to reduce the experimental time for the online experiment: one with the bus stop and Peel Park, and the other with Market Street and Piccadilly Gardens.

a) b)

Figure 2: Map a) of Greater Manchester, U.K. with Peel Park [1], the bus stop [2], Market Street [3], and Piccadilly Gardens [4] (Map adapted from Google Maps, 2022), and b) the locations plotted over the two-dimensional soundscape coordinates.

Field recordings were done with a Ricoh Theta S, 360° video camera, and a sound field micro- phone ST250 plugged into a ZOOM H6 Handy Recorder. Audio recordings were captured in B- format in four channels with W for omnidirectional, X for front-back, Y for left-right, and Z for up- down. A sound level meter, type BSWA 308, was used to register a one-minute sample of A-weighted equivalent continuous sound pressure level (L Aeq,60 ) to adjust sound levels from field to laboratory reproductions.

Once all equipment was in place, “Filming in Progress” signs were placed with the equipment for ethical reasons. Audio and camera were initiated, and the researcher clapped two to three times in front of the equipment for future audiovisual alignment. Recordings lasted from 10 to 12 minutes. Meanwhile, researchers blended in with other people on-site or hid behind surrounding obstacles.

2.2. Audio-visual processing before FB360 Spatial Workstation

Initially, the time of the clap in each recording was identified. The audio and video were then synchronized in time using the free version of Lightworks x64 (14.5.0.0 version) software. Then, recordings were analysed in sections of 30 seconds for foreground and background sounds represent- ing local characters [10] and soundmarks [11] of each location. Stimuli duration followed previous work [12] with the length of 30 seconds. With the extracted samples, audio files were calibrated to the field sound levels using a High-frequency Head and Torso Simulator (HATS) with a PULSE software, both from Brüel & Kjær.

Additionally, the clap helped obtain the azimuth angle of the X signal (front-back) to align the 360° video with the ambisonic audio using the FB360 Spatial Workstation. Derived from spherical harmonics, each of the four audio signals can be understood through the simplified components pre- sented in the Equations 1 to 4 [13]:

𝑊= 𝑆

(1) (2) (3) (4)

ξ2

𝑋= 𝑆∙𝑐𝑜𝑠𝜃∙𝑐𝑜𝑠𝜙

𝑌= 𝑆∙𝑠𝑖𝑛𝜃∙𝑐𝑜𝑠𝜙

𝑍= 𝑆∙𝑠𝑖𝑛𝜃

where the S is the source signal with the azimuth angle θ , and elevation angle Φ . Here, Equation 2 determined the azimuth angle θ .

2.2. Audio-visual processing using FB360 Spatial Workstation

The Facebook 360 Spatial Workstation is a free software pack designed for spatial audio integra- tion to 360° video and virtual reality (VR) applications (https://facebookincubator.github.io/face- book-360-spatial-workstation/ ). The pack was used to convert the spatial audio from FuMa (WXYZ) to ambiX (WYXZ) (Audio 360 Encoder), rotate and align the audio with the video (Spatialiser), visualize the video (Audio 360 Video Player) and encode the audio with the video (Audio 360 En- coder).

First, the spatial audio was converted from FuMa (WXYZ) to ambiX (WYXZ) using the Audio 360 Encoder. The ambiX format is the abbreviation for Ambi sonics e x changeable format that has recently increased use among spatial audio users for its advantages in manipulating higher Ambisonic order audio and rendering longer files [14].

Since the Audio 360 Video Player and Spatializer are plug-ins, they were installed inside a digital audio workstation (DAW). The azimuth angle previously calculated was inserted in the Source Yaw window of the Spatializer while the video was played back simultaneously in the Audio 360 Video

worm 2022

Player. Once ready, the audio file was rendered by the DAW. Next, the Audio 360 Encoder encoded the audio and video files to a single file consisting of the monoscopic video and a head-tracked bin- aural audio for upload to the YouTube platform. These rendered 3D audio videos and other video publications are accessible through the link https://bit.ly/3s0WaL3 .

With the uploaded videos, a web-based questionnaire was built with JotForm, an online form com- pany. The form began with a consent form. Then, general questions were asked like demographic information (gender, age, nationality, and residency), auditory health (evidence of hearing loss and tinnitus), and digital settings (what audio and video system they used during the experiment). The experimental questions were responded to after watching each video and composed of (1) What is the dominant sound source you just heard in the video? (2) Please, slide to the word that best describes the sounds you just heard. To the left (-10) is negative and to the right (10) is positive. For the paired words of Monotonous/Vibrant, Unpleasant/Pleasant, Chaotic/Calm, and Uneventful/Eventful , and (3) Please, slide to the figure that best describes how you feel regarding the sounds you just heard . For the SAM figures that represent Arousal (Calm/Excited), Valence (Unhappy/Happy), and Domi- nance (Controlled/Controlling). Even though part 2 of the soundscape ISO 12913-2:2018 [1] deter- mines that the opposite of pleasant is annoying, we preferred to use the term “unpleasant” to establish a more contrasting lexical taxonomy given that words were in pairs.

3. RESULTS

The 155 participants came from 63 countries, with 75 individuals in Group 1 (Peel Park and the bus stop) and 80 in Group 2 (Piccadilly Gardens and Market Street). Group 1 consisted of 49% women, 48% males, and 3% who preferred not to say or non-binary aged 21 to 68 years old (38±12). Group 2 included 51% women, 45% males, and 4% who preferred not to say or non-binary with an age range from 21 to 67 (38±11). Significant group differences were tested with the help of the sta- tistical package IBM SPSS Statistics 27 ®.

3.1. Crowd density

To test the differences among the three conditions of crowd density (empty, medium, and busy) at each location, Friedman’s non-parametric test was used with pairwise comparisons. From the seven semantic ratings (four soundscape attributes and three SAM emotional states) of the 12 tested condi- tions (four locations in three crowd densities), only six results changed significantly and gradually with the increase of people. They were as follows: chaotic/calm and arousal at Peel Park; unevent- ful/eventful and monotonous/vibrant at Piccadilly Gardens; and uneventful/eventful and arousal at Market Street. For the soundscape attributes, the paired taxonomy ranged from negative ten, -10, for chaotic/uneventful/monotonous to positive ten, 10, for calm/eventful/vibrant, respectively. Consider- ing the SAM scales, the emotional state of arousal was represented by manikins with a calming aspect rating as negative ten, -10, to an excited/alert appearance scored as positive ten, 10. Table 1 lists the statistical results of the semantic scales that had significant differences with the increase in crowd density, and Figures 3 to 5 illustrate the boxplots with the median for each of these semantic scales.

At Peel Park, the increased number of people significantly diminishes the perception of calmness towards a perception of chaos (Figure 3a). This concept can be reasonably understandable given that urban parks represent an ecosystem with predominant natural elements such as green areas, different types of birds, and small animals. From a perspective of sustainability, these locations need a reduced presence of people to preserve natural urban areas for the local fauna and flora, and to provide possi- ble places for human restoration [15]. In addition, when Peel Park rates as a calm place, this result corroborates with the initial site selection of Peel Park as a calm soundscape indicated in Figure 2b.

worm 2022

Also present in the aroused emotional state, a sense of calmness appeared in the absence of people for the empty and medium conditions at Peel Park (Figure 3b). Likewise, a change in this emotional state was significant with the increase of people on Market Street. However, the medians varied from a more neutral score in the empty condition to an excited and alert state in the busy condition (Figure 5b). Table 1: Friedman's test X 2 (2) results of the semantic scales that had significant differences with the increase of crowd density. *Significance thresholds were adjusted by the Bonferroni correction.

Overall p- values* for

p-values* for Pairwise Comparisons

Location Semantic

Friedman's

Empty vs

Empty vs

Medium

test X 2 (2)

scale

group differences

medium

busy

vs busy

Chaotic to Calm 58.161 <0.001 0.000021 4.4298E-

13 0.011

Peel Park

Arousal 54.175 <0.001 0.001 2.0137E-

12 0.002

Uneventful to eventful 80.129 0.00E+00 0.000024 0.00E+00 0.000118

Picca- dilly Gardens

Monotonous to Vibrant 61.517 <0.001 0.012 1.3056E-

13 0.000009

Uneventful to eventful 86.457 0.00E+00 2.688E-09 0.00E+00 0.043

Market Street

Arousal 35.806 <0.001 0.027 7.4895E-

08 0.009

The increase in the eventful ratings with the number of people at Piccadilly Gardens (Figure 4a) and Market Street (Figure 5a) corroborates with the description of eventful places in part 3 of the soundscape ISO 12913-3 [8]. Furthermore, the shift in the perceived soundscape quality from mo- notonous to vibrancy at Piccadilly Gardens (Figure 4b) when more individuals are in the scene, con- firms the initial site selection representing a vibrant scenario in Manchester (Figure 2b). However, Piccadilly Gardens was considered a vibrant soundscape only when in busy condition.

Meanwhile, the other results for the semantic scales had some significant pairwise differences among two groups but not between the three crowd densities. Different behaviour is interesting to point out at the monotonous (-10) to vibrant (10) scale at the bus stop and the dominance (controlled at -10 to controlling at 10) scale at Market Street. Reported medians across crowd densities for these semantic terms were similar: -2 and -1, respectively. The bus stop outcome was monotonous inde- pendently of the number of individuals in the scene, corroborating the previous selection of the site as a monotonous place (Figure 2b). From another perspective, the dominance scale consistency inde- pendently of the number of individuals at Market Street still needs further investigation if the indif- ference of the rating demonstrated a lack of understanding of the term or the absence of a sense of safety due to the proximity to people.

worm 2022

worm 2022

S q 3 => 4 2° -l 2 Is g : ° 5 -10 ° Empty Medium Busy Crowd densit

3b)

3a)

Arousal: Calm (-10) to excited 7 Empty ‘Medium Crowd Densities Busy

Medi ‘Crowd Desens 2728 (01) wg OL) MWD, +10

Figure 3: Boxplots with medians of a) Chaotic (-10) to Calm (10) scales, and b) Arousal scale from calm (-10) to excited/alert (10) at Peel Park.

4a)

4b)

‘Monotonous (-10) to Vibrant (CO) scales Medi ‘Crowd Desens Bay

Medi ‘Crowd Desens

Figure 4: Boxplots with medians of a) Uneventful (-10) to Eventful (10) scales, and b) Monotonous (-10) to Vibrant (10) scales at Piccadilly Gardens.

Medi ‘Crowd Desens

5a)

5b)

Figure 5: Boxplots with medians of a) Uneventful (-10) to Eventful (10) scales, and b) Arousal scale from calm (-10) to excited/alert (10) at Market Street. 4. CONCLUSIONS

Running an online soundscape experiment during the pandemic had many challenges, going from digitally implementing with success the stimuli online to giving enough assistance to participants so they could complete it via the web. Benefits can be that the recruitment is not limited to local indi- viduals, there can be a higher number of participants, and experimental completion can be flexible to the schedule of participants. Some downsides are the lack of controlled laboratory conditions such as

constant audio and video reproduction system for all participants, efficient acoustic laboratory set- tings, and consistent contact between researcher and participant to facilitate the experiment. Never- theless, we managed to accomplish it and investigate if the different soundscape locations at different crowd densities changed people's subjective responses using 3D audiovisual stimuli online.

The main findings were that as crowd density increased, so did the eventful responses at Piccadilly Gardens and Market Street, corroborating the ISO 12913-3 [8] description of eventful places. Also, the number of people in the scene increased the ratings on the arousal scale at Peel Park and Market Street and the monotonous/vibrant scale at Piccadilly Gardens but decreased the ratings on the cha- otic/calm scale at Peel Park. Additionally, our initial survey to identify the soundscape locations rep- resenting the monotonous, calm, and vibrant attributes confirmed by the current results for the bus stop in all crowd densities, Peel Park in empty and medium conditions, and Piccadilly Garden in the busy condition. Other group comparisons as differences among nationalities and hearing health were identified, suggesting further investigations of these differences. Future research will include verify- ing these findings in laboratory conditions alongside measurements of brain activity via electroen- cephalogram. 5. ACKNOWLEDGEMENTS

The authors are grateful for Maria Luiza de Ulhôa Carvalho’s sponsors (the Federal University of Goiás, and Coordination for the Improvement of Higher Education Personnel, CAPES, Brazil) and support from the Acoustics Research Centre of the University of Salford, UK.

6. REFERENCES

1. ISO 12913-2. PD ISO/TS 12913-2:2018 - Acoustics - Soundscape - Part 2: Data collection and

reporting requirements. Vol. 44, Issue 0, pp. 1–37, 2018. 2. Bradley, M. M., & Lang, P. J. Measuring emotion: the self-assessment manikin and the semantic

differential. Journal of behavior therapy and experimental psychiatry , 25(1) , 49-59 (1994). 3. Berger, M., & Bill, R. Combining VR visualization and sonification for immersive exploration of

urban noise standards. Multimodal technologies and interaction , 3(2) , 34 (2019). 4. Maffei, L., Masullo, M., Pascale, A., Ruggiero, G., & Romero, V. P. Immersive virtual reality in

community planning: Acoustic and visual congruence of simulated vs real world. Sustainable Cities and Society , 27 , 338-345 (2016). 5. Rumsey, Francis. Spatial audio . Routledge, 2012. 6. Russell, J. A. A circumplex model of affect. Journal of personality and social psychology , 39(6) ,

1161 (1980). 7. Axelsson, Ö., Nilsson, M. E., & Berglund, B. A principal components model of soundscape per-

ception. The Journal of the Acoustical Society of America , 128(5) , 2836-2846 (2010). 8. ISO 12913-3. PD ISO/TS 12913-3:2019 - Acoustics - Soundscape - Part 3: Data analysis. pp. 1–

30, 2019. 9. Carvalho, M. L., Davies, W. J., & Fazenda, B. Investigation of emotional states in different urban

soundscapes through laboratory reproductions of 3D audiovisual samples. Proceedings of 14th INTERNATIONAL POSTGRADUATE RESEARCH CONFERENCE 2019: Contemporary and Future Directions in the Built Environment , pp. 326-338. Salford, U.K., December 2019. 10. Coelho, J. B. Approaches to urban soundscape management, planning, and design. Soundscape

and the built environment , pp. 197-214 (2016). 11. Schafer, R. M. The soundscape: Our sonic environment and the tuning of the world . Simon and

Schuster, 1993.

worm 2022

12. Berglund, B., & Nilsson, M. E. On a tool for measuring soundscape quality in urban residential

areas. Acta acustica united with acustica , 92(6) , 938-944 (2006). 13. Hong, J. Y., He, J., Lam, B., Gupta, R., & Gan, W. S. Spatial audio for soundscape design: Re-

cording and reproduction. Applied sciences , 7(6) , 627 (2017). 14. Nachbar, C., Zotter, F., Deleflie, E., & Sontacchi, A. AmbiX - A suggested ambisonics format.

Proceedings of Ambisonics Symposium , pp. 1-11. Lexington, U.S., June 2011. 15. Payne, S. R. The production of a perceived restorativeness soundscape scale. Applied acous-

tics , 74(2) , 255-263 (2013).

worm 2022