Institute of Acoustics: Paper Detail

Volume : 45

Part : 1

Proceedings of the Institute of Acoustics

Exploring the impact of erroneous labels on synthetic aperture sonar classifier performance

I. D. Gerg, Penn State Applied Research Laboratory, State College, Pennsylvania, USA
B. E. Cowen, Penn State Applied Research Laboratory, State College, Pennsylvania, USA
Corresponding author email: isaac.gerg@psu.edu

1 INTRODUCTION

Synthetic aperture sonar (SAS) is a standard imaging modality used to generate high-resolution imagery of the seafloor. Often, this imagery is used by analysts and automated systems to locate objects of interest. Deep learning is the state-of-the-art approach to automated SAS image analysis, but it requires abundant labeled data to yield acceptable performance. Acquiring ground-truth SAS image labels is expensive as it requires the coordination of divers and surface ships. Instead, labels are typically created by topside analysts manually examining the SAS imagery sometimes requiring the consultation of additional collated modalities to resolve object ambiguity¹. Overall, this process can result in bad labels making it into the training dataset of a deep learning algorithm. This work aims to determine the effects of training with erroneous labels on SAS classifier performance.

Motivation. To gain insight into the robustness and reliability of deep learning SAS classifiers when faced with corrupted labels during the training phase, a common real-world scenario.

Contributions. We conduct extensive experiments using multiple deep learning models, including state-of-the-art convolutional neural networks (CNNs) and a vision transformer (ViT), where we artificially corrupt the labels of an existing real-world SAS dataset to measure the resilience of each method to label errors. We evaluate two model parameter initialization schemes: ImageNet pre-training (training a neural network on the ImageNet dataset²before using it for this task) and no pre-training (model parameters are initialized randomly). To ensure a robust assessment, we train each model ten times to account for the variability in the results due to the randomness of selecting the training data of the train/validation split and the random initialization of model parameters (used in the no pre-training scenario). Following the training process, the models are evaluated on SAS data containing no label errors allowing us to assess their generalization capabilities and resilience to label noise. On a real world SAS dataset, we show that (1) ImageNet pre-training increases model resilience to erroneous labels, (2) the area under the receiver operating characteristic curve (AUCROC) and the area under the precision-recall curve (AUCPR) have different failure modes when erroneous labels are present, and (3) network architecture plays a significant role with some architectures exhibiting more resilience to erroneous labels than others.

Figure 1: Sample SAS images from our dataset are shown, with images (a)-(c) illustrating objects of interest, images (d)-(e) illustrating background features, and images (f)-(h) illustrating confusers. Confusers are particularly challenging as they can be easily mistaken for objects of interest when labeling the data based solely on sonar imagery without any in situ ground-truth information¹. This underscores the importance of accurate ground-truth labels in the development and evaluation of SAS classification models.

2 PREVIOUS WORK

The impact of erroneous labels on synthetic aperture sonar (SAS) classifier performance has yet to be extensively investigated in the literature, to the best of our knowledge. However, the effect of label noise on classifier performance is a well-known challenge in image-based machine learning³including the sub-fields of remote sensing⁴and medical image analysis⁵.

Researchers have explored the impact of erroneous labels on classification performance and have proposed methods to address this issue. One approach has been to develop robust classifiers that can tolerate a certain level of label noise. For example, some studies have investigated the use of robust loss functions and training schemes^{6,7,8,9,10,11}, which can identify and down-weight the contribution of noisy samples during the training process. By reducing the influence of mislabeled samples, these classifiers can achieve better generalization and improved performance on clean test data.

Another line of research has focused on detecting and correcting erroneous labels in the training data before classifier training. This can be achieved through various techniques, such as unsupervised algorithms explicitly modeling label noise¹², and implicit methods using a pre-trained autoencoder¹³. By correcting or filtering out mislabeled samples before training, the classifier can be trained on a cleaner dataset leading to improved performance.

3 METHODOLOGY

3.1 Dataset and Preprocessing

The training and test datasets consist of image chips of background and targets from the MUSCLE system¹⁴. The chips were generated by scanning SAS images with the Mondrian detector¹⁵and saving chips that obtained a sufficient detector score. Each chip is 335×335 pixels in size. The images were collected by MUSCLE from 2008 - 2018. We divide the dataset roughly in half and assign the training/validation data as the first five years of the collection and use the remaining five years as the test data.

The training/validation set comprises 27,748 images of which 1,385 are targets. The test set is composed of 21,188 images of which 639 are targets. We use a 50:50 random split to create training and validation sets for network training. Before training, all imagery was autofocused using a parametric quadratic phase error model optimized for maximum contrast (i.e., mean-normalized standard deviation) using a grid search. We perform data augmentation on-the-fly during the training of each network. It consists of random crops of the image to 256×256 pixels and random image flipping about the range axis. Consequently, the input image size fed to each network is 256 x 256 pixels, and a centered crop is used during validation and testing.

Each image chip is a single-look-complex (SLC) image which we preprocess to a real-valued image normalized to the interval [0, 1]. This is accomplished by: (1) converting the SLC to a real-valued image by keeping the magnitude portion and discarding the phase and (2) applying dynamic range compression to the resulting magnitude image to obtain the image used for network training. The dynamic range compression method used is a derivative of the rational tone mapping operator of¹⁶,

where f_DRCis the function performing dynamic range compression, g is the real-valued magnitude-only SAS image, and f_medianis the image median value over the entire image.

Additionally, we constructed a new evaluation dataset from our test dataset, which we call the “synthetic dataset.” This dataset was constructed using gradient domain blending of three target shapes from real SAS images and real test set background images (more details in¹⁷). It is comprised of 60,819 images each 256×256 pixels in size. This dataset creates a set of target/background combinations, some of which are not explicitly seen in the training set, allowing for performance assessment in new environments. Due to its construction, all images in this dataset are of targets. Figure 7 of the Appendix shows a montage of randomly selected images from the synthetic dataset.

3.2 Evaluated Network Architectures and Configurations

We assessed the performance of six advanced deep-learning networks encompassing diverse design strategies. These strategies include optimizing performance on ImageNet, learning through contrasting text and images, and maintaining computational efficiency.

DenseNet121 (2017)¹⁸. This model is from a class of convolutional neural networks (CNNs) that utilizes a unique dense connectivity pattern between layers. DenseNet models, such as DenseNet121, are designed to address the vanishing gradient problem, reduce the number of parameters, and improve feature reuse in deep neural networks. The ”121” in DenseNet121 refers to the total number of layers in the architecture.

CLIP_ViT-B_32 (2021)¹⁹. This model was developed by OpenAI as part of their Contrastive Language Image pre-training (CLIP) series. The primary goal of CLIP models is to create versatile AI systems that can understand and generate meaningful outputs based on the correlation between visual and textual data. The ’Vit-B_32’ in its name refers to the use of the Vision Transformer (ViT) architecture²⁰with a base configuration and an image patch size of 32x32 pixels. While the original model was designed to jointly learn from both images and text, it is possible to adapt the image backbone of the model for image classification tasks as we do here. We do this by removing the text-processing components of the model, while retaining the pre-trained weights from the full text/image dataset. This converts the model into an image feature extractor.

EfficientNetV2M (2021)²¹. This model build upon the success of the original EfficientNet models by focusing on further improving model efficiency and training speed. The ’M’ in EfficientNetV2M denotes the medium-sized variant of the EfficientNetV2 series.

MobileNetV3Small (2021)²². This model builds upon the success of the MobileNetV1²³and MobileNetV2²⁴architectures, with the goal of achieving higher performance while maintaining a low computational cost, making it suitable for resource-constrained environments such as mobile devices. The ’Small’ in MobileNetV3Small denotes the smaller-sized variant of the MobileNetV3 series, which is optimized for even lower computational requirements and memory footprint.

ResNetRS152 (2021)²⁵.This model build upon the well-established ResNet architecture²⁶while incorporating improvements in the architecture for better performance and efficiency. The ’RS’ in ResNetRS denotes re-scaling of the original ResNet, while ’152’ refers to the number of layers in the model, making it a deeper variant within the ResNetRS series. ResNetRS is distinct in its design focus which concentrated on disentangling the effects of model architecture, training methods, and scaling strategies.

ConvNeXtSmall (2022)²⁷. The ConvNeXt models are designed to challenge the notion that Vision Transformers (ViTs) have completely superseded traditional ConvNets in terms of performance and scalability across various computer vision tasks. The ConvNeXt family aims to demonstrate that pure ConvNets, when incorporating key design components inspired by Vision Transformers, can achieve competitive results. The ’Small’ in ConvNextSmall denotes a smaller-sized variant within the Con vNeXt series, optimized for lower computational requirements and memory footprint.

3.3 Experimental Setup and Label Corruption Schemes

We examined six state-of-the-art deep learning architectures trained on our SAS dataset. Our experimental setup was designed to analyze the impact of erroneous labels on the performance of these classifiers. To simulate the effects of label noise, we intentionally corrupted the class labels of our training set. This corruption was accomplished by converting target labels to background labels and vice versa. We defined the degree of label corruption as the target label error proportion (TLEP). Positive TLEP indicates an excess of target labels as a result of erroneous conversion from background labels to target labels (i.e., False Positives). Conversely, negative TLEP indicates a deficit of target labels due to the incorrect conversion from target labels to background labels. For example, if the number of targets in the dataset is 100, a TLEP of -10% would switch ten target labels to background yielding a new total of ninety target labels. Likewise, a TLEP of 20% would switch twenty background labels to target, yielding a new total of 120 target labels.

The TLEPs we investigated were -50%, -40%, -30%, -20%, -10%, 0% (indicating no corruption), 10%, 20%, 30%, 40%, and 50%. Only the training set labels were corrupted, while the validation labels remained pristine. This was done to provide an upper bound on performance, as it simulates the best case scenario where the validation data is completely reliable, but the training set is not. The total number of models fully trained for this work is tallied below:

We trained each classifier for a total of 150 epochs using the Adam optimization algorithm, starting with an initial learning rate of 10⁻⁵. If there was no improvement in the validation loss for ten successive epochs, we implemented a learning rate decay where we reduced the learning rate by a factor of ten and continued training. For the final evaluation of the test sets, we chose the model from the training process that achieved the lowest validation loss. All models were trained using binary focal loss²⁸ using a minibatch size of thirty-two.

Each network (with the exception of CLIP_ViT-B_32) was trained using two different initialization schemes. The first scheme initialized the network with random weights using the definitions given in their respective sources, and the second scheme initialized weights from a model trained on the ImageNet dataset (a scenario we refer to as ImageNet pre-training). Since the CLIP_ViT-B_32 model is trained using text and imagery together, we only evaluated a pre-trained version of this model since we have no text in our training dataset, hence we are unable to train the model from scratch as it requires both images and text captions to train.

Classification performance was based on two metrics: AUCROC and AUCPR. AUCROC is a widely used metric for binary classification problems that quantifies the trade-off between true positive rate and false positive rate as the classification threshold is varied. AUCPR, on the other hand, plots precision against recall and is particularly informative for imbalanced datasets where the positive class is rare, a common scenario in the SAS target detection task.

The synthetic dataset was evaluated solely using mean accuracy since it only contained positive target labels. It is defined as,

where T is the total number of thresholds, N is the total number of images, is the predicted label for image i, and is the indicator function that evaluates to 1 if the condition is true and 0 otherwise.

4 RESULTS

For all box and whisker plots: the box edges represent the lower and upper quartiles, the line inside indicates the median, and the triangle indicates the mean. The whiskers extend to showcase the data range excluding any outliers. The box encapsulates approximately 50% of the data, while the whiskers extend 1.5 times the inter-quartile range. Outliers beyond this range, sometimes called fliers, are denoted as circles. For all plots where statistical testing is performed, we denote significance with * in the plot; we define significance as p < 0.01.

4.1 Network-Agnostic Results

Figure 2 presents a comprehensive performance summary across various TLEPs and pre-training schemes. Averaged results from multiple evaluated networks are shown, which reveal insights into the performance differences. For AUCROC, erroneously labeling backgrounds as targets (TLEP > 0) results in worse performance than the alternative. Conversely, for AUCPR, erroneously labeling targets as background (TLEP < 0) leads to worse performance than the alternative. Mean accuracy on the synthetic dataset shows a similar trend, where erroneously labeling targets as background (TLEP < 0) leads to lower performance than the alternative. Notably, the ImageNet pre-training scheme yields improved results over no pre-training (i.e., random weight initialization) for all TLEP levels.

In Figure 2, statistical testing was performed on each TLEP group using a Mann-Whitney U test. This test was used because of the unequal number of samples in the ImageNet pre-training and no pre-training groups (60 vs. 50 samples respectively; recall that CLIP_ViT-B_32 is ImageNet pre-trained only). Consequently, our results show that each TLEP group exhibited a statistically significant improvement with ImageNet pre-training over no pre-training.

Observation 1: ImageNet pre-training improves classifier performance across most TLEPs and metrics compared to no pre-training. This finding underscores the importance of leveraging pre-training techniques to enhance the resilience of SAS classifiers to label noise and improve their overall effectiveness.

Observation 2: Erroneously labeling targets as background leads to inferior performance in terms of AUCPR and mean accuracy, while erroneously labeling backgrounds as targets results in poorer performance for AUCROC.

4.2 Network-Specific Results

Figures 3, 4, and 5 illustrate the performance values obtained from multiple deep network configurations across various TLEPs. Each network was trained ten times for each TLEP with a different random seed used before each training run. Each random seed yields a different train and validation split, and also varies the weight initialization for the non-pre-trained models. Statistical testing was performed on each TLEP group by selecting the model with the largest mean of each metric and per forming a Wilcoxon signed-rank test with this model and the remaining models. Next, a Bonferroni correction was performed to account for multiple comparisons.

Figure 3 shows AUCROC results of all the classifiers evaluated over a range of TLEPs. Erroneously labeling targets as background is more benign than labeling backgrounds as target, regardless of Im ageNet pre-training. Overall, ImageNet pre-training results in better performance than without, but its performance gains diminish as background samples are erroneously labeled as targets. The Con vNeXtSmall model exhibits statistically significant best performance for a given TLEP in two scenarios, of which both use ImageNet pre-training.

Figure 4 shows AUCPR results of all the classifiers evaluated over a range of TLEPs. Using ImageNet pre-training results in slightly better performance when backgrounds are erroneously labeled as targets than erroneously labeled targets as backgrounds. Also, the performance over the label error proportions has an inverted U-shape meaning performance degrades as a function of the absolute proportion of errors.

Figure 5 shows mean accuracy on the synthetic dataset of all the classifiers evaluated over a range of TLEPs. With ImageNet pre-training, labeling targets erroneously as backgrounds is more detrimental to performance than labeling backgrounds erroneously as targets. Overall, the best performance is obtained using ImageNet pre-training than with no pre-training. We see that DenseNet121 obtains statistical significance for a given TLEP in five scenarios of which no pre-training is used. However, the performance is eclipsed by EfficientNetV2M using ImageNet pre-training.

Figure 6 shows the distribution of each metric across all TLEPs for both initialization strategies and the various models. We see that DenseNet121, EfficientNetV2M, and ConvNeXtSmall stand out as models that exhibit statistically significant superior performance for some metrics and configurations indicating these models are robust across a variety of TLEPs. Statistical analysis was conducted on each group by first selecting the model with the highest mean for each metric, then performing a Wilcoxon signed-rank test between this model and the others, followed by the application of a Bonferroni correction to adjust for multiple comparisons.

Observation 3: The ability to withstand label errors varies considerable among networks, both in terms of average performance and consistency in performance. Notably, ConvNeXtSmall and DenseNet121 stand out by demonstrating statistically significant superior performance compared to other networks under certain TLEPs.

Observation 4: When evaluating across all TLEPs collectively, certain models exhibit statistically significant improvement in robustness to erroneous labels: ConvNeXtSmall has the highest AUCROC regardless of pre-training, and has the highest AUCPR with pre-training; EfficientNet has the highest Mean Accuracy and commendable AUCPR amongst pre-trained models; DenseNet121 has the highest AUCPR and Mean Accuracy amongst models trained from scratch.

Figure 2: A comprehensive summary of performance averaged over all the networks evaluated across different TLEPs and pre-training schemes. The figures illustrate the trends and impact of label errors on overall performance measured from several state-of-the-art deep networks. We see AUCROC exhibits different failure characteristics than AUCPR and mean accuracy (see Observations 1 and 2). These results highlight the trade-off one must consider in selecting a performance metric in the midst of label noise.

5 CONCLUSION

Our study offers a comprehensive investigation into the impact of label noise on the performance of training deep-learning models for SAS image classification. Through empirical analysis, we have quantified the influence of erroneous labels on classifier performance, revealing distinct patterns for different evaluation metrics (see Observations 1-4 in the Results section). Finally, by shedding light on the specific effects of label noise and identifying models that exhibit greater resilience to such noise, this study lays a foundation for developing more robust and reliable SAS classification methods.

Figure 3: AUCROC for the evaluated networks over a range of TLEPs. Each configuration was run ten times using bootstrapping. We see that performance across networks varies greatly (see Observations 2 and 3).

6 ACKNOWLEDGMENTS

The authors thank the NATO Centre for Maritime Research & Experimentation (CMRE) for providing the data used in this work; the NATO Allied Command Transformation funded the data collection. This work was supported by the Office of Naval Research under grant N00014-20-1-2406, and in part by high-performance computer time and resources from the DoD High Performance Computing Modernization Program. Finally, the authors would like to express their gratitude to OpenAI’s GPT-4 for its assistance in drafting parts of this document.

Figure 4: AUCPR for the evaluated networks over a range of TLEPs. Each configuration was run ten times using bootstrapping. We see that performance across networks varies greatly (see Observations 2 and 3).

Figure 5: Mean accuracy on the synthetic dataset for the evaluated networks over a range of TLEPs. Each configuration was run ten times using bootstrapping. We see that performance across networks varies greatly (see Observation 2 and 3).

Figure 6: A comprehensive analysis of performance, aggregated over all TLEPs, is presented for different models and initialization methods. Again, there are significant performance differences across models which are metric-dependent (see Observations 3 and 4).

7 REFERENCES

Oscar Bryan, Roy Edgar Hansen, Tom S. F. Haines, Narada Warakagoda, and Alan Hunter. Challenges of labelling unknown seabed munition dumpsites from acoustic and optical surveys: A case study at Skagerrak. Remote Sensing, 14(11):2619, 2022.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. IEEE, 2009.
Görkem Algan and Ilkay Ulusoy. Image classification with deep learning in the presence of noisy labels: A survey. Knowledge-Based Systems, 215:106771, 2021.
Tom Burgert, Mahdyar Ravanbakhsh, and Begüm Demir. On the effects of different types of label noise in multi-label remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 60:1–13, 2022.
Davood Karimi, Haoran Dou, Simon K. Warfield, and Ali Gholipour. Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65:101759, 2020.
Xingjun Ma, Yisen Wang, Michael E. Houle, Shuo Zhou, Sarah Erfani, Shutaox Xia, Sudanthi Wijewickrema, and James Bailey. Dimensionality-driven learning with noisy labels. In International Conference on Machine Learning, pages 3355–3364. PMLR, 2018.
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412, 2020.
Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems, 31, 2018.
Geoff Pleiss, Tianyi Zhang, Ethan Elenberg, and Kilian Q. Weinberger. Identifying mislabeled data using the area under the margin ranking. Advances in Neural Information Processing Systems, 33:17044–17056, 2020.
Aritra Ghosh and Andrew Lan. Contrastive learning improves model robustness under label noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2703–2708, 2021.
Zhilu Zhang and Mert Sabuncu. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neural Information Processing Systems, 31, 2018.
Eric Arazo, Diego Ortego, Paul Albert, Noel O’Connor, and Kevin McGuinness. Unsupervised label noise modeling and loss correction. In International Conference on Machine Learning, pages 312–321. PMLR, 2019.
Yunhao Yang and Andrew Whinston. Identifying mislabeled images in supervised learning utilizing autoencoder. In Proceedings of the Future Technologies Conference (FTC) 2021, Volume 2, pages 266–282. Springer, 2022.
Andrea Bellettini and Marc Pinto. Design and experimental results of a 300-kHz synthetic aperture sonar optimized for shallow-water operations. IEEE Journal of Oceanic Engineering, 34(3):285–293, 2009.
David P. Williams. The Mondrian detection algorithm for sonar imagery. IEEE Transactions on Geoscience and Remote Sensing, 56(2):1091–1102, 2018.
Christophe Schlick. Quantization techniques for visualization of high dynamic range pictures. In Photorealistic Rendering Techniques, pages 7–20. Springer, 1995.
Isaac D. Gerg and Vishal Monga. Preliminary results on distribution shift performance of deep networks for synthetic aperture sonar classification. In OCEANS 2022, Hampton Roads, pages 1–9, 2022.
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4700–4708, 2017.
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
Mingxing Tan and Quoc Le. EfficientNetV2: Smaller models and faster training. In International Conference on Machine Learning, pages 10096–10106. PMLR, 2021.
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1314–1324, 2019.
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 2018.
Irwan Bello, William Fedus, Xianzhi Du, Ekin Dogus Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, and Barret Zoph. Revisiting ResNets: Improved training and scaling strategies. Advances in Neural Information Processing Systems, 34:22614–22627, 2021.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022.
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pages 2980–2988, 2017.

A RANDOM SELECTION OF IMAGES FROM THE SYNTHETIC DATASET

Figure 7: A subset of sixty-four randomly selected images from the synthetic dataset, showcasing the diversity and variability of the generated images via the gradient domain blending procedure of¹⁷. Images selected by random.shuffle(image_list); image_list[:64].

Building Acoustics

Policy & health

Underwater acoustics

Speech and hearing

Physical acoustics

Noise and vibration engineering

Musical acoustics

Electroacoustics

Environmental Sound

Measurement and instrumentation

Regulatory & Standards

Research

About Us

Terms and Conditions

Advertise With Us

People & Contacts

Publications

Engineering

Bursary Fund

Regional Branches

Specialist Groups

Conferences and Events

Conference Proceedings

British Standards Committees

Organisation Search

Why become a member?

Application Process

Membership Fees

Application Policy

Application

Professional Development Scheme (CPD)

Bulletins

Member Directory

Help and Advice

Awards

Become a Sponsor Member

What is acoustics?

Technician Apprenticeship Scheme 2022

Where do acousticians work?

Career Guide

What educational qualifications do I need?