Welcome to the new IOA website! Please reset your password to access your account.

Improving the realistic rendering of artificial sonar images using Cycle Generative Adversarial Networks

 

Zamirddine Mari, Eric Monteux, Yann-Hervé Hellouvry, et al.

 

Citation: Proc. Mtgs. Acoust. 47, 070010 (2022); doi: 10.1121/2.0001598

 

View online: https://doi.org/10.1121/2.0001598

 

View Table of Contents: https://asa.scitation.org/toc/pma/47/1

 

Published by the Acoustical Society of America

 

ARTICLES YOU MAY BE INTERESTED IN

 

Acoustic attenuation of cohesive sediments (mud) at high ultrasound frequencies

 

Proceedings of Meetings on Acoustics 47, 070009 (2022); https://doi.org/10.1121/2.0001594

 

Utilizing imaging geometry meta-data in classification of synthetic aperture sonar images with deep learning

 

Proceedings of Meetings on Acoustics 47, 070011 (2022); https://doi.org/10.1121/2.0001607

 

Three inventions to clean ship hulls, decontaminate hospital floors, and treat wounds, using air, sound and water

 

Proceedings of Meetings on Acoustics 47, 032001 (2022); https://doi.org/10.1121/2.0001606

 

Exploring the use of AI in marine acoustic sensor management

 

Proceedings of Meetings on Acoustics 47, 070008 (2022); https://doi.org/10.1121/2.0001601

 

Optimization of underwater acoustic detection of marine mammals and ships using CNN

 

Proceedings of Meetings on Acoustics 47, 070012 (2022); https://doi.org/10.1121/2.0001608

 

Modelled sonar and target depth distributions for active sonar operations in realistic environments

 

Proceedings of Meetings on Acoustics 47, 070013 (2022); https://doi.org/10.1121/2.0001610

 

 

 

Improving the realistic rendering of artificial sonar images using Cycle Generative Adversarial Networks

 

Zamirddine M ari

 

Department of DGA Techniques Navales, Toulon, VAR, 83000, FRANCE; zamirddine.mari@def.gouv.fr

 

Eric Monteux and Yann- Herv´e Hellouvry

 

SCALIAN-DS, Rennes, Bretagne, 35700, FRANCE; eric.monteux@scalian.com; yann-herve.hellouvry@scalian.com

 

Lionel Pibre and J´erˆome Pasquet

 

Université de Montpellier, TETIS-Inrae, AgroParisTech, Cirad, CNRS, Montpellier, Herault, 34090, FRANCE; lionel.pibre@univ-monpt3.fr; jerome.pasquet@univ-monpt3.fr

 

The advances of recent decades in marine technologies have enabled the development of high-resolution acoustic imaging systems capable of generating high-resolution imagery of the underwater environment. Based on deep learning approaches, the collected data is used to develop Automatic Target Recognition (ATR) algorithms to detect suspicious objects on the seafloor and classify each as an object of interest (e.g., a mine) or not. However, because obtaining labelled underwater images demands time and effort, applying deep learning based approaches in underwater environment remains a challenge due to the scarcity of training data. This paper presents a work on the improvement of the realism of synthetic sonar data generated by a simulator capable of massively producing labelled images. The simulator is based on the numerical modeling of the propagation, reverberation and reflection of acoustics waves in an artificial underwater environment. The reduction of the gap of realism with real sonar images is conducted by applying Cycle Generative Adversarial Networks. Then, we study the successive addition of a growing quantity of these refined synthetic data in a set of real training sonar image of limited size and the impact of this data augmentation on the performance of a CNN-based object detector.

 

1. INTRODUCTION

 

The maritime operation which consists in revealing the presence of suspicious objects on the seabed is an activity of prime importance. Whether for defense matters or for civil activities such as the exploitation of marine resources. This is for example particularly true in the field of the exploitation of offshore wind turbines in the North Sea, a place very famous for suffering from the presence of unexploded ordnance (UXO) dating from past military conflicts. The advent of increasingly resolute sonar sensors combined with the development of autonomous underwater vehicles (AUV) make it possible to collect sonar images in large quantities which need to be processed by automatic algorithms dedicated to the detection and classification of mines and other dangerous objects (Figure 1).

 

In recent years, the development of these algorithms, called ATR for Automatic Target Recognition, has relied on deep learning techniques. These techniques are the most efficient image classification methods, but to be able to take advantage of this efficiency, it is essential to have a large computing capacity but above all a large quantity of sample images representing in a balanced way the diversity of cases in the field studied.

 

 

Figure 1: Example of side-scan sonar image containing a mine-like target

 

In the field of the detection and classification of objects on sonar images, the constitution of a base of learning images produced from sea trials and containing a good representation of the data of the problem, proves to be a real challenge. This is primarily due to the means used to acquire sonar images. The technological advances of recent years, in the field of AUVs and sonar imaging, certainly now make it possible to acquire large expanses of seabed with resolution levels close to one centimeter. However, access to these means has a significant financial cost. And their implementation requires a good level of experience and technical skills. The other problem is the capacity to acquire in balanced proportions then to annotate in the images, the diversity of the targets and their multiple configurations of angles of view and distance to the sonar, deposited with a precise positioning on the diversity of the types of bottoms that the seabed contains. Very cluttered and uneven backgrounds present a complex texture in the images that causes the algorithms the most difficulty. The cases of objects placed on these types of bottoms should therefore be well represented in the base of learning examples. However, it is technically more difficult to acquire them because the operations of depositing and recovering these objects are more delicate on these funds than on flat and unencumbered funds. Finally, the marine environment can often be hostile, especially when the weather is unfavourable. It is common for difficult weather conditions to be the cause of a delay or abandonment of an image survey operation. Given all these difficulties, training image databases, in the field of object detection and classification on sonar images, may suffer from too low a volume of data or too great an imbalance relation to the diversity of objects of interest and the diversity of types of seabed.

 

The simulation of digital images is a field of research that can prove to be an interesting alternative to contribute to the construction of deep learning image sets. High-performance computer programming techniques applied to the mathematical and numerical modeling of underwater acoustics allow the production of synthetic sonar images in very large quantities with the advantage of automatically associating them with a ground truth corresponding to the exact position of all objects of interest in the images. However, it is currently not yet possible to directly use these images for learning the object detection and classification algorithm because they present a too large difference in rendering with reality. In an attempt to overcome this problem and benefit from the advantages of the massive production of annotated computer-generated images, the literature [1] offers avenues aimed at improving the quality of the images from a function developed during a phase of learning. This function takes as input paired data, for example a real image and a simulated image containing the same semantics, and will try to learn the succession of transformations allowing to pass from the simulated image to the real image. The approach applied in this article is called CycleGAN [2]. It makes it possible to transform an image according to a certain representation into another without requiring a preliminary matching. In our case, the objective will be to transform a synthetic image produced by an underwater acoustic signal simulator into a more realistic image respecting the semantics and content of the image and the characteristic noise of sonar images.

 

The rest of this paper is organized as follows: In Section II, we addressed the description of the CycleGAN technique and the sonar image simulator used on this study. We also present methods dedicated to the evaluation of the different simulated images. Section III describes the approach using CycleGAN to transform simulated images into realistic sonar images. And how the final simulated images are used to train an object detector. Section IV presents the experimental results. And finally, Section V summarizes the conclusions of the paper.

 

2. BACKGROUND

 

A. SONAR IMAGE SIMULATOR

 

Simulation of a sonar imaging system requires modelling a set of complex underwater physical processes that influence image formation such as multipath sound propagation, scattering and diffraction phenomena, and targets characteristics. Several approaches can be used in sonar simulation, but one of the most widely used is ray tracing. It is a high frequency approximation that estimates the signal response by emitting acoustic rays from the sound source and tracing them through 3D space while recording their interaction with any objects/surfaces to calculate the contribution of each ray. Among the works based on this technique, we can cite Bell [3] which is based on optical ray tracing by adapting it to side-scan sonar imagery. Coiras and Groen [4] use the frequency domain to produce synthetic aperture sonar frames; to create the acoustic image, they calculate the Fourier transform of the acoustic pulse used to insonify the scene. Sac¸ et al. [5] also apply the frequency domain to the calculation of ray tracing to simulate a frontal sonar. Three main parameters are taken into account when a ray strikes an object in 3D space: the Euclidean distance to the sonar axis, the intensity of the signal returned by a Lambert illumination model and the normal to the surface and reflection and shadow phenomena. A different approach to ray tracing called tube tracing was introduced by Gueriot and Sintes [6]. It is based on a volume approach of the energy in interaction with the scene and collected by the sonar in reception; sound propagation is defined by series of acoustic tubes always orthogonal to the current sonar view. Reverberation and surface irregularities of objects are also handled.

 

This article presents a sonar image simulator called SIMSON (Figure 2) based on ray tracing and developed jointly by the research centre of ENSTA Bretagne and the SCALIAN DS company. SIMSON makes it possible to model a side-scan or sector-based imaging sonar and to define a 3D model of the seabed and of the objects placed on these seabeds by associating them with an acoustic absorption coefficient. To optimize the calculations, SIMSON favors a stochastic method to select the type of interaction between the ray and the surfaces by dealing with the effects of multipath and diffraction. This is a Monte Carlo method, named Sonel Mapping. Simulated images are produced at very high speed, thanks to a full GPU implementation based on the CUDA toolkit and NVIDIA Optix ray tracer.

 

 

Figure 2: Screenshot of the SIMSON software’s user interface

 

B. CYCLEGAN

 

From a general point of view, generative adversarial neural networks or GANs [7] are a specific type of deep neural networks that allow to synthesize new images. The general principle consists in training two deep convolutional neural networks simultaneously in an antagonistic way: a generative model G and a discriminative model D. First, the generative model learns to generate new images from an input image set. Then, the discriminative model evaluates how well the generated image is a true member of the input image set. Finally, the error is back propagated to minimize the loss of both models simultaneously. Initially, the first approach using an input image to synthesize new images was the pix2pix method [8]. It was learning to transfer the style of an image from one domain to another based on paired training samples. But more recently, Zhu et al. [9] introduced the CycleGAN method, or Cycle-Consistent GAN, which allows to learn transformations from one domain to another and vice versa from unpaired sets of images. For two domains X and Y, CycleGAN learns the transformation G:X→Y and F:Y→X. For the G:X→Y transformation and its discriminator Dy, G tries to generate G(x) images that resemble the real y images in the Y domain, while Dy tries to distinguish the simulated G(x) samples from the real y samples by trying to minimize the loss of both models simultaneously. A similar loss is fixed for the map F:Y→X and its discriminator Dx. The novelty was to apply the idea that these transformations should be inverses of each other and that the two transformations should be bijections. This is achieved thanks to the contribution of a new so-called loss of cycle coherence which encourages F(G(x))≈x and G(F(y))≈y, thus forcing the two transformations G and F to perform an unpaired geometrically consistent image-to-image transformation from domain to domain and vice versa. In other words, CycleGAN can translate from one domain to another without one-to-one mapping between the source and target domains. The advantage is that it allows to create a large set of new image data without requiring prior image annotation. In the rest of the paper, the method will be used in order to make the sonar images from the SIMSON simulator more realistic. Domain X will be represented by sonar images from the SIMSON simulator, and domain Y by real sonar images.

 

C. YOLO V5

 

Standing for You Only Look Once, YOLO is an object detection method well known for its accuracy, speed, and ability to detect objects in a single image run. Initially released in 2016, the algorithm continued to evolve and be handled by different teams until its fifth version, the one we use in this paper. The first three versions YOLOv1 [10], YOLOv2 [8] and YOLOv3 [9] were all published by Joseph Redmon. The third version is recognized as a milestone with great improvements in performance and speed by providing multi-scale functionality (FPN) [13], Darknet53 backbone network and replacing soft-max loss with loss of binary cross-entropy. After this version, YOLOv4 [14] was released in 2020 by a new research team that explored new improvements, including the backbone, and what they call bags of freebies and bags of specials. Then a month later the YOLOv5 [15] was released by a new research team which dramatically reduced the size, increased the speed [14] and provided a full implementation in Python (PyTorch). In the rest of this paper, YOLOv5 will be used to evaluate to what extent the addition of simulated sonar images, whose realism would have been improved by CycleGAN, in a small set of real images, improves the performance of object detection on real sonar images.

 

3. METHODOLOGY

 

A. DATASETS AND EXPERIMENTAL SETUP

 

Real sonar image set

 

The real sonar image dataset is based on a SAS sonar image set containing 879 views of different mine type objects represented on different seafloors grouped in two categories: flat seafloors (different types of seabed sediments) and complex seafloor (sand ripples, boulders). The Figure 3 below shows 4 types of targets contained in the dataset and the Figure 4 presents their proportion in this dataset.

 

 

Figure 3: Example sonar mugshots of targets contained in the real image dataset

 

SIMSON simulator output images

 

The principle of the simulation consists in simulating the movements of a sonar based on the Klein 5000 type sonar model, in a 3D environment presenting different types of seafloor on which are deposited 3D models of 4 different types of targets. These seafloors are artificially generated from the superposition of different slabs defining a single type of material among sediment, gravel or sand types defined by a characteristic level of granularity and backscatter. The irregularities of terrain areas and landforms as well as the sand ripples characteristic of the natural seabed are restored through noise modules defining each slab and then composing slab types by superposition to create troughs and elevations of terrain. The presence of isolated or clustered rocks is realised from 3D models of rocks.

 

 

Figure 4 : The proportion of different targets contained in the real image dataset

 

The simulation settings allow the sonar displacements to vary according to its altitude relative to the bottom, the distance of the sonar from a given target and its direction relative to the bottom. In addition, it is possible to vary the burial level of the target and its viewing angle relative to the sonar sensor. This principle has enabled the generation of nearly 20,000 synthetic sonar image rails. Each rail represents a bottom area of approximately 500 metres by 150 metres with a resolution of 10 centimetres along-track and across-track (Figure 5). This simulation process makes it possible to produce and associate with each image a precise ground truth giving the location and delimitation of the echo and shadow of each target as well as the type of bottom on which the target is placed.

 

 

Figure 5: Example of simulated side-scan sonar image of approximately 500 meters by 150 meters

 

For the rest of the study, the simulator provides us with a synthetic sonar dataset composed of around 180000 views of targets representing the 4 types of targets of the real image set, categorized on 2 types of seafloor: flat and complex (Figure 6 and Figure 7).

 

 

Figure 6: Example sonar mugshots of targets contained in the simulated image dataset

 

Real versus synthetic images comparison (FID score/t-SNE) before and after histogram matching

 

Before considering the use of the CycleGAN method to try to improve the realism of the images from the simulator, we evaluated the gap between these images and our set of real images.

 

 

Figure 7: The proportion of different targets contained in the simulated image dataset

 

Firstly, by construction, the real images have a resolution five times higher than the computer-generated images. Secondly, based on our analysis of the three general characteristics of an image: the distribution of pixel values, the texture elements and the shapes, we note that the computer-generated images are much brighter than the real images, and have higher contrast. The background texture characteristic of the speckle noise in the sonar images has a different visual appearance accentuated by the difference in resolution between the two types of images.

 

Finally, we note that the shapes and contours of the echo and shadow of the targets are more pronounced in the simulated images. To support our analysis of the difference in similarity between the real images and those directly output from the simulator, we measured a distance difference between the two datasets, based on the calculation of the FID score [16] in the Table 1 below and plotted the real and simulated distributions using the t-SNE method [17] in the Figure 12 . If we focus on the red points (synthetic dataset) and the green points (real dataset), we can see that the result confirms the quite marked differentiation observed by the visual analysis of the images.

 

Before applying the CycleGAN method, we decided to evaluate the contribution of a pre-processing called histogram matching [18] aiming to bring the distribution of pixel values of the simulated images closer to that of the real data.

 

 

Table 1: Real and simulated datasets comparison based on the FID score

 

As we can see in the Figure 8, this process effectively brings the visual rendering of the simulated images closer to that of the real images.

 

 

Figure 8: Visualization of a sonar image from the SIMSON simulator (a), a simulated sonar image after histogram matching (b) and a real sonar image (c)

 

However, the difference in contrast between the objects and the background, as well as the texture of the background, is still very marked. The Figure 9 shows the resulting histogram transformation of a simulated sonar image.

 

 

 

Figure 9: Visualization of the histograms of a real sonar image (a), a sonar image from the SIMSON simulator (b) and a simulated sonar image after histogram matching (c)

 

The analysis of the FID scores of the different sets of images, in the table 3, effectively shows that the histogram matching applied to the simulated data reduces negligibly the difference in similarity with the real data.

 

The t-SNE visualizations in Figure 12 also allows to observe the impact of histogram matching (blue) on the similarity of simulated (red) and real datasets (green). As can be seen, the histogram matching method is not very effective in matching the simulated dataset to the real dataset in the representation space. Indeed, as we can see, the data after the histogram matching are overlapped with the simulated data.

 

B. SYNTHETIC RENDERING IMPROVEMENT USING CYCLEGAN

 

To improve the realistic rendering of our simulator images we have followed the process illustrated in the Figure 10 . As mentioned above, we started by applying a histogram matching pre-processing to these images to produce an intensity dynamic close to a real image. Following this, we apply the CycleGAN training procedure. As mentioned in section II-B, the strength of this procedure is that it does not require any prior annotation of the training images. We therefore proceeded as follows to build up our two sets of real and simulated unmatched images. First, we produced 1790 real images of size 512x512 pixels by extracting two image rectangles randomly positioned around each of our 879 target views presented in section III-A. Then, we selected the same number of simulated images of size 512x512 pixels. As a basis for the implementation, we used the tensorflow 2 implementation which uses the U-Net generator model [19]. The training was carried out for 100 epochs with the result saved every 5 epochs. The result of the training and tests are discussed in section IV.

 

 

Figure 10: Schematic representation of the refined simulated sonar image generation pipeline

 

C. APPLICATION TO OBJECT DETECTION

 

As explained in section II.C which introduces the YOLO detector, we will use an implementation of this detector to try to evaluate to what extent the addition of simulated sonar images, enhanced by CycleGAN, in a small set of real images, improves the performance of object detection on real sonar images. For our experiments, we used a CycleGAN trained for 5 epochs. The simulated images used for model training are those considered most realistic by the CycleGAN discriminant.

 

To do this, we will train the detector successively on 8 different sets of images of size 512x512 pixels. These 8 sets will be trained with a 3-fold cross-validation. We present Table 2 the distribution of the 8 sets of images. Performance evaluation for each of the 8 training image sets was performed on a set of real images belonging to each cross-validation fold. For each model, we used a validation base composed of 64 real images, and a test base composed of 165 real images. Each of these images contains targets that are precisely located by a bounding box. The evaluation was based on the calculation of detector precision and recall. The recall measures how well the detector identifies all true positives. Thus, for all targets actually present, recall tells us how many the detector has correctly detected. It is calculated by dividing the number of correctly labelled bounding boxes by the total number of true bounding boxes. Precision measures how well the detector performs in detecting correctly.

 

For our problem statement, the accuracy tells us how many targets the detector actually detected out of all the detections it made. It is calculated by dividing the number of correctly labelled bounding boxes by the total number of bounding boxes predicted by the detector. Recall and precision use the Intersection over union (IoU) as a threshold to determine what constitutes a detection. The Intersection over union (IoU) measures the percentage of overlap between the predicted bounding box area and the ground truth area to determine if and to what extent the correct object is actually detected. If a correctly labelled object does not have a significantly high IoU, it is considered a false negative. The YOLO detector also uses a confidence threshold to determine when a prediction should be considered valid. The results of the training and testing are discussed in the next section.

 

 

Table 2: Table describing the different detector training datasets

 

4. EXPERIMENTAL RESULTS

 

A. CYCLEGAN REULTS

 

In Figure 11, we can compare the real image with the image of a simulator and the result after entering the CycleGAN. As we can see, the clean echo in the simulator image has been deconstructed by the CycleGAN. But according to the t-SNE representation (Figure 12) and the FID score (Table 3), the synthetic image after cycleGAN came close to real images in terms of similarity.

 

Unfortunately, the approach is rather unstable. We believe this is due to the difficulty we have during training to gather enough real and simulated images with the same context.

 

 

Figure 11: Visualization of real sonar image (a), a simulated sonar image after histogram matching (b) and a sonar image after the CycleGAN transformation (c)

 

Considering the Figure 13, at epoch 10, we don’t see some particular troubles but at epoch 20, we can see the appearance of inappropriate shadows. At epoch 30, we note the appearance of inappropriate echoes but also the disappearance of the target's shadow. And at epoch 40, the apparition of inappropriate shadows and echoes is even more dramatic.

 

 

Figure 12: The feature vectors t-SNE visualization of real datasets, a simulated dataset after histogram matching and a simulated dataset after CycleGAN

 

 

Table 3: FID score comparison between real dataset and simulated datasets after different processes.

 

Moreover, by applying a t-SNE representation of the CycleGAN outputs from epoch 1 to epoch 15, we can notice that starting from epoch 5 there is no more improvement. Indeed, when we look at the Figure 14, we can notice that after epoch 5, the output of CycleGAN no longer approaches the real datasets.

 

 

Figure 13: Visualization of a simulated sonar image after the CycleGAN transformation at 10 epochs (a), 20 epochs (b), 30 epochs (c) and 40 epochs (d)

 

 

Figure 14 : Visualization of t-SNE feature vectors of real datasets as well as CycleGAN simulated datasets from epoch 1 to epoch 15

 

B. ANALYSING THE DETECTOR PERFORMANCE AGAINST THE DIFFERENT DATASET CONFIGURATIONS

 

 

Figure 15: Area under the precision-recall curve of the different datasets of the experiment

 

The Figure 15 presents for each of the training sets of our experiment, the result of the calculation of the area under the precision-recall curve when the detection threshold varies between 0 and 1.

 

The precision-recall curve helps us to consider the best compromise between a high precision and a high recall as you can’t maximize both at the same time. The area under the curve is a good metric that evaluates which training set makes the best compromise. The training sets are ordered starting with the one giving the largest value of area under the curve. As can be seen, the results improve up to the training base Real images + 4x simulated images. When adding more simulated images, the results decrease. This result is explained by the fact that even if the simulated data have been improved, they are still far from the real images. Moreover, the simulated images used to train the model are those considered most realistic by the CycleGAN discriminant. Thus, the more simulated images are used during training, the less realistic the images added to the training base will be. The precision-recall curves for each of the image sets in our experiment are shown in the Figure 16. The first observation is that the original real data set has the worst AUC value.

 

It can be established that the addition of additional simulated images or images transformed from the original images into the training set does tend to improve the performance of the detector. However, we do not observe a linear increase of the area under the curve with the increase of the size of the training dataset.

 

 

Figure 16: Representation of the precision-recall curves for each of the image sets in the experiment

 

Firstly, the small number of real target images available for this study meant that all these images, whatever the type of target and seabed, had to be presented together during the training of the CycleGAN. Ideally, specific CycleGAN training datasets should be constructed for each target type or even seabed type.

 

Secondly, the improvement of the realism of the simulator output images should also be pursued in order to reduce the discrepancy with the real images before the CycleGAN is put into play.

 

5. CONCLUSION

 

In this work, we have presented an approach that combines a massive sonar image generation tool based on numerical modelling of underwater acoustic phenomena, with the GAN technique called CycleGAN. This approach offers interesting research avenues for increasing the training database of algorithms such as ATR, which may suffer from difficult access to training data. Indeed, we have shown that the performance of a detection can be improved by combining a small amount of real data with simulated data that can be massively produced. However, our approach has several limitations. As we show in Section IV-A, the output of our CycleGAN can lead to some failure cases. These include the effects of inappropriate echoes and shadows or even the disappearance of the echo of targets. We believe that some of these errors could be mitigated by creating pipelines for simulator image generation and CycleGAN refinement specific to a given target type. But this would require acquiring a larger number of real images of each of these targets. In future work, we plan to experiment with this approach. We also plan to experiment with improvements to the CycleGAN architecture and to investigate more domain-specific parameterisations of sonar images.

 

ACKNOWLEDGMENTS

 

This work was part of the research activities of the French Armaments Procurement Agency (DGA). The work exploits the results of two projects supported by the Direction Générale de l'Armement. The first is the DGA RAPID SIMSON project (SONar image simulation) led by SCALIAN DS and ENSTA Bretagne (École nationale supérieure de techniques avancées Bretagne). And the second is a study contract between DGA Techniques navales and the UMR TETIS (Unité Mixte de Recherche pour les Territoires et l'Environnement par la Télédétection et l'Information Spatiale).

 

REFERENCES

 

  1. Improving Sar Automatic Target Recognition Using Simulated Images Under Deep Residual Refinements - Miriam Cha et al. - ICASSP 2018
  2. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks - Jun-Yan Zhu, Taesung Park et al. - ICCV 2017
  3. Bell JM. Application of optical ray tracing techniques to the simulation of sonar images. Optical Engineering 1997;36(6):1806–13.
  4. Coiras E, Groen J. Simulation and 3d reconstruction of side-looking sonar568images. In: Silva S, editor. Advances in Sonar Technology; chap. 1. InTech; 2009, p. 1–15.
  5. Sac ̧ H, Leblebicio ̆glu K, Bozda ̆gi Akar G.2d high-frequency forward-looking sonar simulator based on continuous surfaces approach. Turkish Journal of Electrical Engineering and Computer Sciences 2015;23(1):2289– 303. 
  6. Guériot D, Sintes C. Forward looking sonar data simulation through tube tracing. In: MTS/IEEE OCEANS Conference. 2010, p. 1–6.
  7. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural informationprocessing systems, pp. 2672–2680 (2014)
  8. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditionaladversarial networks. In: CVPR (2017)
  9. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation usingcycle-consistent adversarial networks. In: ICCV (2017) 
  10. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi,“You onlylook once: unified, real-time object detection,”inProceedingsof the IEEE conference on computer vision and pattern recogni-tion, pp. 779–788, Las Vegas, NV, United States, 2016.
  11. J. Redmon and A. Farhadi,“YOLO9000: better, faster, stron-ger,”inProceedings of the IEEE conference on computer visionand pattern recognition, pp. 7263–7271, Honolulu, HI, UnitedStates, 2017.
  12. J. Redmon and A. Farhadi,“Yolov3: an incremental improve-ment,”2018, https://arxiv.org/abs/1804.02767.
  13. T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, andS. Belongie,“Feature pyramid networks for object detection,”inProceedings of the IEEE conference on computer vision andpattern recognition, pp. 2117–2125, Honolulu, HI, UnitedStates, 2017.
  14. A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao,“Yolov4: opti-mal speed and accuracy of object detection,”2020, https://arxiv.org/abs/2004.10934.
  15. Ultralytics,“Yolov5,”20 21, February 2021, https://github.com/ultralytics/yolov5.
  16. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter,“Gans trained by a two time-scale update rule converge to a local nashequilibrium,” inAdvances in Neural Information Processing Systems,pp. 6626–6637, 2017.
  17. L.J.P. van der Maaten and G.E. Hinton, “Visualizing High-Dimensional Data Using t-SNE”, Journal of Machine Learning Research, vol. 9,nov. 2008, p. 2579–2605
  18. X. D. Yang, Q. Xiao, and H. Raafat, “Direct Mapping between Histograms: An Improved Interactive Image Enhancement Method,” in IEEE International Conference on Systems, Man, and Cybernetics, 1, pp. 243-247, 1991.
  19. The tensorflow implementation of CycleGAN. https://www.tensorflow.org/tutorials/generative/cyclegan