A A A Volume : 47 Part : 1 Optimization of underwater acoustic detection of marine mammals and ships using CNN Benedicte Dommergues, Erica Cruz and Guilherme Vaz Citation: Proc. Mtgs. Acoust. 47, 070012 (2022); doi: 10.1121/2.0001608 View online: https://doi.org/10.1121/2.0001608 View Table of Contents: https://asa.scitation.org/toc/pma/47/1 Published by the Acoustical Society of America ARTICLES YOU MAY BE INTERESTED IN Accuracy of numerically predicted underwater sound of a ship-like structure Proceedings of Meetings on Acoustics 47, 070001 (2022); https://doi.org/10.1121/2.0001565 Estimation of uncertainties in underwater sound measurements of ships Proceedings of Meetings on Acoustics 47, 070002 (2022); https://doi.org/10.1121/2.0001571 Utilizing imaging geometry meta-data in classification of synthetic aperture sonar images with deep learning Proceedings of Meetings on Acoustics 47, 070011 (2022); https://doi.org/10.1121/2.0001607 Improving the realistic rendering of artificial sonar images using Cycle Generative Adversarial Networks Proceedings of Meetings on Acoustics 47, 070010 (2022); https://doi.org/10.1121/2.0001598 Three inventions to clean ship hulls, decontaminate hospital floors, and treat wounds, using air, sound and water Proceedings of Meetings on Acoustics 47, 032001 (2022); https://doi.org/10.1121/2.0001606 Acoustic attenuation of cohesive sediments (mud) at high ultrasound frequencies Proceedings of Meetings on Acoustics 47, 070009 (2022); https://doi.org/10.1121/2.0001594 Optimization of underwater acoustic detection of marine mammals and ships using CNN Benedicte Dommergues , Erica Cruz and Guilherme Vaz blueOASIS, Lisbon, Mafra, 2770-071, PORTUGAL; bdommergues@blueoasis.pt; ecruz@blueoasis.pt; gvaz@blueoasis.pt Due to the intensification of the exploitation of the oceans, it becomes crucial to better and at a larger scale monitor marine life before impacting operations and to monitor the evolution of the ambient noise in known habitats. This is usually done with hydrophones deployed underwater at the vicinity of the site of interest. However, the processing of the collected data is a laborious task where an expert must listen to and look at each time - frame to ensure the quality of the results. The automation of the sound identification has boomed since the progress of machine learning and in particular computer vision. Indeed, models based on Convolutional Neural Network (CNN) can now identify marine mammals or other ambient sounds in a record time. However, the question of the generalization to multiple sites of such models is barely studied. This paper trains a simple CNN with the ShipEars and DOSITS datasets to identify large vessels, small vessels, dolphins, and background noise. The model is then validated with a sample of the REP(MUS) dataset. Three types of optimizations are proposed to improve the model’s performance on REP(MUS): the data variability, the time discretization, and a Bayesian optimization of the hyperparameters. 1. INTRODUCTION The development of the maritime industry, the increase of ship traffic and the increasing interest in offshore renewable energies significantly threatens marine life. Radiated noise from these activities, like pile-driving or the continuous noise from shipping can prevent sensitive species such as whales and dolphins from communicating, leading to changes in social behavior, prey detection or habitat change. Loud sounds can even cause hearing damage and lack of orientation. It is therefore crucial to accurately, and at a large- scale, monitor marine life before operations with a possible impact and to monitor the evolution of the ambient noise in known habitats. A traditional technique used to detect and classify the presence of marine life and human activity is passive acoustic monitoring with hydrophones, also called passive sonars, deployed underwater. The hydrophones record the ambient noise, which is either monitored in real time, or post-processed once the equipment has been recovered. In both cases, this analysis must be conducted by an expert who is trained to detect characteristic features in the signal by looking at the spectrogram and power density spectrum and listening to the recording. This method, albeit accurate, is very laborious because each site and each time frame must be assessed individually. Machine Learning (ML) offers a unique opportunity to automate this identification step and accelerate to processing of very large datasets. A model that can both identify ships and marine life would indeed facilitate the monitoring of larger areas and for a longer periods, allowing project developers to better assess their potential impacts before the implementation of their activities. Such a model, to be effective, should be (i) simple enough to be used and tuned by non ML experts; (ii) able to detect and classify anthropogenic sources and marine life to facilitate the environmental monitoring; (iii) general enough to be non-site-specific, i.e., that it does not need additional training to work accurately on an unknown monitoring site; (iv) small enough to be implemented onto underwater water equipment for real time monitoring. This paper studies how a simple Convolutional Neuron Network (CNN) trained to identify small vessels, large vessels, dolphin whistles and background noise on a specific dataset can be generalised to identify the same classes on another dataset. The training dataset contains 90min of ship recordings at the vicinity of a harbour, the ShipsEar dataset,1 and dolphins recordings downloaded from from Discovery of Sound in the Sea.2 The validation dataset corresponds to 30min of the recordings collected during the REP(MUS) 2021 exercise3. Three types of optimisation methods to facilitate the generalisation are used: data enhancement, time discretisation variations and Bayesian method to optimise the hyperparameters of the CNN. This paper is structured as follows: after this introduction, section 2 presents the current use of ML for sounds identification and how it led to the choice of a specific CNN. Section 3 presents the datasets used for training and validation and details the pre-processing method to facilitate the feature extraction. Section 4 presents the results of the trained algorithm on the different datasets. Finally section 5 details the methods of optimisation and their impacts on the performance of the model on REP(MUS), a dataset completely ”unknown” to the model. 2. CHOICE OF THE CNN ARCHITECTURE A. THE CHALLENGES OF SOUND IDENTIFICATION Sound waves can travel up to thousands of kilometers in seawater. This means that a single point of measurement will contain not only local data but also data from remote regions. This is both a advantage and drawback since it allows to monitor a wide area with one equipment, but also leads to complex recordings where noises from multiple sources are overlapping. Cruz et al4 present indeed the three sources of noise emitted by ships: propeller, hull or machinery. Each noise source varies in frequency and amplitude depending on the vessel size, sailing speed and equipment. The large range and the overlap of the characteristic frequencies makes it difficult to match it to a specific feature in the spectrogram. In fact, the power spectrum and data from the Automatic Identification Systems (AIS) of ships are often used additionally to validate the identification from the sound recording and spectrogram. In addition, the amplitude of the signal plays an important role in the identification. A distant ship will emit sounds with very low amplitude. The signal is then mixed with many other sources of the ambient noise, leading to a spectrogram with no recognizable features. The distinction between a distant ship and ”background” noise is therefore often impossible, without looking at the AIS data or power spectrum. On the other hand, the presence of a nearby ship, can be so ”loud” that it completely masks the presence of other noise sources like dolphin vocalisation or clicks, which can otherwise be very clearly distinguished on the spectrogram, as presented in Figure 2. Such are the challenges to identify underwater activities that must be parameterised to train a ML model. B. STATE OF THE ART OF AUTOMATED NOISE IDENTIFICATION Numerous research have been conducted in the field of underwater acoustics to automatically identify ships and/or mammals.5–8 Those studies, albeit relatively old (more than 20 years ago) and using sometimes outdated methods, show the value of using the spectrogram and Power Density Spectrum (PDS) as input data to a neural network, where characteristic features can be automatically extracted. Indeed, the PSD shows the energy at each frequency, a key feature to differentiate ships. The PSD is however often calculated on the total signal, hence removing all the temporal information. This would therefore not be a useful approach for real time detection. The spectrogram, on the other hand, shows the emitted frequencies in time and the sound pressure level (SPL) at each time step and each frequency. This greatly faciliates the identification of dolphin clicks and whistles, as it can be seen in Figure 2. One advantage of the spectrogram comes from the fact that the data can be treated as an image, and powerful modern computer vision techniques, such as CNNs, can be used to detect features and classify the data. Some studies are considering above water sounds,9,10 such as children playing, dog barking or a siren using the UrbanSounds8k dataset. Other studies focus on underwater sounds11–13 such as whales and some man-made activities. In particular, Belghith13 classifies with an accuracy of 66.4% sounds from wind, rain, cetaceans (whitsles and clicks), fish, benthic invertebrate, vessel and sonar. This accuracy is quite low, and the work of the present paper proposes to go beyond it, by using a better performing model,14 applying it to underwater acoustics and verifying its generalisation. C. CHOSEN CNN AND HYPER-PARAMETERS In the current work, the combination of the mel spectrogram of the sound signal and its first and second derivatives are used as inputs a to simple CNN classifier. The CNN architecture is chosen as simple as possible to facilitate its adoption by non ML expert and following Salamon’s work14 and Corderia.15 The CNN, as presented in Figure 1, consists of three convolutional layers followed by two fully connected layers. The first convolutional layer uses 24 filters, a kernel size of (5,5), strides of (1,1), with padding and ReLu as activation function. The second layer use 48 filters and the same other parameters. The third layer is similar to the second but without padding. Between each convolutional layer, a max pooling layer is added. The output is then flattened, followed by two consecutive dropout and dense layers, the last one being activated by the Softmax function. The training is done using SGD as optimizer and categorical cross entropy as loss function. The CNN is built using Python Keras16 and TensorFlow17 libraries. Four classes are considered: BG for the background class, SV for Small Vessel, LV large Vessel and DL dolphin. Figure 1: Architecture of the CNN model 3. DATASETS AND PREPROCESSING A. DATASET DESCRIPTION ShipsEar1 is an open source dataset composed of vessel sounds and ambient noise recordings. Each recording is associated to a vessel type, that were grouped into two categories: small vessel for the fishing boats, trawlers, mussel boats, tugboats, dredgers, motorboats, pilot boats and sailboats, and large vessel for passenger ferries, ocean liners and Ro-Ro. The distinction between large and small vessel was made to match the DNVGL classification. A typical spectrogram of a large vessel sailing and a small vessel sailing are given in figure 2, with their most characteristic emission frequencies highlighted by the orange and green boxes. The dolphin recordings were downloaded from Discovery of Sound in the Sea website.2 This website was mainly developed by the University of Rhode Island and contains many recordings of marine mammals but also fish, invertebrate, and other natural or anthropogenic noises. Recordings with dolphin whistles of multiple races were picked to create the dataset used in this work. It should be noted that the recordings are particularly clear and do not contain other ambient noise. A typical dolphin spectogram is given in figure 2. The blue box highlights the frequency range were dolphins whistles (curved, thin lines) and clicks (vertical line) are observed. The REP(MUS) files were recorded in the peninsula of Setubal, in Portugal, during a NATO exercise in September 2021,3 where a resident population of bottlenose dolphin occurs. Three hydrophones were installed a few days before and during the exercise where many vessels were sailing. Three recordings of 10min, containing both ships and dolphins were selected and post-processed for the purpose of this work. Only a simplified label is available (vessel, dolphin whistle or background) due to the lack of complete AIS data to validate them. The ShipsEar and dolphin datasets were combined to form one dataset hereafter referred to as the combination dataset. Only 70% of the combination dataset was used during the training of the model, while the rest was saved for validation. Figure 2: Typical dolphin spectograms. Left) small vessel. Middle) large vessel. Right) dolphin The REP(MUS) dataset was used exclusively for validation, representing the ”unknown” dataset. The data split is represented in Figure 3. Figure 3: Split of the datasets for training and validation. Blue) ShipsEars. Green) Dolphins. Orange) REP(MUS). The dolphin class is significantly smaller than the other classes. The classic augmentation methods of pitch and time shifting were used consecutively to increase the size of the class. Examples of the outputs of such functions are given Figure 4. The time stretching coefficient was chosen randomly between -1.9 and 1.9, while the pitch coefficient was chosen randomly between -2 and 2. Figure 4: Example of pitch and time stretching. Left) Before augmentation. Right) After augmentation. B. DATA PRE-PROCESSING The data must be prepared so that features may be extracted by the CNN. To do so, multiple steps are followed. • Loading all data with the same sampling rate (sr), equal to the lowest rate of the datasets used for training (step A in Figure 5). Here sr = 52734 Hz. • Conversion of the recording to a spectogram using a Short Time Fourier Transform (STFT). A Hann window of 2048 length (Hl) and 1024 hop (Hh) is used. Given the sampling rate, the Hann window size corresponds to 39ms step B in Figure 5). • Conversion of the spectogram to the Mel scale with 60 bands. Each window is therefore an array of dimension: where D in the duration of the signal in seconds (step C and D in Figure 5). • Calculation of a first pseudo derivative using the following formula: where c represents the coefficient of matrix, i.e., the SPL, i represents the ith band of the mel spec- trogram and k represents the kth timeframe of the mel spectrogram (step E and F in Figure 5).The process is repeated by applying the same method to the first pseudo-derivative. • Combination of the mel spectogram and the two pseudo derivatives. The dimension of the data is now • Cutting the data into small windows of 1sec, which corresponds to the use time scale at which a marine mammal observer uses to detect dolphins. The dimension of the data is now (60,50,3) (Step G in Figure 5). This windowing process will be referred to as the time window , to differentiate it from the Hann window. • Scaling each window by subtracting the average SPL of the window and then dividing by its standard deviation (Step H in Figure 5). In total, 5 parameters can be tuned. An overview of the pre-processing steps are presented in Figure 5. Figure 5: Example of the pre-processing steps. A) Wave form. B) STFT. C) Spectogram. D) Mel Spectrogram. E) First derivative. F) Second derivative. G) First window of 1sec. H) First window scaled. 4. TRAINING RESULTS AND INITIAL VALIDATION A. VALIDATION WITH THE COMBINATION DATASET As presented in Figure 3. The model is first trained with the training set of the combination dataset, then validated with the validation set of the combination dataset and finally validated on the REP(MUS) dataset. The training set is split again in three folds to use a 3-fold cross validation. This method helps to avoid the bias due to the split with the test set within the training. The model is then trained three times, using each of the three test sets once. Table 1: Confusion matrix. A maximum of 50 epochs with an early stopping was set. The patience, i.e. the number of epochs without improvement before the training stops, was set to 5. As it can be seen in Figure 6, the 50 epochs were not reached. The best accuracy and loss were 0.98 and 0.2. The loss is still very high, meaning that the model is often ”uncertain” about its prediction. It should be noted that a longer training was tested, without early stopping, and the loss was not improved. The accuracy is however excellent, but accuracy alone cannot be considered as the main metric to judge the performance of the model, especially since the classes are unbalanced. Figure 6: Training history for each fold. A further analysis is therefore required and the best trained model is used to analyse the validation split of the Combination dataset. The confusion matrix and the precision, recall and F1-scores are given in Figure ??. As a reminder, the accuracy corresponds to the percentage of items identified as their true label, regardless of their class; the confusion matrix gives insights of how the model interprets each class and whether specific classes are prone to errors; the precision P presents, per class, the percentage of the number of true positive over the total number of predicted positive; the recall R presents, per class, the percentage of the number of true positive over the total number of actual positive, and the F1-score combines the two metrics in one. The results are encouraging: all metrics are above 0.95, and only the class SV is slightly incorrectly predicted. However, this could also be the sign that the validation set is too close to the training set, and that there is not enough variability in the training set, which would lead to a very poor generalisation to the REP(MUS) dataset. B. VALIDATION WITH THE REP(MUS) DATASET As described in Section 3A, only a simpler label is available for the REP(MUS) dataset (Vessel, Dolphin and Background). However, whenever possible, the validation with AIS data and PDS is used to further refine the label. The chosen recordings of REP(MUS) were put together as one single file to facilitate the visualisation of the results. Figure 7 presents, on the left, the classification of each window, showing the percentage of each class, and on the right, a comparison between the label and the classification with the highest probability. Three sections can be distinguished, corresponding to the three recordings. In the first section, large vessels were very identified with a high probability. Thanks to the log of the AIS data at the time of the recording, a cargo vessel was identified at the vicinity of the hydrophone, confirming the classification of the model. In the second section, a cargo vessel was identified as well but only at a further distance, according to the AIS data and the PDS, which might explain why the classification is less certain. In the third recording, the model identifies a large vessel as well. However, no AIS data was recorded at this time. Looking at the power spectrogram and listening to the recording, a potential fishing vessel was identified. It should be noted that with small ships without AIS, an identification is never completely certain. Looking at the dolphin classification, the second recording contains multiple small and separated whistles whereas the third one contains one distinct segment identified as a dolphin. The model is constantly over predicting the presence of dolphins, but often with a low probability, in competition with the distant cargo vessel. Finally, there are not enough points from the background data set to draw any conclusion for that class. Figure 7: Validation for the REP(MUS) dataset 5. OPTIMISATION OF THE MODEL The next section is dedicated to the optimisation of the model trained on the combination dataset to improve the classification on the REP(MUS) dataset, as an example of an ”unknown”, generalised and real life dataset. Three types of optimisation are proposed: on the training dataset, on the time decretisation during the pre-processing, and on the model hyperparameters. In each case, the resulting model was also validated with the validation set of the combination dataset, but the results are not presented here to focus on the generalisation performance. A. OPTIMISATION OF THE TRAINING DATASET A first step is to refine the dolphin label by removing the small window frames were no whistles can be heard or seen in the spectogram. Secondly, as explained in section 3, the training datasets are very clear, without background or ambient noise apart from the sounds specific to the class. Those recordings are therefore not realistic compared to a real life monitoring campaign such at REP(MUS). To facilitate the classification, each dolphin file was therefore merged with vessel files, creating a single file with both dolphin and ship sounds, as presented in Figure 8. Figure 8: Superposition of ships and dolphin recordings Figure 9: Classification with realistic pictures. The model was trained again from scratch and evaluated. This process was repeated as a second iteration to increase the variability in the dolphin class as well as its size. The results of both training and evaluation are given in Figure 9. It can be seen that training the model on realistic recordings significantly decreases the false positive for the dolphins. However, this comes at the cost of decreasing the precision for the vessel classification. The confusion between large and small vessel might come from the fact that dolphin recordings were merged almost exclusively with large vessel files. The common features of those files with and without the dolphin sounds must impact the classification. B. OPTIMISATION OF THE TIME DISCRETIZATION Time discretization is often the culprit when a model is not converging or giving accurate results. In the current case, three parameters can be tested: the Hann window length, the Hann window hop and the final window length. i. Influence of the Hann window length The Hann window was decreased to 20ms and 30ms. Generally speaking, the longer the window is, the more information are contained in one column of the final input data. A larger Hann window will therefore facilitate the distinction between the frequencies of a signal, at the cost of decreasing its precision for im- pulses. In terms of detection of dolphins, this means the high frequencies of the whistles should be more distinct and more easily detected with a longer window. Figure 10 presents the spectrogram of a dolphin recording with different Hann window size. As expected, the dolphin clicks (vertical lines highlighted in green) are more and more fuzzy. It can also be noted that all frequencies show a higher amplitude. Looking at the influence of the window on the classification, Figure 11 shows that the increase of the windows size strongly helps the differentiation between the ships and the dolphins. Figure 10: Influence of the Hann window size on spectrogram. Left) Hann window of 20ms. Middle) Hann window of 39ms. Right) Hann window of 39ms. Figure 11: Classification of each time window for various Hann window length. Left: 20ms, Middle: 30ms, Right: 39ms ii. Influence of the Hann window hop The hop of the windows was first decreased from half of the window size to a quarter and then increased so that there is no overlap. The higher the overlap is, the more points the spectogram has, and each point contains information that is also partially contained in the neighboring points. The influence of the hop on the classification is presneted in Figure 12. Both figures show that a compromise must be found between having enough points to capture the details of the signal and a overlap so big that the information is lost in the discretisation. Figure 12: Influence of the overlap of the Hann window on the classification. Left: 1/4, Middle: 1/2, Right: 0 iii. Influence of the time window The second window does not affect the data within the matrix but cuts the signal in shorter or longer steps. Too small steps will not allow the recognition of the characteristic marks of the dolphin whistles. For example, in Figure 13, the curvy line of the whistle is cut too early to enable a clear classification. However, too large steps might hide the dolphin whistle in the middle of other features and the dolphin would not be spotted by the model. In addition, this decreases the amount of samples for training. A compromise must therefore be found here as well. 0.8s, 1s and 1.2s were tested to frame the empirical value of 1s. As presented in Figure 14, 1 sec seems to provide the best results for the classification of REP(MUS). Figure 13: Influence of the window splits on the spectrogram Figure 14: Influence of the window splits on the classification. Left: 0.8s, Middle: 1s, Right: 1.2s C. OPTIMISATION OF THE HYPERPARAMETERS Finally, the hyperparameters of the CNN are tuned to improve the model’s performance. The library bayes opt is used to set up a bayesien optimisation.18 The main three parameters are optimised: the dropout rate, which correspond to the percentage of hidden neurons during training to reduce the risk of over-fitting; the learning rate, which defines how much is changed in the model with the estimated error when the model weights are updated; and the kernel regularizer, which reduces the risk of over-fitting by added penalties to a layer’s weights. The optimisation process consist of training the model numerous times while varying the hyperparameters within provided bounds. Each time, an objective scoring function is calculated, and the hyperparameters are chosen so that the output of the scoring function, the fitness, is reduced. The Bayesian approach actually uses the past evaluation to form a probabilistic model that maps the hyperparameters to be tuned, which allows them to be chosen in an informed way. The optimisation was first run with the loss of the test dataset as a scoring function. After 16 loops of optimisation, the best parameters were found (Dropout = 0.12, Kernel regularizer = 0.00066, learning rate = 0.0044), and the model was trained again. As expected, the loss is significantly decreased, as can be seen in Figure 15: the final loss was approximately 0.06. However, this helped in no way the classification of the REP(MUS) dataset, where only vessels were identified. To push the optimisation towards the classification of the REP(MUS) dataset, a new scoring function was used, using this time the loss of the evaluation of the REP(MUS) dataset. It should be noted that REP(MUS) was not used for training, but only to provide a value of the scoring function. After finding the new best hyper-parameters, the model was trained again on the combination dataset. The loss during training was similar to the previous set up, but the results were slightly better since dolphins were detected. The model architecture was therefore optimised and performed very well on the dataset that is close to the training dataset, but this did not help on the generalisation of the model and its application to real life data. A training set that include actual monitoring data of multiple sites would most likely significantly improve the results and the generalisability of the model. Figure 15: Decreased loss with bayesien optimization 6. CONCLUSION A small simple CNN was trained to detect ships and dolphins. The model was validated with two datasets. One was very close to the training dataset, the Combination dataset, and corresponded to very clear signals. The other dataset, REP(MUS), corresponded to real life monitoring data. The model performed very well on the Combination dataset, despite a high loss. A Bayesian optimisation of the model hyperparameters was able to significantly decrease the loss. However, this did not help with the performance of the model on the REP(MUS) dataset. Instead, pre-processing the training data to transform the recordings into more realistic files enabled a better performance of the model on RepMus. More data augmentation techniques should be used to further improve the performance, such as frequency masking, noise addition, and merging of the dolphin recordings with more varied ship files. Training the model with a dataset that is already more realistic, coming from various monitoring sites, would be an alternative to the data enhancement, but would require significant effort of monitoring and labeling. Finally, it should be noted that time discretisation has a significant impact on the classification. The Hann length, Hann hop and time window size are parameters should be always chosen with care. To conclude, the use of a simple CNN seems to be a very efficient and easy to set way to monitor one site, but might not be suitable for generalisation to multiple sites. 7. ACKNOWLEDGMENT The author would like to thank the Instituto Hidrogr´afico for providing access to the REP(MUS) dataset. REFERENCES David Santos-Dom´ınguez, Soledad Torres-Guijarro, Antonio Cardenal-L´opez, and Antonio Pena- Gimenez. Shipsear: An underwater vessel noise database. Applied Acoustics, 113:64–69, 2016. University of Rhode Island. Discover of sounds in the seas. https://dosits.org/. howpublished = https://www.act.nato.int/articles/rep-mus NATO, title = Robotic Experimentation Prototyping - Maritime Unmanned Systems Exercise. E. Cruz, T. Lloyd, J. Bosschers, F.H. Lafeber, P. Vinagre, and G. Vaz. Study on inventory of existing policy, research and impacts of continuous underwater noise in europe. 2021. Jianguo Huang, Jianping Zhao, and Yiqing Xie. Source classification using pole method of ar model. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages 567–570. IEEE, 1997. Richard Campbell Bennett et al. Classification of Underwater Signals Using a BACK-Propagation Neural Network. PhD thesis, Naval Postgraduate School, 1997. BP Howell and S Wood. Passive sonar recognition and analysis using hybrid neural networks. In Oceans 2003. Celebrating the Past... Teaming Toward the Future (IEEE Cat. No. 03CH37492) , volume 4, pages 1917–1924. IEEE, 2003. Chunyu Kang, Xinhua Zhang, Anqing Zhang, and Hongwen Lin. Underwater acoustic targets classification using welch spectrum estimation and neural networks. In International Symposium on Neural Networks, pages 930–935. Springer, 2004. Justin Salamon, Christopher Jacoby, and Juan Pablo Bello. A dataset and taxonomy for urban sound research. MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia , (1):1041–1044, 2014. Jon Nordby. Environmental sound classification on microcontrollers using convolutional neural networks. Master’s thesis, Norwegian University of Life Sciences, 5 2019. Heriberto A. Garcia, Trenton Couture, Amit Galor, Jessica M. Topple, Wei Huang, Devesh Tiwari, and Purnima Ratilal. Comparing performances of five distinct automatic classifiers for fin whale vocalizations in beamformed spectrograms of coherent hydrophone array. Remote Sensing, 12(2):1–25, 2020. Matt Harvey et al. Acoustic detection of humpback whales using a convolutional neural network. Google AI Blog, 2018. Emna Hachicha Belghith, Francois Rioult, and Medjber Bouzidi. Acoustic diversity classifier for automated marine big data analysis. Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI , 2018-Novem:130–136, 2018. Justin Salamon and Juan Pablo Bello. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 24(3):279–283, 2017. A Cordeira and G Vaz. Marine acoustic signature recognition using convolutional neural networks. In Submitted to Journal of OMAE) , 2021. Franc¸ois Chollet et al. Keras. https://keras.io, 2015. Mart´ın Abadi, Ashish Agarwal, and all. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org. J. Snoek, H. Larochelle, and R. Adams. Practical bayesian optimization of machine learning algorithms. 2012. Previous Paper 6 of 27 Next