Welcome to the new IOA website! Please reset your password to access your account.

Towards wayside wheel flat detection and classification based on psychoacoustic quantities

Jonas Egeler 1

Möhler + Partner Ingenieure AG Prinzstraße 49, 86153 Augsburg, Germany

Christine Huth 2

Möhler + Partner Ingenieure AG Prinzstraße 49, 86153 Augsburg, Germany

ABSTRACT We present the results of psychoacoustic analyses performed on a dataset comprising audio recordings of 250 cargo train passings. Regions with perceptible flat spots are labelled. Common psychoacoustic quantities are calculated for signal frames of both classes “flat spot” and “no flat spot” and statistical analyses are performed to find quantities which qualify best for a separation of both classes. The e ff ects of frame size and sample rate are examined. Based on the results of listening tests, an acoustic classification criterion for the annoyance of wheel flats is proposed.

1. INTRODUCTION

Somewhat similar to the acoustic problems that were uncovered with the replacement of combustion engines with electric motors in cars, the retrofitting of all DB Cargo freight wagons with low-noise brakes caused other disturbing noise emissions to become more prominent. Among these, the periodic beating sounds originating from flattened wheel treads attract most attention. As there is no su ffi ciently validated solution for wayside wheel flat detection which meets the requirements of focus on acoustic relevance, precise detection and economic feasibility, the Umweltbundesamt (German Federal Environmental Agency) has initiated a research project to determine an acoustic maintenance criterion for flat spots [8]. The results of the psychoacoustic analyses that were performed as part of this project will be presented in this contribution. At lower speeds, wheel out-of-roundness is noticeable acoustically in form of periodic hits on the rail which increase in intensity as the train approaches and decrease in intensity as the train moves away from the listener. At higher speeds, the beating sounds merge into an amplitude modulated noise. The goal was to find psychoacoustic quantities that distinctly characterize the sound of wheel flats for the application in a detection system. Audio recordings of 250 cargo train passings were analysed. Regions with perceptible flat spots were

1 jonas.egeler@mopa.de

2 christine.huth@mopa.de

a slaty. inter.noise 21-24 AUGUST SCOTTISH EVENT CAMPUS O ¥, ? GLASGOW

labelled by trained ears. Common psychoacoustic quantities were calculated for signal windows of both classes “flat spot” and “no flat spot” and statistical analyses were performed to find quantities which qualify best for a separation of both classes. To find optimal design parameters for a future detection system, the e ff ects of frame size and sample rate were examined. In addition to detection, the acoustic rating of wheel flats was another focus of the study. Based on the results of listening tests, an acoustic classification criterion for the annoyance of wheel flats is proposed.

2. RELATED RESEARCH

Numerous studies have investigated methods for wayside wheel flat detection. Most of these methods are based on data from sensors measuring the rail dynamics, including accelerometers, strain gauges and fibre-optic sensors [1]. In many cases, multiple sensors are combined into arrays, to capture a complete wheel rotation [2, 3]. Various signal analysis approaches have been proposed in the literature: peak detection [2], wavelet coe ffi cient analysis [3], spectral kurtosis analysis [4], envelope spectrum analysis [5] or cepstrum analysis [6]. Most of the studies mentioned above focus on condition monitoring and safety applications. Acoustic emissions of wheel flats have received comparably little attention. However, the fact that humans can hear flat spots, suggests an approach based on acoustic signals. Dernbach et al. investigated a machine learning approach to flat spot detection based on di ff erent feature representations (raw audio, log-mel-spectrograms and MFCC) and di ff erent classifiers (SVM, CNN and U-Net) [7]. They report a maximum model performance of F 1 = . 78 for with signal features calculated from audio frames of 5 s length, sampled at 8192 Hz . In this contribution, we want to further investigate methods to identify flat spots in acoustic signals, with a special focus on limiting factors of the detection system like limited sample rate or required time precision (which is limited by the window size for spectral calculations). To obtain a complete image about the acoustic characteristics of wheel flats, six acoustic quantities are compared in their ability to discriminate signal regions with wheel flats from regions without wheel flats in audio recordings of train passings. As the goal is to detect audible flat spots, special attention will be given to psychoacoustic signal features, which model human perception. The e ff ects of window size, which is essential for the localisation precision and sample rate, which determines the amount of frequency information in the signal are examined as well as the e ff ects of speed normalization as it is proposed in [7].

3. DATA

The study is based on measurements which were performed at km 53 . 6 of the railway track 5510 “Munich – Rosenheim”. Between 13 January and 9 February 2020, data from two acceleration sensors (one for each rail) and two microphones (7 . 5 m and 25 m distance from track) was recorded. All signals were recorded at 51200 Hz . Additional data, e. g. wagon specific speeds, was provided by a Wheel-Monitoring-System (WMS) of MBBM Rail Technologies. During the measurement period, 2955 train passings were recorded, comprising 1130 cargo trains and 1825 passenger trains. For the analysis, N = 250 audio recordings of cargo train passings of the closer microphone (7 . 5 m ) were randomly picked. In Table 1, relevant statistical characteristics of the dataset are summarized. To obtain a ground truth about the prevalence of audible wheel flats, the audio recordings were labelled in a controlled listening environment (original level − 10 dB ). Regions, where the characteristic sound of a wheel flat was perceptible were marked.

Table 1: Statistical properties of the dataset

Property Values

Train speed v v max = 150 km

h , v min = 59 km

h , v mean = 110 km

h , σ v = 20 km

h Train length l l max = 664 m , l min = 15 m , l mean = 375 m , σ l = 167 m

Labels N labels = 363, t labeled = 318 s (7 . 7 % of total), t mean = 0 . 9 s , σ label = 0 . 3 s

4. FEATURE CALCULATION

For feature calculation, the raw signals of the train passings are first cut to the geometric passing duration via the axle times recorded by the WMS. Then, the signals are subsampled to f s , n = 40960 Hz or 8192 Hz . Passing speed normalisation is optionally performed by resampling the signals to a new sampling rate:

 f s , n · v n

 , (1)

f ′ s , n =

N P

1 N

n = 1 v n

while keeping the playback rate at f s , n , which is essentially a pitch scaling operation. After the speed normalisation all train passings happen at a virtual passing speed, which is choosen to be the mean passing speed of all passings. An optional amplitude normalisation is performed by scaling the raw signals p m to have equal mean e ff ective sound pressures:

s

N P

M P

1 N

1 M

m = 1 p 2 n , m s

n = 1

p ′ m = p m ·

. (2)

M P

1 M

m = 1 p 2 n , m

Each passing is then segmented into overlapping frames of 0 . 5 s , 1 s or 2 s . Overlap is always 50% of the frame length. The signal frames are finally multiplied with a hanning window function in order to reduce spectral leakage. If the middle 50% of a frame overlap at least 50% with an annotated flat spot region the frame gets the label “flat spot” otherwise it is labelled “no flat spot”. The remaining 50% of the frame are disregarded for labelling as they are covered by the adjacend frames. After these pre-processing steps, the acoustic features are calculated for each time frame. The various calculated quantities, and the corresponding equations and algorithms are summarized in Table 2. It has to be noted that, contrary to the behaviour of the human auditory system, which is most sensitive to fluctuations around 4 Hz , the fluctuation algorithm was tuned to be most sensitive to modulation frequencies of:

N P

n = 1 v n

N · π · d wheel ≈ 11 Hz , (3)

f mod =

which corresponds to a frequency of one hit per wheel rotation at the mean passing speed.

Table 2: Features calculated for each time frame

Feature Explanation Algorithm / Equation

L A f , max Maximum sound presssure level A-weighting, exponential time weighting filter, level calculation

Crest factor Ratio of maximum sound pressure level to equivalent sound pressure level

L Af , max

L Aeq

L A f , 2 kHz , max Maximum sound pressure level @ 2 kHz octave band A-weighting, exponential time weighting filter, octave band f i lter, level calculation N max Maximum loudness Implementation in MATLAB: ISO 532-1:2017(E)

S max Maximum sharpness Implementation in MATLAB: DIN 45692:2009

R max Maximum roughness Implementation in MATLAB: Zwicker, Fastl, Psychoacoustics: Facts and Models

F max Maximum fluctuation, dominant modulation frequency tuned to one hit per wheel rotation

Implementation in MATLAB: Zwicker, Fastl, Psychoacoustics: Facts and Models

5. FEATURE SELECTION

To find the features that are best suited to distinguish signal regions labeled with “flat spot” from regions labeled “no flat spot”, statistical analyses have been performed. The Shapiro-Wilk test was used to test the features for normal distribution. For all features, the H 0 that the features are normally distributed could be rejected at a significance level < . 01. Levene’s test was used to test the features inhomogenity of variances. For all features, the H 0 that the feature classes have equal variances could be rejected at a significance level < . 01. Based on these results, two performance metrics were choosen: Welch’s t-test (unequal variances t-test) complemented by Cohen’s e ff ect strength d , which tests the hypothesis that the two feature classes have equal means and the area under the ROC curve ( AUC ), which is a popular quality metric for binary classification problems. E ff ect strength was calculated by: d = x 1 − x 2 q

. (4)

σ 2 1 + σ 2 2 2

AUC ′ , a rescaled version of the standard AUC , was calculated by integrating the area under the ROC curve which plots the true-positive-rate ( TPR ) against the false-positive-rate for di ff erent decision treshold values:

Z 1

(5)

AUC ′ = 2

0 TPR ( FPR ) dFPR − 0 . 5

In Table 3, the results of the statistical analysis are shown. In Figure 1, the results of the AUC ′

calculation are visualised. The sharpness and roughness features both show a very poor discriminative performance, it has to be noted however, that the analysis was performed with speed normalisation turned on, which was found to negatively impact the AUC performance of sharpness (see section 6). The level based features all exhibit a medium discriminative performance, while fluctuation is best suited to discriminate the labeled and unlabeled signal regions. This result can be interpreted in such a way that flat spots usually cause a notable peak in the pass-by sound pressure level, but of course there are other events and system parameters that might cause a similar behaviour of the sound pressure level. Fluctuation strength is a psychoacoustic quantity that is intended to capture signal modulations of 20 Hz or less and is up to this task, especially when the speed normalisation is active and the algorithm is tuned to be most sensitive to the expected modulation frequency of one hit per wheel rotation as described in section 4.

Table 3: Feature selection via statistical analyses. Parameter configuration: sample rate: 40960 Hz , frame size 1 s , speed normalisation: true , amplitude normalisation: false .

Feature t − statistic p − value d AUC ′

L Af , max 26.933 < 0.01 0.964 0.508

Crest factor 22.431 < 0.01 0.839 0.418

L Af , 2 kHz , max 25.241 < 0.01 0.887 0.469

N max 23.971 < 0.01 1.048 0.556

S max 2.132 0.033 0.084 0.057

R max 1.613 < 0.01 0.079 0.128

F max 36.617 < 0.01 2.020 0.895

Figure 1: AUC comparison. Parameter configuration: sample rate: 40960 Hz , frame size 1 s , speed normalisation: true , amplitude normalisation: false .

6. DESIGN PARAMETERS

We identified two main limiting factors which might have an impact on the detection performance: signal sample rate and window size. The sample rate constrains the upper limit of the signals frequency content due to the Nyquist-Theorem: f max < f s 2 . In detection systems with constrained processing power, the sample rate might be an issue, therefore we examine the e ff ect of a reduced sample rate on classification performance. Regarding the frame size, there is a tradeo ff between locatability in the time domain and locatability in the frequency domain: ∆ t · ∆ f = 1. Larger frame sizes correspond to a better resolution in the frequency domain at the cost of bad localistation in the time domain, while small frame sizes allow for good temporal localisation at the cost of bad resolution in the frequency domain. To find the sweet spot, we examine the e ff ect of di ff erent frame sizes on the classification performance. Finally, we test if speed normalisation and amplitude normalisation have positive e ff ects on the classification performance. All tests were performed using the fluctuation strength, which was found to generally have the best discrimination performance. AUC was used as performance metric.

The results are shown in Figure 2. It can be seen that amplitude normalisation has almost no e ff ect on the AUC ′ . The same holds for the reduction of the sample rate. Shortening the window size from 1 s to 0 . 5 s has a slight negative impact on the AUC ′ for all features. Changing the window size from 1 s to 2 s has no significant impact, except for a considerable performance drop of F max (see figure 2g). Speed normalisation boosts the discriminative performance of the F max feature, while it drastically reduces the performance of the S max feature (see figure 2e). On the remaining features, speed normalisation has no significant e ff ect. Taking these observations into account, a detector configuration with sample rate 8192 Hz , frame size 1 s , speed normalisation and no amplitude normalisation seems to be a good starting point. Speed normalisation is computationally expensive and might be dropped for economic reasons.

7. LISTENING TEST

The signals for the listening test were created by picking a subjectively balanced subset of 15 flat spot events with di ff erent intensities. The events were all cut to 5 s length, with the flat spot event centered. The listening test was performed with 20 test persons in the listening booth of Möhler + Partner Ingenieure AG in Augsburg. The participants were asked to rank the 15 flat spot events regarding their annoyance. The listening test resulted in 20 rankings of the sounds. For further analysis, the linear correlation between the median value of the rank and di ff erent psychoacoustic quantities was calculated. To check, whether the combination of multiple quantities improves the explanation of variance, a multilinear regresssion was performed. The results are shown in Table 4.

Table 4: Results of listening test [8]

Feature R 2 (linear regression) R 2 (multilinear regression)

N max 0.76 -

S max 0.67 -

F max 0.46 -

N max , S max - 0.80

N max , F max - 0.89

N max , S max , F max - 0.91

In congruence with psychoacoustic models [9], loudness shows the strongest correlation with percieved annoyance. When loudness, sharpness and fluctuation strength are considered together, the explanation of variance can be improved to more than 90 %.

8. CONCLUSION

In this study, various acoustic and psychoacoustic quantities have been examined regarding their capability to detect regions with flat spots in audio recordings of train passings. It was found that the discriminative performance of the maxium sound pressure level alone is rather poor. A well performing detection system has to be based on features that capture the fluctuating nature of the flat spot sound. We propose the psychoacoustic fluctuation strength as one suitable feature that captures this characteristic. Once a flat spot is detected, its acoustic annoyance can be modelled very well by a linar combination of the psychoacoustic quantities loudness, fluctuation strength and sharpness. Future work will be done with a focus on more refined features that capture the amplitude modulations

caused by flat spots, e.g. the envelope spectrum, suitable machine learning models, uncertainties in labelling, the use of soft labels and axle specific windowing to allow for precise locatability.

ACKNOWLEDGEMENTS

Möhler + Partner Ingenieure AG acknowledges support by the German Federal Environmental Agency (UBA) in the project Messung von Flachstellen und Ermittlung eines akustischen Instandhaltungskriteriums .

REFERENCES

[1] G. Kouroussis, C. Caucheteur, G. Alexandrou, O. Verlinden, and V. Moeyaert. Review of trackside monitoring solutions: From strain gages to optical fibre sensors. Sensors , 2015(15):20115–20139, 2015. [2] C. Zhou, L. Gao, H. Xiao, and B. Hou. Railway wheel flat recognition and precise positioning method based on multisensor arrays. Applied Science , 10(4):1297, 2020. [3] G. Krummenacher, C. Soon Ong, S. Koller, S. Kobayashi, and J. Buhmann. Wheel defect detection with machine learning. IEEE Transactions on Intelligent Transportation System , 19(4):1176–1729, 2020. [4] A. Mosleh, P. Montenegro, P. Costa, and R. Calçada. Railway vehicle wheel flat detection with multiple records using spectral kurtosis analysis. Applied Science , 11(9):4002, 2021. [5] A. Mosleh, P. Montenegro, P. Costa, and R. Calçada. An approach for wheel flat detection of railway train wheels using envelope spectrum analysis. Structure and Infrastructure Engineering , 17(12):1710–1729, 2020. [6] A. Bracciali and G. Cascini. Detection of corrugation and wheelflats of railway wheels using energy and cepstrum analysis of rail acceleration. Sage , 211(2):109–116, 1997. [7] G. Dernbach, A. Lykartis, L. Sievers, and S. Weinzierl. Acoustic identification of flat spots on wheels using di ff erent machine learning techniques. In Fortschritte der Akustik - DAGA 2020: 46. Deutsche Jahrestagung für Akustik , pages 367–370, Berlin, 2020. Deutsche Gesellschaft für Akustik e.V. [8] C. Huth, M. Forstreuter, M. Liepert, and R. Artl. Abschlussbericht des forschungsvorhabens „messung von flachstellen und ermittlung eines akustischen instandhaltungskriteriums. Technical report, Umweltbundesamt, In Preparation. [9] U. Widmann. Ein Modell der psychoakustischen. Lästigkeit von Schallen und seine Anwendung in der Praxis der Lärmbeurteilung . PhD thesis, TU München, 1992.

(a) L Af , max (b) Crest factor

(c) L Af , 2 kHz , max (d) N max

(e) S max (f) R max

(g) F max

Figure 2: Main e ff ects plots for all variables and di ff erent parameter configurations: sample rate | frame size | speed normalisation | amplitude normalisation