Development of Convolutional Neural Network for Classification of Amplitude-Frequency Characteristics of Audio Signals

УДК 534.87

  • V.N. Popov Altai State University (Barnaul, Russia) Email: oskage.work@gmail.com
  • Павел Сергеевич Ладыгин Altai State University (Barnaul, Russia) Email: pavelladygin@yandex.ru
  • V.V. Karev Altai State University (Barnaul, Russia) Email: krv.valentin@gmail.com
  • Ya.I. Bortsova Altai State University (Barnaul, Russia) Email: server2791@mail.ru
Keywords: convolutional neural network, detection error tradeoff curve, classification, cosine similarity, predictive model

Abstract

This paper studies the application of deep convolutional neural networks for the processing of audio files, particularly for classifying amplitude-frequency characteristics of audio signals. The mapping of audio fragments to each other is reduced to verifying objects by their representation. A large representative sample of audio signals was collected and supplemented with a satisfying Free Music Archive dataset to produce a dataset for training a deep convolutional neural network. The CQT-Net architecture is taken as a predictive model with cosine similarity being used to compare feature vectors. Four types of augmentation, including Gaussian noise, reverberation, change in pitch frequency, and change in tempo of the audio signal, are used to prevent retraining of the predictive model. The verification quality of the predictive model is tested on two separate datasets consisting of 1500 audio recordings excluded from the training dataset. Detection error tradeoff curves are plotted for all datasets, including testing ones with a changed pace and with a changed "pitch." Equal Error Rate is used as a model quality metric. The probability of identification of commonly used distortions of audio signals in the amplitude-frequency domain is evaluated to be higher than 92%. It signifies the reliability of the developed model.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biographies

V.N. Popov, Altai State University (Barnaul, Russia)

студент Института цифровых технологий, электроники и физики

Павел Сергеевич Ладыгин, Altai State University (Barnaul, Russia)

старший преподаватель кафедры информационной безопасности

V.V. Karev, Altai State University (Barnaul, Russia)

студент Института цифровых технологий, электроники и физики

Ya.I. Bortsova, Altai State University (Barnaul, Russia)

старший преподаватель кафедры информационной безопасности

References

Бринк Х., Ричардс Дж., Феверолф М. Машинное обучение. СПб., 2017.

Furui, S., Rosenberg, A.E. Speaker Verification. Digital Signal Processing Handbook. CRC Press LLC, 1999.

Bimbot F. et al. A Tutorial on Text-Independent Speaker Verification. EURASIP Journal on Advances in Signal Processing. 2004. № 4.

Kim J.W Salamon J. Li P. Crepe: A Convolutional Representation for Pitch Estimation. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2018. URL: https://arxiv.org/ pdf/1802.06182.pdf (дата обращения: 13.12.2021).

Bock S. Krebs F., Widmer G. Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters ISMIR. 2015. URL: http://www.cp.jku.at/research/ papers/Boeck_etal_ISMIR. (дата обращения: 13.12.2021).

Li Z., Yang W., Peng Sh., Liu F. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. Hohai University, Nanjing, China. 2020. URL: https://arxiv. org/ftp/arxiv/papers/2004/2004.02806.pdf (дата обращения: 13.12.2021).

Попов В.Н., Ладыгин П.С., Борцова Я.И., Карев В.В. Подготовка набора данных для обучения нейронной сети, используемой в задачах сравнения аудиофайлов // Проблемы правовой и технической защиты информации. Барнаул, 2021. Вып. IX.

Defferrard M., Benzi K., Vandergheynst P., Bresson X. FMA: A Dataset For Music Analysis. 18th International Society for Music Information Retrieval Conference, Suzhou, China. 2017. URL: https://arxiv.org/pdf/1612.01840.pdf (дата обращения: 13.12.2021).

Yu Zh., Xu X., Chen X., Yang D. Learning a Representation for Cover Song Identification Using Convolutional Neural Network. 2019. URL: http:// https://arxiv.org/abs/1911.00334 (дата обращения: 13.12.2021).

McFee B., Raffel C., Liang D., PW Ellis D., McVicar M., Battenberg E., and Nieto O. Librosa: Audio and music signal analysis in python. Proc. of the 14th python in science conf. 2015. URL: http://conference.scipy.org/proceedings/scipy2015/ pdf. (дата обращения: 13.12.2021).

Goodfellow I., Bengio Y., Courville A. Deep learning: The MIT Press, 2016.

Published
2022-03-18
How to Cite
Popov V., Ладыгин П. С., Karev V., Bortsova Y. Development of Convolutional Neural Network for Classification of Amplitude-Frequency Characteristics of Audio Signals // Izvestiya of Altai State University, 2022, № 1(123). P. 116-120 DOI: 10.14258/izvasu(2022)1-19. URL: http://izvestiya.asu.ru/article/view/%282022%291-19.