Разработка сверточной нейронной сети для классификации амплитудно-частотных характеристик аудиосигналов:

V.N.  Popov; Павел Сергеевич Ладыгин; V.V.  Karev; Ya.I.  Bortsova

doi:10.14258/izvasu(2022)1-19

Development of Convolutional Neural Network for Classification of Amplitude-Frequency Characteristics of Audio Signals

УДК 534.87

V.N. Popov Altai State University (Barnaul, Russia) Email: oskage.work@gmail.com
Павел Сергеевич Ладыгин Altai State University (Barnaul, Russia) Email: pavelladygin@yandex.ru
V.V. Karev Altai State University (Barnaul, Russia) Email: krv.valentin@gmail.com
Ya.I. Bortsova Altai State University (Barnaul, Russia) Email: server2791@mail.ru

https://doi.org/10.14258/izvasu(2022)1-19

Keywords: convolutional neural network, detection error tradeoff curve, classification, cosine similarity, predictive model

Abstract

This paper studies the application of deep convolutional neural networks for the processing of audio files, particularly for classifying amplitude-frequency characteristics of audio signals. The mapping of audio fragments to each other is reduced to verifying objects by their representation. A large representative sample of audio signals was collected and supplemented with a satisfying Free Music Archive dataset to produce a dataset for training a deep convolutional neural network. The CQT-Net architecture is taken as a predictive model with cosine similarity being used to compare feature vectors. Four types of augmentation, including Gaussian noise, reverberation, change in pitch frequency, and change in tempo of the audio signal, are used to prevent retraining of the predictive model. The verification quality of the predictive model is tested on two separate datasets consisting of 1500 audio recordings excluded from the training dataset. Detection error tradeoff curves are plotted for all datasets, including testing ones with a changed pace and with a changed "pitch." Equal Error Rate is used as a model quality metric. The probability of identification of commonly used distortions of audio signals in the amplitude-frequency domain is evaluated to be higher than 92%. It signifies the reliability of the developed model.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biographies

V.N. Popov, Altai State University (Barnaul, Russia)

студент Института цифровых технологий, электроники и физики

Павел Сергеевич Ладыгин, Altai State University (Barnaul, Russia)

старший преподаватель кафедры информационной безопасности

V.V. Karev, Altai State University (Barnaul, Russia)

студент Института цифровых технологий, электроники и физики

Ya.I. Bortsova, Altai State University (Barnaul, Russia)

старший преподаватель кафедры информационной безопасности

References

Бринк Х., Ричардс Дж., Феверолф М. Машинное обучение. СПб., 2017.

Furui, S., Rosenberg, A.E. Speaker Verification. Digital Signal Processing Handbook. CRC Press LLC, 1999.

Bimbot F. et al. A Tutorial on Text-Independent Speaker Verification. EURASIP Journal on Advances in Signal Processing. 2004. № 4.

Kim J.W Salamon J. Li P. Crepe: A Convolutional Representation for Pitch Estimation. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2018. URL: https://arxiv.org/ pdf/1802.06182.pdf (дата обращения: 13.12.2021).

Bock S. Krebs F., Widmer G. Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters ISMIR. 2015. URL: http://www.cp.jku.at/research/ papers/Boeck_etal_ISMIR. (дата обращения: 13.12.2021).

Li Z., Yang W., Peng Sh., Liu F. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. Hohai University, Nanjing, China. 2020. URL: https://arxiv. org/ftp/arxiv/papers/2004/2004.02806.pdf (дата обращения: 13.12.2021).

Попов В.Н., Ладыгин П.С., Борцова Я.И., Карев В.В. Подготовка набора данных для обучения нейронной сети, используемой в задачах сравнения аудиофайлов // Проблемы правовой и технической защиты информации. Барнаул, 2021. Вып. IX.

Defferrard M., Benzi K., Vandergheynst P., Bresson X. FMA: A Dataset For Music Analysis. 18th International Society for Music Information Retrieval Conference, Suzhou, China. 2017. URL: https://arxiv.org/pdf/1612.01840.pdf (дата обращения: 13.12.2021).

Yu Zh., Xu X., Chen X., Yang D. Learning a Representation for Cover Song Identification Using Convolutional Neural Network. 2019. URL: http:// https://arxiv.org/abs/1911.00334 (дата обращения: 13.12.2021).

McFee B., Raffel C., Liang D., PW Ellis D., McVicar M., Battenberg E., and Nieto O. Librosa: Audio and music signal analysis in python. Proc. of the 14th python in science conf. 2015. URL: http://conference.scipy.org/proceedings/scipy2015/ pdf. (дата обращения: 13.12.2021).

Goodfellow I., Bengio Y., Courville A. Deep learning: The MIT Press, 2016.

pdf (Русский)

Published

2022-03-18

How to Cite

Popov V., Ладыгин П. С., Karev V., Bortsova Y. Development of Convolutional Neural Network for Classification of Amplitude-Frequency Characteristics of Audio Signals // Izvestiya of Altai State University, 2022, № 1(123). P. 116-120 DOI: 10.14258/izvasu(2022)1-19. URL: http://izvestiya.asu.ru/article/view/%282022%291-19.

Download Citation

Issue

No 1(123) (2022): Известия Алтайского государственного университета

Section

Математика и механика

This work is licensed under a Creative Commons Attribution 4.0 International License.

Izvestiya of Altai State University is a golden publisher, as we allow self-archiving, but most importantly we are fully transparent about your rights.

Authors may present and discuss their findings ahead of publication: at biological or scientific conferences, on preprint servers, in public databases, and in blogs, wikis, tweets, and other informal communication channels.

Izvestiya of Altai State University allows authors to deposit manuscripts (currently under review or those for intended submission to Izvestiya of Altai State University) in non-commercial, pre-print servers such as ArXiv.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).