Development of Convolutional Neural Network for Classification of Amplitude-Frequency Characteristics of Audio Signals
УДК 534.87
Abstract
This paper studies the application of deep convolutional neural networks for the processing of audio files, particularly for classifying amplitude-frequency characteristics of audio signals. The mapping of audio fragments to each other is reduced to verifying objects by their representation. A large representative sample of audio signals was collected and supplemented with a satisfying Free Music Archive dataset to produce a dataset for training a deep convolutional neural network. The CQT-Net architecture is taken as a predictive model with cosine similarity being used to compare feature vectors. Four types of augmentation, including Gaussian noise, reverberation, change in pitch frequency, and change in tempo of the audio signal, are used to prevent retraining of the predictive model. The verification quality of the predictive model is tested on two separate datasets consisting of 1500 audio recordings excluded from the training dataset. Detection error tradeoff curves are plotted for all datasets, including testing ones with a changed pace and with a changed "pitch." Equal Error Rate is used as a model quality metric. The probability of identification of commonly used distortions of audio signals in the amplitude-frequency domain is evaluated to be higher than 92%. It signifies the reliability of the developed model.
Downloads
Metrics
References
Бринк Х., Ричардс Дж., Феверолф М. Машинное обучение. СПб., 2017.
Furui, S., Rosenberg, A.E. Speaker Verification. Digital Signal Processing Handbook. CRC Press LLC, 1999.
Bimbot F. et al. A Tutorial on Text-Independent Speaker Verification. EURASIP Journal on Advances in Signal Processing. 2004. № 4.
Kim J.W Salamon J. Li P. Crepe: A Convolutional Representation for Pitch Estimation. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2018. URL: https://arxiv.org/ pdf/1802.06182.pdf (дата обращения: 13.12.2021).
Bock S. Krebs F., Widmer G. Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters ISMIR. 2015. URL: http://www.cp.jku.at/research/ papers/Boeck_etal_ISMIR. (дата обращения: 13.12.2021).
Li Z., Yang W., Peng Sh., Liu F. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. Hohai University, Nanjing, China. 2020. URL: https://arxiv. org/ftp/arxiv/papers/2004/2004.02806.pdf (дата обращения: 13.12.2021).
Попов В.Н., Ладыгин П.С., Борцова Я.И., Карев В.В. Подготовка набора данных для обучения нейронной сети, используемой в задачах сравнения аудиофайлов // Проблемы правовой и технической защиты информации. Барнаул, 2021. Вып. IX.
Defferrard M., Benzi K., Vandergheynst P., Bresson X. FMA: A Dataset For Music Analysis. 18th International Society for Music Information Retrieval Conference, Suzhou, China. 2017. URL: https://arxiv.org/pdf/1612.01840.pdf (дата обращения: 13.12.2021).
Yu Zh., Xu X., Chen X., Yang D. Learning a Representation for Cover Song Identification Using Convolutional Neural Network. 2019. URL: http:// https://arxiv.org/abs/1911.00334 (дата обращения: 13.12.2021).
McFee B., Raffel C., Liang D., PW Ellis D., McVicar M., Battenberg E., and Nieto O. Librosa: Audio and music signal analysis in python. Proc. of the 14th python in science conf. 2015. URL: http://conference.scipy.org/proceedings/scipy2015/ pdf. (дата обращения: 13.12.2021).
Goodfellow I., Bengio Y., Courville A. Deep learning: The MIT Press, 2016.
Copyright (c) 2022 Владислав Николаевич Попов, Павел Сергеевич Ладыгин, Валентин Витальевич Карев, Яна Игоревна Борцова
This work is licensed under a Creative Commons Attribution 4.0 International License.
Izvestiya of Altai State University is a golden publisher, as we allow self-archiving, but most importantly we are fully transparent about your rights.
Authors may present and discuss their findings ahead of publication: at biological or scientific conferences, on preprint servers, in public databases, and in blogs, wikis, tweets, and other informal communication channels.
Izvestiya of Altai State University allows authors to deposit manuscripts (currently under review or those for intended submission to Izvestiya of Altai State University) in non-commercial, pre-print servers such as ArXiv.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).