Speech Replay Spoofing Attack Detection System Based on Fusion of Classification Algorithms
Abstract
Fast development of modern technologies of digital processing and speech recording leads to the fact that it is necessary to take into account the potential threats from the speech replay attacks. We propose our ensemble fusion replay attack detection system. It uses constant Q cepstral coefficients as speech features and short-time mean normalization for their preprocessing. The set of binary classifiers includes multiple Gaussian mixture models based Bayesian classifier, i-vector based Gaussian Probabilistic Linear Discriminant Analysis and XGBoost tree boosting algorithm. Fusion of scores was made by modified logistic regression algorithm from BOSARIS toolbox. ASV Spoof 2017 corpus is utilized in the experiments as the main database for anti-spoofing systems evaluation. Obtained results demonstrate that the proposed system can provide substantially better performance than the baseline Gaussian mixture model classifier. The pre-processing of cepstral features is crucial for the better performance of the system. High evaluation performance can be obtained using only few algorithms in a set. The attained value of equal error rate EER=12.44% for our fusion classifier is competitive with the best results obtained during last two years.
DOI 10.14258/izvasu(2018)1-19
Downloads
References
Kinnunen T., Sahidullah M., Delgado H., Todisco M., Evans N., Yamagishi J., Lee K.A. The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection // Proc. INTERSPEECH 2017. 2017. D01:10.21437/ Interspeech.2017-1111.
Wu Z., Yamagishi J., Kinnunen T., Hanil^i C., Sahidullah M., Sizov A., Evans N., Todisco M., Delgado H. ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge // IEEE Journal of Selected Topics in Signal Processing. — 2017. — Vol. 11, No. 4. D0I:10.1109/ JSTSP2017.2671435.
K. Lee, A. Larcher, G. Wang, P. Kenny, N. Brummer, D. A. van Leeuwen, H. Aronowitz, et al. The RedDots data collection for speaker recognition // Proc. Interspeech, Annual Conf. of the Int. Speech Comm. Assoc., 2015.
Morrison G.S. Tutorial on logistic-regression calibration and fusion:converting a score to a likelihood ratio // Australian Journal of Forensic Sciences. — 2013. — Vol. 45, No. 2. DOI: 10.1080/00450618.2012.733025.
Reynolds D.A., Quatieri T.F., Dunn R.B. Speaker verification using adapted Gaussian mixture models // Digital Signal Processing. — 2000. — Vol. 10, No. 1. DOI: 10.1006/ dspr.1999.0361.
Senoussaoui M., Kenny P, Dehak N., Dumouchel P An i-vector extractor suitable for speaker recognition with both micro-phone and telephone speech // Proc. Odyssey Speaker and Language Recogntion Workshop, 2010.
Verma P, Das PK. I-vectors in speech processing applications: a survey // International Journal of Speech Technolng. — 2015. — Vol. 18, No. 4. DOI: 10.1007/978-981-10-6626-9_18.
Chen T., Guestrin C. XGBoost: A Scalable Tree Boosting System // KDD’16 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016.
Shikha G., Jaafar J., Fatimah W., Ahmad W., Bansal A. Feature Extraction using MFCC // International Journal of signal and image processing (SIPIJ). — 2013. — Vol. 4, No. 4. DOI: 10.5121/sipij.2013.4408.
Todisco M., Delgado H., Evans N. A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients // Speaker Odyssey Workshop, Bilbao, Spain. 2016.
Brown J. C. Calculation of a constant Q spectral transform // Journal of Acoustic Society America. — 1991. — Vol. 89, No. 1.
Alam M., Ouellet P, Kenny P., O’Shaughnessy D. Comparative evaluation of feature normalization techniques for speaker verification // Advances in Nonlinear Speech Processing: 5th International Conference on Nonlinear Speech Processing, NOLISP 2011. DOI: 10.1007/978-3-642-25020-0_32.
Dehak N., Kenny P, Dehak R., Dumouchel P., Ouellet P. Front-End Factor Analysis For Speaker Verification // IEEE Transactions on Audio, Speech and Language Processing. — 2010. — Vol. 19, No. 4. DOI: 10.1109/TASL.2010.2064307.
Sadjadi S. O., Slaney M., Heck L. MSR identity toolbox v1.0: A MATLAB toolbox for speaker recognition research // Proc. IEEE Signal Process. Soc. Speech Lang. Tech. Committee Newsl. 2013.
Copyright (c) 2018 А.А. Лепендин, Я.А. Филин, П.В. Малинин

This work is licensed under a Creative Commons Attribution 4.0 International License.
Izvestiya of Altai State University is a golden publisher, as we allow self-archiving, but most importantly we are fully transparent about your rights.
Authors may present and discuss their findings ahead of publication: at biological or scientific conferences, on preprint servers, in public databases, and in blogs, wikis, tweets, and other informal communication channels.
Izvestiya of Altai State University allows authors to deposit manuscripts (currently under review or those for intended submission to Izvestiya of Altai State University) in non-commercial, pre-print servers such as ArXiv.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).



