Generalized Algorithm for Finding Outliers in a Regression Model
УДК 510.5
Abstract
One of the actively developing areas of modern computational problems is data analysis. The studied data have a different structure, which causes certain difficulties in the process of smoothing and analysis. This fact entails the need to search for new universal algorithms for data processing and create computer programs that analyze data of various nature. Today, a widely used method of data processing is regression modeling. It is used in problems of pattern recognition, classification, dimensionality reduction, and many others. The literature describes various methods of constructing regression models, the basis of which is the optimization of a certain indicator — the quality functional. A very important requirement for the quality of such models is the absence of outliers (outliers) in the data.
This article discusses a method for examining a sample for outliers. The obtained algorithm can be applied to regression models estimated by the most common methods (least squares method, least modulus method). The mathematical basis of this procedure is the Legendre transformation, which provides computational accuracy in computer implementation. The adequacy of the obtained algorithm was investigated on a number of test samples. All tests were positive in terms of emissions. The MatLab system is used to develop a set of programs, which allows the building of various regression models and evaluation of the original sample for sharply distinguished observations.
Downloads
Metrics
References
Стрижов В.В., Крымова Е.А. Методы выбора регрессионных моделей. М., 2010.
Мудров В.И., Кушко В.Л. Метод наименьших модулей. М., 1971.
Armstrong R.D., Kung D.S. Algorithm AS132: Least absolute value estimates for a simple linear regression problem // Appl. Stat. 1978. Vol. 7.
Weisberg S. Applied linear regression. 3rd ed. Jonh Wiley & Sans, Inc., 2005.
Мостеллер Ф., Тьюки Дж. Анализ данных и регрессия / пер. с англ. М., 1982. Вып. 1, 2.
Cook R.D. Detection of Influential Observation in Linear Regression // Technometrics. 1977. Vol. 19(1).
Andrews D.F., Pregibоn D. Finding the outliers that matter // Journal of the Royal Statistical Society. 1978. Vol. 40.
Пономарев И.В., Саженкова Т.В., Славский В.В. Метод поиска экстремальных наблюдений в задаче нечеткой регрессии // Известия Алт. гос. ун-та. 2018. № 4(102). DOI: 10.14258/izvasu(2021)1-17.
Arthur Zimek, Peter Filzmoser. There and back again: Outlier detection between statistical reasoning and data mining algorithms // Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2018. Vol. 8. № 6. DOI: 10.1002/widm.1280.
Campello R.J.G.B., Moulavi D., Zimek A., Sander J. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection // ACM Transactions on Knowledge Discovery from Data. 2015. Vol. 10. № 1. DOI: 10.1145/2733381.
Copyright (c) 2021 Мария Викторовна Куркина , Игорь Викторович Пономарев
![Creative Commons License](http://i.creativecommons.org/l/by/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Izvestiya of Altai State University is a golden publisher, as we allow self-archiving, but most importantly we are fully transparent about your rights.
Authors may present and discuss their findings ahead of publication: at biological or scientific conferences, on preprint servers, in public databases, and in blogs, wikis, tweets, and other informal communication channels.
Izvestiya of Altai State University allows authors to deposit manuscripts (currently under review or those for intended submission to Izvestiya of Altai State University) in non-commercial, pre-print servers such as ArXiv.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).