Generalized Algorithm for Finding Outliers in a Regression Model

УДК 510.5

  • M.V. Kurkina Ugra State University (Khanty-Mansiysk, Russia) Email: mavi@inbox.ru
  • I.V. Ponomarev Altai State University (Barnaul, Russia) Email: igorpon@mail.ru
Keywords: linear regression, least squares, least modulus, statistical outliers

Abstract

One of the actively developing areas of modern computational problems is data analysis. The studied data have a different structure, which causes certain difficulties in the process of smoothing and analysis. This fact entails the need to search for new universal algorithms for data processing and create computer programs that analyze data of various nature. Today, a widely used method of data processing is regression modeling. It is used in problems of pattern recognition, classification, dimensionality reduction, and many others. The literature describes various methods of constructing regression models, the basis of which is the optimization of a certain indicator — the quality functional. A very important requirement for the quality of such models is the absence of outliers (outliers) in the data.

This article discusses a method for examining a sample for outliers. The obtained algorithm can be applied to regression models estimated by the most common methods (least squares method, least modulus method). The mathematical basis of this procedure is the Legendre transformation, which provides computational accuracy in computer implementation. The adequacy of the obtained algorithm was investigated on a number of test samples. All tests were positive in terms of emissions. The MatLab system is used to develop a set of programs, which allows the building of various regression models and evaluation of the original sample for sharply distinguished observations.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biographies

M.V. Kurkina, Ugra State University (Khanty-Mansiysk, Russia)

кандидат физико-математических наук, доцент

I.V. Ponomarev, Altai State University (Barnaul, Russia)

кандидат физико-математических наук, доцент кафедры математического анализа

References

Стрижов В.В., Крымова Е.А. Методы выбора регрессионных моделей. М., 2010.

Мудров В.И., Кушко В.Л. Метод наименьших модулей. М., 1971.

Armstrong R.D., Kung D.S. Algorithm AS132: Least absolute value estimates for a simple linear regression problem // Appl. Stat. 1978. Vol. 7.

Weisberg S. Applied linear regression. 3rd ed. Jonh Wiley & Sans, Inc., 2005.

Мостеллер Ф., Тьюки Дж. Анализ данных и регрессия / пер. с англ. М., 1982. Вып. 1, 2.

Cook R.D. Detection of Influential Observation in Linear Regression // Technometrics. 1977. Vol. 19(1).

Andrews D.F., Pregibоn D. Finding the outliers that matter // Journal of the Royal Statistical Society. 1978. Vol. 40.

Пономарев И.В., Саженкова Т.В., Славский В.В. Метод поиска экстремальных наблюдений в задаче нечеткой регрессии // Известия Алт. гос. ун-та. 2018. № 4(102). DOI: 10.14258/izvasu(2021)1-17.

Arthur Zimek, Peter Filzmoser. There and back again: Outlier detection between statistical reasoning and data mining algorithms // Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2018. Vol. 8. № 6. DOI: 10.1002/widm.1280.

Campello R.J.G.B., Moulavi D., Zimek A., Sander J. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection // ACM Transactions on Knowledge Discovery from Data. 2015. Vol. 10. № 1. DOI: 10.1145/2733381.

Published
2021-09-10
How to Cite
Kurkina M., Ponomarev I. Generalized Algorithm for Finding Outliers in a Regression Model // Izvestiya of Altai State University, 2021, № 4(120). P. 102-105 DOI: 10.14258/izvasu(2021)4-16. URL: http://izvestiya.asu.ru/article/view/%282021%294-16.