Simplified Silhouette Parameter for Assessing the Quality of Cluster Structures
УДК 519.254
Abstract
The article deals with issues related to assessing the quality of a cluster data structure. A description of the clustering quality index is given, which takes into account the characteristics of compactness and separability of clusters in two versions: the classical and the simplified silhouette index. It is noted that a laborious procedure of a complete enumeration of pairs of objects is required to evaluate the classical silhouette feature on big data. Further, a variation of this indicator — a simplified silhouette indicator — is proposed and found to be convenient for assessing cluster structures built on big data arrays. The sample indicator has been tested on model data, and several variants of cluster structures are built for the objects like identified clusters that are present in the set of mini-clusters. The centers of mini-clusters with consideration to their “weight” (the number of objects in mini-clusters was set as the weight) are chosen as objects when calculating intracluster and inter-cluster distances. The corresponding silhouette parameter is calculated. The comparison of the indicators of the classical and simplified silhouette indicators for each set of data models provides an adequate assessment of the quality of clustering.
Downloads
Metrics
References
Загоруйко Н.Г. Прикладные методы анализа данных и знаний. Новосибирск, 1999.
Загоруйко Н.Г. Интеллектуальный анализ данных, основанный на функции конкурентного сходства // Автометрия. 2008. Т. 44. № 3.
Миркин Б.Г. Методы кластер-анализа для поддержки принятия решений: обзор. М., 2011.
Dronov S.V, Evdokimov E.A. Post-hoc cluster analysis of connection between forming characteristics // Model Assisted Statistics and Applications. 2018. Vol. 13. № 2. DOI: 10.3233/MAS-180429.
Журавлева В.В., Аюпов К.Е. Применение метода кластерного анализа для обнаружения зависимости обострений сердечно-сосудистых заболеваний от геофизических факторов : сб. научн. ст. Междунар. конф. «Ломоносовские чтения на Алтае: фундаментальные проблемы науки и образования». Барнаул, 2015.
Айдинян А.Р., Цветкова О.Л. Алгоритмы кластерного анализа для решения задач с асимметричной мерой близости // Сиб. журн. вычисл. матем. 2018. Т. 21. № 2. DOI: 10.15372/SJNM20180201.
Игнатьев Н.А. Кластерный анализ данных и выбор объектов-эталонов в задачах распознавания с учителем // Вычислительные технологии. 2015. Т. 20. № 6.
Савченко Т.Н. Применение методов кластерного анализа для анализа данных психологических исследований // Прикладная юридическая психология. 2008. № 4.
Сивоголовко Е.В. Оценка качества кластеризации в задачах интеллектуального анализа данных : дис. ... канд. физ.-мат. наук. СПб., 2014.
Паклин Н.Б., Орешков В.И. Кластерные силуэты // Системный анализ в проектировании и управлении : сб. научн. тр. XX Междунар. науч.-практич. конф. Ч. 2. СПб., 2016.
Журавлева В.В., Бондарева А.А. Описание одного алгоритма кластеризации типа Forel // МАК-2015 : сб. трудов 18-й Всеросс. конф. по математике. Барнаул, 2015.
Журавлева В.В. Об одном алгоритме кластеризации : сб. научн. ст. Междунар. конф. «Ломоносовские чтения на Алтае: фундаментальные проблемы науки и образования». Барнаул, 2015.
Copyright (c) 2022 Вера Владимировна Журавлева , Анастасия Станиславовна Маничева
This work is licensed under a Creative Commons Attribution 4.0 International License.
Izvestiya of Altai State University is a golden publisher, as we allow self-archiving, but most importantly we are fully transparent about your rights.
Authors may present and discuss their findings ahead of publication: at biological or scientific conferences, on preprint servers, in public databases, and in blogs, wikis, tweets, and other informal communication channels.
Izvestiya of Altai State University allows authors to deposit manuscripts (currently under review or those for intended submission to Izvestiya of Altai State University) in non-commercial, pre-print servers such as ArXiv.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).