| Peer-Reviewed

A Comparison of K-Means and Mean Shift Algorithms

Received: 25 August 2021     Accepted: 30 September 2021     Published: 27 November 2021
Views:       Downloads:
Abstract

Clustering, also known as cluster analysis, is a learning problem that occurs without the intervention of a human. This technique is frequently used very efficiently in data analysis to observe and identify interesting, useful, or desirable patterns in data. The clustering technique operates by dividing the data involved into similar objects based on their identified properties. This process results in the formation of groups, and each formed group is referred to as a cluster. A single said cluster consists of objects from the data that share similarities with other objects found in the same cluster and differ from objects identified from the data that now exist in other clusters. Clustering is an important process in many aspects of data analysis because it determines and presents the intrinsic grouping of objects in the data based on their attributes in a batch of unlabeled raw data. This method of cluster analysis lacks a textbook or, to put it another way, good criteria. This is due to the fact that this process is unique and customizable for each user who requires it for a variety of reasons. There is no single best clustering algorithm because it is so dependent on the user's scenario and needs. The purpose of this paper is to compare and contrast two different clustering algorithms. The algorithms under consideration are the k- mean and the mean shift. These algorithms are compared based on the following criteria: time complexity, training, prediction performance, and clustering algorithm accuracy.

Published in International Journal of Theoretical and Applied Mathematics (Volume 7, Issue 5)
DOI 10.11648/j.ijtam.20210705.12
Page(s) 76-84
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

K-Mean, Mean-Shift, Performance, Accuracy

References
[1] Automation. Top 10 things to know about custom automation. [Online]. Available: https://www.roboticstomorrow.com/article/2020/11/maximizing-the-benefits-of-customized-solutions/15941/.
[2] K. Kambatla, G. Kollias, V. Kumar, and A. Grama, “Trends in big data analytics,” Journal of parallel and distributed computing, vol. 74, no. 7, pp. 2561–2573, 2014.
[3] Y. P. Raykov, A. Boukouvalas, F. Baig, and M. A. Little, “What to do when k-means clustering fails: a simple yet principled alternative algorithm,” PloS one, vol. 11, no. 9, p. e0162259, 2016.
[4] C.-W. Tsai, C.-F. Lai, H.-C. Chao, and A. V. Vasilakos, “Big data analytics: a survey,” Journal of Big data, vol. 2, no. 1, pp. 1–32, 2015.
[5] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, S. Y. Philip et al., “Top 10 algorithms in data mining,” Knowledge and information systems, vol. 14, no. 1, pp. 1–37, 2008.
[6] R. Sathya and A. Abraham, “Comparison of supervised and unsu- pervised learning algorithms for pattern classification,” International Journal of Advanced Research in Artificial Intelligence, vol. 2, no. 2, pp. 34–38, 2013.
[7] A. Kapoor and A. Singhal, “A comparative study of k-means, k- means++ and fuzzy c-means clustering algorithms,” in 2017 3rd inter- national conference on computational intelligence & communication technology (CICT). IEEE, 2017, pp. 1–6.
[8] O. A. Abbas, “Comparisons between data clustering algorithms.” International Arab Journal of Information Technology (IAJIT), vol. 5, no. 3, 2008.
[9] J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14. Oakland, CA, USA, 1967, pp. 281–297.
[10] S. Lloyd, “Least squares quantization in pcm,” IEEE transactions on information theory, vol. 28, no. 2, pp. 129–137, 1982.
[11] E. W. Forgy, “Cluster analysis of multivariate data: efficiency versus interpretability of classifications,” biometrics, vol. 21, pp. 768–769, 1965.
[12] K. Raghupathi. 10 interesting use cases for the k-means algorithm. [Online]. Available: https://dzone.com/articles/10-interesting-use- cases-for-the-k-means-algorithm.
[13] S. C. Nair, M. S. Elayidom, and S. Gopalan, “Call detail record-based traffic density analysis using global k-means clustering,” International Journal of Intelligent Enterprise, vol. 7, no. 1-3, pp. 176–187, 2020.
[14] D. LIN. Using data science techniques for the automatic clustering of it alerts. [Online]. Avail- able: https://tanzu.vmware.com/content/blog/using-data-science- techniques-for-the-automatic-clustering-of-it-alerts.
[15] M. Zulfadhilah, Y. Prayudi, and I. Riadi, “Cyber profiling using log analysis and k-means clustering,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 7, pp. 430–435, 2016.
[16] A. Chakravarthy, “Analysis of cyber-criminal profiling and cyber- attacks: A comprehensive study,” in 3rd World Conference on Applied Sciences, Engineering and Technology, Kathmandu, Nepal, 2014.
[17] J. Yang, S. Rahardja, and P. Fränti, “Mean-shift outlier detection and filtering,” Pattern Recognition, p. 107874, 2021.
[18] A. Shivhare and V. Choudhary, “Object tracking in video using mean shift algorithm: A review,” International Journal of Computer Science and Information Technologies, 2015.
[19] Fisher. Iris data set. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/iris.
[20] W. D. Set. Wine data set. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/wine.
[21] starweaver. Python is a powerful programming language of choice. [Online]. Available: https://starweaver.com/why-python-is-a- powerful-programming-language-of-choice/.
[22] M. Learning. Why is python used for ai (artificial intelligence) machine learning? [Online]. Available: esparkinfo.com/why-python- is-used-for-ai-and-machine-learning.html.
[23] Blogarama. Why is python used for ai (artificial intelligence) machine learning? [Online]. Available: https://www.blogarama.com/software- blogs/1070228-learnprogramingbyluckysir-blog/22404363-why- python-powerful-for-data-science.
[24] P. Piotrowski, “Build a rapid web development environment for python server pages and oracle,” Oracle Technology Network, 2012.
[25] S. Raschka, J. Patterson, and C. Nolet, “Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence,” Information, vol. 11, no. 4, p. 193, 2020.
[26] A. Singh, A. Yadav, and A. Rana, “K-means with three different distance metrics,” International Journal of Computer Applications, vol. 67, no. 10, 2013.
[27] M. Inaba, N. Katoh, and H. Imai, “Applications of weighted voronoi diagrams and randomization to variance-based k-clustering,” in Pro- ceedings of the tenth annual symposium on Computational geometry, 1994, pp. 332–339.
[28] M. K. Pakhira, “A linear time-complexity k-means algorithm using cluster shifting,” in 2014 International Conference on Computational Intelligence and Communication Networks, 2014, pp. 1047–1051.
[29] Scikit-Learn. Meanshift algorithm sci- kit learn. [Online]. Available: https://scikit- learn.org/stable/modules/generated/sklearn.cluster.MeanShift.html.
Cite This Article
  • APA Style

    Mehak Nigar Shumaila. (2021). A Comparison of K-Means and Mean Shift Algorithms. International Journal of Theoretical and Applied Mathematics, 7(5), 76-84. https://doi.org/10.11648/j.ijtam.20210705.12

    Copy | Download

    ACS Style

    Mehak Nigar Shumaila. A Comparison of K-Means and Mean Shift Algorithms. Int. J. Theor. Appl. Math. 2021, 7(5), 76-84. doi: 10.11648/j.ijtam.20210705.12

    Copy | Download

    AMA Style

    Mehak Nigar Shumaila. A Comparison of K-Means and Mean Shift Algorithms. Int J Theor Appl Math. 2021;7(5):76-84. doi: 10.11648/j.ijtam.20210705.12

    Copy | Download

  • @article{10.11648/j.ijtam.20210705.12,
      author = {Mehak Nigar Shumaila},
      title = {A Comparison of K-Means and Mean Shift Algorithms},
      journal = {International Journal of Theoretical and Applied Mathematics},
      volume = {7},
      number = {5},
      pages = {76-84},
      doi = {10.11648/j.ijtam.20210705.12},
      url = {https://doi.org/10.11648/j.ijtam.20210705.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijtam.20210705.12},
      abstract = {Clustering, also known as cluster analysis, is a learning problem that occurs without the intervention of a human. This technique is frequently used very efficiently in data analysis to observe and identify interesting, useful, or desirable patterns in data. The clustering technique operates by dividing the data involved into similar objects based on their identified properties. This process results in the formation of groups, and each formed group is referred to as a cluster. A single said cluster consists of objects from the data that share similarities with other objects found in the same cluster and differ from objects identified from the data that now exist in other clusters. Clustering is an important process in many aspects of data analysis because it determines and presents the intrinsic grouping of objects in the data based on their attributes in a batch of unlabeled raw data. This method of cluster analysis lacks a textbook or, to put it another way, good criteria. This is due to the fact that this process is unique and customizable for each user who requires it for a variety of reasons. There is no single best clustering algorithm because it is so dependent on the user's scenario and needs. The purpose of this paper is to compare and contrast two different clustering algorithms. The algorithms under consideration are the k- mean and the mean shift. These algorithms are compared based on the following criteria: time complexity, training, prediction performance, and clustering algorithm accuracy.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - A Comparison of K-Means and Mean Shift Algorithms
    AU  - Mehak Nigar Shumaila
    Y1  - 2021/11/27
    PY  - 2021
    N1  - https://doi.org/10.11648/j.ijtam.20210705.12
    DO  - 10.11648/j.ijtam.20210705.12
    T2  - International Journal of Theoretical and Applied Mathematics
    JF  - International Journal of Theoretical and Applied Mathematics
    JO  - International Journal of Theoretical and Applied Mathematics
    SP  - 76
    EP  - 84
    PB  - Science Publishing Group
    SN  - 2575-5080
    UR  - https://doi.org/10.11648/j.ijtam.20210705.12
    AB  - Clustering, also known as cluster analysis, is a learning problem that occurs without the intervention of a human. This technique is frequently used very efficiently in data analysis to observe and identify interesting, useful, or desirable patterns in data. The clustering technique operates by dividing the data involved into similar objects based on their identified properties. This process results in the formation of groups, and each formed group is referred to as a cluster. A single said cluster consists of objects from the data that share similarities with other objects found in the same cluster and differ from objects identified from the data that now exist in other clusters. Clustering is an important process in many aspects of data analysis because it determines and presents the intrinsic grouping of objects in the data based on their attributes in a batch of unlabeled raw data. This method of cluster analysis lacks a textbook or, to put it another way, good criteria. This is due to the fact that this process is unique and customizable for each user who requires it for a variety of reasons. There is no single best clustering algorithm because it is so dependent on the user's scenario and needs. The purpose of this paper is to compare and contrast two different clustering algorithms. The algorithms under consideration are the k- mean and the mean shift. These algorithms are compared based on the following criteria: time complexity, training, prediction performance, and clustering algorithm accuracy.
    VL  - 7
    IS  - 5
    ER  - 

    Copy | Download

Author Information
  • Department of Information Technology, Technische Hochschule Ostwestfalen-Lippe, North Rhine-Westphalia, Germany

  • Sections