Research Article | | Peer-Reviewed

Comparative Analysis of Feature Extraction of High Dimensional Data Reduction Using Machine Learning Techniques

Received: 29 October 2023     Accepted: 17 November 2023     Published: 11 December 2023
Views:       Downloads:
Abstract

Dimensionality reduction is critical for analyzing and interpreting high-dimensional data across domains like genomics, imaging, and finance. This paper presents a comparative analysis of dimensionality reduction techniques, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Recursive Feature Elimination (RFE), and Lasso regression. These methods are applied to datasets from genomics, medical imaging, and finance to evaluate their ability to reduce dimensions while preserving relevant information. The results demonstrate that PCA and LDA are highly effective for genomics data, reducing gene expression profiles from over 60,000 dimensions to 10-50 components while maintaining precision of over 80%. For medical images, PCA and LDA reduce pixel dimensions by over 90% without compromising precision. However, no single technique optimizes dimensionality reduction and precision for complex finance data. Overall, the analysis provides domain-specific insights, highlighting PCA and LDA as leading techniques for genomics and imaging. The choice of method should be guided by data characteristics. Testing on more diverse, real-world datasets is needed to establish validity further. This research aims to inform the selection of appropriate data reduction techniques across critical applications involving high-dimensional data.

Published in American Journal of Electrical and Computer Engineering (Volume 7, Issue 2)
DOI 10.11648/j.ajece.20230702.12
Page(s) 27-39
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2023. Published by Science Publishing Group

Keywords

Machine Learning, Principal Component Analysis, Linear Discriminant Analysis, Recursive Feature Elimination, Lasso Regression, Genomics, Medical Imaging

References
[1] S. Vijayarani, S. Sharmila and G. Srivastava, "Comparative analysis of dimensionality reduction techniques for heart disease prediction," in Computational Intelligence and Data Analytics: Proceedings of ICIDA 2019, Cham, 2019.
[2] K. Yildiz, A. Çamurcu and B. Doğan, "Comparison of dimension reduction techniques on high dimensional datasets.," Int. Arab J. Inf. Technol., vol. 15, pp. 256-262, 2018.
[3] G. T. Reddy, M. P. K. Reddy, K. Lakshmanna, R. Kaluri, D. S. Rajput, G. Srivastava and T. Baker, "Analysis of Dimensionality Reduction Techniques on Big Data," IEEE Access, vol. 8, pp. 54776-54788, 2020.
[4] H. Yang, " A comparative study of dimensionality reduction techniques to enhance trace clustering performances," 2012.
[5] T. Gadekallu, P. Reddy, K. Lakshman, R. Kaluri, D. Rajput, G. Srivastava and T. Baker, "Analysis of Dimensionality Reduction Techniques on Big Data," IEEE Access, pp. 1-10, 2020.
[6] L. Zhang, Z. Wang and Z. Liu, "A comparative study of dimensionality reduction techniques for cancer diagnosis," Journal of Biomedical Informatics, vol. 92, pp. 103-111, 2018.
[7] S. Bharti, S. Kumar and A. Kumar, "Comparative study of dimensionality reduction techniques for intrusion detection systems," in 2nd International Conference on Computing, Communication, and Smart Technologies (ICCST), 2020.
[8] S. Ayesha, M. Kashif and R. Talib, "Overview and Comparative Study of Dimensionality Reduction Techniques for High Dimensional Data," Information Fusion, 2020.
[9] V. Santhosh, " Comparative Analysis of Dimensionality Reduction Techniques for Machine Learning," International Journal of Scientific Research in Science and, vol. 4, no. 8, pp. 364-369, 2018.
[10] R. A. Fisher, "The use of multiple measurements in taxonomic problems," Annals of Eugenics, vol. 7, no. 2, pp. 179-188, 1936.
[11] M. Vikram, R. Pavan, N. D. Dineshbhai and B. Mohan, "Performance evaluation of dimensionality reduction techniques on high dimensional data," in 3rd International Conference on Trends in Electronics and Informatics (ICOEI), 2019.
[12] M. A. Belarbi, S. Mahmoudi, G. Belalem, S. A. Mahmoudi and A. Cools, "A New Comparative Study of Dimensionality Reduction Methods in Large-Scale Image," Big Data and Cognitive Computing, vol. 6, no. 2, 2022.
[13] D. Mishra and S. Sharma, "Performance Analysis of Dimensionality Reduction Techniques: A Comprehensive Review," Advances in Mechanical Engineering. Lecture Notes in Mechanical Engineering, 2021.
[14] S. Gyamerah and D. R. Korda, "Prediction of Stock Market Returns using LSTM Model and Traditional Statistical Model," International Journal of Computer Applications, vol. 183, no. 37, pp. 57-61, 2021.
[15] B. Ghojogh, M. N. Samad, S. A. Mashhadi, T. Kapoor, W. Ali, F. Karray and M. Crowley, "Feature selection and feature extraction in pattern analysis: A literature review," arXiv preprint, 2019.
[16] Wikipedia, "Principal component analysis," [Online]. Available: https://en.wikipedia.org/w/index.php?title=Principal_component_analysis&oldid=1168271511. [Accessed 3 August 2023].
Cite This Article
  • APA Style

    Gyamerah, S., Tour Soori, G., Redeemer Korda, D., Kwame Tawiah, J., Ayintareba Akolgo, E., et al. (2023). Comparative Analysis of Feature Extraction of High Dimensional Data Reduction Using Machine Learning Techniques. American Journal of Electrical and Computer Engineering, 7(2), 27-39. https://doi.org/10.11648/j.ajece.20230702.12

    Copy | Download

    ACS Style

    Gyamerah, S.; Tour Soori, G.; Redeemer Korda, D.; Kwame Tawiah, J.; Ayintareba Akolgo, E., et al. Comparative Analysis of Feature Extraction of High Dimensional Data Reduction Using Machine Learning Techniques. Am. J. Electr. Comput. Eng. 2023, 7(2), 27-39. doi: 10.11648/j.ajece.20230702.12

    Copy | Download

    AMA Style

    Gyamerah S, Tour Soori G, Redeemer Korda D, Kwame Tawiah J, Ayintareba Akolgo E, et al. Comparative Analysis of Feature Extraction of High Dimensional Data Reduction Using Machine Learning Techniques. Am J Electr Comput Eng. 2023;7(2):27-39. doi: 10.11648/j.ajece.20230702.12

    Copy | Download

  • @article{10.11648/j.ajece.20230702.12,
      author = {Seth Gyamerah and Godfred Tour Soori and Dennis Redeemer Korda and John Kwame Tawiah and Eric Ayintareba Akolgo and Emmanuel Oteng Dapaah},
      title = {Comparative Analysis of Feature Extraction of High Dimensional Data Reduction Using Machine Learning Techniques},
      journal = {American Journal of Electrical and Computer Engineering},
      volume = {7},
      number = {2},
      pages = {27-39},
      doi = {10.11648/j.ajece.20230702.12},
      url = {https://doi.org/10.11648/j.ajece.20230702.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajece.20230702.12},
      abstract = {Dimensionality reduction is critical for analyzing and interpreting high-dimensional data across domains like genomics, imaging, and finance. This paper presents a comparative analysis of dimensionality reduction techniques, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Recursive Feature Elimination (RFE), and Lasso regression. These methods are applied to datasets from genomics, medical imaging, and finance to evaluate their ability to reduce dimensions while preserving relevant information. The results demonstrate that PCA and LDA are highly effective for genomics data, reducing gene expression profiles from over 60,000 dimensions to 10-50 components while maintaining precision of over 80%. For medical images, PCA and LDA reduce pixel dimensions by over 90% without compromising precision. However, no single technique optimizes dimensionality reduction and precision for complex finance data. Overall, the analysis provides domain-specific insights, highlighting PCA and LDA as leading techniques for genomics and imaging. The choice of method should be guided by data characteristics. Testing on more diverse, real-world datasets is needed to establish validity further. This research aims to inform the selection of appropriate data reduction techniques across critical applications involving high-dimensional data.
    },
     year = {2023}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Comparative Analysis of Feature Extraction of High Dimensional Data Reduction Using Machine Learning Techniques
    AU  - Seth Gyamerah
    AU  - Godfred Tour Soori
    AU  - Dennis Redeemer Korda
    AU  - John Kwame Tawiah
    AU  - Eric Ayintareba Akolgo
    AU  - Emmanuel Oteng Dapaah
    Y1  - 2023/12/11
    PY  - 2023
    N1  - https://doi.org/10.11648/j.ajece.20230702.12
    DO  - 10.11648/j.ajece.20230702.12
    T2  - American Journal of Electrical and Computer Engineering
    JF  - American Journal of Electrical and Computer Engineering
    JO  - American Journal of Electrical and Computer Engineering
    SP  - 27
    EP  - 39
    PB  - Science Publishing Group
    SN  - 2640-0502
    UR  - https://doi.org/10.11648/j.ajece.20230702.12
    AB  - Dimensionality reduction is critical for analyzing and interpreting high-dimensional data across domains like genomics, imaging, and finance. This paper presents a comparative analysis of dimensionality reduction techniques, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Recursive Feature Elimination (RFE), and Lasso regression. These methods are applied to datasets from genomics, medical imaging, and finance to evaluate their ability to reduce dimensions while preserving relevant information. The results demonstrate that PCA and LDA are highly effective for genomics data, reducing gene expression profiles from over 60,000 dimensions to 10-50 components while maintaining precision of over 80%. For medical images, PCA and LDA reduce pixel dimensions by over 90% without compromising precision. However, no single technique optimizes dimensionality reduction and precision for complex finance data. Overall, the analysis provides domain-specific insights, highlighting PCA and LDA as leading techniques for genomics and imaging. The choice of method should be guided by data characteristics. Testing on more diverse, real-world datasets is needed to establish validity further. This research aims to inform the selection of appropriate data reduction techniques across critical applications involving high-dimensional data.
    
    VL  - 7
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Department of Computer Science, C. K. Tedam University of Technology and Applied Sciences, Navrongo, Ghana

  • Department of Computer Science, C. K. Tedam University of Technology and Applied Sciences, Navrongo, Ghana

  • Department of Information and Communication Technology, Bolgatanga Technical University, Bolgatanga, Ghana

  • Department of Civil Engineering, Ho Technical University, Ho, Ghana

  • Department of Computer Science, Regentropfen College of Applied Sciences, Bolgatanga, Ghana

  • Department of Information and Communication Technology, E.P College of Education, Bimbila, Ghana

  • Sections