Research Article | | Peer-Reviewed

Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification

Received: 3 March 2025     Accepted: 14 March 2025     Published: 31 March 2025
Views:       Downloads:
Abstract

Migraine is a common neurological disorder that can seriously compromise the quality of life of the affected individuals. Migraine's typical diagnosis is solely dependent on traditional diagnostic methods which relies on patient self-reporting and clinical judgment, which can be subjective and prone to errors. The main objective of this study was to model migraine classification using Extreme Gradient Boosting (XGBoost), Random Forest, and K-Nearest Neighbors (KNN) algorithms, integrating Least Absolute Shrinkage and Selection Operator (LASSO) for feature regularization. Through this study, the classifications abilities of these machine learning models were evaluated to determine which among them is superior in terms of classifying the type of migraine one is suffering from. To prevent overfitting and enhance interpretability, LASSO regression was utilized for feature regularization. The models were trained with a labeled data set, hyperparameter tuning was achieved through Grid Search to systematically explore different combinations of hyperparameters and identify the optimal settings that maximize models performance. The models were evaluated based on accuracy, precision, recall, ROC-AUC, F1-score and computation time. The top-performing model was deployed into a web-based application using Spring Boot. XGBoost outperformed the other models, achieving an accuracy of 92.4%, an AUC of 96.0%, an F1-score of 91.65%, and a sensitivity of 92.24%, with a false positive rate of 1.59% and a computation time of 2.08s. Random Forest followed closely with 91.6% accuracy, a 94.0% AUC, an F1-score of 90.49%, and a sensitivity of 86.45%, but required 4.65s of computation time. K-Nearest Neighbors (KNN) demonstrated the lowest performance, with an accuracy of 86.6%, an AUC of 91.0%, F1-score of 80.53%, a sensitivity of 79.32%, and the highest computation time of 9.51s. XGBoost was found to be the most appropriate choice for migraine classification. This study highlights the promise of machine learning in enhancing migraine diagnosis through objective and data-driven means.

Published in American Journal of Mathematical and Computer Modelling (Volume 10, Issue 1)
DOI 10.11648/j.ajmcm.20251001.13
Page(s) 19-28
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Random Forest, K-Nearest Neighbors, Extreme Gradient Boosting, Least Absolute Shrinkage and Selection Operator

References
[1] Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
[2] Burton, W. N., Landy, S. H., Downs, K. E., & Runken, M. C. (2009). The impact of migraines and the effect of migraines treatment on workplace productivity in the United States and suggestions for future research. In Mayo Clinic Proceedings (Vol. 84, pp. 436-445).
[3] Chardet, M., Coullon, H., Pertin, D., & Pérez, C. (2018). Madeus: A formal deployment model. In 2018 International Conference on High Performance Computing & Simulation (HPCS) (pp. 724-731).
[4] Choudhary, K., DeCost, B., Chen, C., Jain, A., Tavazza, F., Cohn, R., et al. (2022). Recent advances and applications of deep learning methods in materials science. npj Computational Materials, 8(159).
[5] Diaby, T., & Rad, B. B. (2017). Cloud computing: A review of the concepts and deployment models. International Journal of Information Technology and Computer Science, 9(6), 50-58.
[6] Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN model-based approach in classification. In On the move to meaningful internet systems 2003: Coopis, doa, and odbase: Otm Confederated International Conferences, Coopis, Doa, and Odbase 2003, Catania, Sicily, Italy, November 3-7, 2003. Proceedings (pp. 986- 996).
[7] Hassan, M. M., Hassan, M. M., Yasmin, F., Khan, M. A. R., Zaman, S., Islam, K. K., et al. (2023). A comparative assessment of machine learning algorithms with the Least Absolute Shrinkage and Selection Operator for breast cancer detection and prediction. Decision Analytics Journal, 7, 100245.
[8] Larsson, J. (2024). Optimization and algorithms in sparse regression: Screening rules, coordinate descent, and normalization. Lund University.
[9] Liang, P., Song, B., Zhan, X., Chen, Z., &Yuan, J.(2024). Automating the training and deployment of models in MLOps by integrating systems with machine learning.
[10] Mahović, D., Bračić, M., & Jakuš, L. (2021). Diagnostic criteria and classification of migraine. Medicus, 301, Migrena, 39-44.:
[11] Miao, Y., Wang, J., Zhang, B., & Li, H. (2022). Practical framework of Gini index in the application of machinery fault feature extraction. Mechanical Systems and Signal Processing, 165, 108333.
[12] Qi, Y. (2012). Random forest for bioinformatics. Ensemble machine learning: Methods and applications, 307-323.
[13] Rigatti, S. J. (2017). Random forest. Journal of Insurance Medicine, 47(1), 31-39.
[14] Selvaraj, S. (2024). Building RESTful APIs with Spring Boot (Java). In Mastering REST APIs: Boosting Your Web Development Journey with Advanced API Techniques (pp. 291-347). Springer.
[15] Shaik, N. B., Jongkittinarukorn, K., & Bingi, K. (2024). XGBoost based enhanced predictive model for handling missing input parameters: A case study on gas turbine. Case Studies in Chemical and Environmental Engineering, 10, 100775.
[16] Wang, Y., & Wang, Z.-O. (2007). A fast KNN algorithm for text categorization. In 2007 International Conference on Machine Learning and Cybernetics (Vol. 6, pp. 3436- 3441).
[17] Zhang, S., Li, X., Zong, M., Zhu, X., &Cheng, D.(2017). Learning k for KNN classification. ACM Transactions on Intelligent Systems and Technology (TIST), 8(3), 1-19.
[18] Zhou, X., Lu, P., Zheng, Z., Tolliver, D., & Keramati, A. (2020). Accident prediction accuracy assessment for highway-rail grade crossings using random forest algorithm compared with decision tree. Reliability Engineering & System Safety, 200, 106931.
Cite This Article
  • APA Style

    Kamau, B. N., Malenje, B., Wamwea, C., Onyango, L. A. (2025). Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. American Journal of Mathematical and Computer Modelling, 10(1), 19-28. https://doi.org/10.11648/j.ajmcm.20251001.13

    Copy | Download

    ACS Style

    Kamau, B. N.; Malenje, B.; Wamwea, C.; Onyango, L. A. Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. Am. J. Math. Comput. Model. 2025, 10(1), 19-28. doi: 10.11648/j.ajmcm.20251001.13

    Copy | Download

    AMA Style

    Kamau BN, Malenje B, Wamwea C, Onyango LA. Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. Am J Math Comput Model. 2025;10(1):19-28. doi: 10.11648/j.ajmcm.20251001.13

    Copy | Download

  • @article{10.11648/j.ajmcm.20251001.13,
      author = {Boniface Ngugi Kamau and Bonface Malenje and Charity Wamwea and Lena Anyango Onyango},
      title = {Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification},
      journal = {American Journal of Mathematical and Computer Modelling},
      volume = {10},
      number = {1},
      pages = {19-28},
      doi = {10.11648/j.ajmcm.20251001.13},
      url = {https://doi.org/10.11648/j.ajmcm.20251001.13},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmcm.20251001.13},
      abstract = {Migraine is a common neurological disorder that can seriously compromise the quality of life of the affected individuals. Migraine's typical diagnosis is solely dependent on traditional diagnostic methods which relies on patient self-reporting and clinical judgment, which can be subjective and prone to errors. The main objective of this study was to model migraine classification using Extreme Gradient Boosting (XGBoost), Random Forest, and K-Nearest Neighbors (KNN) algorithms, integrating Least Absolute Shrinkage and Selection Operator (LASSO) for feature regularization. Through this study, the classifications abilities of these machine learning models were evaluated to determine which among them is superior in terms of classifying the type of migraine one is suffering from. To prevent overfitting and enhance interpretability, LASSO regression was utilized for feature regularization. The models were trained with a labeled data set, hyperparameter tuning was achieved through Grid Search to systematically explore different combinations of hyperparameters and identify the optimal settings that maximize models performance. The models were evaluated based on accuracy, precision, recall, ROC-AUC, F1-score and computation time. The top-performing model was deployed into a web-based application using Spring Boot. XGBoost outperformed the other models, achieving an accuracy of 92.4%, an AUC of 96.0%, an F1-score of 91.65%, and a sensitivity of 92.24%, with a false positive rate of 1.59% and a computation time of 2.08s. Random Forest followed closely with 91.6% accuracy, a 94.0% AUC, an F1-score of 90.49%, and a sensitivity of 86.45%, but required 4.65s of computation time. K-Nearest Neighbors (KNN) demonstrated the lowest performance, with an accuracy of 86.6%, an AUC of 91.0%, F1-score of 80.53%, a sensitivity of 79.32%, and the highest computation time of 9.51s. XGBoost was found to be the most appropriate choice for migraine classification. This study highlights the promise of machine learning in enhancing migraine diagnosis through objective and data-driven means.},
     year = {2025}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification
    AU  - Boniface Ngugi Kamau
    AU  - Bonface Malenje
    AU  - Charity Wamwea
    AU  - Lena Anyango Onyango
    Y1  - 2025/03/31
    PY  - 2025
    N1  - https://doi.org/10.11648/j.ajmcm.20251001.13
    DO  - 10.11648/j.ajmcm.20251001.13
    T2  - American Journal of Mathematical and Computer Modelling
    JF  - American Journal of Mathematical and Computer Modelling
    JO  - American Journal of Mathematical and Computer Modelling
    SP  - 19
    EP  - 28
    PB  - Science Publishing Group
    SN  - 2578-8280
    UR  - https://doi.org/10.11648/j.ajmcm.20251001.13
    AB  - Migraine is a common neurological disorder that can seriously compromise the quality of life of the affected individuals. Migraine's typical diagnosis is solely dependent on traditional diagnostic methods which relies on patient self-reporting and clinical judgment, which can be subjective and prone to errors. The main objective of this study was to model migraine classification using Extreme Gradient Boosting (XGBoost), Random Forest, and K-Nearest Neighbors (KNN) algorithms, integrating Least Absolute Shrinkage and Selection Operator (LASSO) for feature regularization. Through this study, the classifications abilities of these machine learning models were evaluated to determine which among them is superior in terms of classifying the type of migraine one is suffering from. To prevent overfitting and enhance interpretability, LASSO regression was utilized for feature regularization. The models were trained with a labeled data set, hyperparameter tuning was achieved through Grid Search to systematically explore different combinations of hyperparameters and identify the optimal settings that maximize models performance. The models were evaluated based on accuracy, precision, recall, ROC-AUC, F1-score and computation time. The top-performing model was deployed into a web-based application using Spring Boot. XGBoost outperformed the other models, achieving an accuracy of 92.4%, an AUC of 96.0%, an F1-score of 91.65%, and a sensitivity of 92.24%, with a false positive rate of 1.59% and a computation time of 2.08s. Random Forest followed closely with 91.6% accuracy, a 94.0% AUC, an F1-score of 90.49%, and a sensitivity of 86.45%, but required 4.65s of computation time. K-Nearest Neighbors (KNN) demonstrated the lowest performance, with an accuracy of 86.6%, an AUC of 91.0%, F1-score of 80.53%, a sensitivity of 79.32%, and the highest computation time of 9.51s. XGBoost was found to be the most appropriate choice for migraine classification. This study highlights the promise of machine learning in enhancing migraine diagnosis through objective and data-driven means.
    VL  - 10
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

  • Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

  • Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

  • Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

  • Sections