Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification

Boniface Ngugi Kamau; Bonface Malenje; Charity Wamwea; Lena Anyango Onyango

doi:doi:10.11648/j.ajmcm.20251001.13

Research Article |

| Peer-Reviewed

Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification

Boniface Ngugi Kamau^*, Bonface Malenje, Charity Wamwea, Lena Anyango Onyango

Published in American Journal of Mathematical and Computer Modelling (Volume 10, Issue 1)

Received: 3 March 2025 Accepted: 14 March 2025 Published: 31 March 2025

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Migraine is a common neurological disorder that can seriously compromise the quality of life of the affected individuals. Migraine's typical diagnosis is solely dependent on traditional diagnostic methods which relies on patient self-reporting and clinical judgment, which can be subjective and prone to errors. The main objective of this study was to model migraine classification using Extreme Gradient Boosting (XGBoost), Random Forest, and K-Nearest Neighbors (KNN) algorithms, integrating Least Absolute Shrinkage and Selection Operator (LASSO) for feature regularization. Through this study, the classifications abilities of these machine learning models were evaluated to determine which among them is superior in terms of classifying the type of migraine one is suffering from. To prevent overfitting and enhance interpretability, LASSO regression was utilized for feature regularization. The models were trained with a labeled data set, hyperparameter tuning was achieved through Grid Search to systematically explore different combinations of hyperparameters and identify the optimal settings that maximize models performance. The models were evaluated based on accuracy, precision, recall, ROC-AUC, F1-score and computation time. The top-performing model was deployed into a web-based application using Spring Boot. XGBoost outperformed the other models, achieving an accuracy of 92.4%, an AUC of 96.0%, an F1-score of 91.65%, and a sensitivity of 92.24%, with a false positive rate of 1.59% and a computation time of 2.08s. Random Forest followed closely with 91.6% accuracy, a 94.0% AUC, an F1-score of 90.49%, and a sensitivity of 86.45%, but required 4.65s of computation time. K-Nearest Neighbors (KNN) demonstrated the lowest performance, with an accuracy of 86.6%, an AUC of 91.0%, F1-score of 80.53%, a sensitivity of 79.32%, and the highest computation time of 9.51s. XGBoost was found to be the most appropriate choice for migraine classification. This study highlights the promise of machine learning in enhancing migraine diagnosis through objective and data-driven means.

Published in	American Journal of Mathematical and Computer Modelling (Volume 10, Issue 1)
DOI	10.11648/j.ajmcm.20251001.13
Page(s)	19-28
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Random Forest, K-Nearest Neighbors, Extreme Gradient Boosting, Least Absolute Shrinkage and Selection Operator

References

[1]	Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
[2]	Burton, W. N., Landy, S. H., Downs, K. E., & Runken, M. C. (2009). The impact of migraines and the effect of migraines treatment on workplace productivity in the United States and suggestions for future research. In Mayo Clinic Proceedings (Vol. 84, pp. 436-445). https://doi.org/10.4065/84.5.436
[3]	Chardet, M., Coullon, H., Pertin, D., & Pérez, C. (2018). Madeus: A formal deployment model. In 2018 International Conference on High Performance Computing & Simulation (HPCS) (pp. 724-731). https://doi.org/10.1109/HPCS.2018.00118
[4]	Choudhary, K., DeCost, B., Chen, C., Jain, A., Tavazza, F., Cohn, R., et al. (2022). Recent advances and applications of deep learning methods in materials science. npj Computational Materials, 8(159).
[5]	Diaby, T., & Rad, B. B. (2017). Cloud computing: A review of the concepts and deployment models. International Journal of Information Technology and Computer Science, 9(6), 50-58. https://doi.org/10.5815/ijitcs.2017.06.07
[6]	Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN model-based approach in classification. In On the move to meaningful internet systems 2003: Coopis, doa, and odbase: Otm Confederated International Conferences, Coopis, Doa, and Odbase 2003, Catania, Sicily, Italy, November 3-7, 2003. Proceedings (pp. 986- 996).
[7]	Hassan, M. M., Hassan, M. M., Yasmin, F., Khan, M. A. R., Zaman, S., Islam, K. K., et al. (2023). A comparative assessment of machine learning algorithms with the Least Absolute Shrinkage and Selection Operator for breast cancer detection and prediction. Decision Analytics Journal, 7, 100245. https://doi.org/10.1016/j.dajour.2023.100245
[8]	Larsson, J. (2024). Optimization and algorithms in sparse regression: Screening rules, coordinate descent, and normalization. Lund University.
[9]	Liang, P., Song, B., Zhan, X., Chen, Z., &Yuan, J.(2024). Automating the training and deployment of models in MLOps by integrating systems with machine learning. https://doi.org/10.48550/arXiv.2405.09819
[10]	Mahović, D., Bračić, M., & Jakuš, L. (2021). Diagnostic criteria and classification of migraine. Medicus, 301, Migrena, 39-44.: https://hrcak.srce.hr/257514
[11]	Miao, Y., Wang, J., Zhang, B., & Li, H. (2022). Practical framework of Gini index in the application of machinery fault feature extraction. Mechanical Systems and Signal Processing, 165, 108333. https://doi.org/10.1016/j.ymssp.2021.108333
[12]	Qi, Y. (2012). Random forest for bioinformatics. Ensemble machine learning: Methods and applications, 307-323.
[13]	Rigatti, S. J. (2017). Random forest. Journal of Insurance Medicine, 47(1), 31-39.
[14]	Selvaraj, S. (2024). Building RESTful APIs with Spring Boot (Java). In Mastering REST APIs: Boosting Your Web Development Journey with Advanced API Techniques (pp. 291-347). Springer. https://doi.org/10.1007/979-8-8688-0309-3-7
[15]	Shaik, N. B., Jongkittinarukorn, K., & Bingi, K. (2024). XGBoost based enhanced predictive model for handling missing input parameters: A case study on gas turbine. Case Studies in Chemical and Environmental Engineering, 10, 100775. https://doi.org/10.1016/j.cscee.2024.100775
[16]	Wang, Y., & Wang, Z.-O. (2007). A fast KNN algorithm for text categorization. In 2007 International Conference on Machine Learning and Cybernetics (Vol. 6, pp. 3436- 3441). https://doi.org/10.1109/ICMLC.2007.4370742
[17]	Zhang, S., Li, X., Zong, M., Zhu, X., &Cheng, D.(2017). Learning k for KNN classification. ACM Transactions on Intelligent Systems and Technology (TIST), 8(3), 1-19. https://doi.org/10.1145/2990508
[18]	Zhou, X., Lu, P., Zheng, Z., Tolliver, D., & Keramati, A. (2020). Accident prediction accuracy assessment for highway-rail grade crossings using random forest algorithm compared with decision tree. Reliability Engineering & System Safety, 200, 106931. https://doi.org/10.1016/j.ress.2020.106931

Cite This Article

Plain Text BibTeX RIS

APA Style

Kamau, B. N., Malenje, B., Wamwea, C., Onyango, L. A. (2025). Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. American Journal of Mathematical and Computer Modelling, 10(1), 19-28. https://doi.org/10.11648/j.ajmcm.20251001.13

Copy | Download

ACS Style

Kamau, B. N.; Malenje, B.; Wamwea, C.; Onyango, L. A. Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. Am. J. Math. Comput. Model. 2025, 10(1), 19-28. doi: 10.11648/j.ajmcm.20251001.13

Copy | Download

AMA Style

Kamau BN, Malenje B, Wamwea C, Onyango LA. Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. Am J Math Comput Model. 2025;10(1):19-28. doi: 10.11648/j.ajmcm.20251001.13

Copy | Download

@article{10.11648/j.ajmcm.20251001.13,
  author = {Boniface Ngugi Kamau and Bonface Malenje and Charity Wamwea and Lena Anyango Onyango},
  title = {Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification},
  journal = {American Journal of Mathematical and Computer Modelling},
  volume = {10},
  number = {1},
  pages = {19-28},
  doi = {10.11648/j.ajmcm.20251001.13},
  url = {https://doi.org/10.11648/j.ajmcm.20251001.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmcm.20251001.13},
  abstract = {Migraine is a common neurological disorder that can seriously compromise the quality of life of the affected individuals. Migraine's typical diagnosis is solely dependent on traditional diagnostic methods which relies on patient self-reporting and clinical judgment, which can be subjective and prone to errors. The main objective of this study was to model migraine classification using Extreme Gradient Boosting (XGBoost), Random Forest, and K-Nearest Neighbors (KNN) algorithms, integrating Least Absolute Shrinkage and Selection Operator (LASSO) for feature regularization. Through this study, the classifications abilities of these machine learning models were evaluated to determine which among them is superior in terms of classifying the type of migraine one is suffering from. To prevent overfitting and enhance interpretability, LASSO regression was utilized for feature regularization. The models were trained with a labeled data set, hyperparameter tuning was achieved through Grid Search to systematically explore different combinations of hyperparameters and identify the optimal settings that maximize models performance. The models were evaluated based on accuracy, precision, recall, ROC-AUC, F1-score and computation time. The top-performing model was deployed into a web-based application using Spring Boot. XGBoost outperformed the other models, achieving an accuracy of 92.4%, an AUC of 96.0%, an F1-score of 91.65%, and a sensitivity of 92.24%, with a false positive rate of 1.59% and a computation time of 2.08s. Random Forest followed closely with 91.6% accuracy, a 94.0% AUC, an F1-score of 90.49%, and a sensitivity of 86.45%, but required 4.65s of computation time. K-Nearest Neighbors (KNN) demonstrated the lowest performance, with an accuracy of 86.6%, an AUC of 91.0%, F1-score of 80.53%, a sensitivity of 79.32%, and the highest computation time of 9.51s. XGBoost was found to be the most appropriate choice for migraine classification. This study highlights the promise of machine learning in enhancing migraine diagnosis through objective and data-driven means.},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification
AU  - Boniface Ngugi Kamau
AU  - Bonface Malenje
AU  - Charity Wamwea
AU  - Lena Anyango Onyango
Y1  - 2025/03/31
PY  - 2025
N1  - https://doi.org/10.11648/j.ajmcm.20251001.13
DO  - 10.11648/j.ajmcm.20251001.13
T2  - American Journal of Mathematical and Computer Modelling
JF  - American Journal of Mathematical and Computer Modelling
JO  - American Journal of Mathematical and Computer Modelling
SP  - 19
EP  - 28
PB  - Science Publishing Group
SN  - 2578-8280
UR  - https://doi.org/10.11648/j.ajmcm.20251001.13
AB  - Migraine is a common neurological disorder that can seriously compromise the quality of life of the affected individuals. Migraine's typical diagnosis is solely dependent on traditional diagnostic methods which relies on patient self-reporting and clinical judgment, which can be subjective and prone to errors. The main objective of this study was to model migraine classification using Extreme Gradient Boosting (XGBoost), Random Forest, and K-Nearest Neighbors (KNN) algorithms, integrating Least Absolute Shrinkage and Selection Operator (LASSO) for feature regularization. Through this study, the classifications abilities of these machine learning models were evaluated to determine which among them is superior in terms of classifying the type of migraine one is suffering from. To prevent overfitting and enhance interpretability, LASSO regression was utilized for feature regularization. The models were trained with a labeled data set, hyperparameter tuning was achieved through Grid Search to systematically explore different combinations of hyperparameters and identify the optimal settings that maximize models performance. The models were evaluated based on accuracy, precision, recall, ROC-AUC, F1-score and computation time. The top-performing model was deployed into a web-based application using Spring Boot. XGBoost outperformed the other models, achieving an accuracy of 92.4%, an AUC of 96.0%, an F1-score of 91.65%, and a sensitivity of 92.24%, with a false positive rate of 1.59% and a computation time of 2.08s. Random Forest followed closely with 91.6% accuracy, a 94.0% AUC, an F1-score of 90.49%, and a sensitivity of 86.45%, but required 4.65s of computation time. K-Nearest Neighbors (KNN) demonstrated the lowest performance, with an accuracy of 86.6%, an AUC of 91.0%, F1-score of 80.53%, a sensitivity of 79.32%, and the highest computation time of 9.51s. XGBoost was found to be the most appropriate choice for migraine classification. This study highlights the promise of machine learning in enhancing migraine diagnosis through objective and data-driven means.
VL  - 10
IS  - 1
ER  -

Copy | Download

Author Information

Boniface Ngugi Kamau

Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

Contact Email
Bonface Malenje

Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

Contact Email
Charity Wamwea

Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

Contact Email
Lena Anyango Onyango

Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

Contact Email

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Kamau, B. N., Malenje, B., Wamwea, C., Onyango, L. A. (2025). Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. American Journal of Mathematical and Computer Modelling, 10(1), 19-28. https://doi.org/10.11648/j.ajmcm.20251001.13

Copy | Download

ACS Style

Kamau, B. N.; Malenje, B.; Wamwea, C.; Onyango, L. A. Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. Am. J. Math. Comput. Model. 2025, 10(1), 19-28. doi: 10.11648/j.ajmcm.20251001.13

Copy | Download

AMA Style

Kamau BN, Malenje B, Wamwea C, Onyango LA. Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. Am J Math Comput Model. 2025;10(1):19-28. doi: 10.11648/j.ajmcm.20251001.13

Copy | Download

@article{10.11648/j.ajmcm.20251001.13,
  author = {Boniface Ngugi Kamau and Bonface Malenje and Charity Wamwea and Lena Anyango Onyango},
  title = {Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification},
  journal = {American Journal of Mathematical and Computer Modelling},
  volume = {10},
  number = {1},
  pages = {19-28},
  doi = {10.11648/j.ajmcm.20251001.13},
  url = {https://doi.org/10.11648/j.ajmcm.20251001.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmcm.20251001.13},
  abstract = {Migraine is a common neurological disorder that can seriously compromise the quality of life of the affected individuals. Migraine's typical diagnosis is solely dependent on traditional diagnostic methods which relies on patient self-reporting and clinical judgment, which can be subjective and prone to errors. The main objective of this study was to model migraine classification using Extreme Gradient Boosting (XGBoost), Random Forest, and K-Nearest Neighbors (KNN) algorithms, integrating Least Absolute Shrinkage and Selection Operator (LASSO) for feature regularization. Through this study, the classifications abilities of these machine learning models were evaluated to determine which among them is superior in terms of classifying the type of migraine one is suffering from. To prevent overfitting and enhance interpretability, LASSO regression was utilized for feature regularization. The models were trained with a labeled data set, hyperparameter tuning was achieved through Grid Search to systematically explore different combinations of hyperparameters and identify the optimal settings that maximize models performance. The models were evaluated based on accuracy, precision, recall, ROC-AUC, F1-score and computation time. The top-performing model was deployed into a web-based application using Spring Boot. XGBoost outperformed the other models, achieving an accuracy of 92.4%, an AUC of 96.0%, an F1-score of 91.65%, and a sensitivity of 92.24%, with a false positive rate of 1.59% and a computation time of 2.08s. Random Forest followed closely with 91.6% accuracy, a 94.0% AUC, an F1-score of 90.49%, and a sensitivity of 86.45%, but required 4.65s of computation time. K-Nearest Neighbors (KNN) demonstrated the lowest performance, with an accuracy of 86.6%, an AUC of 91.0%, F1-score of 80.53%, a sensitivity of 79.32%, and the highest computation time of 9.51s. XGBoost was found to be the most appropriate choice for migraine classification. This study highlights the promise of machine learning in enhancing migraine diagnosis through objective and data-driven means.},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification
AU  - Boniface Ngugi Kamau
AU  - Bonface Malenje
AU  - Charity Wamwea
AU  - Lena Anyango Onyango
Y1  - 2025/03/31
PY  - 2025
N1  - https://doi.org/10.11648/j.ajmcm.20251001.13
DO  - 10.11648/j.ajmcm.20251001.13
T2  - American Journal of Mathematical and Computer Modelling
JF  - American Journal of Mathematical and Computer Modelling
JO  - American Journal of Mathematical and Computer Modelling
SP  - 19
EP  - 28
PB  - Science Publishing Group
SN  - 2578-8280
UR  - https://doi.org/10.11648/j.ajmcm.20251001.13
AB  - Migraine is a common neurological disorder that can seriously compromise the quality of life of the affected individuals. Migraine's typical diagnosis is solely dependent on traditional diagnostic methods which relies on patient self-reporting and clinical judgment, which can be subjective and prone to errors. The main objective of this study was to model migraine classification using Extreme Gradient Boosting (XGBoost), Random Forest, and K-Nearest Neighbors (KNN) algorithms, integrating Least Absolute Shrinkage and Selection Operator (LASSO) for feature regularization. Through this study, the classifications abilities of these machine learning models were evaluated to determine which among them is superior in terms of classifying the type of migraine one is suffering from. To prevent overfitting and enhance interpretability, LASSO regression was utilized for feature regularization. The models were trained with a labeled data set, hyperparameter tuning was achieved through Grid Search to systematically explore different combinations of hyperparameters and identify the optimal settings that maximize models performance. The models were evaluated based on accuracy, precision, recall, ROC-AUC, F1-score and computation time. The top-performing model was deployed into a web-based application using Spring Boot. XGBoost outperformed the other models, achieving an accuracy of 92.4%, an AUC of 96.0%, an F1-score of 91.65%, and a sensitivity of 92.24%, with a false positive rate of 1.59% and a computation time of 2.08s. Random Forest followed closely with 91.6% accuracy, a 94.0% AUC, an F1-score of 90.49%, and a sensitivity of 86.45%, but required 4.65s of computation time. K-Nearest Neighbors (KNN) demonstrated the lowest performance, with an accuracy of 86.6%, an AUC of 91.0%, F1-score of 80.53%, a sensitivity of 79.32%, and the highest computation time of 9.51s. XGBoost was found to be the most appropriate choice for migraine classification. This study highlights the promise of machine learning in enhancing migraine diagnosis through objective and data-driven means.
VL  - 10
IS  - 1
ER  -

Copy | Download