Migraine is a common neurological disorder that can seriously compromise the quality of life of the affected individuals. Migraine's typical diagnosis is solely dependent on traditional diagnostic methods which relies on patient self-reporting and clinical judgment, which can be subjective and prone to errors. The main objective of this study was to model migraine classification using Extreme Gradient Boosting (XGBoost), Random Forest, and K-Nearest Neighbors (KNN) algorithms, integrating Least Absolute Shrinkage and Selection Operator (LASSO) for feature regularization. Through this study, the classifications abilities of these machine learning models were evaluated to determine which among them is superior in terms of classifying the type of migraine one is suffering from. To prevent overfitting and enhance interpretability, LASSO regression was utilized for feature regularization. The models were trained with a labeled data set, hyperparameter tuning was achieved through Grid Search to systematically explore different combinations of hyperparameters and identify the optimal settings that maximize models performance. The models were evaluated based on accuracy, precision, recall, ROC-AUC, F1-score and computation time. The top-performing model was deployed into a web-based application using Spring Boot. XGBoost outperformed the other models, achieving an accuracy of 92.4%, an AUC of 96.0%, an F1-score of 91.65%, and a sensitivity of 92.24%, with a false positive rate of 1.59% and a computation time of 2.08s. Random Forest followed closely with 91.6% accuracy, a 94.0% AUC, an F1-score of 90.49%, and a sensitivity of 86.45%, but required 4.65s of computation time. K-Nearest Neighbors (KNN) demonstrated the lowest performance, with an accuracy of 86.6%, an AUC of 91.0%, F1-score of 80.53%, a sensitivity of 79.32%, and the highest computation time of 9.51s. XGBoost was found to be the most appropriate choice for migraine classification. This study highlights the promise of machine learning in enhancing migraine diagnosis through objective and data-driven means.
Published in | American Journal of Mathematical and Computer Modelling (Volume 10, Issue 1) |
DOI | 10.11648/j.ajmcm.20251001.13 |
Page(s) | 19-28 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2025. Published by Science Publishing Group |
Random Forest, K-Nearest Neighbors, Extreme Gradient Boosting, Least Absolute Shrinkage and Selection Operator
[1] | Breiman, L. (2001). Random forests. Machine learning, 45, 5-32. |
[2] | Burton, W. N., Landy, S. H., Downs, K. E., & Runken, M. C. (2009). The impact of migraines and the effect of migraines treatment on workplace productivity in the United States and suggestions for future research. In Mayo Clinic Proceedings (Vol. 84, pp. 436-445). |
[3] | Chardet, M., Coullon, H., Pertin, D., & Pérez, C. (2018). Madeus: A formal deployment model. In 2018 International Conference on High Performance Computing & Simulation (HPCS) (pp. 724-731). |
[4] | Choudhary, K., DeCost, B., Chen, C., Jain, A., Tavazza, F., Cohn, R., et al. (2022). Recent advances and applications of deep learning methods in materials science. npj Computational Materials, 8(159). |
[5] | Diaby, T., & Rad, B. B. (2017). Cloud computing: A review of the concepts and deployment models. International Journal of Information Technology and Computer Science, 9(6), 50-58. |
[6] | Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN model-based approach in classification. In On the move to meaningful internet systems 2003: Coopis, doa, and odbase: Otm Confederated International Conferences, Coopis, Doa, and Odbase 2003, Catania, Sicily, Italy, November 3-7, 2003. Proceedings (pp. 986- 996). |
[7] | Hassan, M. M., Hassan, M. M., Yasmin, F., Khan, M. A. R., Zaman, S., Islam, K. K., et al. (2023). A comparative assessment of machine learning algorithms with the Least Absolute Shrinkage and Selection Operator for breast cancer detection and prediction. Decision Analytics Journal, 7, 100245. |
[8] | Larsson, J. (2024). Optimization and algorithms in sparse regression: Screening rules, coordinate descent, and normalization. Lund University. |
[9] | Liang, P., Song, B., Zhan, X., Chen, Z., &Yuan, J.(2024). Automating the training and deployment of models in MLOps by integrating systems with machine learning. |
[10] | Mahović, D., Bračić, M., & Jakuš, L. (2021). Diagnostic criteria and classification of migraine. Medicus, 301, Migrena, 39-44.: |
[11] | Miao, Y., Wang, J., Zhang, B., & Li, H. (2022). Practical framework of Gini index in the application of machinery fault feature extraction. Mechanical Systems and Signal Processing, 165, 108333. |
[12] | Qi, Y. (2012). Random forest for bioinformatics. Ensemble machine learning: Methods and applications, 307-323. |
[13] | Rigatti, S. J. (2017). Random forest. Journal of Insurance Medicine, 47(1), 31-39. |
[14] | Selvaraj, S. (2024). Building RESTful APIs with Spring Boot (Java). In Mastering REST APIs: Boosting Your Web Development Journey with Advanced API Techniques (pp. 291-347). Springer. |
[15] | Shaik, N. B., Jongkittinarukorn, K., & Bingi, K. (2024). XGBoost based enhanced predictive model for handling missing input parameters: A case study on gas turbine. Case Studies in Chemical and Environmental Engineering, 10, 100775. |
[16] | Wang, Y., & Wang, Z.-O. (2007). A fast KNN algorithm for text categorization. In 2007 International Conference on Machine Learning and Cybernetics (Vol. 6, pp. 3436- 3441). |
[17] | Zhang, S., Li, X., Zong, M., Zhu, X., &Cheng, D.(2017). Learning k for KNN classification. ACM Transactions on Intelligent Systems and Technology (TIST), 8(3), 1-19. |
[18] | Zhou, X., Lu, P., Zheng, Z., Tolliver, D., & Keramati, A. (2020). Accident prediction accuracy assessment for highway-rail grade crossings using random forest algorithm compared with decision tree. Reliability Engineering & System Safety, 200, 106931. |
APA Style
Kamau, B. N., Malenje, B., Wamwea, C., Onyango, L. A. (2025). Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. American Journal of Mathematical and Computer Modelling, 10(1), 19-28. https://doi.org/10.11648/j.ajmcm.20251001.13
ACS Style
Kamau, B. N.; Malenje, B.; Wamwea, C.; Onyango, L. A. Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. Am. J. Math. Comput. Model. 2025, 10(1), 19-28. doi: 10.11648/j.ajmcm.20251001.13
AMA Style
Kamau BN, Malenje B, Wamwea C, Onyango LA. Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification. Am J Math Comput Model. 2025;10(1):19-28. doi: 10.11648/j.ajmcm.20251001.13
@article{10.11648/j.ajmcm.20251001.13, author = {Boniface Ngugi Kamau and Bonface Malenje and Charity Wamwea and Lena Anyango Onyango}, title = {Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification}, journal = {American Journal of Mathematical and Computer Modelling}, volume = {10}, number = {1}, pages = {19-28}, doi = {10.11648/j.ajmcm.20251001.13}, url = {https://doi.org/10.11648/j.ajmcm.20251001.13}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmcm.20251001.13}, abstract = {Migraine is a common neurological disorder that can seriously compromise the quality of life of the affected individuals. Migraine's typical diagnosis is solely dependent on traditional diagnostic methods which relies on patient self-reporting and clinical judgment, which can be subjective and prone to errors. The main objective of this study was to model migraine classification using Extreme Gradient Boosting (XGBoost), Random Forest, and K-Nearest Neighbors (KNN) algorithms, integrating Least Absolute Shrinkage and Selection Operator (LASSO) for feature regularization. Through this study, the classifications abilities of these machine learning models were evaluated to determine which among them is superior in terms of classifying the type of migraine one is suffering from. To prevent overfitting and enhance interpretability, LASSO regression was utilized for feature regularization. The models were trained with a labeled data set, hyperparameter tuning was achieved through Grid Search to systematically explore different combinations of hyperparameters and identify the optimal settings that maximize models performance. The models were evaluated based on accuracy, precision, recall, ROC-AUC, F1-score and computation time. The top-performing model was deployed into a web-based application using Spring Boot. XGBoost outperformed the other models, achieving an accuracy of 92.4%, an AUC of 96.0%, an F1-score of 91.65%, and a sensitivity of 92.24%, with a false positive rate of 1.59% and a computation time of 2.08s. Random Forest followed closely with 91.6% accuracy, a 94.0% AUC, an F1-score of 90.49%, and a sensitivity of 86.45%, but required 4.65s of computation time. K-Nearest Neighbors (KNN) demonstrated the lowest performance, with an accuracy of 86.6%, an AUC of 91.0%, F1-score of 80.53%, a sensitivity of 79.32%, and the highest computation time of 9.51s. XGBoost was found to be the most appropriate choice for migraine classification. This study highlights the promise of machine learning in enhancing migraine diagnosis through objective and data-driven means.}, year = {2025} }
TY - JOUR T1 - Comparative Study of Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and Random Forest for Migraine Classification AU - Boniface Ngugi Kamau AU - Bonface Malenje AU - Charity Wamwea AU - Lena Anyango Onyango Y1 - 2025/03/31 PY - 2025 N1 - https://doi.org/10.11648/j.ajmcm.20251001.13 DO - 10.11648/j.ajmcm.20251001.13 T2 - American Journal of Mathematical and Computer Modelling JF - American Journal of Mathematical and Computer Modelling JO - American Journal of Mathematical and Computer Modelling SP - 19 EP - 28 PB - Science Publishing Group SN - 2578-8280 UR - https://doi.org/10.11648/j.ajmcm.20251001.13 AB - Migraine is a common neurological disorder that can seriously compromise the quality of life of the affected individuals. Migraine's typical diagnosis is solely dependent on traditional diagnostic methods which relies on patient self-reporting and clinical judgment, which can be subjective and prone to errors. The main objective of this study was to model migraine classification using Extreme Gradient Boosting (XGBoost), Random Forest, and K-Nearest Neighbors (KNN) algorithms, integrating Least Absolute Shrinkage and Selection Operator (LASSO) for feature regularization. Through this study, the classifications abilities of these machine learning models were evaluated to determine which among them is superior in terms of classifying the type of migraine one is suffering from. To prevent overfitting and enhance interpretability, LASSO regression was utilized for feature regularization. The models were trained with a labeled data set, hyperparameter tuning was achieved through Grid Search to systematically explore different combinations of hyperparameters and identify the optimal settings that maximize models performance. The models were evaluated based on accuracy, precision, recall, ROC-AUC, F1-score and computation time. The top-performing model was deployed into a web-based application using Spring Boot. XGBoost outperformed the other models, achieving an accuracy of 92.4%, an AUC of 96.0%, an F1-score of 91.65%, and a sensitivity of 92.24%, with a false positive rate of 1.59% and a computation time of 2.08s. Random Forest followed closely with 91.6% accuracy, a 94.0% AUC, an F1-score of 90.49%, and a sensitivity of 86.45%, but required 4.65s of computation time. K-Nearest Neighbors (KNN) demonstrated the lowest performance, with an accuracy of 86.6%, an AUC of 91.0%, F1-score of 80.53%, a sensitivity of 79.32%, and the highest computation time of 9.51s. XGBoost was found to be the most appropriate choice for migraine classification. This study highlights the promise of machine learning in enhancing migraine diagnosis through objective and data-driven means. VL - 10 IS - 1 ER -