Review Article | | Peer-Reviewed

Data Science and Machine Learning for Cyber Intrusion Detection: A Systematic Review

Received: 28 February 2026     Accepted: 11 March 2026     Published: 18 March 2026
Views:       Downloads:
Abstract

The escalating sophistication and volume of cyberattacks have driven an urgent demand for intelligent Intrusion Detection Systems (IDS) that leverage Data Science (DS) and Machine Learning (ML). Despite rapid advances, existing reviews often focus narrowly on specific aspects without integrating the full data science and machine learning lifecycle. This paper presents a systematic review of DS and ML applications in cyber intrusion detection, covering 153 studies published from 2009 to 2025. The review systematically surveys benchmark datasets, data preprocessing and feature engineering techniques, classical ML and Deep Learning (DL) models, ensemble and hybrid strategies, class imbalance handling, and evaluation methodologies. A unified four-axis taxonomy is proposed to classify the literature, including learning strategy, imbalance handling, explainability level, and deployment context. A quantitative meta-analysis reveals that UNSW-NB15 and CIC-IDS2017 dominate at 71% combined dataset usage, deep learning represents 40% of algorithmic approaches, and only 34% of studies report per-class recall for minority attack types. Nine technically grounded research gaps are identified, spanning preprocessing standardization, cross-dataset evaluation, minority-class recall optimization, adversarial robustness, online and edge deployment, explainability for Security Operations Center (SOC) operations, federated learning, transformer and Large Language Model (LLMs) application, and zero-shot adaptation. The review further identifies eight emerging trends including attention-based and transformer architectures, LLMs, Graph Neural Networks (GNNs), federated and privacy-preserving learning, adversarial robustness, Explainable AI (XAI), zero-shot and few-shot detection, and Internet of Things (IoT) edge-based IDS. A seven-stage actionable architecture is proposed that integrates adaptive preprocessing, contrastive feature learning, recall-aware ensemble detection, XAI decision support, continual learning, and federated aggregation. This review provides researchers and practitioners with a structured roadmap for advancing the next generation of intelligent cyber intrusion detection systems.

Published in Machine Learning Research (Volume 11, Issue 1)
DOI 10.11648/j.mlr.20261101.12
Page(s) 8-21
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2026. Published by Science Publishing Group

Keywords

Intrusion Detection, Machine Learning, Deep Learning, Data Science, Explainable AI

References
[1] S. Corbet and J. W. Goodell,“The reputational contagion effects of ransomware attacks,” Finance Research Letters, vol. 47, pp. 102715, 2022.
[2] M. A. Al-Garadi, A. Mohamed, A. K. Al-Ali, X. Du, I. Ali, and M. Guizani, “A survey of machine learning and deep learning in cybersecurity,” IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 1646–1685, 2020.
[3] I. H. Sarker, A. S. M. Kayes, S. Badsha, H. Alqahtani, P. Watters, and A. Ng, “Cybersecurity data science: An overview from machine learning perspective,” Journal of Big Data, vol. 7, no. 1, p. 41, 2020.
[4] Z. Ahmad, A. S. Khan, C. W. Shiang, J. Abdullah, and F. Ahmad, “Network intrusion detection system: A systematic study of ML and DL approaches,” Transactions on Emerging Telecommunications Technologies, vol. 32, no. 1, e4150, 2021.
[5] A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, “Survey of intrusion detection systems: Techniques, datasets and challenges,” Cybersecurity, vol. 2, no. 1,p. 20, 2019.
[6] M. A. Ferrag, L. Maglaras, S. Moschoyiannis, andH. Janicke, “Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study,” Journal of Information Security and Applications, vol. 50, 102419, 2020.
[7] I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Computer Science, vol. 2, no. 3, p. 160, 2021.
[8] A. Thakkar and R. Lohiya, “A survey on intrusion detection system: Feature selection, model, performance measures, application perspective and challenges,” Artificial Intelligence Review, vol. 55, pp. 453–563, 2022.
[9] N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems,” in Proc. MilCIS, pp. 1–6, 2015. [Dataset updated 2023.]
[10] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating a new intrusion detection dataset and intrusion traffic characterization,” in Proc. ICISSP, pp. 108–116, 2018. [CIC-IDS2017 dataset.]
[11] O. H. Abdulganiyu, T. A. Tchakoucht, and Y. K. Saheed, “A systematic literature review on network intrusion detection based on deep learning,” Expert Systems with Applications, vol. 215, 119357, 2023.
[12] S. Choudhary and N. Kesswani, “Analysis of KDD-Cup’99, NSL-KDD and UNSW-NB15 using deep learning with LSTM,” Procedia Computer Science, vol. 167, pp. 592–600, 2020.
[13] P. Selvam and K. Latha, “An efficient feature selection method for network intrusion detection using PSO-GA,” Journal of Intelligent & Fuzzy Systems, vol. 44, no. 1, pp. 1–15, 2023.
[14] H. Liu and B. Lang, “Machine learning and deep learning methods for intrusion detection systems: A survey,” Applied Sciences, vol. 9, no. 20, 4396, 2019.
[15] P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli, “A detailed investigation and analysis of using machine learning techniques for intrusion detection,” IEEE Communications Surveys & Tutorials, vol. 21, no. 1, pp. 686–728, 2019.
[16] W. Villegas-Ch, J. Govea, A. Maldonado Navarro, and P. Palacios Játiva, ”Intrusion Detection in IoT Networks Using Dynamic Graph Modeling and Graph-Based Neural Networks,” IEEE Access, vol. 13, pp. 65356-65375, 2025.
[17] Z. Z. Shah, M. Ikram, H. J. Asghar, and M. A. Kaafar, ”Deception Meets Diagnostics: Deception-based Real-Time Threat Detection in Healthcare Web Systems,” in Proc. 28th International Symposium on Research in Attacks, Intrusions and Defenses (RAID), pp. 391-410, 2025. IEEE.
[18] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis of the KDD CUP 99 data set,” in Proc. IEEE CISDA, pp. 1–6, 2009.
[19] I. Sharafaldin, A. H. Lashkari, S. Hakak, and A. A. Ghorbani, “Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy,” in Proc. IEEE ICCST, pp. 1–8, 2019.
[20] N. Koroniotis, N. Moustafa, E. Sitnikova, and B. Turnbull, “Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset,” Future Generation Computer Systems, vol. 100, pp. 779–796, 2019.
[21] N. Moustafa, “A new distributed architecture for evaluating AI-based security systems at the edge: Network TON IoT datasets,” Sustainable Cities and Society, vol. 72, 102994, 2021.
[22] E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, and A. A. Ghorbani, “CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment,” Sensors, vol. 23, no. 13, 5941, 2023.
[23] M. A. Ferrag, O. Friha, D. Hamouda, L. Maglaras, and H. Janicke, “Edge-IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning,” IEEE Access, vol. 10, pp. 40281–40306, 2022.
[24] S. Dadkhah, E. C. P. Neto, R. Ferreira, R. C. Molokwu, S. Sadeghi, and A. A. Ghorbani, “CICIoMT2024: A benchmark dataset for multi-protocol security assessment in IoMT,” Internet of Things, vol. 28, 101351, 2024.
[25] R. Sommer and V. Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” in Proc. IEEE S&P, pp. 305–316, 2010. [Reprinted with commentary 2019.]
[26] G. Apruzzese, P. Laskov, M. de Oca Montes, W. Malber, and F. Roli, “The role of machine learning in cybersecurity,” Digital Threats: Research and Practice, vol. 4, no. 1, pp. 1–38, 2023.
[27] S. Hozouri, A. Rezaei, M. Bahrololoum, and S. M. Hosseinirad, “A comprehensive survey on intrusion detection systems with advances in machine learning, deep learning and emerging cybersecurity challenges,” Discover Artificial Intelligence, vol. 5, p. 314, 2025.
[28] M. M. Rahman, S. A. Shakil, and M. R. Mustakim, “A survey on intrusion detection system in IoT networks,” Cyber Security and Applications, vol. 3, 100082, 2025.
[29] M. Sarhan, S. Layeghy, and M. Portmann, “Towards a standard feature set for network intrusion detection system datasets,” Mobile Networks and Applications, vol. 27, pp. 357–370, 2022.
[30] I. F. Kilincer, F. Ertam, and A. Sengur, “Machine learning methods for cyber security intrusion detection: Datasets and comparative study,” Computer Networks, vol. 188, 107840, 2021.
[31] J. Gu and S. Lu, “An effective intrusion detection approach using SVM with naïve Bayes feature embedding,” Computers & Security, vol. 103, 102158, 2021.
[32] B. A. Tama, M. Comuzzi, and K. H. Rhee, “TSE-IDS: A two-stage classifier ensemble for intelligent anomaly-based intrusion detection system,” IEEE Access, vol. 7, pp. 94497–94507, 2019.
[33] R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman, “Deep learning approach for intelligent intrusion detection system,” IEEE Access, vol. 7, pp. 41525–41550, 2019.
[34] X. Gao, C. Shan, C. Hu, Z. Niu, and Z. Liu, “An adaptive ensemble machine learning model for intrusion detection,” IEEE Access, vol. 7, pp. 82512–82521, 2019.
[35] S. S. Dhaliwal, A. A. Nahid, and R. Abbas, ”Effective Intrusion Detection System Using XGBoost,” Information, vol. 9, no. 7, 149, 2018.
[36] S. Rajagopal, K. S. Hareesha, and P. P. Mandya, “A stacking ensemble for network intrusion detection using heterogeneous datasets,” Security and Communication Networks, 2020, 4586875.
[37] S. M. Kasongo and Y. Sun, “Performance analysis of intrusion detection systems using a feature selection method on the UNSW-NB15 dataset,” Journal of Big Data, vol. 7, no. 1, p. 105, 2020.
[38] T. Saba, A. Rehman, T. Sadad, H. Kolivand, and S. A. Bahaj, “Anomaly-based intrusion detection system for IoT networks through deep learning model,” Computers and Electrical Engineering, vol. 99, 107810, 2022.
[39] Z. Xu, Y. Wu, S. Wang, J. Gao, T. Qiu, Z. Wang, H. Wan, and X. Zhao, “Deep learning-based intrusion detection systems: A survey,” arXiv: 2504.07839, 2025.
[40] M. Injadat, A. Moubayed, A. B. Nassif, and A. Shami, “Multi-stage optimized machine learning framework for network intrusion detection,” IEEE Transactions on Network and Service Management, vol. 18, no. 2, pp. 1803–1816, 2021.
[41] A. Capuano, G. Chiarella, R. Gallo, and G. A. Angelini, “Explainable artificial intelligence in cybersecurity: A survey,” IEEE Access, vol. 10, pp. 93575–93600, 2022.
[42] I. H. Sarker, “Deep cybersecurity: A comprehensive overview from neural network and deep learning perspective,” SN Computer Science, vol. 2, no. 3, p. 154, 2021.
[43] O. Arreche, T. Guntur, and M. Abdallah, “Evaluating machine learning-based intrusion detection systems with explainable AI: Enhancing transparency and interpretability,” Frontiers in Computer Science, vol. 7, 1520741, 2025.
[44] P. Houssel, P. Singh, S. Layeghy, and M. Portmann, “Towards explainable network intrusion detection using large language models,” arXiv: 2408.04342, 2024.
[45] Y. Zhou, G. Cheng, S. Jiang, and M. Dai, “Building an efficient intrusion detection system based on feature selection and ensemble classifier,” Computer Networks, vol. 174, 107247, 2020.
[46] V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, and G. Srivastava, “A survey on security and privacy of federated learning,” Future Generation Computer Systems, vol. 115, pp. 619–640, 2021.
[47] A. Karim, R. Islam, M. Shahriar, M. S. Kaiser, S. A. Pirani, and S. Belhaouari, “Intrusion detection based on federated learning: A systematic review,” ACM Computing Surveys, 2025.
[48] Y. Djenouri, A. Belhadi, and P. Srivastava, “Federated learning-based intrusion detection in IoT networks: Performance evaluation and data scaling study,” Journal of Sensor and Actuator Networks, vol. 14, no. 4, p. 78, 2025.
[49] X. Liu, T. Li, P. Gu, and Y. Su, “Survey of federated learning in intrusion detection,” Journal of Parallel and Distributed Computing, vol. 195, 104977, 2025.
[50] R. S. M. L. Patibandla, S. S. Kurra, and N. B. Mundukur, “A study on scalability of data science pipeline for unstructured data,” Journal of Big Data, vol. 9, no. 1, pp. 1–18, 2022.
[51] V. Kumar, A. K. Das, and D. Sinha, “Statistical analysis driven optimized neural network for intrusion detection,” International Journal of Information Security, vol. 20, pp. 827–841, 2021.
[52] F. E. Ayo, S. O. Folorunso, F. T. Ibharalu, and I. A. Osinuga, “Network intrusion detection based on deep learning model optimized with rule-based hybrid feature selection,” Inf. Secur. J., vol. 29, no. 6, pp. 267–283, 2020.
[53] H. Alazzam, A. Sharieh, and K. E. Sabri, “A feature selection algorithm for intrusion detection system based on pigeon inspired optimizer,” Expert Systems with Applications, vol. 148, 113249, 2020.
[54] Y. Yue, X. Chen, Z. Han, X. Zeng, and Y. Zhu, “Contrastive learning enhanced intrusion detection,” IEEE Transactions on Network and Service Management, vol. 19, no. 4, pp. 4232–4247, 2022.
[55] Y. Ren, “Factors Impacting the Adoption of Artificial Intelligence Powered Cybersecurity Virtual Assistant: A Quantitative Study,” Doctoral dissertation, National University, 2024.
[56] M. Pawlicki, M. Chorás, and R. Kozik, “Defending network intrusion detection systems against adversarial evasion attacks,” Future Generation Computer Systems, vol. 110, pp. 148–154, 2020.
[57] D. Han, Z. Wang, Y. Zhong, W. Chen, J. Yang, S. Lu, andX. Yin, “Evaluating and improving adversarial robustness of machine learning-based network intrusion detectors,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 8, pp. 2632–2647, 2021.
[58] M. A. Ferrag, M. Ndhlovu, N. Tihanyi, L. Cordeiro, M. Debbah, and T. H. Luan, “Revolutionizing cyber threat detection with large language models: A privacy-preserving BERT-based lightweight model for IoT/IIoT networks,” IEEE Access, vol. 12, pp. 23733–23750, 2024.
[59] Z. Xu, D. Li, T. He, and Q. Huang, “Few-shot learning for network intrusion detection using novel feature extraction with autoencoders,” IEEE Transactions on Network and Service Management, vol. 20, no. 2, pp. 1552–1564, 2023.
[60] J. Zhao, S. Shetty, J. W. Pan, C. Kamhoua, and K. Kwiat, “Transfer learning for detecting unknown network attacks,” EURASIP Journal on Information Security, vol. 1, 1–13, 2019.
[61] F. Blefari, C. Cosentino, F. A. Pironti, A. Furfaro, and F. Marozzo, ”CyberRAG: An Agentic RAG Cyber Attack Classification and Reporting Tool,” arXiv: 2507.02424, 2025.
[62] K. Alkhatib and S. Mohammed, “Transformer-based intrusion detection for IoT networks: A comprehensive survey,” IEEE Internet of Things Journal, vol. 11, no. 6, pp. 9874–9895, 2024.
[63] E. C. Nwosu, O. B. Longe, and N. S. Musa, “Graph-based network intrusion detection using community detection and flow correlation,” IEEE Transactions on Network and Service Management, vol. 19, no. 3, pp. 2987–3001, 2022.
[64] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, “Graph neural networks: A review of methods and applications,” AI Open, vol. 1, pp. 57–81, 2020.
Cite This Article
  • APA Style

    Ren, Y. (2026). Data Science and Machine Learning for Cyber Intrusion Detection: A Systematic Review. Machine Learning Research, 11(1), 8-21. https://doi.org/10.11648/j.mlr.20261101.12

    Copy | Download

    ACS Style

    Ren, Y. Data Science and Machine Learning for Cyber Intrusion Detection: A Systematic Review. Mach. Learn. Res. 2026, 11(1), 8-21. doi: 10.11648/j.mlr.20261101.12

    Copy | Download

    AMA Style

    Ren Y. Data Science and Machine Learning for Cyber Intrusion Detection: A Systematic Review. Mach Learn Res. 2026;11(1):8-21. doi: 10.11648/j.mlr.20261101.12

    Copy | Download

  • @article{10.11648/j.mlr.20261101.12,
      author = {Yali Ren},
      title = {Data Science and Machine Learning for Cyber Intrusion Detection: A Systematic Review
    },
      journal = {Machine Learning Research},
      volume = {11},
      number = {1},
      pages = {8-21},
      doi = {10.11648/j.mlr.20261101.12},
      url = {https://doi.org/10.11648/j.mlr.20261101.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20261101.12},
      abstract = {The escalating sophistication and volume of cyberattacks have driven an urgent demand for intelligent Intrusion Detection Systems (IDS) that leverage Data Science (DS) and Machine Learning (ML). Despite rapid advances, existing reviews often focus narrowly on specific aspects without integrating the full data science and machine learning lifecycle. This paper presents a systematic review of DS and ML applications in cyber intrusion detection, covering 153 studies published from 2009 to 2025. The review systematically surveys benchmark datasets, data preprocessing and feature engineering techniques, classical ML and Deep Learning (DL) models, ensemble and hybrid strategies, class imbalance handling, and evaluation methodologies. A unified four-axis taxonomy is proposed to classify the literature, including learning strategy, imbalance handling, explainability level, and deployment context. A quantitative meta-analysis reveals that UNSW-NB15 and CIC-IDS2017 dominate at 71% combined dataset usage, deep learning represents 40% of algorithmic approaches, and only 34% of studies report per-class recall for minority attack types. Nine technically grounded research gaps are identified, spanning preprocessing standardization, cross-dataset evaluation, minority-class recall optimization, adversarial robustness, online and edge deployment, explainability for Security Operations Center (SOC) operations, federated learning, transformer and Large Language Model (LLMs) application, and zero-shot adaptation. The review further identifies eight emerging trends including attention-based and transformer architectures, LLMs, Graph Neural Networks (GNNs), federated and privacy-preserving learning, adversarial robustness, Explainable AI (XAI), zero-shot and few-shot detection, and Internet of Things (IoT) edge-based IDS. A seven-stage actionable architecture is proposed that integrates adaptive preprocessing, contrastive feature learning, recall-aware ensemble detection, XAI decision support, continual learning, and federated aggregation. This review provides researchers and practitioners with a structured roadmap for advancing the next generation of intelligent cyber intrusion detection systems.
    },
     year = {2026}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Data Science and Machine Learning for Cyber Intrusion Detection: A Systematic Review
    
    AU  - Yali Ren
    Y1  - 2026/03/18
    PY  - 2026
    N1  - https://doi.org/10.11648/j.mlr.20261101.12
    DO  - 10.11648/j.mlr.20261101.12
    T2  - Machine Learning Research
    JF  - Machine Learning Research
    JO  - Machine Learning Research
    SP  - 8
    EP  - 21
    PB  - Science Publishing Group
    SN  - 2637-5680
    UR  - https://doi.org/10.11648/j.mlr.20261101.12
    AB  - The escalating sophistication and volume of cyberattacks have driven an urgent demand for intelligent Intrusion Detection Systems (IDS) that leverage Data Science (DS) and Machine Learning (ML). Despite rapid advances, existing reviews often focus narrowly on specific aspects without integrating the full data science and machine learning lifecycle. This paper presents a systematic review of DS and ML applications in cyber intrusion detection, covering 153 studies published from 2009 to 2025. The review systematically surveys benchmark datasets, data preprocessing and feature engineering techniques, classical ML and Deep Learning (DL) models, ensemble and hybrid strategies, class imbalance handling, and evaluation methodologies. A unified four-axis taxonomy is proposed to classify the literature, including learning strategy, imbalance handling, explainability level, and deployment context. A quantitative meta-analysis reveals that UNSW-NB15 and CIC-IDS2017 dominate at 71% combined dataset usage, deep learning represents 40% of algorithmic approaches, and only 34% of studies report per-class recall for minority attack types. Nine technically grounded research gaps are identified, spanning preprocessing standardization, cross-dataset evaluation, minority-class recall optimization, adversarial robustness, online and edge deployment, explainability for Security Operations Center (SOC) operations, federated learning, transformer and Large Language Model (LLMs) application, and zero-shot adaptation. The review further identifies eight emerging trends including attention-based and transformer architectures, LLMs, Graph Neural Networks (GNNs), federated and privacy-preserving learning, adversarial robustness, Explainable AI (XAI), zero-shot and few-shot detection, and Internet of Things (IoT) edge-based IDS. A seven-stage actionable architecture is proposed that integrates adaptive preprocessing, contrastive feature learning, recall-aware ensemble detection, XAI decision support, continual learning, and federated aggregation. This review provides researchers and practitioners with a structured roadmap for advancing the next generation of intelligent cyber intrusion detection systems.
    
    VL  - 11
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • School of Computer Science, Georgia Institute of Technology, Atlanta, the United States

  • Sections