| Peer-Reviewed

Using Data Mining Techniques for COVID-19: A Systematic Review

Received: 2 April 2022     Accepted: 1 June 2022     Published: 16 June 2022
Views:       Downloads:
Abstract

The primary goal of this survey is to determine the most widely used data mining approaches and knowledge gaps from published publications. The novel coronavirus pneumonia, namely COVID-19, has become a global public health problem. Since the threat of pandemics has raised public health concerns, researchers to uncover hidden knowledge have used data extraction techniques. Web of Science, Scopus, and PubMed databases were used to conduct systematic research. Then, to choose good papers, all retrieved publications were reviewed in a stepwise procedure using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist. All of the data were examined and summarized using a few different classifications. Out of 300 citations, 50 papers were eligible through a systematic review. The review results showed that the most favorite DM belonged to Natural language processing (22%), and the most commonly proposed approach was revealing disease characteristics (22%). Regarding diseases, the most addressed disease was COVID-19. The studies predominately apply supervised learning techniques (90%). We found infectious disease (36%) to be the most frequent, closely followed by epidemiology discipline concerning healthcare scopes. The most common software used in the studies was SPSS (22%) and R (20%). Our results indicate that there is a significant relationship between air pollution and COVID-19 infection, which could partially explain the effect of national lockdown and provide implications for the control and prevention of this novel disease. The results revealed valuable research conducted by employing the capabilities of knowledge discovery methods to understand the unknown dimensions of diseases in pandemics. However, most research will need in terms of treatment and disease control.

Published in International Journal on Data Science and Technology (Volume 8, Issue 2)
DOI 10.11648/j.ijdst.20220802.11
Page(s) 36-42
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2022. Published by Science Publishing Group

Keywords

Public Health, COVID-19, SARS-CoV-2, Machine Learning, Meta-Analyses, Data Mining

References
[1] Jain V, Duse A, Bausch DG. Planning for large epidemics and pandemics: challenges from a policy perspective. Curr Opin Infect Dis. 2018; 31 (4): 316–24.
[2] Cook AH, Cohen DB. Pandemic Disease: A Past and Future Challenge to Governance in the United States. Rev Policy Res. 2008; 25 (5): 449–71.
[3] Cascella M, Rajnik M, Cuomo A, Dulebohn SC, Di Napoli R. Features, evaluation, and treatment coronavirus (COVID-19). StatPearls [Internet]. StatPearls Publishing 2020.
[4] Zhang S, Diao M, Yu W, Pei L, Lin Z, Chen D. Estimation of the reproductive number of novelcoronavirus (COVID-19) and the probable outbreak size on the Diamond Princess cruise ship: A data-driven analysis. Int J Infect Dis. 2020; 93: 201–4.
[5] Atif I, Cawood FT, Mahboob MA. The Role of Digital Technologies that Could Be Applied for Prescreening in the Mining Industry During the COVID-19 Pandemic. Transactions of the IndianNational Academy of Engineering. 2020: 1–12.
[6] Gholamzadeh M, Abtahi H, Safdari R. Suggesting a framework for preparedness against thepandemic outbreak based on medical informatics solutions: a thematic analysis. The International Jour- nal of health planningand management. 2021; n/a (n/a).
[7] Gulyaeva M, Huettmann F, Shestopalov A, Okamatsu M, Matsuno K, Chu D-H, et al. Data mining and model-predicting a global dis- ease reservoir for low-pathogenic Avian Influenza in the wider pacific rim using big data sets.
[8] Day M (2020) COVID-19: identifying and isolating asymptomatic peo- ple helped eliminatevirus in Italian village. BMJ 368: 135.
[9] Deng X et al (2020) A classification–detection approach of COVID-19 based on chest X- rayand CT by using keras pre-trained deep learning models. Comput Model Eng Sci 125 (2): 579–596.
[10] Kassani SH et al (2020) Automatic detection of coronavirus disease (COVID-19) in X-ray andCT images: a machine learning-based approach 10 (4): 1–18.
[11] Asur S, Huberman BA, editors. Predicting the future with social media. 2010 IEEE/WIC/ACMinternational conference on web intelligence and intelligent agent technology; 2010: IEEE.
[12] Zhang Y, Guo SL, Han LN, Li TL. Application and Exploration of Big Data Mining in ClinicalMedicine. Chin Med J. 2016; 129 (6): 731–8.
[13] Alanazi HO, Abdullah AH, Qureshi KN, Ismail AS. Accurate and dynamic predictive modelfor better prediction in medicine and healthcare. Irish Journal of Medical Science (1971 -).2018; 187 (2): 501–13.
[14] Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ (Clinical research ed). 2009; 339: b2535.
[15] Patel S, Patel H. Survey of Data Mining Techniques used in Healthcare Domain. International Journal of Information Sciences and Techniques. 2016; 6 (1/2): 53–60.
[16] Institute JB. The Joanna Briggs Institute Critical Appraisal Tools. University of Adelaide, South Australia. 2017. https://jbi.global/critical-appraisal-tools. Accessed 5 Mar 2021.
[17] Hannes K, Lockwood C, Pearson A. A comparative analysis of three online appraisal instruments’ ability to assess validity in qualitative research. Qual Health Res. 2010; 20 (12): 1736–43.
[18] Alimadadi A et al (2020) Artificial intelligence and machine learning to fight COVID-19.
[19] American Physiological Society, Bethesda Brinati D et al (2020) Detection of COVID-19 infection from routine blood exams with machine learning: a feasibility study. J Med Syst 44: 135.
[20] Huang C, Xu X, Cai Y, Ge Q, Zeng G, Li X, et al. Mining the characteristics of COVID-19 patients in China: Analysis of social media posts. J Med Internet Res. 2020; 22 (5).
[21] Maram B, Padmapriya G, Satish AR. A framework for perfor- mance analysis on machine learning algorithms using COVID-19 dataset. Adv Math: Sci J. 2020; 9 (10): 8207-15.
[22] Foieni F, Sala G, Mognarelli JG, Suigo G, Zampini D, Pistoia M, et al. Derivation and validation of the clinical prediction model for COVID-19. Intern Emerg Med. 2020.
[23] Ma XX, Li A, Jiao MF, Shi QM, An XC, Feng YH, et al. Character- istic of 523 COVID-19 inHenan Province and a Death Prediction Model. Frontiers in Public Health. 2020; 8.
[24] Luo Y, Mao LY, Yuan X, Xue Y, Lin Q, Tang GX, et al. Prediction Model Based on the Combination of Cytokines and Lymphocyte Subsets for Prognosis of SARS-CoV-2 Infection. J Clin Immunol. 2020; 40 (7): 960–9.
[25] Ciucurel C, Iconaru EI. An Epidemiological Study on the Prev- alence of the Clinical Featuresof SARS-CoV-2 Infection in Romanian People. Int J Environ Res Public Health. 2020; 17 (14).
[26] Roland LT, Gurrola JG, Loftus PA, Cheung SW, Chang JLL. Smell and taste symptom- based predictive model for COVID-19 diagnosis. International Forum of Allergy & Rhinology. 2020; 10 (7): 832–8.
[27] Liu Q, Song NC, Zheng ZK, Li JS, Li SK. Laboratory findings and a combined multifactorialapproach to predict death in critically ill patients with COVID-19: a retrospective study. Epidemiology and Infection. 2020; 148.
[28] Li D, Chaudhary H, Zhang Z. Modeling Spatiotemporal Pattern of Depressive Symptoms Caused by COVID-19 Using Social Media Data Mining. Int J Environ Res Public Health. 2020; 17 (14).
[29] Sarker A, Lakamana S, Hogg-Bremer W, Xie A, Al-Garadi MA, Yang YC. Self- reported COVID-19 symptoms on Twitter: an analysis and a research resource. J Am Med InformAssoc. 2020; 27 (8): 1310–5.
[30] Abd-Alrazaq A, Alhuwail D, Househ M, Hamdi M, Shah Z. Top Concerns of Tweeters Duringthe COVID-19 Pandemic: Infoveil- lance Study. J Med Internet Res. 2020; 22 (4): e19016. https://doi. org/10.2196/19016.
[31] Zhang Y, Cheng J, Yang Y, Li H, Zheng X, Chen X, et al. COVID-19 public opinion and emotionmonitoring system based on time series thermal new word mining. Comput Mater Continua. 2020; 64 (3): 1415–34.
[32] Han X, Wang J, Zhang M, Wang X. Using social media to mine and analyze public opinionrelated to COVID-19 in China. Int J Environ Res Public Health. 2020; 17 (8). https://doi.org/10.3390/ ijerph17082788.
[33] Qin L, Sun Q, Wang Y, Wu KF, Chen M, Shia BC, et al. Predic- tion of number of cases of 2019 novel coronavirus (COVID-19) using social media search index. Int J Environ ResPublic Health. 2020; 17 (7).
[34] Ayyoubzadeh SM, Ayyoubzadeh SM. Predicting COVID-19 Incidence Through Analysis of GoogleTrends Data in Iran: Data Mining and Deep Learning Pilot Study. JMIR Public Health Surveill. 2020; 6 (2): e18828.
[35] Kostoff RN. Literature-related discovery: potential treatments and preventatives for SARS. Technol Forecast Soc Chang. 2011; 78 (7): 1164–73.
[36] Szomszor M, Kostkova P, St Louis C, editors. Twitter informatics: tracking and understandingpublic reaction during the 2009 swine flu pandemic. 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology; 2011: IEEE.
[37] Wahbeh A, Nasralah T, Al-Ramahi M, El-Gayar O. Mining Physi- cians’ Opinions on SocialMedia to Obtain Insights Into COVID-19: Mixed Methods Analysis. JMIR Public HealthSurveill. 2020; 6 (2): e19276.
[38] Chintalapudi N, Battineni G, Amenta F. COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: A data driven model approach. Journal ofMicrobiology, Immunology and Infection. 2020; InPress.
[39] Khan MA, Abbas S, Khan KM, Al Ghamdi MA, Rehman A. Intelligent Forecasting Model of COVID-19 Novel Coronavirus Outbreak Empowered with Deep Extreme Learning Machine. Cmc-Computers Materials & Continua. 2020; 64 (3): 1329–42.
[40] Kargarfard F, Sami A, Hemmatzadeh F, Ebrahimie E. Identifying mutation positions in all segments of influenza genome enables better differentiation between pandemic and seasonalstrains. Gene. 2019; 697: 78–85.
[41] Kostkova P, Szomszor M, St. Louis C. Swineflu: The use of twitter as an early warning and riskcommunication tool in the 2009 swine flu pandemic. ACM Transactions on Management Information Systems (TMIS). 2014; 5 (2): 1–25.
[42] Dong YL, Zhou HF, Li MY, Zhang ZL, Guo WN, Yu T, et al. A novel simple scoring model forpredicting severity of patients with SARS-CoV-2 infection. Transboundary and Emerging Dis. 2020.
[43] Verity R, Okell LC, Dorigatti I, Winskill P, Whittaker C, Imai N, et al. Estimates of the severityof coronavirus disease 2019: a model-based analysis. The Lancet Infectious Dis. 2020; InPress.
[44] Ghosal S, Sinha B, Majumder M, Misra A. Estimation of effects of nationwide lockdown for containing coronavi- rus infection on worsening of glycosylated haemoglobin and increase in diabetes-related complications: A simulation model using multivariate regressionanalysis. Diabetes Metab Syndr. 2020; 14 (4): 319–23.
[45] Cheng FY, Joshi H, Tandon P, Freeman R, Reich DL, Mazumdar M, et al. Using Machine Learning to Predict ICU Transfer in Hospitalized COVID-19 Patients. Journal of ClinicalMedicine. 2020; 9 (6).
[46] Zhao ZR, Chen AN, Hou W, Graham JM, Li HF, Richman PS, et al. Prediction model andrisk scores of ICU admission and mortality in COVID-19. Plos One. 2020; 15 (7).
[47] Alzahrani SI, Aljamaan IA, Al-Fakih EA. Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA pre- diction model under current public health interventions. J Infect Public Health. 2020; 13 (7): 914–9.
[48] Anastassopoulou C, Russo L, Tsakris A, Siettos C. Data-based analysis, modeling and forecasting of the COVID-19 outbreak. PLoS ONE. 2020; 15 (3).
[49] Pinter G, Felde I, Mosavi A, Ghamisi P, Gloaguen R. COVID-19 Pandemic Prediction forHungary; A Hybrid Machine Learn- ing Approach. Mathematics. 2020; 8 (6).
[50] Qiang XL, Xu P, Fang G, Liu WB, Kou Z. Using the spike pro- tein feature to predict infectionrisk and monitor the evolution- ary dynamic of coronavirus. Infect Dis Poverty. 2020; 9 (1): 33.
[51] Zhou YW, He YQ, Yang H, Yu H, Wang T, Chen Z, et al. Devel- opment and validation a nomogram for predicting the risk of severe COVID-19: A multi-center study in Sichuan, China. Plos One. 2020; 15 (5). 10.1371/journal.pone.0233328.
[52] Yan L, Zhang HT, Goncalves J, Xiao Y, Wang ML, Guo YQ et al. An interpretable mortalityprediction model for COVID-19 patients. Nature Machine Intelligence. 2020; 2 (5): 283.
[53] Jiang X, Coffee M, Bari A, Wang J, Jiang X, Huang J, et al. Towards an Artificial Intelligence Framework for Data-Driven Prediction of Coronavirus Clinical Severity. Computers, Mate- rials & Continua. 2020; 63 (1).
[54] Li S, Wang Y, Xue J, Zhao N, Zhu T. The impact of COVID-19 epidemic declaration on psychological consequences: a study on active Weibo users. Int J Environ Res Public Health. 2020; 17 (6): 2032.
[55] Martin-Rodriguez F, Sanz-Garcia A, Lopez-Izquierdo R, Benito JFD, Martin-Conty JL, Villamor MAC, et al. Predicting Health Care Workers’ Tolerance of Personal ProtectiveEquipment: An Observational Simulation Study. Clin Simul Nurs. 2020; 47: 65–72.
[56] Ketu S, Mishra PK. Enhanced Gaussian process regression- based forecasting model forCOVID-19 outbreak and signifi- cance of IoT for its detection. Appl Intell. 2020.
[57] Moftakhar L. The Exponentially Increasing Rate of Patients Infected with COVID-19 in Iran. Archives of Iranian medicine. 2020; 23 (4): 235–8.
[58] Yongjian Z, Jingu X, Fengming H, Liqing C. Association between short-term exposure to air pollution and COVID-19 infection: Evi- dence from China. Sci Total Environ. 2020: 138704.
[59] Sudirman ID, Nugraha DY. Naive Bayes classifier for predicting the factors that influence death due to COVID-19 in China. J Theor Appl Inf Technol. 2020; 98 (10): 1686–96.
[60] Sudirman ID, Aryanto R, Mulyani. Optimizing decision tree criteria for predicting COVID-19 mortality in South Korea dataset. J Theor Appl Inf Technol. 2020; 98 (15): 2889–900.
[61] Fan Q, Zhu HL, Zhao JX, Zhuang LF, Zhang H, Xie HY. Risk factors for myocardial injury in patients with coronavirus dis- ease et al 2019 in China Esc Heart Failure 2020.
[62] Ibrahim S, Kamaruddin SA, Sabri N, Samah KA, Noordin M, Shari A. The influences of global geographical climate towards COVID-19 spread and death. Int J Adv Trends Comput Sci Eng. 2020; 9 (1.4 Special Issue): 612–7.
[63] Lei MT, Monjardino J, Mendes L, Goncalves D, Ferreira F. Statistical Forecast of Pollution Episodes in Macao during National Holidayand COVID-19. Int J Environ Res Public Health. 2020; 17 (14).
[64] Ren X, Shao XX, Li XX, Jia XH, Song T, Zhou WY, et al. Identi- fying potential treatments of COVID-19 from Traditional Chinese Medicine (TCM) by using a data- driven approach. J Ethnophar- macol. 2020; 258.
[65] Neuraz A, Lerner I, Digan W, Paris N, Tsopra R, Rogier A, et al. Natural Language Processing for Rapid Response to Emergent Dis- eases: Case Study of Calcium Channel Blockers and Hypertension in the COVID-19 Pandemic. J Med Internet Res. 2020; 22 (8): e20773.
[66] Masand VH, Rastija V, Patil MK, Gandhi A, Chapolikar A. Extending the identification of structural features responsible for anti-SARS-CoV activity of peptide-type compounds using QSAR modelling. SAR QSAR Environ Res. 2020; 31 (9): 643–54.
[67] Kargarfard F, Sami A, Ebrahimie E. Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm. J Biomed Inform. 2015; 57: 181–8.
[68] Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction modelsfor diagnosis and prognosis of COVID-19 infection: systematic review and critical appraisal. BMJ (Clinical research ed). 2020; 369: m1328-m. https://doi.org/10. 1136/bmj.m1328.
[69] Adhikari SP, Meng S, Wu Y-J, Mao Y-P, Ye R-X, Wang Q-Z, et al. Epidemiology, causes, clinical manifestation and diagnosis, prevention and control of coronavirus disease (COVID-19) during the early outbreak period: a scoping review. Infect Dis Poverty. 2020; 9 (1): 29.
[70] Kelly-Cirino CD, Nkengasong J, Kettler H, Tongio I, Gay-Andrieu F, Escadafal C, et al. Importance of diagnostics in epidemic and pandemic preparedness. BMJ global health. 2019; 4 (Suppl 2): e001179-e.
Cite This Article
  • APA Style

    Sanjib Ghosh, Lipon Chandra Das. (2022). Using Data Mining Techniques for COVID-19: A Systematic Review. International Journal on Data Science and Technology, 8(2), 36-42. https://doi.org/10.11648/j.ijdst.20220802.11

    Copy | Download

    ACS Style

    Sanjib Ghosh; Lipon Chandra Das. Using Data Mining Techniques for COVID-19: A Systematic Review. Int. J. Data Sci. Technol. 2022, 8(2), 36-42. doi: 10.11648/j.ijdst.20220802.11

    Copy | Download

    AMA Style

    Sanjib Ghosh, Lipon Chandra Das. Using Data Mining Techniques for COVID-19: A Systematic Review. Int J Data Sci Technol. 2022;8(2):36-42. doi: 10.11648/j.ijdst.20220802.11

    Copy | Download

  • @article{10.11648/j.ijdst.20220802.11,
      author = {Sanjib Ghosh and Lipon Chandra Das},
      title = {Using Data Mining Techniques for COVID-19: A Systematic Review},
      journal = {International Journal on Data Science and Technology},
      volume = {8},
      number = {2},
      pages = {36-42},
      doi = {10.11648/j.ijdst.20220802.11},
      url = {https://doi.org/10.11648/j.ijdst.20220802.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20220802.11},
      abstract = {The primary goal of this survey is to determine the most widely used data mining approaches and knowledge gaps from published publications. The novel coronavirus pneumonia, namely COVID-19, has become a global public health problem. Since the threat of pandemics has raised public health concerns, researchers to uncover hidden knowledge have used data extraction techniques. Web of Science, Scopus, and PubMed databases were used to conduct systematic research. Then, to choose good papers, all retrieved publications were reviewed in a stepwise procedure using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist. All of the data were examined and summarized using a few different classifications. Out of 300 citations, 50 papers were eligible through a systematic review. The review results showed that the most favorite DM belonged to Natural language processing (22%), and the most commonly proposed approach was revealing disease characteristics (22%). Regarding diseases, the most addressed disease was COVID-19. The studies predominately apply supervised learning techniques (90%). We found infectious disease (36%) to be the most frequent, closely followed by epidemiology discipline concerning healthcare scopes. The most common software used in the studies was SPSS (22%) and R (20%). Our results indicate that there is a significant relationship between air pollution and COVID-19 infection, which could partially explain the effect of national lockdown and provide implications for the control and prevention of this novel disease. The results revealed valuable research conducted by employing the capabilities of knowledge discovery methods to understand the unknown dimensions of diseases in pandemics. However, most research will need in terms of treatment and disease control.},
     year = {2022}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Using Data Mining Techniques for COVID-19: A Systematic Review
    AU  - Sanjib Ghosh
    AU  - Lipon Chandra Das
    Y1  - 2022/06/16
    PY  - 2022
    N1  - https://doi.org/10.11648/j.ijdst.20220802.11
    DO  - 10.11648/j.ijdst.20220802.11
    T2  - International Journal on Data Science and Technology
    JF  - International Journal on Data Science and Technology
    JO  - International Journal on Data Science and Technology
    SP  - 36
    EP  - 42
    PB  - Science Publishing Group
    SN  - 2472-2235
    UR  - https://doi.org/10.11648/j.ijdst.20220802.11
    AB  - The primary goal of this survey is to determine the most widely used data mining approaches and knowledge gaps from published publications. The novel coronavirus pneumonia, namely COVID-19, has become a global public health problem. Since the threat of pandemics has raised public health concerns, researchers to uncover hidden knowledge have used data extraction techniques. Web of Science, Scopus, and PubMed databases were used to conduct systematic research. Then, to choose good papers, all retrieved publications were reviewed in a stepwise procedure using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist. All of the data were examined and summarized using a few different classifications. Out of 300 citations, 50 papers were eligible through a systematic review. The review results showed that the most favorite DM belonged to Natural language processing (22%), and the most commonly proposed approach was revealing disease characteristics (22%). Regarding diseases, the most addressed disease was COVID-19. The studies predominately apply supervised learning techniques (90%). We found infectious disease (36%) to be the most frequent, closely followed by epidemiology discipline concerning healthcare scopes. The most common software used in the studies was SPSS (22%) and R (20%). Our results indicate that there is a significant relationship between air pollution and COVID-19 infection, which could partially explain the effect of national lockdown and provide implications for the control and prevention of this novel disease. The results revealed valuable research conducted by employing the capabilities of knowledge discovery methods to understand the unknown dimensions of diseases in pandemics. However, most research will need in terms of treatment and disease control.
    VL  - 8
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Department of Statistics, University of Chittagong, Chittagong, Bangladesh

  • Department of Mathematics, University of Chittagong, Chittagong, Bangladesh

  • Sections