As COVID-19 has spread from the first reported cases into a global pandemic, there has been a number of efforts to understand the mutations and clusters of genetic lineages of the SARS-CoV-2 virus. The high mutation rate and rapid spread makes this analysis capable of tracking chains of infections as well as putting individual sequences in context. Whole genomes of the SARS-CoV-2 virus are being collected and shared from across the globe. With the advent of affordable and prolific Next Generation Sequencing, this is the first pandemic in which the genomic evolution of the pathogen can be tracked in near real-time. So far, phylogenetic analysis methods have recently found a broader application in this regard. Here we demonstrate that Principal Component Analysis (PCA), used heavily in population genetics, corroborates the existing findings while providing unique new capabilities to understand our public repositories of complete virus sequences. This novel application of PCA is demonstrated on all publicly available SARS-CoV-2 samples from GenBank and other open-access databases until mid-April. We show that PCA is a useful and easy-to-use tool to analyze SARS-CoV-2 genomes in addition to phylogenetic analytics. It offers a previously untapped opportunity to analyze the dynamics of the current SARS-CoV-2 pandemic in a new way.
Published in | European Journal of Clinical and Biomedical Sciences (Volume 6, Issue 4) |
DOI | 10.11648/j.ejcbs.20200604.11 |
Page(s) | 49-55 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2020. Published by Science Publishing Group |
SARS-CoV-2, COVID-19, Principal Component Analysis, Next-Generation Sequencing
[1] | J. H. C. f. C. Impact, "COVID-19 Global Map," [Online]. Available: https://coronavirus.jhu.edu/map.html. |
[2] | P. Forster, L. Forster, C. Renfrew and M. Forster, "Phylogenetic network analysis of SARS-CoV-2 genomes," Proceedings of the National Academy of Sciences, vol. 117, no. 17, pp. 9241-9243, 2020. |
[3] | A. Scherer, Genetic Analysis of the COVID-19 Virus and Other Pathogens, ISBN 978-0-9986882-8-2, Golden Helix, 2020. |
[4] | C. Scherer and A. Scherer, "Diagnosing and Tracking COVID-19 Infections Leveraging Next-Gen Sequencing," accepted for publication, Journal of Precision Medicine, vol. July, 2020. |
[5] | A. Scherer, "Leveraging Next-Generation Sequencing Technology in the Fight Against COVID-19," Clinical Lab Manager, vol. May 4, 2020. |
[6] | F. P and R. C., "Evolution. Mother tongue and Y chromosomes.," Science., vol. 333, pp. 1390-1, 2011. |
[7] | C. Renfrew and P. Bahn, The Cambridge World Prehistory, Cambridge University Press, 2014. |
[8] | P. Forster and C. Renfrew, Phylogenetic Methods and the Prehistory of Languages., McDonald Institute Press, 2006. |
[9] | K. Bryc, A. Auton, M. R. Nelson, J. R. Oksenberg, S. L. Hauser, S. Williams, A. Froment, J. M. Bodo, C. T. Wambebe, S. A. and C. D. Bustamante, "Genome-wide Patterns of Population Structure and Admixture in West Africans and African Americans," Proceedings of the National Academy of Sciences of the United States of America, Vols. 107, 2, pp. 786-91, 2010. |
[10] | I. Lazaridis, N. Patterson and A. Mittnik, "Ancient human genomes suggest three ancestral populations for present-day Europeans.," Nature, vol. 513, p. 409–413, 2014. |
[11] | NCBI Resource Coordinators, "Database resources of the National Center for Biotechnology Information," Nucleic Acids Research, vol. 44, no. D1, pp. D7-D19, 2016. |
[12] | W. F, Z. S, Y. B, C. YM and W. W, "A new coronavirus associated with human respiratory disease in China." Nature, vol. 579, no. 7798, pp. 265-269, 2020. |
[13] | H. Li, "Minimap2: pairwise alignment for nucleotide sequences," Bioinformatics, vol. 34, no. 18, p. 3094–3100, 2018. |
[14] | H. Li, "A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data," Bioinformatics, vol. 27, no. 21, p. 2987–2993, 2011. |
[15] | J. Fauver, M. Petrone and E. Hodcroft, "Coast-to-Coast Spread of SARS-CoV-2 during the Early Epidemic in the United States.," Cell, vol. 181, no. 5, pp. 990-996, 2020. |
[16] | L. v. Dorp, M. Acman, D. Richard, L. P. Shaw, C. E. Ford, L. Ormond, C. J. Owen, J. Pang, C. C. Tan, F. A. Boshier, A. T. Ortiz and F. Balloux, "Emergence of genomic diversity and recurrent mutations in SARS-CoV-2," Infection, Genetics and Evolution, vol. 83, 2020. |
APA Style
Christiane Scherer, James Grover, Darby Kammeraad, Gabe Rudy, Andreas Scherer. (2020). Investigating the Global Spread of SARS-CoV-2 Leveraging Next-Gen Sequencing and Principal Component Analysis. European Journal of Clinical and Biomedical Sciences, 6(4), 49-55. https://doi.org/10.11648/j.ejcbs.20200604.11
ACS Style
Christiane Scherer; James Grover; Darby Kammeraad; Gabe Rudy; Andreas Scherer. Investigating the Global Spread of SARS-CoV-2 Leveraging Next-Gen Sequencing and Principal Component Analysis. Eur. J. Clin. Biomed. Sci. 2020, 6(4), 49-55. doi: 10.11648/j.ejcbs.20200604.11
AMA Style
Christiane Scherer, James Grover, Darby Kammeraad, Gabe Rudy, Andreas Scherer. Investigating the Global Spread of SARS-CoV-2 Leveraging Next-Gen Sequencing and Principal Component Analysis. Eur J Clin Biomed Sci. 2020;6(4):49-55. doi: 10.11648/j.ejcbs.20200604.11
@article{10.11648/j.ejcbs.20200604.11, author = {Christiane Scherer and James Grover and Darby Kammeraad and Gabe Rudy and Andreas Scherer}, title = {Investigating the Global Spread of SARS-CoV-2 Leveraging Next-Gen Sequencing and Principal Component Analysis}, journal = {European Journal of Clinical and Biomedical Sciences}, volume = {6}, number = {4}, pages = {49-55}, doi = {10.11648/j.ejcbs.20200604.11}, url = {https://doi.org/10.11648/j.ejcbs.20200604.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ejcbs.20200604.11}, abstract = {As COVID-19 has spread from the first reported cases into a global pandemic, there has been a number of efforts to understand the mutations and clusters of genetic lineages of the SARS-CoV-2 virus. The high mutation rate and rapid spread makes this analysis capable of tracking chains of infections as well as putting individual sequences in context. Whole genomes of the SARS-CoV-2 virus are being collected and shared from across the globe. With the advent of affordable and prolific Next Generation Sequencing, this is the first pandemic in which the genomic evolution of the pathogen can be tracked in near real-time. So far, phylogenetic analysis methods have recently found a broader application in this regard. Here we demonstrate that Principal Component Analysis (PCA), used heavily in population genetics, corroborates the existing findings while providing unique new capabilities to understand our public repositories of complete virus sequences. This novel application of PCA is demonstrated on all publicly available SARS-CoV-2 samples from GenBank and other open-access databases until mid-April. We show that PCA is a useful and easy-to-use tool to analyze SARS-CoV-2 genomes in addition to phylogenetic analytics. It offers a previously untapped opportunity to analyze the dynamics of the current SARS-CoV-2 pandemic in a new way.}, year = {2020} }
TY - JOUR T1 - Investigating the Global Spread of SARS-CoV-2 Leveraging Next-Gen Sequencing and Principal Component Analysis AU - Christiane Scherer AU - James Grover AU - Darby Kammeraad AU - Gabe Rudy AU - Andreas Scherer Y1 - 2020/08/13 PY - 2020 N1 - https://doi.org/10.11648/j.ejcbs.20200604.11 DO - 10.11648/j.ejcbs.20200604.11 T2 - European Journal of Clinical and Biomedical Sciences JF - European Journal of Clinical and Biomedical Sciences JO - European Journal of Clinical and Biomedical Sciences SP - 49 EP - 55 PB - Science Publishing Group SN - 2575-5005 UR - https://doi.org/10.11648/j.ejcbs.20200604.11 AB - As COVID-19 has spread from the first reported cases into a global pandemic, there has been a number of efforts to understand the mutations and clusters of genetic lineages of the SARS-CoV-2 virus. The high mutation rate and rapid spread makes this analysis capable of tracking chains of infections as well as putting individual sequences in context. Whole genomes of the SARS-CoV-2 virus are being collected and shared from across the globe. With the advent of affordable and prolific Next Generation Sequencing, this is the first pandemic in which the genomic evolution of the pathogen can be tracked in near real-time. So far, phylogenetic analysis methods have recently found a broader application in this regard. Here we demonstrate that Principal Component Analysis (PCA), used heavily in population genetics, corroborates the existing findings while providing unique new capabilities to understand our public repositories of complete virus sequences. This novel application of PCA is demonstrated on all publicly available SARS-CoV-2 samples from GenBank and other open-access databases until mid-April. We show that PCA is a useful and easy-to-use tool to analyze SARS-CoV-2 genomes in addition to phylogenetic analytics. It offers a previously untapped opportunity to analyze the dynamics of the current SARS-CoV-2 pandemic in a new way. VL - 6 IS - 4 ER -