Volume 42 Issue 1
Jan.  2021
Jacobo Pardo-Seco, Alberto Gómez-Carballa, Xabier Bello, Federico Martinón-Torres, Antonio Salas. Pitfalls of barcodes in the study of worldwide SARS-CoV-2 variation and phylodynamics. Zoological Research, 2021, 42(1): 87-93. doi: 10.24272/j.issn.2095-8137.2020.364
Pitfalls of barcodes in the study of worldwide SARS-CoV-2 variation and phylodynamics

Funds:  This study was supported by the GePEM (Instituto de Salud Carlos III(ISCIII)/PI16/01478/Cofinanciado FEDER), DIAVIR (Instituto de Salud Carlos III(ISCIII)/DTS19/00049/Cofinanciado FEDER; Proyecto de Desarrollo Tecnológico en Salud), Resvi-Omics (Instituto de Salud Carlos III(ISCIII)/PI19/01039/Cofinanciado FEDER), BI-BACVIR (PRIS-3; Agencia de Conocimiento en Salud (ACIS)—Servicio Gallego de Salud (SERGAS)—Xunta de Galicia; Spain), Programa Traslaciona Covid-19 (ACIS—Servicio Gallego de Salud (SERGAS)—Xunta de Galicia; Spain) and Axencia Galega de Innovación (GAIN; IN607B 2020/08—Xunta de Galicia; Spain) to A.S.; and ReSVinext (Instituto de Salud Carlos III(ISCIII)/PI16/01569/Cofinanciado FEDER), and Enterogen (Instituto de Salud Carlos III(ISCIII)/PI19/01090/Cofinanciado FEDER) to F.M.-T.
  • Corresponding author: E-mail: antonio.salas@usc.es
  • Received Date: 2020-12-04
  • Accepted Date: 2020-12-31
  • Available Online: 2020-12-31
  • Publish Date: 2021-01-18
  • Analysis of SARS-CoV-2 genome variation using a minimal number of selected informative sites conforming a genetic barcode presents several drawbacks. We show that purely mathematical procedures for site selection should be supervised by known phylogeny (i) to ensure that solid tree branches are represented instead of mutational hotspots with poor phylogeographic proprieties, and (ii) to avoid phylogenetic redundancy. We propose a procedure that prevents information redundancy in site selection by considering the cumulative informativeness of previously selected sites (as a proxy for phylogenetic-based criteria). This procedure demonstrates that, for short barcodes (e.g., 11 sites), there are thousands of informative site combinations that improve previous proposals. We also show that barcodes based on worldwide databases inevitably prioritize variants located at the basal nodes of the phylogeny, such that most representative genomes in these ancestral nodes are no longer in circulation. Consequently, coronavirus phylodynamics cannot be properly captured by universal genomic barcodes because most SARS-CoV-2 variation is generated in geographically restricted areas by the continuous introduction of domestic variants.
  • ZR-2020-364 Supplementary Data and Table S1.zip
    Figures(1)  / Tables(1)

