留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

Pitfalls of barcodes in the study of worldwide SARS-CoV-2 variation and phylodynamics

Jacobo Pardo-Seco Alberto Gómez-Carballa Xabier Bello Federico Martinón-Torres Antonio Salas

Jacobo Pardo-Seco, Alberto Gómez-Carballa, Xabier Bello, Federico Martinón-Torres, Antonio Salas. Pitfalls of barcodes in the study of worldwide SARS-CoV-2 variation and phylodynamics. Zoological Research, 2021, 42(1): 87-93. doi: 10.24272/j.issn.2095-8137.2020.364
Citation: Jacobo Pardo-Seco, Alberto Gómez-Carballa, Xabier Bello, Federico Martinón-Torres, Antonio Salas. Pitfalls of barcodes in the study of worldwide SARS-CoV-2 variation and phylodynamics. Zoological Research, 2021, 42(1): 87-93. doi: 10.24272/j.issn.2095-8137.2020.364

全球SARS-Cov-2变异和系统发育动力学研究中的条形码缺陷

doi: 10.24272/j.issn.2095-8137.2020.364

Pitfalls of barcodes in the study of worldwide SARS-CoV-2 variation and phylodynamics

Funds: This study was supported by the GePEM (Instituto de Salud Carlos III(ISCIII)/PI16/01478/Cofinanciado FEDER), DIAVIR (Instituto de Salud Carlos III(ISCIII)/DTS19/00049/Cofinanciado FEDER; Proyecto de Desarrollo Tecnológico en Salud), Resvi-Omics (Instituto de Salud Carlos III(ISCIII)/PI19/01039/Cofinanciado FEDER), BI-BACVIR (PRIS-3; Agencia de Conocimiento en Salud (ACIS)—Servicio Gallego de Salud (SERGAS)—Xunta de Galicia; Spain), Programa Traslaciona Covid-19 (ACIS—Servicio Gallego de Salud (SERGAS)—Xunta de Galicia; Spain) and Axencia Galega de Innovación (GAIN; IN607B 2020/08—Xunta de Galicia; Spain) to A.S.; and ReSVinext (Instituto de Salud Carlos III(ISCIII)/PI16/01569/Cofinanciado FEDER), and Enterogen (Instituto de Salud Carlos III(ISCIII)/PI19/01090/Cofinanciado FEDER) to F.M.-T.
More Information
  • 摘要: 使用最少量的选定信息位点组成的基因条形码在分析SARS-Cov-2基因组变异时存在诸多弊端。我们的研究表明,仅用数学程序来选定位点时应由已知的系统发育学研究作为指导,(1)确保用实体树分支来代表,而不是具有较差的系统发育地理特性的突变热点;(2)避免系统发育冗余。我们提出了一个流程,即通过考虑先前选定位点的累积的信息量(作为基于系统发育分析的标准代表)来避免位点选择中的信息冗余。这个程序演示了,对于一些短的条形码(如有11个位点)来说,也有成千上万位点组合信息来改进之前的提议。我们的研究还表明,基于全球数据库的条形码不可避免的优先考虑那些位于系统发育的基础节点上的变异,这使得在这些祖先节点上的大多数代表性基因组不再反复出现。因此,冠状病毒的系统发育动力学无法通过普遍的基因组条形码捕获,因为大多数的SARS-Cov-2变异是在地理限制区域内引入当地的变异产生的。
    #Authors contributed equally to this work
  • Figure  1.  Skeleton of the SARS-CoV-2 phylogeny based on ISMs signatures, interpolated frequency maps of haplogroup sub-lineages having differential geographic distributions, and comparative entropy values for ISMs signatures using different strategies

    A: Skeleton of most parsimonious phylogenetic tree of SARS-CoV-2 variation based on ISMs signatures. Above: Zhao et al. (2020) proposed an initial signature conformed by 20 ISMs; those retained in their reduced 11 ISMs signature are highlighted in blue. Signatures defined by Zhao et al. (2020) are indicated below labels for each clade (according to Gómez-Carballa et al., (2020a)); clades with purple background are those captured by the 11 ISMs set. Bottom: Tree built on 11 ISMs set prioritized by HE algorithm; gray indicates mutations that occurred in same branches (according to Gómez-Carballa et al. (2020a)). Green stars indicate parallel mutations. Percentages below nodes indicate frequencies in 90 K database. B: Interpolated maps of haplogroup frequencies for haplogroup A2a4 (represented by signature CCCGCCAGGGA in Zhao et al. (2020)) and its two sub-lineages A2a4a3a and A2a4c1a, as well as haplogroup A2a5 (CCCGCCGGGGG) and its sub-lineage A2a5c. C: Above: Entropy using HE algorithm for 11 and 20 ISMs selected by Zhao et al. (2020) (red and purple, respectively (note: curves do not match because the HE algorithm prioritizes the 20 ISMs differently; see also Table 1)) and 11 ISMs barcodes proposed by Guan et al. (2020) (blue); dotted vertical lines indicate HE values for 11 and 20 ISMs sets. Inset figure shows HE entropy values for signatures conformed by 1 to 400 ISMs (green) calculated in present study using 90 K database. Bottom: Boxplot records HE values for 2×106 combinations of 11 ISMs among the 50 with the highest individual entropy values; light green dots (n=12 751) in the dot cloud indicate different combinations with HE values above signature proposed by Zhao et al. (2020) (red dot); note, all random combinations are below the signature obtained by the HE algorithm implemented in the present study (top green dot). Blue dot shows HE values of 11 site barcode of Guan et al. (2020) (95% of random site combinations fall above the HE value provided by this site combination).

    Table  1.   ISMs selected using HE procedure described in the present study and 20 ISMs signature captured by Zhao et al. (2020)

    90 K database–HE algorithm 90 K database – Zhao et al. (2020) ISMs signature
    All database Before 18 June 2020 After 17 June 2020 All database Before 18 June 2020 After 17 June 2020
    Site HE Site HE Site HE Site HE Site HE Site HE
    #1 28881 0.93 241 0.86 28881 0.99 28881* 0.93 241 0.86 28881* 0.99
    #2 25563 1.58 25563 1.58 25563 1.58 25563* 1.58 25563* 1.58 25563* 1.58
    #3 241 2.06 28881 2.07 241 1.97 241 2.06 28881* 2.07 241 1.97
    #4 11083 2.37 11083 2.41 1163 2.35 11083* 2.37 11083* 2.41 11083* 2.24
    #5 1163 2.61 1059 2.64 11083 2.61 1059* 2.59 1059* 2.64 1059* 2.45
    #6 1059 2.83 8782 2.84 28854 2.83 20268* 2.78 8782* 2.84 20268* 2.64
    #7 20268 3.02 20268 3.03 1059 3.03 14805* 2.93 20268* 3.03 14805* 2.75
    #8 14805 3.17 14805 3.21 19839 3.21 8782* 3.06 14805* 3.21 8782* 2.82
    #9 23731 3.31 15324 3.33 23731 3.37 18060* 3.12 17747 3.30 14408 2.87
    #10 28854 3.45 27964 3.44 20268 3.52 14408 3.18 2558* 3.36 18060* 2.92
    #11 19839 3.58 10097 3.54 27964 3.65 2558* 3.22 3037 3.42 23403* 2.95
    #12 8782 3.70 28854 3.64 313 3.77 23403* 3.25 26144* 3.45 2558* 2.99
    #13 27964 3.83 27046 3.73 14805 3.88 3037 3.28 14408 3.48 3037 3.01
    #14 15324 3.93 17747 3.81 11916 3.98 26144* 3.31 28144 3.50 17747 3.03
    #15 313 4.03 25429 3.89 15324 4.07 17747 3.32 18060* 3.52 26144* 3.04
    #16 11916 4.12 11916 3.97 22480 4.15 28144 3.33 23403* 3.54 28882 3.05
    #17 18877 4.19 313 4.04 8782 4.22 28882 3.34 2480 3.54 2480 3.05
    #18 25429 4.26 29553 4.11 21575 4.29 2480 3.35 28882 3.55 28144 3.05
    #19 18060 4.32 19839 4.18 18877 4.35 17858 3.35 17858 3.55 17858 3.06
    #20 21575 4.38 18877 4.24 13862 4.41 28883 3.35 28883 3.56 28883 3.06
    Sites common in all columns are in bold. Database used by Zhao et al. (2020) was downloaded on 17 June 2020; table shows values obtained according to this timepoint. Asterisks indicate ISMs retained in 11 ISMs set by Zhao et al. (2020) out of the 20 initially selected by their algorithm; HE algorithm prioritizes other ISMs not included by Zhao et al. among the 20 top candidates, which instead includes several that are not considered among the top 20 prioritized by the HE algorithm.
    下载: 导出CSV
  • [1] Boni MF, Lemey P, Jiang XW, Lam TTY, Perry BW, Castoe TA, et al. 2020. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nature Microbiology, 5(11): 1408−1417. doi: 10.1038/s41564-020-0771-4
    [2] Forster P, Forster L, Renfrew C, Forster M. 2020. Phylogenetic network analysis of SARS-CoV-2 genomes. Proceedings of the National Academy of Sciences of the United States of America, 117(17): 9241−9243. doi: 10.1073/pnas.2004999117
    [3] Galanter JM, Fernández-López JC, Gignoux CR, Barnholtz-Sloan J, Fernández-Rozadilla C, Via M, et al. 2012. Development of a panel of genome-wide ancestry informative markers to study admixture throughout the Americas. PLoS Genetics, 8(3): e1002554. doi: 10.1371/journal.pgen.1002554
    [4] Gómez-Carballa A, Bello X, Pardo-Seco J, Martinón-Torres F, Salas A. 2020a. Mapping genome variation of SARS-CoV-2 worldwide highlights the impact of COVID-19 super-spreaders. Genome Research, 30(10): 1434−1448. doi: 10.1101/gr.266221.120
    [5] Gómez-Carballa A, Bello X, Pardo-Seco J, Pérez Del Molino ML, Martinón-Torres F, Salas A. 2020b. Phylogeography of SARS-CoV-2 pandemic in Spain: a story of multiple introductions, micro-geographic stratification, founder effects, and super-spreaders. Zoological Research, 41(6): 605−620. doi: 10.24272/j.issn.2095-8137.2020.217
    [6] Guan QT, Sadykov M, Mfarrej S, Hala S, Naeem R, Nugmanova R, et al. 2020. A genetic barcode of SARS-CoV-2 for monitoring global distribution of different clades during the COVID-19 pandemic. International Journal of Infectious Diseases, 100: 216−223. doi: 10.1016/j.ijid.2020.08.052
    [7] Gudbjartsson DF, Helgason A, Jonsson H, Magnusson OT, Melsted P, Norddahl GL, et al. 2020. Spread of SARS-CoV-2 in the icelandic population. The New England Journal of Medicine, 382(24): 2302−2315. doi: 10.1056/NEJMoa2006100
    [8] Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. 2018. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics, 34(23): 4121−4123. doi: 10.1093/bioinformatics/bty407
    [9] Pardo-Seco J, Martinón-Torres F, Salas A. 2014. Evaluating the accuracy of AIM panels at quantifying genome ancestry. BMC Genomics, 15(1): 543. doi: 10.1186/1471-2164-15-543
    [10] Rambaut A, Holmes EC, O'toole Á, Hill V, McCrone JT, Ruis C, et al. 2020. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature Microbiology, 5(11): 1403−1407. doi: 10.1038/s41564-020-0770-5
    [11] Rockett RJ, Arnott A, Lam C, Sadsad R, Timms V, Gray KA, et al. 2020. Revealing COVID-19 transmission in Australia by SARS-CoV-2 genome sequencing and agent-based modeling. Nature Medicine, 26(9): 1398−1404. doi: 10.1038/s41591-020-1000-7
    [12] Salas A, Amigo J. 2010. A reduced number of mtSNPs saturates mitochondrial DNA haplotype diversity of worldwide population groups. PLoS One, 5(5): e10218. doi: 10.1371/journal.pone.0010218
    [13] Van Dorp L, Acman M, Richard D, Shaw LP, Ford CE, Ormond L, et al. 2020. Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infection, Genetics and Evolution, 83: 104351. doi: 10.1016/j.meegid.2020.104351
    [14] Yu WB, Tang GD, Zhang L, Corlett RT. 2020. Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2 / HCoV-19) using whole genomic data. Zoological Research, 41(3): 247−257. doi: 10.24272/j.issn.2095-8137.2020.022
    [15] Zhao ZQ, Sokhansanj BA, Malhotra C, Zheng K, Rosen GL. 2020. Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization. PLoS Computational Biology, 16(9): e1008269. doi: 10.1371/journal.pcbi.1008269
  • ZR-2020-364 Supplementary Data and Table S1.zip
  • 加载中
图(1) / 表(1)
计量
  • 文章访问数:  1166
  • HTML全文浏览量:  505
  • PDF下载量:  391
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-12-04
  • 录用日期:  2020-12-31
  • 网络出版日期:  2020-12-31
  • 刊出日期:  2021-01-18

目录

    /

    返回文章
    返回