• 中文核心期刊要目总览
  • 中国科技核心期刊
  • 中国科学引文数据库(CSCD)
  • 中国科技论文与引文数据库(CSTPCD)
  • 中国学术期刊文摘数据库(CSAD)
  • 中国学术期刊(网络版)(CNKI)
  • 中文科技期刊数据库
  • 万方数据知识服务平台
  • 中国超星期刊域出版平台
  • 国家科技学术期刊开放平台
  • 荷兰文摘与引文数据库(SCOPUS)
  • 日本科学技术振兴机构数据库(JST)
毛玉芳, 袁细国, 寸玉鹏. 2021: svmSomatic:利用新一代测序数据来区分体细胞突变和种系突变的机器学习方法. 动物学研究, 42(2): 246-249. DOI: 10.24272/j.issn.2095-8137.2021.014
引用本文: 毛玉芳, 袁细国, 寸玉鹏. 2021: svmSomatic:利用新一代测序数据来区分体细胞突变和种系突变的机器学习方法. 动物学研究, 42(2): 246-249. DOI: 10.24272/j.issn.2095-8137.2021.014
Yu-Fang Mao, Xi-Guo Yuan, Yu-Peng Cun. 2021: A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data. Zoological Research, 42(2): 246-249. DOI: 10.24272/j.issn.2095-8137.2021.014
Citation: Yu-Fang Mao, Xi-Guo Yuan, Yu-Peng Cun. 2021: A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data. Zoological Research, 42(2): 246-249. DOI: 10.24272/j.issn.2095-8137.2021.014

svmSomatic:利用新一代测序数据来区分体细胞突变和种系突变的机器学习方法

A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data

  • 摘要: 体细胞突变是癌症基因组中一种主要的变异类型,它与肿瘤的产生与发展有密切联系。单核苷酸变异(SNVs)的检测可以促进肿瘤研究的下游分析。目前已经有许多方法来检测SNVs,但大多数方法都需要癌症样本有与之匹配正常样本才能将体细胞变异检测出来,但与之配对的正常样本通常不容易获得。因此,发展新的方法对肿瘤单样本数据进行体细胞变异的检测至关重要。在这项工作中,我们发展了一个新的机器学习方法用于精确检测单个肿瘤样本的新一代测序数据中的体细胞突变。在体细胞变异检测中要考虑的另一点是多种变异同时存在的情形,即肿瘤细胞内拷贝数变异(CNV)和SNV的共同出现是很常见。因此,我们提出了一种新的机器学习模型svmSomatic,该方法可以根把单个肿瘤样本的基因组数据中的体细胞突变与种系突变区分开。svmSomatic的新特点包括:1)考虑了CNV的对检测体细胞变异的影响;2)在单肿瘤样本数据中,采用支持向量机(SVM)的训练结果作为分类器来区分体细胞变异和种系变异。我们在基因组的模拟数据和真实数据中测试了svmSomatic,并将其与其它同类方法进行了比较。这些模拟和比较结果表明,在F1-score的综合评价下,svmSomatic与其它方法相比在模拟数据和真实数据中都表现出了较好的性能。

     

    Abstract: Somatic mutations are a large category of genetic variations, which play an essential role in tumorigenesis. Detection of somatic single nucleotide variants (SNVs) could facilitate downstream analysis of tumorigenesis. Many computational methods have been developed to detect SNVs, but most require normal matched samples to differentiate somatic SNVs from the normal state, which can be difficult to obtain. Therefore, developing new approaches for detecting somatic SNVs without matched samples are crucial. In this work, we detected somatic mutations from individual tumor samples based on a novel machine learning approach, svmSomatic, using next-generation sequencing (NGS) data. In addition, as somatic SNV detection can be impacted by multiple mutations, with germline mutations and co-occurrence of copy number variations (CNVs) common in organisms, we used the novel approach to distinguish somatic and germline mutations based on the NGS data from individual tumor samples. In summary, svmSomatic: (1) considers the influence of CNV co-occurrence in detecting somatic mutations; and (2) trains a support vector machine algorithm to distinguish between somatic and germline mutations, without requiring normal matched samples. We further tested and compared svmSomatic with other common methods. Results showed that svmSomatic performance, as measured by F1-score, was significantly better than that of others using both simulation and real NGS data.

     

/

返回文章
返回