Yanfeng ZHANG, Bing SU. Peak identification for ChIP-seq data with no controls. Zoological Research, 2012, 33(E5-6): 121-128. doi: 10.3724/SP.J.1141.2012.E05-06E121
Citation: Yanfeng ZHANG, Bing SU. Peak identification for ChIP-seq data with no controls. Zoological Research, 2012, 33(E5-6): 121-128. doi: 10.3724/SP.J.1141.2012.E05-06E121

Peak identification for ChIP-seq data with no controls

doi: 10.3724/SP.J.1141.2012.E05-06E121
Funds:  This study was supported by the National 973 project of China (2011CBA01101) and the National Natural Science Foundation of China (30871343 and 31130051)
  • Received Date: 2012-09-06
  • Rev Recd Date: 2012-10-31
  • Publish Date: 2012-12-08
  • Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is increasingly being used for genome-wide profiling of transcriptional regulation, as this technique enables dissection of the gene regulatory networks. With input as control, a variety of statistical methods have been proposed for identifying the enriched regions in the genome, i.e., the transcriptional factor binding sites and chromatin modifications. However, when there are no controls, whether peak calling is still reliable awaits systematic evaluations. To address this question, we used a Bayesian framework approach to show the effectiveness of peak calling without controls (PCWC). Using several different types of ChIP-seq data, we demonstrated the relatively high accuracy of PCWC with less than a 5% false discovery rate (FDR). Compared with previously published methods, e.g., the model-based analysis of ChIP-seq (MACS), PCWC is reliable with lower FDR. Furthermore, to interpret the biological significance of the called peaks, in combination with microarray gene expression data, gene ontology annotation and subsequent motif discovery, our results indicate PCWC possesses a high efficiency. Additionally, using in silico data, only a small number of peaks were identified, suggesting the significantly low FDR for PCWC.
  • loading
  • [1]
    Arata Y, Fujita M, Ohtani K, Kijima S, Kato J-y. 2000. Cdk2-dependent and -independent Pathways in E2F-mediated S Phase Induction. J Biol Chem275(9): 6337-6345.
    Bailey TL, Elkan C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol2: 28-36.
    Cairns J, Spyrou C, Stark R, Smith ML, Lynch AG, Tavare S. 2011. BayesPeak--an R package for analyzing ChIP-seq data. Bioinformatics27(5): 713-714.
    Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J, Loh YH, Yeo HC, Yeo ZX, Narang V, Govindarajan KR, Leong B, Shahab A, Ruan Y, Bourque G, Sung WK, Clarke ND, Wei CL, Ng HH. 2008. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell133(6): 1106-1117.
    Chen Y, Negre N, Li Q, Mieczkowska JO, Slattery M, Liu T, Zhang Y, Kim TK, He HH, Zieba J, Ruan Y, Bickel PJ, Myers RM, Wold BJ, White KP, Lieb JD, Liu XS. 2012. Systematic evaluation of factors influencing ChIP-seq fidelity. Nat Methods9(6): 609-614.
    Choi H, Nesvizhskii AI, Ghosh D, Qin ZS. 2009. Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data. Bioinformatics25(14): 1715-1721.
    Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J, Lodato MA, Frampton GM, Sharp PA, Boyer LA, Young RA, Jaenisch R. 2010. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci USA107(50): 21931-21936.
    DeGregori J, Kowalik T, Nevins JR. 1995. Cellular targets for activation by the E2F1 transcription factor include DNA synthesis- and G1/S-regulatory genes. Mol Cell Biol, 15(8): 4215-4224.
    Edgar R, Domrachev M, Lash AE. 2002. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res, 30(1): 207-210.
    Fejes AP, Robertson G, Bilenky M, Varhol R, Bainbridge M, Jones SJ. 2008. FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics, 24(15): 1729-1730.
    Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J. 2004. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, 5(10): R80.
    Guenther MG, Lawton LN, Rozovskaia T, Frampton GM, Levine SS, Volkert TL, Croce CM, Nakamura T, Canaani E, Young RA. 2008. Aberrant chromatin at genes encoding stem cell regulators in human mixed-lineage leukemia. Genes Dev, 22(24): 3403-3408.
    Ho JW, Bishop E, Karchenko PV, Negre N, White KP, Park PJ. 2011. ChIP-chip versus ChIP-seq: lessons for experimental design and data analysis. BMC Genomics, 12: 134.
    Hower V, Evans SN, Pachter L. 2011. Shape-based peak identification for ChIP-Seq. BMC Bioinformatics, 12: 15.
    Huang da W, Sherman BT, Lempicki RA. 2009. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc, 4(1): 44-57.
    Jothi R, Cuddapah S, Barski A, Cui K, Zhao K. 2008. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res, 36(16): 5221-5231.
    Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science, 296(5569): 916-919.
    Kharchenko PV, Tolstorukov MY, Park PJ. 2008. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol, 26(12): 1351-1359.
    Langmead B, Trapnell C, Pop M, Salzberg S. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 10(3): R25.
    Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J. 2009. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 25(15): 1966-1967.
    Madigan D, Ridgeway G. 2003. Bayesian data analysis. In Ye, N. (eds). The Handbook of Data Mining CRC Press, USA: 103-131.
    Micsinai M, Parisi F, Strino F, Asp P, Dynlacht BD, Kluger Y. 2012. Picking ChIP-seq peak detectors for analyzing chromatin modification experiments. Nucleic Acids Res, 40(9), e70.
    Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP, Lee W, Mendenhall E, O'Donovan A, Presser A, Russ C, Xie X, Meissner A, Wernig M, Jaenisch R, Nusbaum C, Lander ES, Bernstein BE. 2007. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature, 448(7153): 553-560.
    Osmanbeyoglu H, Hartmaier R, Oesterreich S, Lu X. 2012. Improving ChIP-seq peak-calling for functional co-regulator binding by integrating multiple sources of biological information. BMC Genomics, 13(Suppl 1): S1.
    Park PJ. 2009. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet, 10(10): 669-680.
    Ramagopalan SV, Heger A, Berlanga AJ, Maugeri NJ, Lincoln MR, Burrell A, Handunnetthi L, Handel AE, Disanto G, Orton SM, Watson CT, Morahan JM, Giovannoni G, Ponting CP, Ebers GC, Knight JC. 2010. A ChIP-seq defined genome-wide map of vitamin D receptor binding: associations with disease and evolution. Genome Res, 20(10): 1352-1360.
    Rozowsky J, Euskirchen G, Auerbach RK, Zhang ZD, Gibson T, Bjornson R, Carriero N, Snyder M, Gerstein MB. 2009. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol, 27(1): 66-75.
    Seuter S, Vaisanen S, Radmark O, Carlberg C, Steinhilber D. 2007. Functional characterization of vitamin D responding regions in the human 5-Lipoxygenase gene. Biochim Biophys Acta, 1771(7): 864-872.
    Sinkkonen L, Malinen M, Saavalainen K, Vaisanen S, Carlberg C. 2005. Regulation of the human cyclin C gene via multiple vitamin D3-responsive regions in its promoter. Nucleic Acids Res, 33(8): 2440-2451.
    Smith AD, Chung W-Y, Hodges E, Kendall J, Hannon G, Hicks J, Xuan Z, Zhang MQ. 2009. Updates to the RMAP short-read mapping software. Bioinformatics, 25(21): 2841-2842.
    Spyrou C, Stark R, Lynch AG, Tavare S. 2009. BayesPeak: Bayesian analysis of ChIP-seq data. BMC Bioinformatics, 10: 299.
    Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D, O'Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo ML. 2008. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science, 321(5891): 956-960.
    Tang C, Shi X, Wang W, Zhou D, Tu J, Xie X, Ge Q, Xiao PF, Sun X, Lu Z. 2010. Global analysis of in vivo EGR1-binding sites in erythroleukemia cell using chromatin immunoprecipitation and massively parallel sequencing. Electrophoresis, 31(17): 2936-2943.
    Valouev A, Johnson DS, Sundquist A, Medina C, Anton E, Batzoglou S, Myers RM, Sidow A. 2008. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods, 5(9): 829-834.
    Valouev A, Johnson SM, Boyd SD, Smith CL, Fire AZ, Sidow A. 2011. Determinants of nucleosome organization in primary human cells. Nature, 474(7352): 516-520.
    Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. 2008. Alternative isoform regulation in human tissue transcriptomes. Nature, 456(7221): 470-476.
    Xu X, Bieda M, Jin VX, Rabinovich A, Oberley MJ, Green R, Farnham PJ. 2007. A comprehensive ChIP-chip analysis of E2F1, E2F4, and E2F6 in normal and tumor cells reveals interchangeable roles of E2F family members. Genome Res, 17(11): 1550-1561.
    Yan Z, DeGregori J, Shohet R, Leone G, Stillman B, Nevins JR, Williams RS. 1998. Cdc6 is regulated by E2F and is essential for DNA replication in mammalian cells. Proc Natl Acad Sci USA95(7): 3603-3608.
    Zella LA, Meyer MB, Nerenz RD, Lee SM, Martowicz ML, Pike JW. 2010. Multifunctional enhancers regulate mouse and human vitamin D receptor gene transcription. Mol Endocrinol, 24(1): 128-147.
    Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. 2008. Model-based analysis of ChIP-Seq (MACS). Genome Biol, 9(9): R137.
  • 加载中


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (1783) PDF downloads(2275) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint