MOTIVATION: ChIP-chip and ChIP-seq technologies provide genome-wide measurements of various types of chromatin marks at an unprecedented resolution. With ChIP samples collected from different tissue types and/or individuals, we can now begin to characterize stochastic or systematic changes in epigenetic patterns during development (intra-individual) or at the population level (inter-individual). This requires statistical methods that permit a simultaneous comparison of multiple ChIP samples on a global as well as locus-specific scale. Current analytical approaches are mainly geared toward single sample investigations, and therefore have limited applicability in this comparative setting. This shortcoming presents a bottleneck in biological interpretations of multiple sample data. RESULTS: To address this limitation, we introduce a parametric classification approach for the simultaneous analysis of two (or more) ChIP samples. We consider several competing models that reflect alternative biological assumptions about the global distribution of the data. Inferences about locus-specific and genome-wide chromatin differences are reached through the estimation of multivariate mixtures. Parameter estimates are obtained using an incremental version of the Expectation-Maximization algorithm (IEM). We demonstrate efficient scalability and application to three very diverse ChIP-chip and ChIP-seq experiments. The proposed approach is evaluated against several published ChIP-chip and ChIP-seq software packages. We recommend its use as a first-pass algorithm to identify candidate regions in the epigenome, possibly followed by some type of second-pass algorithm to fine-tune detected peaks in accordance with biological or technological criteria. AVAILABILITY: R source code is available at http://gbic.biol.rug.nl/supplementary/2009/ChromatinProfiles/. Access to Chip-seq data: GEO repository GSE17937.
|Nummer van het tijdschrift||8|
|Status||Gepubliceerd - 2010|