Abstract
Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future discovery, particularly when combined with computational modeling. However, genome-scale experiments are costly and laborious, yielding “big data” that are conceptually and statistically difficult to analyze. There is no obvious consensus
regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them.
Keywords circadian rhythms, diurnal rhythms, computational biology, functional genomics, systems biology, guidelines, biostatistics, RNA-seq, ChIP-seq, proteomics, metabolomics.
It has become a cliché to comment on the rapid growth of “–omics” technologies in biomedical sciences over the past 20 years. Nevertheless, it is difficult to overstate the transformative impact that genome-scale technologies are having on the practice of modern biology, notably including transcriptional, proteomic, and metabolomic profiling (Fig. 1A). These analytical approaches have had a substantial impact on the study of circadian Rhythms (Fig. 1B), particularly since biological rhythms are ubiquitous at every level of organismal physiology and are seemingly custom made for large-scale analysis. Systems biology approaches offer enormous opportunities to gain insight into the nature of biological rhythms, but they also create unique challenges in properly collecting and interpreting large data sets. Here, we set out to codify unifying principles for genome-scale analyses of biological rhythms. We confine our discussion to the analysis of rhythmic abundance of RNAs, proteins, and metabolites, as well as rhythmic occupancy of DNA by proteins. These guidelines also apply to the study of related processes such as promoter activity (Liu et al., 1995). We do not discuss the analysis of other large data sets, including genomewide association studies, mutagenesis and cell-based screens, or the use of “wearables” that track physiological parameters. All 3 unquestionably produce large data sets and are important for the field, but they present technical challenges beyond our scope here. We further restrict ourselves to discussing general principles. When appropriate, we refer the reader to more detailed discussions of critical topics such as sample collection and statistical benchmarking. We emphasize that these guidelines are current at the time they were written but should not be used as hard rules to replace informed peer review. Instead, we hope that this article will formalize a consensus regarding best practices for generation and analysis of large-scale biological rhythms data sets and thereby increase the rigor and reproducibility of research in our field.
regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them.
Keywords circadian rhythms, diurnal rhythms, computational biology, functional genomics, systems biology, guidelines, biostatistics, RNA-seq, ChIP-seq, proteomics, metabolomics.
It has become a cliché to comment on the rapid growth of “–omics” technologies in biomedical sciences over the past 20 years. Nevertheless, it is difficult to overstate the transformative impact that genome-scale technologies are having on the practice of modern biology, notably including transcriptional, proteomic, and metabolomic profiling (Fig. 1A). These analytical approaches have had a substantial impact on the study of circadian Rhythms (Fig. 1B), particularly since biological rhythms are ubiquitous at every level of organismal physiology and are seemingly custom made for large-scale analysis. Systems biology approaches offer enormous opportunities to gain insight into the nature of biological rhythms, but they also create unique challenges in properly collecting and interpreting large data sets. Here, we set out to codify unifying principles for genome-scale analyses of biological rhythms. We confine our discussion to the analysis of rhythmic abundance of RNAs, proteins, and metabolites, as well as rhythmic occupancy of DNA by proteins. These guidelines also apply to the study of related processes such as promoter activity (Liu et al., 1995). We do not discuss the analysis of other large data sets, including genomewide association studies, mutagenesis and cell-based screens, or the use of “wearables” that track physiological parameters. All 3 unquestionably produce large data sets and are important for the field, but they present technical challenges beyond our scope here. We further restrict ourselves to discussing general principles. When appropriate, we refer the reader to more detailed discussions of critical topics such as sample collection and statistical benchmarking. We emphasize that these guidelines are current at the time they were written but should not be used as hard rules to replace informed peer review. Instead, we hope that this article will formalize a consensus regarding best practices for generation and analysis of large-scale biological rhythms data sets and thereby increase the rigor and reproducibility of research in our field.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Supervisors/Advisors |
|
Award date | 04 Oct 2019 |
Publication status | Published - 2019 |