User Interface

There are two option to visualize gene expression on ClockProfile for individual genes (search by genes) and for gene sets (search by gene set).

search by gene (main gene view)

In the search by gene, the user has several options to view the data. Genes can be added and deleted in the search pane. All selected genes can be visualized in different ways:

search by geneset

In this query mode. All expressed genes in mouse liver are grouped in functional gene sets. In the search by gene set, the user has several options to view the data.

Experimental setup

Male mice were kept under diurnal lighting conditions (12-hr light 12-hr dark). Mouse KO models for the core clock genes Bmal1 and Cry1/2 had restricted access to a chow diet during their activity period 4 day before sacrifice. All mice used in the experiments were between 9 and 14 weeks old. Genetical background and the generation of Bmal1 KO and Cry1/2 dKO have been previously described in Jouffe et al. [1]. The utilized PAR bZIP KO mouse models were described in detail in Gachon et al. [2]. These animals were fed ad libitum as feeding rhythms were not different to wild-type siblings.

RNA-Seq

Raw files and technical details about the RNA-Seq data have been deposited in NCBI's Gene Expression Omnibus [3] and are accessible through GEO Series accession number GSE135898 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135898) and GSE135875 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135875).

Modeling temporal gene expression profiles across different conditions

To assess rhythmicity and mean differences of gene expression in RNA-Seq count data, we developed a model selection framework based on generalized linear models. As proposed [4], we modeled counts Image002 that have been uniquely mapped to a gene Image004 in a sample s as a negative binomial (NB) with a fitted mean Image006 and a gene-specific but sample-independent dispersion parameter Image008.

Image010

The fitted mean is proportional to the quantity Image012 of fragments that correspond to a gene g in a sample Image014 scaled by a sample-specific scaling factor Image016 [4]. This scaling factor depends on the sampling depth of each library and was estimated using the median-of-ratios method of DESeq2 [5].

Image018

A gene-specific distribution Image008 was estimated using empirical Bayes shrinkage [4].

The fit was performed using a generalized linear model (GLM) with a logarithmic link function implemented in DESeq2. Sample specific size factor (Image020) was defined as an offset. The full GLM was defined as follows:

Image022

where Image024 is the mean of the NB distribution for gene Image004 and condition Image026 in sample Image014 at Zeitgeber/circadian time . The index Image030 refers to a batch of samples. The parameters Image032 and Image034 are coefficients of the cosine and sine functions, respectively. Image036 is a coefficient to describe a condition specific mean expression level. When necessary, we included a batch specific Image038 to account for technical batch effects. To select an optimal gene-specific model, we proceeded in two steps. First, we assessed rhythmicity across the different conditions, where the parameters Image036 and Image038 were free but the parameters and had constraints depending on the particular model considered. To this end, we defined different models for four (e.g., Bmal1 WT, Bmal1 KO, Cry1/2 WT, Cry1/2 KO) or two conditions (i.e., PARbZip deficient (Hlf/Dbp/Tef KO) and proficient (Hlf/Dbp/Tef WT). Models were defined to have either zero (non-rhythmic pattern) or non-zero (rhythmic pattern) Image040 and Image042 coefficients for each analyzed condition. Moreover, for some models the values of Image040 and Image042 can be also shared within any combination of the four conditions. For example, for four conditions, there are 52 such models (figure below). The coefficients Image040 and Image042 were used to calculate the phase (Image044) and amplitude (log2-fold change peak-to-trough; Image046) of a gene. Bayesian information criterion (BIC) based model selection was employed to account for model complexity [6] using the following formula:

Image048

Image050 is defined as the log-likelihood of the model j from the regression, n is the number of data points and k is the number of parameters. To assess the confidence of the selected model j we calculated the Schwarz weight (BICW):

Image052

, where Image054 is defined as the difference in BIC between model j and the minimum BIC value in the entire model set Image058

Image060

Image062 is interpreted as the probability for model j. The model with the highest BIC was selected as the optimal model within the set of all defined models. In the second step, we analyzed the mean levels and thus set the coefficient Image040 and Image042 to the values of the selected model in the first regression. A new regression was performed where the parameter Image036 was either free or subject to constraints between conditions based on several competing models. We considered all possible combinations for the mean coefficient Image036 with differing or shared means between conditions. For example, for four conditions, there are 15 such models (figure below). Each model was solved using generalized linear regression and each gene was assigned to a preferred model based on the BICW as described above for the first iteration. Cook's distance is an estimate of how much a data point influences the fitted coefficients for a gene. A large value of Cook's distance is considered to be an outlier. Thus, genes that did not reach at least one BICW above 0.4 for the preferred model or genes that had at least a Cook's distance of more than 1 were categorized as "ambiguous" (model 0). In the case of two conditions (i.e., PARbZip KO dataset) the threshold on BICW was set to 0.95. The difference in the two BICW thresholds reflected that the number of possible models is much larger for four compared to two conditions. Also, we considered only genes with a log2 amplitude of at least 0.25 to be rhythmic in the corresponding condition. To asses temporal variation of normally distributed measurements (e.g., relative liver size, nuclear abundance of PPARa), we modified model selection approach described above to deal with gaussian noise and therefore used linear models. A function handling normally distributed data is implemented in the dryR package.

Rhythm for fabrice Pastedgraphic 6
Functional and Gene Set Enrichment Analysis

We tested enrichment for annotated gene sets of several sources including Gene Ontology [7], Molecular Signatures Database (MSigDB) [8], Kyoto Encyclopedia of Genes and Genomes (KEGG) [9] and Reactome Pathway database [10]. Target genes for circadian clock regulators and PPARa were defined as outlined below. We used GOseq 3.1 [11] to assess overrepresentation of all gene sets for genes assigned to a statistical model. Temporal resolved enrichments were calculated using a Fisher's exact test. We defined foreground genes as genes with a peak phase within in a sliding 4 h window. The gene universe was defined as all expressed genes. For each gene set the window was moved by 0.1 hr to calculate the p-value.

References