Identification of cell cycle-relevant gene modules in single-cell RNA-seq data

Project presentation: File:GeneModules.pptx

Previous related project: Investigating gene expression across the cell cycle using single-cell RNA-seq data


Biological & technical background

The recent development of single-cell RNA sequencing (scRNAseq) affords unprecedented resolution for transcriptome studies, allowing molecular processes to be examined within isolated cells. Typically, scRNAseq-based gene expression patterns are strongly affected by the cell cycle. Previous work in the lab explored the contributions of long intergenic non-coding RNAs (lincRNAs), a supposed class of gene expression regulators, to cell cycle progression. Several lincRNA candidates were identified and are now being tested in the wet-lab.

To support the functional characterisation of the candidate lincRNAs, we aim to build "modules" of co-expressed genes containing them. This approach is based on the observation that genes involved in similar biological processes often have their expression levels co-regulated. Once candidate-containing modules are identified, the functional annotations of the non-candidate genes can be mined for insight as to the more precise biological processes the candidates are involved in.


Project Goal

Students will run the software for identifying gene modules. This software contains multiple parameters to be tweaked, and the students will explore different combinations of these. Each combination will give rise to a new set of modules, and the students will evaluate these modules in terms of biological relevance & practical usefulness.


Pre-requisites

A basic working knowledge of R is recommended (if you know about data frames, lists & basic plotting you’re good to go!). Familiarity with functional enrichment analyses would be would be advantageous, but not required.


Methods

Be prepared to carry out all analyses in R. RStudio usage is strongly recommended. Gene module identification will be performed with the Iterative Signature Algorithm. The evaluation of module biological relevant will (at least initially) be done using ToppFun. Module usefulness will be evaluated using criteria to be decided upon.


Dataset

scRNA-sequencing gene expression estimates for ~10k genes across 3*96 single Mouse embryonic stem cells staged as being in the G1, S, or G2/M phases of the cell cycle will be provided.


Things to be learned from this

R coding, bioinformatics and statistical analyses relevant to the project will be taught by the supervisor as needed. The project will overall offer additional experience using R, better understanding of clustering, analysis of RNA-seq data, in particular single-cell data, perhaps some basics of cell cycle biology.


Supervisor

Adam Alexander Thil SMITH, bioinformatics/biostatistics post-doc from the Ana Claudia Marques Lab (Departement de Physiologie, UNIL).


References

Original single-cell RNA-seq paper:

Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, Teichmann SA, Marioni JC, Stegle O
Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.
Nat Biotechnol: 2015 Feb, 33(2);155-60
[PubMed:25599176] [WorldCat.org: ISSN ESSN ] [DOI] ( o)


Cell cycle regulation by lncRNAs review:

Kitagawa M, Kitagawa K, Kotake Y, Niida H, Ohhata T
Cell cycle regulation by long non-coding RNAs.
Cell Mol Life Sci: 2013 Dec, 70(24);4785-94
[PubMed:23880895] [WorldCat.org: ISSN ESSN ] [DOI] ( o)


Main ISA paper:

Bergmann S, Ihmels J, Barkai N
Iterative signature algorithm for the analysis of large-scale gene expression data.
Phys Rev E Stat Nonlin Soft Matter Phys: 2003 Mar, 67(3 Pt 1);031902
[PubMed:12689096] [WorldCat.org: ISSN ESSN ] [DOI] ( o)


Latest ToppFun paper:

Chen J, Bardes EE, Aronow BJ, Jegga AG
ToppGene Suite for gene list enrichment analysis and candidate gene prioritization.
Nucleic Acids Res: 2009 Jul, 37(Web Server issue);W305-11
[PubMed:19465376] [WorldCat.org: ISSN ESSN ] [DOI] ( o)