BioMathProject2016 Investigating gene expression across the cell cycle using single-cell RNA-seq data

Investigating gene expression across the cell cycle using single-cell RNA-seq data.

Extended Project Presentation: File:BioMathProject2016 lncRNAs cell-cycle scRNAseq.pdf


Biological & technical background

Traditional bulk RNA-sequencing technology relied on the extraction & quantification of RNA from samples composed of hundreds of thousands of cells, resulting in data where only general tissue-level trends could be seen. Excitingly, the recent development of single-cell RNA sequencing (scRNAseq) now affords unprecedented resolution and opened up multiple new transcriptomics research avenues. ScRNAseq-based gene expression patterns are strongly affected by cell cycle stage. We are exploring this signal to investigate the contributions of long non-coding RNAs (lncRNAs), a recently-discovered class of gene expression regulators, in cell cycle progression. To tackle the computing challenges of quantifying gene expression for hundreds of single-cell samples, we are we are currently experimenting with new-generation gene expression quantification tools, which are thousands of times faster than previous methods. This latest generation offers new data analysis features that have yet not been extensively explored, especially on scRNAseq data.


Goal

In this project, we aim to explore the behaviour of one of the latest generation of gene expression quantification methods (Kallisto + Sleuth) to aid interpretation of single-cell gene & lncRNA expression collected from cells from 3 stages of the cell cycle.


Pre-requisites

A basic working knowledge of R is recommended (if you know about data frames, lists & basic plotting you’re good to go!). Familiarity with Principal Components Analysis and ANOVA would be would be advantageous, but not required. Also, please check out the “Things to be learned from this” section.


Methods

Be prepared to carry out all analyses in R (c.f. recommended introductory tutorial if necessary)! RStudio use is recommended. Expression & differential expression analysis, Principal Components Analysis, and ANOVA will most likely be performed, although alternate & further analyses can be imagined.


Dataset

RNA-sequencing gene expression estimates for 3*96 single Mouse embryonic stem cells staged as being in the G1, S, or G2/M phases of the cell cycle.


Things to be learned from this

R coding, bioinformatics and statistical analyses relevant to the project will be taught by the supervisor as needed. The project will overall offer additional experience using R, better understanding of PCAs, analysis of RNA-seq data, in particular single-cell data, especially bootstrapped produced using Kallisto, some basics of cell cycle biology.


Supervisor

Adam Alexander Thil SMITH, bioinformatics/biostatistics post-doc from the Ana Claudia Marques Lab (Departement de Physiologie, UNIL).


Wiki of the Students

Wiki of the presentation: http://www2.unil.ch/cbg/images/5/53/Wikipresentationfinale.pdf


References

Original single-cell RNA-seq paper:

Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, Teichmann SA, Marioni JC, Stegle O
Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.
Nat Biotechnol: 2015 Feb, 33(2);155-60
[PubMed:25599176] [WorldCat.org: ISSN ESSN ] [DOI] ( o)


Cell cycle regulation by lncRNAs review:

Kitagawa M, Kitagawa K, Kotake Y, Niida H, Ohhata T
Cell cycle regulation by long non-coding RNAs.
Cell Mol Life Sci: 2013 Dec, 70(24);4785-94
[PubMed:23880895] [WorldCat.org: ISSN ESSN ] [DOI] ( o)


Kallisto post

Kallisto website

Sleuth post

Sleuth website