BioMathProject2016 Investigating gene expression across the cell cycle using single-cell RNA-seq data

Revision as of 14:30, 4 February 2016 by AdamSmith (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Investigating gene expression across the cell cycle using single-cell RNA-seq data


Biological & technical background

Traditional bulk RNA-sequencing technology relied on the extraction & quantification of RNA from samples composed of hundreds of thousands of cells, resulting in data where only general tissue-level trends could be seen. Excitingly, the recent development of single-cell RNA sequencing (scRNAseq) now affords unprecedented resolution and opened up multiple new transcriptomics research avenues. ScRNAseq-based gene expression patterns are strongly affected by cell cycle stage. We are exploring this signal to investigate the contributions of long non-coding RNAs (lncRNAs), a recently-discovered class of gene expression regulators, in cell cycle progression. To tackle the computing challenges of quantifying gene expression for hundreds of single-cell samples, we are we are currently experimenting with new-generation gene expression quantification tools, which are thousands of times faster than previous methods. This latest generation offers new data analysis features that have yet not been extensively explored, especially on scRNAseq data.


Goal

In this project, we aim to explore the behaviour of one of the latest generation of gene expression quantification methods (Kallisto + Sleuth) to aid interpretation of single-cell gene & lncRNA expression collected from cells from 3 stages of the cell cycle.


Pre-requisites

A basic working knowledge of R is recommended (if you know about data frames, lists & basic plotting you’re good to go!). Familiarity with Principal Components Analysis and ANOVA would be would be advantageous, but not required. Also, please check out the “Things to be learned from this” section.


Methods

Be prepared to carry out all analyses in R (c.f. recommended introductory tutorial if necessary)! RStudio use is recommended. Expression & differential expression analysis, Principal Components Analysis, and ANOVA will most likely be performed, although alternate & further analyses can be imagined.


Dataset

RNA-sequencing gene expression estimates for 3*96 single Mouse embryonic stem cells staged as being in the G1, S, or G2/M phases of the cell cycle.


Things to be learned from this

R coding, bioinformatics and statistical analyses relevant to the project will be taught by the supervisor as needed. The project will overall offer additional experience using R, better understanding of PCAs, analysis of RNA-seq data, in particular single-cell data, especially bootstrapped produced using Kallisto, some basics of cell cycle biology.


Supervisor

Adam Alexander Thil SMITH, bioinformatics/biostatistics post-doc from the Ana Claudia Marques Lab (Departement de Physiologie, UNIL).