BioMathProject2016 Identification of disease associated pseudogenes


Pseudogenes are defined as genomic loci derived from protein-coding genes (parental genes) but lacking coding potential due to the accumulation of loss of function mutations, such as frameshifts and stop codons. There are two general classes of pseudogenes: unprocessed pseudogenes that arise from tandem duplication of genes, and processed pseudogenes that are created through retrotransposition of mRNAs (reviewed in Mighell et al. 2000). Lacking introns, processed pseudogenes are the most abundant type in Human due to a burst of retrotranspon activity in the common ancestral primates (reviewed in Mighell et al. 2000).

Although generally considered as nonfunctional, evidence suggests that some processed pseudogenes evolved under selective constraint and and can have regulatory roles (Pei et al. 2002, Svensson et al. 2006). In particular, some can post-transcriptionally regulate the expression levels of their parental gene through a microRNA-dependent mechanism (Poliseno et al. 2000).


The aim of the project is to investigate and characterize processed pseudogenes that are likely associated with disease-linked genetic variants through expression quantitative trait locus (eQTL) analysis. 36 processed pseudogenes have been identified through this method to be associated with disease-associated genetic variants. We will first test the strength of the correlation between the pseudogene and the disease-associated variant using other nearby SNPs. After establishing the association with the disease, we will study the regulatory interactions, if any, between the pseudogenes with their parental mRNAs, i.e. through microRNA-dependent mechanisms. Next, we will ask whether the parental genes are also associated with the same or similar traits. This will then allow us to study further whether it is through the interactions between the pseudogenes and parental genes that the pseudogene contribute to complex diseases. Disease SNP eQTL.png


The students will largely program using R (use of Rstudio recommended). Different genome annotation databases and statistical tests will also be introduced.


Jennifer TAN, molecular biology/bioinformatics post-doc from the Ana Claudia Marques Lab (Departement de Physiologie, UNIL).


Mighell AJ, Smith NR, Robinson PA, Markham AF
Vertebrate pseudogenes.
FEBS Lett: 2000 Feb 25, 468(2-3);109-14
[PubMed:10692568] [ ISSN ESSN ] [DOI] ( p)

Svensson O, Arvestad L, Lagergren J
Genome-wide survey for biologically functional pseudogenes.
PLoS Comput Biol: 2006 May, 2(5);e46
[PubMed:16680195] [ ISSN ESSN ] [DOI] ( o)

Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu XJ, Harte R, Balasubramanian S, Tanzer A, Diekhans M, Reymond A, Hubbard TJ, Harrow J, Gerstein MB
The GENCODE pseudogene resource.
Genome Biol: 2012 Sep 26, 13(9);R51
[PubMed:22951037] [ ISSN ESSN ] [DOI] ( e)

Poliseno L, Salmena L, Zhang J, Carver B, Haveman WJ, Pandolfi PP
A coding-independent function of gene and pseudogene mRNAs regulates tumour biology.
Nature: 2010 Jun 24, 465(7301);1033-8
[PubMed:20577206] [ ISSN ESSN ] [DOI] ( p)

Media:Wiki summary final.pdf