Ensemble pathway discovery

Revision as of 14:59, 25 February 2016 by Sarvenaz (talk | contribs)

Background: Genes in the same pathway are activated together, and thus exhibit similar gene expression profiles. In addition, when genes coordinate to achieve a particular task, their protein products often interact. Integration of both sources of information can improve the quality of discovered pathways.

Goal: The aim of the project is to detect groups of genes that are co-expressed, and whose products interact in the protein interaction data by using gene expression data and gene interactions networks. In this project, students will explore the biological networks topology, examining their modularity.


Mathematical tools: This project entails multiple steps where some standard tools are used as follows:

  • Explore the data and produce some stats about the network and gene expression data
  • Find modules in the protein interaction network using hierarchical clustering
  • Find modules from the gene expression profile using WGCNA
  • Merge the two sets of predicted modules
  • Find the modules from both dataset using ensemble method
  • Evaluate and compare of the results by running gwas-evaluation using PASCAL

If time permits we will try probabilistic modeling as well where both dataset are integrated using relational Markov networks to derive gene sets. The project can be done in R or Python, although Python is recommended.


Biological or Medical aspects: The students will learn module discovery methods including hierarchical and WGCNA. They will also get familiar with gene enrichment test and gwas pathway scoring.


Supervisor: Sarvenaz Choobdar