Ensemble pathway discovery

Background: Genes in the same pathway are activated together, and thus exhibit similar gene expression profiles. In addition, when genes coordinate to achieve a particular task, their protein products often interact. Integration of both sources of information can improve the quality of discovered pathways.

Goal: The aim of the project is to detect groups of genes that are co-expressed, and whose products interact in the protein interaction data by using gene expression data and gene interactions networks. In this project, students will explore the biological networks topology, examining their modularity.


Mathematical tools: This project entails multiple steps where some standard tools are used as follows:

  • Explore the data and produce some stats about the network and gene expression data
  • Find modules in the protein interaction network using hierarchical clustering
  • Find modules from the gene expression profile using WGCNA
  • Merge the two sets of predicted modules
  • Find the modules from both dataset using ensemble method
  • Evaluate and compare of the results by running gwas-evaluation using PASCAL

The students will learn module discovery methods including hierarchical and WGCNA. If time permits we will try probabilistic modeling as well where both datasets are integrated using relational Markov networks to derive gene sets.

The project can be done in R or Python, although Python is recommended.


Biological or Medical aspects: Pathway analysis, gene set enrichment.


Supervisor: Sarvenaz Choobdar


File:Wiki Pathway analysis for gene expression profiles.pdf