Prioritizing genes via network enrichment
Background GWAS have been successful at finding loci (relatively small regions in the genome) that contain a gene which influences a specific human traits such as height or blood pressure. However, some given locus might contain many genes and it is difficult to decide which one has the effect. One way to address this challenge, is to use additional information such protein-protein-interaction maps. For some proteins it is known that they often bind to each other to form protein complexes. We can assume that when one complex-member protein is having an effect on the trait under study, it becomes much more likely that other complex-members also have an effect on the trait. Therefore, if we have multiple loci all containing a gene that affects the trait and we know that each locus contains a gene whose protein product form a complex together, it is probable, that its these genes have an effect on the trait.
Goal In this project, we will investigate the usefulness of this strategy and think of ways to evaluate the quality of our results that we get for different human traits (height, blood pressure, smoking, etc..) and for various external data types. (protein-protein interaction, canonical pathways etc..)
Tools We will mainly use the language R as well as a little bit of UNIX command line (It's good if know some of it to start with but if not, its ok too.)