<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://www2.unil.ch/cbg/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://www2.unil.ch/cbg/api.php?action=feedcontributions&amp;user=Sven&amp;feedformat=atom</id>
		<title>Computational Biology Group - User contributions [en]</title>
		<link rel="self" type="application/atom+xml" href="http://www2.unil.ch/cbg/api.php?action=feedcontributions&amp;user=Sven&amp;feedformat=atom"/>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Special:Contributions/Sven"/>
		<updated>2013-05-26T02:28:29Z</updated>
		<subtitle>User contributions</subtitle>
		<generator>MediaWiki 1.19.1</generator>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=File:Sven_cat_pic.jpg</id>
		<title>File:Sven cat pic.jpg</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=File:Sven_cat_pic.jpg"/>
				<updated>2013-04-10T10:38:16Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann</id>
		<title>Sven Bergmann</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann"/>
				<updated>2013-04-10T10:35:29Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Bulletins]]&lt;br /&gt;
&amp;lt;newstitle&amp;gt;Sven Bergmann is Associate Professor&amp;lt;/newstitle&amp;gt;&lt;br /&gt;
&amp;lt;teaser&amp;gt;&lt;br /&gt;
Sven Bergmann has successfully completed his&lt;br /&gt;
tenure-track as Assistant Professor and is Associate Professor since August&lt;br /&gt;
2010.&lt;br /&gt;
&amp;lt;date&amp;gt;1 Aug 2010 — 9:12&amp;lt;/date&amp;gt;&lt;br /&gt;
&amp;lt;/teaser&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:Sven_cat_pic.jpg|200px|thumb|left|Sven Bergmann, PI]] Sven Bergmann heads the [http://www2.unil.ch/cbg ''Computational Biology Group''] in the [http://www.unil.ch/dgm Department of Medical Genetics] at the [http://www.unil.ch University of Lausanne]. He joined the [http://www.unil.ch/fbm Faculty of Biology and Medicine] in 2005 as Assistant Professor and became Associate Professor in 2010 after successfully completing his tenure track. He is also affiliated with the [http://www.isb-sib.ch/ Swiss Institute of Bioinformatics] since 2006.&lt;br /&gt;
&lt;br /&gt;
Sven studied theoretical particle physics with [http://www.weizmann.ac.il/home/ftnir Prof. Yosef Nir] at the [http://www.weizmann.ac.il Weizmann Institute of Science] (Israel) where he received his PhD in 2001 for [http://www-spires.slac.stanford.edu/spires/find/hep/www?rawcmd=find+author+bergmann%2C+s+and+not+author+storchi&amp;amp;FORMAT=WWW&amp;amp;SEQUENCE= studies of neutrino oscillations and CP violation]. He then joined the laboratory of [http://barkai-serv.weizmann.ac.il/GroupPage/ Prof. Naama Barkai] in the Department of Molecular Genetics at the same institute, where he first worked as a [http://www.weizmann.ac.il/RGP_open/postdoc/Weizmann-Postdoc.html Koshland postdoctoral fellow] and later as staff scientist. &lt;br /&gt;
&lt;br /&gt;
His work in the field of computational biology includes designing and applying novel algorithms for the analysis of large-scale biological and medical data, as well as modeling of genetic networks pertaining to the development of the Drosophila embryo and the response of plants to environmental changes. &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
* Permanent Address: Rue du Bugnon 27 - DGM 023 - CH-1005 Lausanne - Switzerland&lt;br /&gt;
* Phone at work: +41-21-692-5452&lt;br /&gt;
* Cell phone: +41-78-663-4980&lt;br /&gt;
* e-mail: Sven.Bergmann_AT_unil.ch&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Sven is currently on Sabbatical leave at the [http://www.babraham.ac.uk Babraham Institute] hosted by [http://drupal.lenoverelab.org/ Dr. Nicolas Le Novère].&lt;br /&gt;
&lt;br /&gt;
* Address during Sabbatical: Babraham Institute, Cambridge, CB22 3AT, United Kingdom&lt;br /&gt;
* Phone at work: +44-1223-49-6308&lt;br /&gt;
* Cell phone: +44-7901-27-8292&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
PS: Do you know how to get ''smoothly'' from A to B? Well, you just need to minimize a functional expression, see this [[http://arxiv.org/PS_cache/physics/pdf/0105/0105039v1.pdf paper]] for details!&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T15:40:49Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|300px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|300px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|300px|Figure 1: Standard GWAS aim to identify genotypic variants (G) that are significantly&lt;br /&gt;
associated with a phenotypic trait (P) in order to improve annotation (A). The large number of variants imposes a huge burden of multiple hypotheses testing, which is even more severe when associating multiple phenotypes (b), or highdimensional molecular (M) traits (c).]]&lt;br /&gt;
&lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT [2,3] integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed [4-6]. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs) [5,7,8] and rare variants [9] that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet [10,11]. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE [12] project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Integrative_analysis.png|thumb|400px|left|Figure 3: Novel analysis framework for medical data integration]]&lt;br /&gt;
&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus) [13], a population-based of 6182 participants from Lausanne, Switzerland.&lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm [14-16] (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus [17]: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure [18], but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture [19]. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm [20] (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT [2,3]) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines [21] (SVM) and Random Forests [22], to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Feasability_vs_innovation.png|thumb|400px|The trade-off relation between innovation and feasibility for our three main analysis goals.]]&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics” [23,24]). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data [15,16,20,28-33], (ii) our experience with GWAS [2,3,34-38], and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture [19]). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM [21] and “random forests” [22] to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support for the NOFIA project from my colleagues Prof. [http://www.unil.ch/edcvm/page78675.html Peter Vollenweider] (PI of [http://www.chuv.ch/chuv_home/recherche/chuv-recherche-presentation/chuv-recherche-histoires/recherche-histoires-colaus.htm  CoLaus], see [[media:letter_PV.pdf]]) and Prof. [http://www.unil.ch/actu/page62193.html Martin Preisig] (PI of [http://www.colaus.ch/en/cls_home/cls_pro_home/cls_pro_studies/cls_pro_studies-psycolaus.htm PsyCoLaus], see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T15:38:43Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Context and state of the art */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|250px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|250px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|250px|Figure 1: Standard GWAS aim to identify genotypic variants (G) that are significantly&lt;br /&gt;
associated with a phenotypic trait (P) in order to improve annotation (A). The large number of variants imposes a huge burden of multiple hypotheses testing, which is even more severe when associating multiple phenotypes (b), or highdimensional molecular (M) traits (c).]]&lt;br /&gt;
&lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT [2,3] integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed [4-6]. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs) [5,7,8] and rare variants [9] that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet [10,11]. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE [12] project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Integrative_analysis.png|thumb|400px|left|Figure 3: Novel analysis framework for medical data integration]]&lt;br /&gt;
&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus) [13], a population-based of 6182 participants from Lausanne, Switzerland.&lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm [14-16] (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus [17]: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure [18], but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture [19]. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm [20] (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT [2,3]) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines [21] (SVM) and Random Forests [22], to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Feasability_vs_innovation.png|thumb|300px|The trade-off relation between innovation and feasibility for our three main analysis goals.]]&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics” [23,24]). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data [15,16,20,28-33], (ii) our experience with GWAS [2,3,34-38], and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture [19]). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM [21] and “random forests” [22] to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support for the NOFIA project from my colleagues Prof. [http://www.unil.ch/edcvm/page78675.html Peter Vollenweider] (PI of [http://www.chuv.ch/chuv_home/recherche/chuv-recherche-presentation/chuv-recherche-histoires/recherche-histoires-colaus.htm  CoLaus], see [[media:letter_PV.pdf]]) and Prof. [http://www.unil.ch/actu/page62193.html Martin Preisig] (PI of [http://www.colaus.ch/en/cls_home/cls_pro_home/cls_pro_studies/cls_pro_studies-psycolaus.htm PsyCoLaus], see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T15:37:34Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|200px|Figure 1: Standard GWAS aim to identify genotypic variants (G) that are significantly&lt;br /&gt;
associated with a phenotypic trait (P) in order to improve annotation (A). The large number of variants imposes a huge burden of multiple hypotheses testing, which is even more severe when associating multiple phenotypes (b), or highdimensional molecular (M) traits (c).]]&lt;br /&gt;
&lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT [2,3] integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed [4-6]. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs) [5,7,8] and rare variants [9] that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet [10,11]. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE [12] project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Integrative_analysis.png|thumb|400px|left|Figure 3: Novel analysis framework for medical data integration]]&lt;br /&gt;
&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus) [13], a population-based of 6182 participants from Lausanne, Switzerland.&lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm [14-16] (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus [17]: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure [18], but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture [19]. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm [20] (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT [2,3]) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines [21] (SVM) and Random Forests [22], to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Feasability_vs_innovation.png|thumb|300px|The trade-off relation between innovation and feasibility for our three main analysis goals.]]&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics” [23,24]). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data [15,16,20,28-33], (ii) our experience with GWAS [2,3,34-38], and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture [19]). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM [21] and “random forests” [22] to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support for the NOFIA project from my colleagues Prof. [http://www.unil.ch/edcvm/page78675.html Peter Vollenweider] (PI of [http://www.chuv.ch/chuv_home/recherche/chuv-recherche-presentation/chuv-recherche-histoires/recherche-histoires-colaus.htm  CoLaus], see [[media:letter_PV.pdf]]) and Prof. [http://www.unil.ch/actu/page62193.html Martin Preisig] (PI of [http://www.colaus.ch/en/cls_home/cls_pro_home/cls_pro_studies/cls_pro_studies-psycolaus.htm PsyCoLaus], see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T15:31:59Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Ground-breaking nature of this project */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|200px|Figure 1: Standard GWAS aim to identify genotypic variants (G) that are significantly&lt;br /&gt;
associated with a phenotypic trait (P) in order to improve annotation (A). The large number of variants imposes a huge burden of multiple hypotheses testing, which is even more severe when associating multiple phenotypes (b), or highdimensional molecular (M) traits (c).]]&lt;br /&gt;
&lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Integrative_analysis.png|thumb|400px|left|Figure 3: Novel analysis framework for medical data integration]]&lt;br /&gt;
&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland.&lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Feasability_vs_innovation.png|thumb|300px|The trade-off relation between innovation and feasibility for our three main analysis goals.]]&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support for the NOFIA project from my colleagues Prof. [http://www.unil.ch/edcvm/page78675.html Peter Vollenweider] (PI of [http://www.chuv.ch/chuv_home/recherche/chuv-recherche-presentation/chuv-recherche-histoires/recherche-histoires-colaus.htm  CoLaus], see [[media:letter_PV.pdf]]) and Prof. [http://www.unil.ch/actu/page62193.html Martin Preisig] (PI of [http://www.colaus.ch/en/cls_home/cls_pro_home/cls_pro_studies/cls_pro_studies-psycolaus.htm PsyCoLaus], see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T15:31:07Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Support */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|200px|Figure 1: Standard GWAS aim to identify genotypic variants (G) that are significantly&lt;br /&gt;
associated with a phenotypic trait (P) in order to improve annotation (A). The large number of variants imposes a huge burden of multiple hypotheses testing, which is even more severe when associating multiple phenotypes (b), or highdimensional molecular (M) traits (c).]]&lt;br /&gt;
&lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Integrative_analysis.png|thumb|300px|left|Figure 3: Novel analysis framework for medical data integration]]&lt;br /&gt;
&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland.&lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Feasability_vs_innovation.png|thumb|300px|The trade-off relation between innovation and feasibility for our three main analysis goals.]]&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support for the NOFIA project from my colleagues Prof. [http://www.unil.ch/edcvm/page78675.html Peter Vollenweider] (PI of [http://www.chuv.ch/chuv_home/recherche/chuv-recherche-presentation/chuv-recherche-histoires/recherche-histoires-colaus.htm  CoLaus], see [[media:letter_PV.pdf]]) and Prof. [http://www.unil.ch/actu/page62193.html Martin Preisig] (PI of [http://www.colaus.ch/en/cls_home/cls_pro_home/cls_pro_studies/cls_pro_studies-psycolaus.htm PsyCoLaus], see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T15:29:59Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Support */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|200px|Figure 1: Standard GWAS aim to identify genotypic variants (G) that are significantly&lt;br /&gt;
associated with a phenotypic trait (P) in order to improve annotation (A). The large number of variants imposes a huge burden of multiple hypotheses testing, which is even more severe when associating multiple phenotypes (b), or highdimensional molecular (M) traits (c).]]&lt;br /&gt;
&lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Integrative_analysis.png|thumb|300px|left|Figure 3: Novel analysis framework for medical data integration]]&lt;br /&gt;
&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland.&lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Feasability_vs_innovation.png|thumb|300px|The trade-off relation between innovation and feasibility for our three main analysis goals.]]&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support for the NOFIA project from my colleagues Prof. [http://www.unil.ch/edcvm/page78675.html Peter Vollenweider] (PI of [http://www.chuv.ch/chuv_home/recherche/chuv-recherche-presentation/chuv-recherche-histoires/recherche-histoires-colaus.htm  CoLaus], see [[media:letter_PV.pdf]]) and Prof. [http://www.unil.ch/actu/page62193.html Martin Preisig] (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T15:22:32Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Support */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|200px|Figure 1: Standard GWAS aim to identify genotypic variants (G) that are significantly&lt;br /&gt;
associated with a phenotypic trait (P) in order to improve annotation (A). The large number of variants imposes a huge burden of multiple hypotheses testing, which is even more severe when associating multiple phenotypes (b), or highdimensional molecular (M) traits (c).]]&lt;br /&gt;
&lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Integrative_analysis.png|thumb|300px|left|Figure 3: Novel analysis framework for medical data integration]]&lt;br /&gt;
&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland.&lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Feasability_vs_innovation.png|thumb|300px|The trade-off relation between innovation and feasibility for our three main analysis goals.]]&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support for the NOFIA project from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[media:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T14:06:35Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Context and state of the art */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|200px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|200px|Figure 1: Standard GWAS aim to identify genotypic variants (G) that are significantly&lt;br /&gt;
associated with a phenotypic trait (P) in order to improve annotation (A). The large number of variants imposes a huge burden of multiple hypotheses testing, which is even more severe when associating multiple phenotypes (b), or highdimensional molecular (M) traits (c).]]&lt;br /&gt;
&lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Integrative_analysis.png|thumb|300px|left|Figure 3: Novel analysis framework for medical data integration]]&lt;br /&gt;
&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland.&lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Feasability_vs_innovation.png|thumb|300px|The trade-off relation between innovation and feasibility for our three main analysis goals.]]&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[media:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T14:05:08Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Feasibility */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|300px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|300px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|300px]]&lt;br /&gt;
&lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Integrative_analysis.png|thumb|300px|left|Figure 3: Novel analysis framework for medical data integration]]&lt;br /&gt;
&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland.&lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Feasability_vs_innovation.png|thumb|300px|The trade-off relation between innovation and feasibility for our three main analysis goals.]]&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[media:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T14:03:58Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Ground-breaking nature of this project */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|300px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|300px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|300px]]&lt;br /&gt;
&lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Integrative_analysis.png|thumb|300px|left|Figure 3: Novel analysis framework for medical data integration]]&lt;br /&gt;
&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland.&lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Feasability_vs_innovation.png|thumb|300px]]&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[media:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=File:Feasability_vs_innovation.png</id>
		<title>File:Feasability vs innovation.png</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=File:Feasability_vs_innovation.png"/>
				<updated>2013-02-21T14:02:20Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=File:Integrative_analysis.png</id>
		<title>File:Integrative analysis.png</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=File:Integrative_analysis.png"/>
				<updated>2013-02-21T14:01:24Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T14:01:04Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Extended Synopsis of the project proposal */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|300px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|300px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|300px]]&lt;br /&gt;
&lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Integrative_analysis.png|thumb|300px]]&lt;br /&gt;
&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland. &lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
&lt;br /&gt;
[[Image:Feasability_vs_innovation.png|thumb|300px]]&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[media:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T13:58:10Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Context and state of the art */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|300px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|300px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|300px]]&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland. &lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[media:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=File:Standard_GWAS_for_molecular_phenotypes.png</id>
		<title>File:Standard GWAS for molecular phenotypes.png</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=File:Standard_GWAS_for_molecular_phenotypes.png"/>
				<updated>2013-02-21T13:57:13Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=File:Standard_GWAS_for_multiple_phenotypes.png</id>
		<title>File:Standard GWAS for multiple phenotypes.png</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=File:Standard_GWAS_for_multiple_phenotypes.png"/>
				<updated>2013-02-21T13:56:35Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T13:55:59Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Context and state of the art */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
[[Image:Standard_GWAS_for_one_phenotype.png|thumb|500px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_multiple_phenotypes.png|thumb|500px]]&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|500px]]&lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland. &lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[media:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=File:Standard_GWAS_for_one_phenotype.png</id>
		<title>File:Standard GWAS for one phenotype.png</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=File:Standard_GWAS_for_one_phenotype.png"/>
				<updated>2013-02-21T13:54:30Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T13:53:37Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
[[Image:Standard_GWAS_for_molecular_phenotypes.png|thumb|500px]]&lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland. &lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[media:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T13:49:40Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland. &lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[media:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T13:48:49Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Extended Synopsis of the project proposal */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[media:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Context and state of the art ===&lt;br /&gt;
 &lt;br /&gt;
Genome-wide association studies (GWAS) search for significant correlations between genetic markers (most commonly Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals (see Ref. [1] for review). The motivation is that such associations could provide new candidate loci for causal variants in genes (or their regulatory elements) that play a causal role for the phenotype of interest. In the clinical context there is hope that this would eventually lead to a better understanding of the genetic components of diseases and their risk factors, and potentially lead to more accurate diagnostics and novel therapeutic avenues.&lt;br /&gt;
&lt;br /&gt;
From the hundreds of GWAS that were performed for complex traits in the last years, it became apparent that for most complex traits the elucidated loci explain a very small fraction of the phenotypic variance, even for highly heritable traits that are known to have a significant genetic component to their variability. This applies not only to individual SNPs, where the most significantly associated ones rarely account for more than one percent of the variability, but also for additive combinations thereof, which even in the case of meta-studies with extremely high power (like GIANT2,3 integrating data from &amp;gt;100`000 individuals) usually explain less than 20%. This so-called “missing variance” enigma4 has triggered some disappointment for those who expected that GWAS could rapidly become of any practical use for assessing risk for predisposition to any of the complex diseases that have been studied.&lt;br /&gt;
&lt;br /&gt;
Several explanations for the lack of predictive power have been proposed4-6. Firstly, many traits may be influenced by genetic variants that are not yet routinely measured, including copy number variants (CNVs)5,7,8 and rare variants9 that are not captured by SNP-arrays. New genotyping approaches (including whole genome sequencing) will eventually overcome this technical limitation, but this will only increase the number of explanatory variables. Indeed, the more fundamental challenge of current GWAS is rooted in the enormous size of this feature space (i.e. around a million of non-redundant SNPs and potentially many more rare variants and CNVs). Within the standard GWAS approach each variant within the genotypic data (G) is independently tested for association with the phenotype (P) of interest (Fig. 1a). This imposes a huge burden of multiple hypotheses testing and only extremely significant associations survive stringent Bonferroni correction (i.e. those “low hanging fruits” above the line in the Manhattan plots in Fig.1), while there may be many more relevant genetic variants whose contributions are too small to be detected yet10,11. In some cases existing annotation (A) from previous GWAS, or data about the implicated gene’s function or expression, like those provided by the ENCODE12 project, may help to prioritize marginally significant associations. Yet, the burden of multiple testing is even more severe when considering sizable collections of phenotypic traits (Fig. 1b), let alone the high-dimensional features of molecular data (M), like those generated by metabolomics or transcriptomics assays (Fig. 1c). &lt;br /&gt;
&lt;br /&gt;
A complementary limitation relates to the fact that most models used in GWAS allow only for linear effects of single variants. Moreover, models including multiple variants usually combine their effects in an additive manner, ignoring possible interactions. Indeed, already the number of possible pair-wise interactions grows quadratically with the number of variants, so even gigantic cohorts are underpowered to overcome the combinatorial complexity within any brute-force modeling approach.&lt;br /&gt;
&lt;br /&gt;
=== Ground-breaking nature of this project ===&lt;br /&gt;
I surmise that the linear analysis pathway of current GWAS is central to their failure to achieve predictive power. What is needed to overcome the current impasse is an integrated approach with the following hallmarks (illustrated in Fig. 2):&lt;br /&gt;
&lt;br /&gt;
1)	Use all potentially relevant phenotypic information available for a cohort. This means that rather than considering one phenotype at a time, our framework will integrate many relevant traits in a single analysis.&lt;br /&gt;
&lt;br /&gt;
2)	Integrate intermediate molecular features whenever feasible. Molecular data provide valuable information on how genetic variability is transmitted to organismal traits and how this process is modulated by the environment. Thus establishing links between molecular features and both the available genotypic and phenotypic information is crucial for elucidating the causal pathways bridging from one to the other. &lt;br /&gt;
&lt;br /&gt;
3)	Reduce the complexity of all involved large-dimensional data. The idea is to identify meta-features p, m and g, which have significantly lower dimensionality than the corresponding full datasets (P, M and G). This applies in particular to the organismal phenotypes and the molecular data, which often contain redundant information (e.g. from closely related traits or molecular features) and for which various tools for dimensional reduction already exist. Yet, it is also potentially relevant for the enormous genotypic space, where little is known on how to reduce the effective number of variants beyond combining proximal ones which are in very high linkage disequilibrium (LD).&lt;br /&gt;
&lt;br /&gt;
4)	Use existing annotation to help the identification of relevant meta-features. The available annotation should be used to prioritize the potential relevance of the various meta-features. While for organismal traits there are sometimes well-established heuristics on how to combine elementary traits (like the BMI from weight and height), there is much less known on how to integrate effectively the large amount of information on genes that can help to prioritize the genetic variants impacting their function, or the molecular traits they affect. &lt;br /&gt;
&lt;br /&gt;
5)	Generate new annotation by combining these features. Any pair of meta-features can be used to create new knowledge. For example, testing models that explain molecular meta-phenotypes in terms of meta-genotypes can identify sets of genetic variants that have a molecular phenotypic effect. Prioritizing these variants can in turn improve power for modeling the response of down-stream organismal traits. Finally, connecting molecular and organismal meta-features is likely to provide interesting links between these different levels that can be used to further refine these features.  &lt;br /&gt;
&lt;br /&gt;
6)	Perform an iterative analysis that progressively identifies the most relevant meta-features needed for a particular biomedical question. This implies that the analysis should not stop once interesting links between the different data have been identified. Rather, these links should inform the integrative model to further refine and prioritize the meta-features within a specific analysis. For example, starting from a particular set of organismal phenotypes, one may identify the most relevant molecular traits and/or genotypes, which in turn may implicate additional phenotypes, and so on.&lt;br /&gt;
&lt;br /&gt;
This integrated analysis framework is conceptually very different from the conventional GWAS pipeline, and has the potential to overcome some of its limitations. It builds on existing analysis tools developed previously by my group (see Early Achievements on page 7), that will be adapted and extended. &lt;br /&gt;
&lt;br /&gt;
Importantly, as for any innovative approach, it will have to be evaluated rigorously within a concrete setting to demonstrate its potential benefits. We are in a unique position to have direct access to genotypic, phenotypic and molecular data from the Cohorte Lausannoise (CoLaus)13, a population-based of 6182 participants from Lausanne, Switzerland. &lt;br /&gt;
&lt;br /&gt;
=== Project Objectives ===&lt;br /&gt;
&lt;br /&gt;
==== 1)	Uncoupled generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform modular analyses generating meta-features from all molecular profiles: Using our Iterative Signature Algorithm14-16 (ISA) and other standard tools (like PCA or clustering) we will first analyze existing metabolomics data from ~1000 CoLaus samples to access whether metabolomics meta-features reflect any annotated compounds or pathways. Using RNAseq we will also generate transcriptomics profiles for lymphoblastoid cell-lines derived from the same samples to enable the analogous analysis for expression data. We will then perform standard GWAS to access which meta-features have a significant genetically determined component and whether the association is stronger than any of its constituent metabolomics or transcriptomics features.&lt;br /&gt;
&lt;br /&gt;
b)	Perform modular analyses for phenotypic traits, including both the clinical phenotypes gathered for CoLaus and the mental health parameters obtained within its sub-study PsyCoLaus17: We will analyze which traits co-aggregate in the same module and perform standard GWAS to test for a stronger genetic component of any of the phenotypic meta-features (as in 1a). We will also check systematically whether linear models for major cardio-vascular risk factors explain more of the data when including certain meta-features related to environmental conditions as co-variables (similar to correcting for population stratification using genotypic PCs).&lt;br /&gt;
&lt;br /&gt;
c)	Develop new methods for aggregating genotypes: We will explore new ways to reduce the complexity of the genotypic data. PCA analysis has been successful in capturing the population structure18, but these very global features usually reflect shared environmental factors (like diet) and are therefore considered as co-variables that can mask the causal effects of individual genotypes. What is needed are new approaches to bundle relatively small groups of genotypes that co-segregate more often than expected. This may include LD blocks, but more interestingly long-range interactions, on which there is an increasing body of complementary information from new genomics tools unravelling chromosome architecture19. This will allow for reducing the burden of multiple hypotheses testing, because all constituent genotypes can be discarding at once, if their representative “meta-genotypes” exhibits no association signal with a phenotype of interest.  &lt;br /&gt;
&lt;br /&gt;
==== 2)	Coupled and iterative generation of meta-features ====&lt;br /&gt;
&lt;br /&gt;
a)	Perform coupled analysis of distinct sets of molecular and clinical phenotypes using our Ping-Pong Algorithm20 (PPA) and other tools (like Partial-Least-Squares) in order to generate modular links between the various types of data: We will use this approach to co-analyze pairs of phenotypic datasets, including:&lt;br /&gt;
&lt;br /&gt;
i)	NMR vs mass-spec metabolomics data to characterize the overlap and comple¬mentarity of these two technologies, and derive robust metabolomics signatures using coherent features from both types of data;&lt;br /&gt;
&lt;br /&gt;
ii)	Metabolomics vs transcriptomics data to reveal relationships between gene expression and metabolite concentrations;&lt;br /&gt;
&lt;br /&gt;
iii)	Blood chemistry data vs metabolomics and transcriptomics data to better understand the relation between the relatively inexpensive measurements routinely used in the clinics and the features of high-resolution molecular profiles;&lt;br /&gt;
&lt;br /&gt;
iv)	Organismal traits vs blood chemistry data, metabolomics and transcriptomics data to identify potential molecular signatures for disease-related abnormal organismal profiles.&lt;br /&gt;
&lt;br /&gt;
b)	Score genotypic markers for their relevance to any of the meta-features derived in the previous analyses. This will be done using three strategies:&lt;br /&gt;
&lt;br /&gt;
i)	Within the annotation-based approach genotypic markers will receive scores (or “priors” within a Bayesian statistic framework) if they are in LD with a gene (or its regulatory region) that can be linked to the meta-feature based on existing annotation (e.g. a known enzyme involved in the metabolism of a particular compound tagged by a metabolomics meta-feature); &lt;br /&gt;
&lt;br /&gt;
ii)	Within the model-based approach genotypic markers will receive priors based on the likelihood ratio of a specific model (e.g. a (set of) marker(s) explaining the meta-phenotype against some null model) using a regression or machine learning framework, c.f. point (3);&lt;br /&gt;
&lt;br /&gt;
iii)	Iterative refine all meta-features: The sets of most relevant genotypic meta-features (i.e. sets of markers with the highest scores) will be used as new cues to update and refine the organismal and molecular meta-features (c.f. Fig. 2). This process will be repeated as long as there is a measurable increase in predictive power, see point (3).&lt;br /&gt;
&lt;br /&gt;
==== 3)	Benchmarking ====&lt;br /&gt;
&lt;br /&gt;
It is important to combine this framework with a rigorous benchmarking procedure, since the identification and refinement procedure for meta-features in (1) and (2) will unavoidably include heuristic elements. Here we take a practical point of view with regard to this general challenge: Ultimately the goal of any framework for medical data integration should be the generation of new knowledge and the ability to predict clinically relevant endpoints, based on the available data. &lt;br /&gt;
&lt;br /&gt;
a)	As for the first goal, we will investigate systematically whether our novel analysis frame work is able to elucidate genetic variants whose relevance for certain phenotypes has been demonstrated by extremely large meta-studies (like GIANT2,3) using only CoLaus data. In other words, we will ask whether data from a moderately sized cohort, if analyzed in a more sophisticated manner (e.g. using the scores in 2b), would be able to recapitulate (at least some of) the results of extremely well-powered studies. &lt;br /&gt;
&lt;br /&gt;
b)	As for the second goal, we will take advantage of the fact that CoLaus recently has become a longitudinal study, allowing for prospective analyses. Specifically, one can try to predict various clinically relevant parameters measured at follow-up (including cardio-vascular incidences, development of diabetes and even death) based on the data that were available at the baseline investigation (i.e. about five years earlier). We will apply well-developed machine learning tools, like Support-Vector Machines21 (SVM) and Random Forests22, to compare the predictive power using our meta-features with that based on the unprocessed raw data (using a cross-validation methodology). &lt;br /&gt;
&lt;br /&gt;
=== Feasibility ===&lt;br /&gt;
 &lt;br /&gt;
Devising new strategies for medical data analysis is very timely at the current data deluge. Central to our proposal is our vision of the integrative framework illustrated in Fig. 2, which departs radically from the canonical analysis pipelines used by most GWAS. Nevertheless it is important to realize that the impasses of this linear and brute-force approach are becoming more and more realized, and that a growing community is moving towards a more integrated approach (sometimes termed as “Systems Genetics”23,24). This approach has already made remarkable progress for model organisms25-27, but is less established for human data. Thus, while my proposal derives its strengths and uniqueness from the available resources outlined above (including the first massive collection of already existing metabolomics and that will be matched with transcriptomics data), it is well aligned and likely to cross-pollinate with other research in this field.&lt;br /&gt;
&lt;br /&gt;
The feasibility of our proposal rests primarily on the well-established nature of the three components we aim to synthesize: (i) our expertise with (modular) analysis of large-scale phenotypic data15,16,20,28-33, (ii) our experience with GWAS2,3,34-38, and (iii) our direct access to existing data from the CoLaus study. The challenge lies in combining these assets, and connecting them with new methodologies. The trade-off relation between innovation and feasibility is determined by this difficulty and increases in a balanced manner for our three main analysis objectives (see Fig. 3 for illustration): For objectives (1a) and (1b) we can rely largely on our existing resources in terms of data and analysis tools. Objective 1c is a bit more challenging, because it calls for new ideas to reduce the genotypic complexity (like the use of information on chromosomal architecture19). Objective 2a has great potential to yield new insights of high methodological (2a-i/ii) or clinical (2a-iii/iv) relevance, but requires the integration of external annotation. We have ample experience in using gene annotation (like GO term enrichment analysis). We also profit from the close proximity to our colleagues at the Lausanne University Hospital, with whom we can consult on clinical matters. Since the analysis of metabolomics data is not within our direct expertise we are fortunate to have an on-going collaboration with the Steinbeck Chemoinformatics group at the European Bioinformatics Institute (EBI), which has great experience in the analysis of mass- and NMR-spectra for structure elucidation. This support structure will also be invaluable for objective 2b-i, which also relies on the integration of external information. The most significant challenge in remaining objectives is the integration of machine-learning approaches with our modular analysis tools. We have a solid background in non-linear classification theory, so we are confident that we can apply the well-established SVM21 and “random forests”22 to the problem at hand.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
1.	McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008).&lt;br /&gt;
&lt;br /&gt;
2.	Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-8 (2010).&lt;br /&gt;
&lt;br /&gt;
3.	Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. &lt;br /&gt;
Nat Genet 42, 949-60 (2010).&lt;br /&gt;
4.	Maher, B. Personal genomes: The case of the missing heritability. Nature 456, 18-21 (2008).&lt;br /&gt;
&lt;br /&gt;
5.	Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446-50 (2010).&lt;br /&gt;
&lt;br /&gt;
6.	Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-53 (2009).&lt;br /&gt;
&lt;br /&gt;
7.	McCarroll, S.A. Extending genome-wide association studies to copy-number variation. Hum Mol Genet 17, R135-42 (2008).&lt;br /&gt;
&lt;br /&gt;
8.	Beckmann, J.S., Sharp, A.J. &amp;amp; Antonarakis, S.E. CNVs and genetic medicine (excitement and consequences of a rediscovery). Cytogenet Genome Res 123, 7-16 (2008).&lt;br /&gt;
&lt;br /&gt;
9.	Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009).&lt;br /&gt;
&lt;br /&gt;
10.	Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-9 (2010).&lt;br /&gt;
&lt;br /&gt;
11.	Visscher, P.M., Brown, M.A., McCarthy, M.I. &amp;amp; Yang, J. Five years of GWAS discovery. Am J Hum Genet 90, 7-24 (2012).&lt;br /&gt;
&lt;br /&gt;
12.	Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).&lt;br /&gt;
&lt;br /&gt;
13.	Firmann, M. et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8, 6 (2008).&lt;br /&gt;
&lt;br /&gt;
14.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67, 031902 (2003).&lt;br /&gt;
&lt;br /&gt;
15.	Ihmels, J., Bergmann, S. &amp;amp; Barkai, N. Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993-2003 (2004).&lt;br /&gt;
16.	Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7 (2002).&lt;br /&gt;
&lt;br /&gt;
17.	Preisig, M. et al. The PsyCoLaus study: methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).&lt;br /&gt;
&lt;br /&gt;
18.	Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98-101 (2008).&lt;br /&gt;
&lt;br /&gt;
19.	van Steensel, B. &amp;amp; Dekker, J. Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28, 1089-1095 (2010).&lt;br /&gt;
&lt;br /&gt;
20.	Kutalik, Z., Beckmann, J.S. &amp;amp; Bergmann, S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26, 531-9 (2008).&lt;br /&gt;
&lt;br /&gt;
21.	Cristianini, N. &amp;amp; Shawe-Taylor, J. An introduction to Support Vector Machines : and other kernel-based learning methods, xi, 189 p. (Cambridge University Press, Cambridge, 2000).&lt;br /&gt;
&lt;br /&gt;
22.	Breiman, L. Random forests. Machine Learning 45, 5-32 (2001).&lt;br /&gt;
&lt;br /&gt;
23.	Li, H. Systems genetics in &amp;quot;-omics&amp;quot; era: current and future development. Theory Biosci 132, 1-16 (2013).&lt;br /&gt;
&lt;br /&gt;
24.	Nadeau, J.H. &amp;amp; Dudley, A.M. Genetics. Systems genetics. Science 331, 1015-6 (2011).&lt;br /&gt;
&lt;br /&gt;
25.	Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-31 (2010).&lt;br /&gt;
26.	Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173-8 (2012).&lt;br /&gt;
&lt;br /&gt;
27.	Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. &amp;amp; Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234-7 (2013).&lt;br /&gt;
&lt;br /&gt;
28.	Bergmann, S., Ihmels, J. &amp;amp; Barkai, N. Similarities and Differences in Genome-Wide Expression Data of Six Organisms. PLoS Biol 2, E9 (2004).&lt;br /&gt;
&lt;br /&gt;
29.	Henrichsen, C.N. et al. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol 7, e1001054 (2011).&lt;br /&gt;
&lt;br /&gt;
30.	Ihmels, J., Bergmann, S., Berman, J. &amp;amp; Barkai, N. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 1, e39 (2005).&lt;br /&gt;
&lt;br /&gt;
31.	Ihmels, J., Levy, R. &amp;amp; Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol 22, 86-92 (2004).&lt;br /&gt;
&lt;br /&gt;
32.	Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-8 (2011).&lt;br /&gt;
&lt;br /&gt;
33.	Piasecka, B., Kutalik, Z., Roux, J., Bergmann, S. &amp;amp; Robinson-Rechavi, M. Comparative modular analysis of gene expression in vertebrate organs. BMC Genomics 13, 124 (2012).&lt;br /&gt;
&lt;br /&gt;
34.	Genick, U.K. et al. Sensitivity of genome-wide-association signals to phenotyping strategy: the PROP-TAS2R38 taste association as a benchmark. PLoS One 6, e27745 (2011).&lt;br /&gt;
&lt;br /&gt;
35.	Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42, 786-9 (2010).&lt;br /&gt;
&lt;br /&gt;
36.	Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. &amp;amp; Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS One 6, e28415 (2011).&lt;br /&gt;
&lt;br /&gt;
37.	Kutalik, Z. et al. Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12, 1-17 (2011).&lt;br /&gt;
&lt;br /&gt;
38.	Kutalik, Z., Whittaker, J., Waterworth, D., Beckmann, J.S. &amp;amp; Bergmann, S. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet Epidemiol 35, 341-9 (2011).&lt;br /&gt;
&lt;br /&gt;
39.	Prelic, A. et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122-9 (2006).&lt;br /&gt;
&lt;br /&gt;
40.	Csardi, G., Kutalik, Z. &amp;amp; Bergmann, S. Modular analysis of gene expression data with R. Bioinformatics 26, 1376-7 (2010).&lt;br /&gt;
&lt;br /&gt;
41.	Luscher, A. et al. ExpressionView--an interactive viewer for modules identified in gene expression data. Bioinformatics 26, 2062-3 (2010).&lt;br /&gt;
&lt;br /&gt;
42.	Chasman, D.I. et al. Integration of Genome-Wide Association Studies with Biological Knowledge Identifies Six Novel Genes Related to Kidney Function. Hum Mol Genet (2012).&lt;br /&gt;
&lt;br /&gt;
43.	Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet 8, e1002607 (2012).&lt;br /&gt;
&lt;br /&gt;
44.	Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet 8, e1002584 (2012).&lt;br /&gt;
&lt;br /&gt;
45.	Kapur, K. et al. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene. PLoS Genet 6, e1001035 (2010).&lt;br /&gt;
&lt;br /&gt;
46.	Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 1338-45, 1345 e1-7 (2010).&lt;br /&gt;
&lt;br /&gt;
47.	Valsesia, A. et al. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 13, 241 (2012).&lt;br /&gt;
&lt;br /&gt;
48.	Schupbach, T., Xenarios, I., Bergmann, S. &amp;amp; Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468-9 (2010).&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Main_Page</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Main_Page"/>
				<updated>2013-02-21T13:41:22Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Welcome to the Computational Biology Group! */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Homepage]]&lt;br /&gt;
&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
&amp;lt;newsbulletins&amp;gt;header=NEWS|limit=3&amp;lt;/newsbulletins&amp;gt;&lt;br /&gt;
A history of all news can be found [[History | here]].&lt;br /&gt;
&lt;br /&gt;
== Welcome to the ''Computational Biology Group''! ==&lt;br /&gt;
&lt;br /&gt;
[[Image:CBG_picture_2011.jpg|thumb|500px]]&lt;br /&gt;
&lt;br /&gt;
The ''Computational Biology Group'' (CBG) is part of the ''[http://www.unil.ch/dgm/page13525_en.html Department of Medical Genetics]'' at the [http://www.unil.ch University of Lausanne]. We have interest in various fields related to Computational Biology, which are detailed in the [[Science]] section of this wiki. Briefly, there are two main directions: We develop and apply methods for the integrative analysis of large-scale biological and clinical data. This includes ''molecular'' phenotypes like gene-expression data, as well as ''organismal'' phenotypes (ranging from patient data to growth arrays). We focus particularly on relating these phenotypes to genotypes such as &amp;quot;Single Nucleotide Polymorphisms&amp;quot; (SNPs) and &amp;quot;Copy Number Variants&amp;quot; (CNVs) measured by microarrays or next-generation sequencing. Our goal is to move towards predictive models in order to improve the diagnosis, prevention and treatment of disease. A complementary direction of research pertains to relatively small genetic networks, whose components are well-known. We collaborate closely with experts of the field to identify biological systems that can be modeled quantitatively. Our goal in developing such models is not only to give an approximate description of system, but also to obtain a better understanding of its properties. For example, regulatory networks evolved to function reliably under ever-changing environmental conditions. This notion of robustness can guide computational analysis and provide constraints on models that complement those from direct measurements of the system's output.&lt;br /&gt;
&lt;br /&gt;
In general, our group seeks an interdisciplinary approach, bridging the traditional gaps between physics, mathematics and biology. Our lab collaborates with experimental groups within and outside our department. In particular, due to our proximity to the University Hospital ([http://www.chuv.ch CHUV]) we have close contacts to medical research groups and assist the analysis of clinical data.&lt;br /&gt;
&lt;br /&gt;
== General info on this wiki ==&lt;br /&gt;
This wiki is the main instrument to centralize and archive information on and generated by the CBG. [mailto:wwwcbg@unil.ch Drop an email to the admin] if you have any questions or need an account.&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Main_Page</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Main_Page"/>
				<updated>2013-02-21T13:40:57Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Welcome to the Computational Biology Group! */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Homepage]]&lt;br /&gt;
&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
&amp;lt;newsbulletins&amp;gt;header=NEWS|limit=3&amp;lt;/newsbulletins&amp;gt;&lt;br /&gt;
A history of all news can be found [[History | here]].&lt;br /&gt;
&lt;br /&gt;
== Welcome to the ''Computational Biology Group''! ==&lt;br /&gt;
&lt;br /&gt;
[[Image:CBG_picture_2011.jpg|thumb|500px]]&lt;br /&gt;
&lt;br /&gt;
The 'Computational Biology Group' (CBG) is part of the ''[http://www.unil.ch/dgm/page13525_en.html Department of Medical Genetics]'' at the [http://www.unil.ch University of Lausanne]. We have interest in various fields related to Computational Biology, which are detailed in the [[Science]] section of this wiki. Briefly, there are two main directions: We develop and apply methods for the integrative analysis of large-scale biological and clinical data. This includes ''molecular'' phenotypes like gene-expression data, as well as ''organismal'' phenotypes (ranging from patient data to growth arrays). We focus particularly on relating these phenotypes to genotypes such as &amp;quot;Single Nucleotide Polymorphisms&amp;quot; (SNPs) and &amp;quot;Copy Number Variants&amp;quot; (CNVs) measured by microarrays or next-generation sequencing. Our goal is to move towards predictive models in order to improve the diagnosis, prevention and treatment of disease. A complementary direction of research pertains to relatively small genetic networks, whose components are well-known. We collaborate closely with experts of the field to identify biological systems that can be modeled quantitatively. Our goal in developing such models is not only to give an approximate description of system, but also to obtain a better understanding of its properties. For example, regulatory networks evolved to function reliably under ever-changing environmental conditions. This notion of robustness can guide computational analysis and provide constraints on models that complement those from direct measurements of the system's output.&lt;br /&gt;
&lt;br /&gt;
In general, our group seeks an interdisciplinary approach, bridging the traditional gaps between physics, mathematics and biology. Our lab collaborates with experimental groups within and outside our department. In particular, due to our proximity to the University Hospital ([http://www.chuv.ch CHUV]) we have close contacts to medical research groups and assist the analysis of clinical data.&lt;br /&gt;
&lt;br /&gt;
== General info on this wiki ==&lt;br /&gt;
This wiki is the main instrument to centralize and archive information on and generated by the CBG. [mailto:wwwcbg@unil.ch Drop an email to the admin] if you have any questions or need an account.&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Main_Page</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Main_Page"/>
				<updated>2013-02-21T13:40:36Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Welcome to the Computational Biology Group (CBG) at the Department of Medical Genetics of the University of Lausanne! */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Homepage]]&lt;br /&gt;
&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
&amp;lt;newsbulletins&amp;gt;header=NEWS|limit=3&amp;lt;/newsbulletins&amp;gt;&lt;br /&gt;
A history of all news can be found [[History | here]].&lt;br /&gt;
&lt;br /&gt;
== Welcome to the ''Computational Biology Group''! ==&lt;br /&gt;
&lt;br /&gt;
[[Image:CBG_picture_2011.jpg|thumb|500px]]&lt;br /&gt;
&lt;br /&gt;
The 'Computational Biology Group (CBG) is part of the ''[http://www.unil.ch/dgm/page13525_en.html Department of Medical Genetics]'' at the [http://www.unil.ch University of Lausanne]. We have interest in various fields related to Computational Biology, which are detailed in the [[Science]] section of this wiki. Briefly, there are two main directions: We develop and apply methods for the integrative analysis of large-scale biological and clinical data. This includes ''molecular'' phenotypes like gene-expression data, as well as ''organismal'' phenotypes (ranging from patient data to growth arrays). We focus particularly on relating these phenotypes to genotypes such as &amp;quot;Single Nucleotide Polymorphisms&amp;quot; (SNPs) and &amp;quot;Copy Number Variants&amp;quot; (CNVs) measured by microarrays or next-generation sequencing. Our goal is to move towards predictive models in order to improve the diagnosis, prevention and treatment of disease. A complementary direction of research pertains to relatively small genetic networks, whose components are well-known. We collaborate closely with experts of the field to identify biological systems that can be modeled quantitatively. Our goal in developing such models is not only to give an approximate description of system, but also to obtain a better understanding of its properties. For example, regulatory networks evolved to function reliably under ever-changing environmental conditions. This notion of robustness can guide computational analysis and provide constraints on models that complement those from direct measurements of the system's output.&lt;br /&gt;
&lt;br /&gt;
In general, our group seeks an interdisciplinary approach, bridging the traditional gaps between physics, mathematics and biology. Our lab collaborates with experimental groups within and outside our department. In particular, due to our proximity to the University Hospital ([http://www.chuv.ch CHUV]) we have close contacts to medical research groups and assist the analysis of clinical data.&lt;br /&gt;
&lt;br /&gt;
== General info on this wiki ==&lt;br /&gt;
This wiki is the main instrument to centralize and archive information on and generated by the CBG. [mailto:wwwcbg@unil.ch Drop an email to the admin] if you have any questions or need an account.&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Main_Page</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Main_Page"/>
				<updated>2013-02-21T13:39:03Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: /* Welcome to the Computational Biology Group (CBG) at the Department of Medical Genetics of the University of Lausanne! */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Homepage]]&lt;br /&gt;
&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
&amp;lt;newsbulletins&amp;gt;header=NEWS|limit=3&amp;lt;/newsbulletins&amp;gt;&lt;br /&gt;
A history of all news can be found [[History | here]].&lt;br /&gt;
&lt;br /&gt;
== Welcome to the ''Computational Biology Group'' (CBG) at the ''[http://www.unil.ch/dgm/page13525_en.html Department of Medical Genetics]'' of the [http://www.unil.ch University of Lausanne]! ==&lt;br /&gt;
&lt;br /&gt;
[[Image:CBG_picture_2011.jpg|thumb|500px]]&lt;br /&gt;
&lt;br /&gt;
We have interest in various fields related to Computational Biology, which are detailed in the [[Science]] section of this wiki. Briefly, there are two main directions: We develop and apply methods for the integrative analysis of large-scale biological and clinical data. This includes ''molecular'' phenotypes like gene-expression data, as well as ''organismal'' phenotypes (ranging from patient data to growth arrays). We focus particularly on relating these phenotypes to genotypes such as &amp;quot;Single Nucleotide Polymorphisms&amp;quot; (SNPs) and &amp;quot;Copy Number Variants&amp;quot; (CNVs) measured by microarrays or next-generation sequencing. Our goal is to move towards predictive models in order to improve the diagnosis, prevention and treatment of disease. A complementary direction of research pertains to relatively small genetic networks, whose components are well-known. We collaborate closely with experts of the field to identify biological systems that can be modeled quantitatively. Our goal in developing such models is not only to give an approximate description of system, but also to obtain a better understanding of its properties. For example, regulatory networks evolved to function reliably under ever-changing environmental conditions. This notion of robustness can guide computational analysis and provide constraints on models that complement those from direct measurements of the system's output.&lt;br /&gt;
&lt;br /&gt;
In general, our group seeks an interdisciplinary approach, bridging the traditional gaps between physics, mathematics and biology. Our lab collaborates with experimental groups within and outside our department. In particular, due to our proximity to the University Hospital ([http://www.chuv.ch CHUV]) we have close contacts to medical research groups and assist the analysis of clinical data.&lt;br /&gt;
&lt;br /&gt;
== General info on this wiki ==&lt;br /&gt;
This wiki is the main instrument to centralize and archive information on and generated by the CBG. [mailto:wwwcbg@unil.ch Drop an email to the admin] if you have any questions or need an account.&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T13:29:18Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[media:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[media:letter_MP.pdf]]).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T13:25:58Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[file:letter_PV.pdf]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[file:letter_MP.pdf]]).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=File:Letter_MP.pdf</id>
		<title>File:Letter MP.pdf</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=File:Letter_MP.pdf"/>
				<updated>2013-02-21T13:24:24Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: Sven uploaded a new version of &amp;amp;quot;File:Letter MP.pdf&amp;amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=File:Letter_MP.pdf</id>
		<title>File:Letter MP.pdf</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=File:Letter_MP.pdf"/>
				<updated>2013-02-21T13:22:17Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T13:17:41Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[file:letter_PV.pdf letter]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[Media:letter_MP]]).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=File:Letter_PV.pdf</id>
		<title>File:Letter PV.pdf</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=File:Letter_PV.pdf"/>
				<updated>2013-02-21T13:15:15Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T11:54:22Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[Media:letter_PV]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[Media:letter_MP]]).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T11:53:44Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
 I have strong support from my colleagues Prof. Peter Vollenweider (PI of CoLaus, see [[Media:letter_PV]]) and Prof. Martin Preisig (PI of PsyCoLaus, see [[Media:letter_MP]]).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=NOFIA</id>
		<title>NOFIA</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=NOFIA"/>
				<updated>2013-02-21T11:52:35Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: Created page with &amp;quot;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis  of large-sca...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is an internal page providing additional information about our long-term vision about &amp;quot;A '''No'''vel '''F'''ramework for the '''I'''ntegrated '''A'''nalysis &lt;br /&gt;
of large-scale biomedical data&amp;quot; (NOFIA). We have applied for funding NOFIA within an ERC Consolidator Grant.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
 &lt;br /&gt;
Vast amounts of financial and human resources have been invested into clinical and genomic profiling of large cohorts creating enormous amounts of data. While genome-wide association studies (GWAS) have already successfully revealed new candidate loci that potentially affect human disease or related phenotypes, they still fail to predict a significant portion of the heritable component of phenotypic variability. We believe that part of this failure may be overcome by developing novel analysis concepts and methodologies. &lt;br /&gt;
&lt;br /&gt;
The main goal of this proposal is to develop and apply a new analysis framework for the integrated analysis of large-scale medical data. Such data include molecular phenotypes as well as large collections of organismal or clinical observables. Molecular phenotypes, like expression- or metabolomics-profiles are now becoming available for many cohorts, but efficient methods to integrate these data into association studies are still missing. We propose to adapt and extend the modular technologies we have developed in recent years in order to address this challenge. Specifically, we plan to (1) perform modular analyses generating meta-phenotypes of metabolomics, transcriptomics and large-scale clinical data from genotyped individuals in order to facilitate the identification of genetic variants associated with these traits, (2) perform coupled co-module decompositions for the unsupervised integrated analysis of distinct large sets of molecular and clinical phenotypes in order to generate modular links between the various types of data, and (3) develop predictive models using (co )modules as features and explore practical applications aimed at predicting disease risks or response to treatment with better accuracy than classical approaches based on individual biomarkers. &lt;br /&gt;
&lt;br /&gt;
Our work will synthesize our expertise with modular analysis (including our well-established state-of-the-art tools) and our ample experience with GWAS. While our methodological developments will be set within concrete bio-medical questions and applied to real data from the Cohorte Lausannoise and other large data collections, they will be relevant for a large field of data-driven bio-medical research.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
 I have strong support from my colleagues Peter Vollenweider (PI of CoLaus, see [[Media:letter]]) and Martin Preisig (PI of PsyCoLaus, see [[Media:letter]]).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Extended Synopsis of the project proposal ==&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Dates_%26_Deadlines_2013</id>
		<title>Dates &amp; Deadlines 2013</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Dates_%26_Deadlines_2013"/>
				<updated>2013-02-09T00:56:58Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: Created page with &amp;quot;* '''First ''kick-off'' meeting''': Friday, 22. February 2012 at 9:00 (Bugnon 27, room TBA) ** The concept of the course will be explained (~15 min, [[User:Micha|Micha Hersch]...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* '''First ''kick-off'' meeting''': Friday, 22. February 2012 at 9:00 (Bugnon 27, room TBA)&lt;br /&gt;
** The concept of the course will be explained (~15 min, [[User:Micha|Micha Hersch]])&lt;br /&gt;
** The projects will be briefly presented by the respective supervisors (~15 min for each supervisor)&lt;br /&gt;
** The course wiki will be explained (~15 min, [[User:Tim|Tim Hohm]])&lt;br /&gt;
&lt;br /&gt;
* Students need &lt;br /&gt;
** to '''confirm their participation''' in this course by: Tuesday, 26. February 2013 (midnight),&lt;br /&gt;
** to '''form study groups''' of 2-3 students and &lt;br /&gt;
** to announce by e-mail the '''three projects they are most interested in working on'''&lt;br /&gt;
&lt;br /&gt;
* Projects will be assigned no later than: Thursday, 28. February 2013&lt;br /&gt;
&lt;br /&gt;
* The first session with the supervisors will arranged per e-mail&lt;br /&gt;
&lt;br /&gt;
* ''free working &amp;amp; weekly meetings with supervisors'': until the intermediate meeting&lt;br /&gt;
&lt;br /&gt;
* '''[[Intermediate report meeting 2013]]''': TBA&lt;br /&gt;
* Students will present their projects highlighting&lt;br /&gt;
** the goals they want to achieve&lt;br /&gt;
** the analysis tools they will use&lt;br /&gt;
** potential challenges &lt;br /&gt;
** preliminary results (if any) &lt;br /&gt;
&lt;br /&gt;
* ''free working &amp;amp; weekly meetings with supervisors'': until end of May 2012 &lt;br /&gt;
&lt;br /&gt;
* '''[[Final report meeting 2013]]''':  TBA&lt;br /&gt;
** Each study group presents their results (Presentations should take no longer than 20 minutes and are the basis for the student evaluation)&lt;br /&gt;
** Feedback on the course in general&lt;br /&gt;
&lt;br /&gt;
(Meetings will be arranged freely between students and supervisors.)&lt;br /&gt;
&lt;br /&gt;
* back to [[UNIL BSc course: &amp;quot;Solving Biological Problems that require Math 2013&amp;quot;]]&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=UNIL_BSc_course:_%22Solving_Biological_Problems_that_require_Math_2013%22</id>
		<title>UNIL BSc course: &quot;Solving Biological Problems that require Math 2013&quot;</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=UNIL_BSc_course:_%22Solving_Biological_Problems_that_require_Math_2013%22"/>
				<updated>2013-02-09T00:51:12Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: Created page with &amp;quot;* Coordinators: Sven Bergmann and Micha Hersch * Concept * Useful tools * Dates &amp;amp; Deadlines 2013 * Projects ** to be announced * Supervisors ** ...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* Coordinators: [[Sven Bergmann]] and [[User:Micha|Micha Hersch]]&lt;br /&gt;
* [[Concept]]&lt;br /&gt;
* [[Useful tools]]&lt;br /&gt;
* [[Dates &amp;amp; Deadlines 2013]]&lt;br /&gt;
* Projects&lt;br /&gt;
** to be announced&lt;br /&gt;
* Supervisors&lt;br /&gt;
** [[Murielle Bochud]]&lt;br /&gt;
** [[User:Sascha|Sascha Dalessi]]&lt;br /&gt;
** [[User:Zoltan|Zoltan Kutalik]]&lt;br /&gt;
** [[User:Diana|Diana Marek]]&lt;br /&gt;
** [[User:PedroMarquesVidal|Pedro Marques-Vidal]]&lt;br /&gt;
** [[User:Micha|Micha Hersch]]&lt;br /&gt;
** [[User:NicolasSalamin|Nicolas Salamin]]&lt;br /&gt;
** [[User:Anna|Anna Kostikova]]&lt;br /&gt;
** [[User:Tanguy|Tanguy Corre]]&lt;br /&gt;
** [[User:David|David Lamparter]]&lt;br /&gt;
&lt;br /&gt;
* see also [[UNIL BSc course: &amp;quot;Solving Biological Problems that require Math 2012&amp;quot;]]&lt;br /&gt;
* [[How to upload files and edit your project page]]&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann</id>
		<title>Sven Bergmann</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann"/>
				<updated>2013-01-16T17:01:41Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Bulletins]]&lt;br /&gt;
&amp;lt;newstitle&amp;gt;Sven Bergmann is Associate Professor&amp;lt;/newstitle&amp;gt;&lt;br /&gt;
&amp;lt;teaser&amp;gt;&lt;br /&gt;
Sven Bergmann has successfully completed his&lt;br /&gt;
tenure-track as Assistant Professor and is Associate Professor since August&lt;br /&gt;
2010.&lt;br /&gt;
&amp;lt;date&amp;gt;1 Aug 2010 — 9:12&amp;lt;/date&amp;gt;&lt;br /&gt;
&amp;lt;/teaser&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:Sven_pic.jpg|200px|thumb|left|Sven Bergmann, PI]] Sven Bergmann heads the [http://www2.unil.ch/cbg ''Computational Biology Group''] in the [http://www.unil.ch/dgm Department of Medical Genetics] at the [http://www.unil.ch University of Lausanne]. He joined the [http://www.unil.ch/fbm Faculty of Biology and Medicine] in 2005 as Assistant Professor and became Associate Professor in 2010 after successfully completing his tenure track. He is also affiliated with the [http://www.isb-sib.ch/ Swiss Institute of Bioinformatics] since 2006.&lt;br /&gt;
&lt;br /&gt;
Sven studied theoretical particle physics with [http://www.weizmann.ac.il/home/ftnir Prof. Yosef Nir] at the [http://www.weizmann.ac.il Weizmann Institute of Science] (Israel) where he received his PhD in 2001 for [http://www-spires.slac.stanford.edu/spires/find/hep/www?rawcmd=find+author+bergmann%2C+s+and+not+author+storchi&amp;amp;FORMAT=WWW&amp;amp;SEQUENCE= studies of neutrino oscillations and CP violation]. He then joined the laboratory of [http://barkai-serv.weizmann.ac.il/GroupPage/ Prof. Naama Barkai] in the Department of Molecular Genetics at the same institute, where he first worked as a [http://www.weizmann.ac.il/RGP_open/postdoc/Weizmann-Postdoc.html Koshland postdoctoral fellow] and later as staff scientist. &lt;br /&gt;
&lt;br /&gt;
His work in the field of computational biology includes designing and applying novel algorithms for the analysis of large-scale biological and medical data, as well as modeling of genetic networks pertaining to the development of the Drosophila embryo and the response of plants to environmental changes. &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
* Permanent Address: Rue du Bugnon 27 - DGM 023 - CH-1005 Lausanne - Switzerland&lt;br /&gt;
* Phone at work: +41-21-692-5452&lt;br /&gt;
* Cell phone: +41-78-663-4980&lt;br /&gt;
* e-mail: Sven.Bergmann_AT_unil.ch&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Sven is currently on Sabbatical leave at the [http://www.babraham.ac.uk Babraham Insitute] hosted by [http://drupal.lenoverelab.org/ Dr. Nicolas Le Novère].&lt;br /&gt;
&lt;br /&gt;
* Address during Sabbatical: Babraham Institute, Cambridge, CB22 3AT, United Kingdom&lt;br /&gt;
* Phone at work: +44-1223-49-6308&lt;br /&gt;
* Cell phone: +44-7901-27-8292&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
PS: Do you know how to get ''smoothly'' from A to B? Well, you just need to minimize a functional expression, see this [[http://arxiv.org/PS_cache/physics/pdf/0105/0105039v1.pdf paper]] for details!&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann</id>
		<title>Sven Bergmann</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann"/>
				<updated>2012-10-12T16:27:01Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Bulletins]]&lt;br /&gt;
&amp;lt;newstitle&amp;gt;Sven Bergmann is Associate Professor&amp;lt;/newstitle&amp;gt;&lt;br /&gt;
&amp;lt;teaser&amp;gt;&lt;br /&gt;
Sven Bergmann has successfully completed his&lt;br /&gt;
tenure-track as Assistant Professor and is Associate Professor since August&lt;br /&gt;
2010.&lt;br /&gt;
&amp;lt;date&amp;gt;1 Aug 2010 — 9:12&amp;lt;/date&amp;gt;&lt;br /&gt;
&amp;lt;/teaser&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:Sven_pic.jpg|200px|thumb|left|Sven Bergmann, PI]] Sven Bergmann heads the [http://www2.unil.ch/cbg ''Computational Biology Group''] in the [http://www.unil.ch/dgm Department of Medical Genetics] at the [http://www.unil.ch University of Lausanne]. He joined the [http://www.unil.ch/fbm Faculty of Biology and Medicine] in 2005 as Assistant Professor and became Associate Professor in 2010 after successfully completing his tenure track. He is also affiliated with the [http://www.isb-sib.ch/ Swiss Institute of Bioinformatics] since 2006.&lt;br /&gt;
&lt;br /&gt;
Sven studied theoretical particle physics with [http://www.weizmann.ac.il/home/ftnir Prof. Yosef Nir] at the [http://www.weizmann.ac.il Weizmann Institute of Science] (Israel) where he received his PhD in 2001 for [http://www-spires.slac.stanford.edu/spires/find/hep/www?rawcmd=find+author+bergmann%2C+s+and+not+author+storchi&amp;amp;FORMAT=WWW&amp;amp;SEQUENCE= studies of neutrino ascillations and CP violation]. He then joined the laboratory of [http://barkai-serv.weizmann.ac.il/GroupPage/ Prof. Naama Barkai] in the Department of Molecular Genetics at the same institute, where he first worked as a [http://www.weizmann.ac.il/RGP_open/postdoc/Weizmann-Postdoc.html Koshland postdoctoral fellow] and later as staff scientist. &lt;br /&gt;
&lt;br /&gt;
His work in the field of computational biology includes designing and applying novel algorithms for the analysis of large-scale biological and medical data, as well as modeling of genetic networks pertaining to the development of the Drosophila embryo and the response of plants to environmental changes. &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
* Permanent Address: Rue du Bugnon 27 - DGM 023 - CH-1005 Lausanne - Switzerland&lt;br /&gt;
* Phone at work: +41-21-692-5452&lt;br /&gt;
* Cell phone: +41-78-663-4980&lt;br /&gt;
* e-mail: Sven.Bergmann_AT_unil.ch&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Sven is currently on Sabbatical leave at the [http://www.babraham.ac.uk Babraham Insitute] hosted by [http://drupal.lenoverelab.org/ Dr. Nicolas Le Novère].&lt;br /&gt;
&lt;br /&gt;
* Address during Sabbatical: Babraham Institute, Cambridge, CB22 3AT, United Kingdom&lt;br /&gt;
* Phone at work: +44-1223-49-6308&lt;br /&gt;
* Cell phone: +44-7901-27-8292&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
PS: Do you know how to get ''smoothly'' from A to B? Well, you just need to minimize a functional expression, see this [[http://arxiv.org/PS_cache/physics/pdf/0105/0105039v1.pdf paper]] for details!&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann</id>
		<title>Sven Bergmann</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann"/>
				<updated>2012-10-12T16:26:15Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Bulletins]]&lt;br /&gt;
&amp;lt;newstitle&amp;gt;Sven Bergmann is Associate Professor&amp;lt;/newstitle&amp;gt;&lt;br /&gt;
&amp;lt;teaser&amp;gt;&lt;br /&gt;
Sven Bergmann has successfully completed his&lt;br /&gt;
tenure-track as Assistant Professor and is Associate Professor since August&lt;br /&gt;
2010.&lt;br /&gt;
&amp;lt;date&amp;gt;1 Aug 2010 — 9:12&amp;lt;/date&amp;gt;&lt;br /&gt;
&amp;lt;/teaser&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:Sven_pic.jpg|200px|thumb|left|Sven Bergmann, PI]] Sven Bergmann heads the [http://www2.unil.ch/cbg ''Computational Biology Group''] in the [http://www.unil.ch/dgm Department of Medical Genetics] at the [http://www.unil.ch University of Lausanne]. He joined the [http://www.unil.ch/fbm Faculty of Biology and Medicine] in 2005 as Assistant Professor and became Associate Professor in 2010 after successfully completing his tenure track. He is also affiliated with the [http://www.isb-sib.ch/ Swiss Institute of Bioinformatics] since 2006.&lt;br /&gt;
&lt;br /&gt;
Sven studied theoretical particle physics with [http://www.weizmann.ac.il/home/ftnir Prof. Yosef Nir] at the [http://www.weizmann.ac.il Weizmann Institute of Science] (Israel) where he received his PhD in 2001 for [http://www-spires.slac.stanford.edu/spires/find/hep/www?rawcmd=find+author+bergmann%2C+s+and+not+author+storchi&amp;amp;FORMAT=WWW&amp;amp;SEQUENCE= studies of neutrino ascillations and CP violation]. He then joined the laboratory of [http://barkai-serv.weizmann.ac.il/GroupPage/ Prof. Naama Barkai] in the Department of Molecular Genetics at the same institute, where he first worked as a [http://www.weizmann.ac.il/RGP_open/postdoc/Weizmann-Postdoc.html Koshland postdoctoral fellow] and later as staff scientist. &lt;br /&gt;
&lt;br /&gt;
His work in the field of computational biology includes designing and applying novel algorithms for the analysis of large-scale biological and medical data, as well as modeling of genetic networks pertaining to the development of the Drosophila embryo and the response of plants to environmental changes. &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
* Permanent Address: Rue du Bugnon 27 - DGM 023 - CH-1005 Lausanne - Switzerland&lt;br /&gt;
* Phone at work: +41-21-692-5452&lt;br /&gt;
* Cell phone: +41-78-663-4980&lt;br /&gt;
* e-mail: Sven.Bergmann_AT_unil.ch&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
&lt;br /&gt;
Sven is currently on Sabbatical leave at the [http://www.babraham.ac.uk Babraham Insitute] hosted by [http://drupal.lenoverelab.org/ Dr. Nicolas Le Novère].&lt;br /&gt;
&lt;br /&gt;
* Address during Sabbatical: Babraham Institute, Cambridge, CB22 3AT, United Kingdom&lt;br /&gt;
* Phone at work: +44-1223-49-6308&lt;br /&gt;
* Cell phone: +44-7901-27-8292&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
PS: Do you know how to get ''smoothly'' from A to B? Well, you just need to minimize a functional expression, see this [[http://arxiv.org/PS_cache/physics/pdf/0105/0105039v1.pdf paper]] for details!&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann</id>
		<title>Sven Bergmann</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann"/>
				<updated>2012-10-12T16:25:19Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Bulletins]]&lt;br /&gt;
&amp;lt;newstitle&amp;gt;Sven Bergmann is Associate Professor&amp;lt;/newstitle&amp;gt;&lt;br /&gt;
&amp;lt;teaser&amp;gt;&lt;br /&gt;
Sven Bergmann has successfully completed his&lt;br /&gt;
tenure-track as Assistant Professor and is Associate Professor since August&lt;br /&gt;
2010.&lt;br /&gt;
&amp;lt;date&amp;gt;1 Aug 2010 — 9:12&amp;lt;/date&amp;gt;&lt;br /&gt;
&amp;lt;/teaser&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:Sven_pic.jpg|200px|thumb|left|Sven Bergmann, PI]] Sven Bergmann heads the [http://www2.unil.ch/cbg ''Computational Biology Group''] in the [http://www.unil.ch/dgm Department of Medical Genetics] at the [http://www.unil.ch University of Lausanne]. He joined the [http://www.unil.ch/fbm Faculty of Biology and Medicine] in 2005 as Assistant Professor and became Associate Professor in 2010 after successfully completing his tenure track. He is also affiliated with the [http://www.isb-sib.ch/ Swiss Institute of Bioinformatics] since 2006.&lt;br /&gt;
&lt;br /&gt;
Sven studied theoretical particle physics with [http://www.weizmann.ac.il/home/ftnir Prof. Yosef Nir] at the [http://www.weizmann.ac.il Weizmann Institute of Science] (Israel) where he received his PhD in 2001 for [http://www-spires.slac.stanford.edu/spires/find/hep/www?rawcmd=find+author+bergmann%2C+s+and+not+author+storchi&amp;amp;FORMAT=WWW&amp;amp;SEQUENCE= studies of neutrino ascillations and CP violation]. He then joined the laboratory of [http://barkai-serv.weizmann.ac.il/GroupPage/ Prof. Naama Barkai] in the Department of Molecular Genetics at the same institute, where he first worked as a [http://www.weizmann.ac.il/RGP_open/postdoc/Weizmann-Postdoc.html Koshland postdoctoral fellow] and later as staff scientist. &lt;br /&gt;
&lt;br /&gt;
His work in the field of computational biology includes designing and applying novel algorithms for the analysis of large-scale biological and medical data, as well as modeling of genetic networks pertaining to the development of the Drosophila embryo and the response of plants to environmental changes. &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Sven is currently on Sabbatical leave at the [http://www.babraham.ac.uk Babraham Insitute] hosted by [http://drupal.lenoverelab.org/ Dr. Nicolas Le Novère].&lt;br /&gt;
&lt;br /&gt;
* Permanent Address: Rue du Bugnon 27 - DGM 023 - CH-1005 Lausanne - Switzerland&lt;br /&gt;
* Phone at work: +41-21-692-5452&lt;br /&gt;
* Cell phone: +41-78-663-4980&lt;br /&gt;
* e-mail: Sven.Bergmann_AT_unil.ch&lt;br /&gt;
&lt;br /&gt;
* Address during Sabbatical: Babraham Institute, Cambridge, CB22 3AT, United Kingdom&lt;br /&gt;
* Phone at work: +44-1223-49-6308&lt;br /&gt;
* Cell phone: +44-7901-27-8292&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
PS: Do you know how to get ''smoothly'' from A to B? Well, you just need to minimize a functional expression, see this [[http://arxiv.org/PS_cache/physics/pdf/0105/0105039v1.pdf paper]] for details!&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann</id>
		<title>Sven Bergmann</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann"/>
				<updated>2012-10-12T16:20:37Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Bulletins]]&lt;br /&gt;
&amp;lt;newstitle&amp;gt;Sven Bergmann is Associate Professor&amp;lt;/newstitle&amp;gt;&lt;br /&gt;
&amp;lt;teaser&amp;gt;&lt;br /&gt;
Sven Bergmann has successfully completed his&lt;br /&gt;
tenure-track as Assistant Professor and is Associate Professor since August&lt;br /&gt;
2010.&lt;br /&gt;
&amp;lt;date&amp;gt;1 Aug 2010 — 9:12&amp;lt;/date&amp;gt;&lt;br /&gt;
&amp;lt;/teaser&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:Sven_pic.jpg|200px|thumb|left|Sven Bergmann, PI]] Sven Bergmann heads the [http://www2.unil.ch/cbg ''Computational Biology Group''] in the [http://www.unil.ch/dgm Department of Medical Genetics] at the [http://www.unil.ch University of Lausanne]. He joined the [http://www.unil.ch/fbm Faculty of Biology and Medicine] in 2005 as Assistant Professor and became Associate Professor in 2010 after successfully completing his tenure track. He is also affiliated with the [http://www.isb-sib.ch/ Swiss Institute of Bioinformatics] since 2006.&lt;br /&gt;
&lt;br /&gt;
Sven studied theoretical particle physics with [http://www.weizmann.ac.il/home/ftnir Prof. Yosef Nir] at the [http://www.weizmann.ac.il Weizmann Institute of Science] (Israel) where he received his PhD in 2001 for [http://www-spires.slac.stanford.edu/spires/find/hep/www?rawcmd=find+author+bergmann%2C+s+and+not+author+storchi&amp;amp;FORMAT=WWW&amp;amp;SEQUENCE= studies of neutrino ascillations and CP violation]. He then joined the laboratory of [http://barkai-serv.weizmann.ac.il/GroupPage/ Prof. Naama Barkai] in the Department of Molecular Genetics at the same institute, where he first worked as a [http://www.weizmann.ac.il/RGP_open/postdoc/Weizmann-Postdoc.html Koshland postdoctoral fellow] and later as staff scientist. &lt;br /&gt;
&lt;br /&gt;
His work in the field of computational biology includes designing and applying novel algorithms for the analysis of large-scale biological and medical data, as well as modeling of genetic networks pertaining to the development of the Drosophila embryo and the response of plants to environmental changes. &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Sven is currently on Sabbatical leave at the [http://www.babraham.ac.uk Babraham Insitute] in the group of [http://www.babraham.ac.uk/inositide/lenovere.html Dr. Nicolas Le Novère].&lt;br /&gt;
&lt;br /&gt;
* Permanent Address: Rue du Bugnon 27 - DGM 023 - CH-1005 Lausanne - Switzerland&lt;br /&gt;
* Phone at work: +41-21-692-5452&lt;br /&gt;
* Cell phone: +41-78-663-4980&lt;br /&gt;
* e-mail: Sven.Bergmann_AT_unil.ch&lt;br /&gt;
&lt;br /&gt;
* Address during Sabbatical: Babraham Institute, Cambridge, CB22 3AT, United Kingdom&lt;br /&gt;
* Phone at work: +44-1223-49-6308&lt;br /&gt;
* Cell phone: +44-7901-27-8292&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
PS: Do you know how to get ''smoothly'' from A to B? Well, you just need to minimize a functional expression, see this [[http://arxiv.org/PS_cache/physics/pdf/0105/0105039v1.pdf paper]] for details!&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann</id>
		<title>Sven Bergmann</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann"/>
				<updated>2012-10-12T16:18:16Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Bulletins]]&lt;br /&gt;
&amp;lt;newstitle&amp;gt;Sven Bergmann is Associate Professor&amp;lt;/newstitle&amp;gt;&lt;br /&gt;
&amp;lt;teaser&amp;gt;&lt;br /&gt;
Sven Bergmann has successfully completed his&lt;br /&gt;
tenure-track as Assistant Professor and is Associate Professor since August&lt;br /&gt;
2010.&lt;br /&gt;
&amp;lt;date&amp;gt;1 Aug 2010 — 9:12&amp;lt;/date&amp;gt;&lt;br /&gt;
&amp;lt;/teaser&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:Sven_pic.jpg|200px|thumb|left|Sven Bergmann, PI]] Sven Bergmann heads the [http://www2.unil.ch/cbg ''Computational Biology Group''] in the [http://www.unil.ch/dgm Department of Medical Genetics] at the [http://www.unil.ch University of Lausanne]. He joined the [http://www.unil.ch/fbm Faculty of Biology and Medicine] in 2005 as Assistant Professor and became Associate Professor in 2010 after successfully completing his tenure track. He is also affiliated with the [http://www.isb-sib.ch/ Swiss Institute of Bioinformatics] since 2005.&lt;br /&gt;
&lt;br /&gt;
Sven studied theoretical particle physics with [http://www.weizmann.ac.il/home/ftnir Prof. Yosef Nir] at the [http://www.weizmann.ac.il Weizmann Institute of Science] (Israel) where he received his PhD in 2001 for [http://www-spires.slac.stanford.edu/spires/find/hep/www?rawcmd=find+author+bergmann%2C+s+and+not+author+storchi&amp;amp;FORMAT=WWW&amp;amp;SEQUENCE= studies of neutrino ascillations and CP violation]. He then joined the laboratory of [http://barkai-serv.weizmann.ac.il/GroupPage/ Prof. Naama Barkai] in the Department of Molecular Genetics at the same institute, where he first worked as a [http://www.weizmann.ac.il/RGP_open/postdoc/Weizmann-Postdoc.html Koshland postdoctoral fellow] and later as staff scientist. &lt;br /&gt;
&lt;br /&gt;
His work in the field of computational biology includes designing and applying novel algorithms for the analysis of large-scale biological and medical data, as well as modeling of genetic networks pertaining to the development of the Drosophila embryo and the response of plants to environmental changes. &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Sven is currently on Sabbatical leave at the [http://www.babraham.ac.uk Babraham Insitute] in the group of [http://www.babraham.ac.uk/inositide/lenovere.html Dr. Nicolas Le Novère].&lt;br /&gt;
&lt;br /&gt;
* Permanent Address: Rue du Bugnon 27 - DGM 023 - CH-1005 Lausanne - Switzerland&lt;br /&gt;
* Phone at work: +41-21-692-5452&lt;br /&gt;
* Cell phone: +41-78-663-4980&lt;br /&gt;
* e-mail: Sven.Bergmann_AT_unil.ch&lt;br /&gt;
&lt;br /&gt;
* Address during Sabbatical: Babraham Institute, Cambridge, CB22 3AT, United Kingdom&lt;br /&gt;
* Phone at work: +44-1223-49-6308&lt;br /&gt;
* Cell phone: +44-7901-27-8292&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
PS: Do you know how to get ''smoothly'' from A to B? Well, you just need to minimize a functional expression, see this [[http://arxiv.org/PS_cache/physics/pdf/0105/0105039v1.pdf paper]] for details!&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann</id>
		<title>Sven Bergmann</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Sven_Bergmann"/>
				<updated>2012-10-12T15:40:13Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Bulletins]]&lt;br /&gt;
&amp;lt;newstitle&amp;gt;Sven Bergmann is Associate Professor&amp;lt;/newstitle&amp;gt;&lt;br /&gt;
&amp;lt;teaser&amp;gt;&lt;br /&gt;
Sven Bergmann has successfully completed his&lt;br /&gt;
tenure-track as Assistant Professor and is Associate Professor since August&lt;br /&gt;
2010.&lt;br /&gt;
&amp;lt;date&amp;gt;1 Aug 2010 — 9:12&amp;lt;/date&amp;gt;&lt;br /&gt;
&amp;lt;/teaser&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:Sven_pic.jpg|200px|thumb|left|Sven Bergmann, PI]] Sven Bergmann heads the [http://www2.unil.ch/cbg ''Computational Biology Group''] in the [http://www.unil.ch/dgm Department of Medical Genetics] at the [http://www.unil.ch University of Lausanne]. He joined the [http://www.unil.ch/fbm Faculty of Biology and Medicine] in 2005 as Assistant Professor and became Associate Professor in 2010 after successfully completing his tenure track. He is also affiliated with the [http://www.isb-sib.ch/ Swiss Institute of Bioinformatics] since 2005.&lt;br /&gt;
&lt;br /&gt;
Sven studied theoretical particle physics with [http://www.weizmann.ac.il/home/ftnir Prof. Yosef Nir] at the [http://www.weizmann.ac.il Weizmann Institute of Science] (Israel) where he received his PhD in 2001 for [http://www-spires.slac.stanford.edu/spires/find/hep/www?rawcmd=find+author+bergmann%2C+s+and+not+author+storchi&amp;amp;FORMAT=WWW&amp;amp;SEQUENCE= studies of neutrino ascillations and CP violation]. He then joined the laboratory of [http://barkai-serv.weizmann.ac.il/GroupPage/ Prof. Naama Barkai] in the Department of Molecular Genetics at the same institute, where he first worked as a [http://www.weizmann.ac.il/RGP_open/postdoc/Weizmann-Postdoc.html Koshland postdoctoral fellow] and later as staff scientist. &lt;br /&gt;
&lt;br /&gt;
His work in the field of computational biology includes designing and applying novel algorithms for the analysis of large-scale biological and medical data, as well as modeling of genetic networks pertaining to the development of the Drosophila embryo and the response of plants to environmental changes. &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Sven is currently on Sabbatical leave at the [http://www.babraham.ac.uk Babraham Insitute] in the group of [http://www.babraham.ac.uk/inositide/lenovere.html Dr. Nicolas Le Novère].&lt;br /&gt;
&lt;br /&gt;
* Permanent Address: Rue du Bugnon 27 - DGM 023 - CH-1005 Lausanne - Switzerland&lt;br /&gt;
* Phone at work: +41-21-692-5452&lt;br /&gt;
* Cell phone: +41-78-663-4980&lt;br /&gt;
* e-mail: Sven.Bergmann_AT_unil.ch&lt;br /&gt;
&lt;br /&gt;
* Address during Sabbatical: Babraham Institute, Cambridge, CB22 3AT, United Kingdom&lt;br /&gt;
* Phone at work: +44-1223-49 6308&lt;br /&gt;
* Cell phone: +44-7901-278292&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
PS: Do you know how to get ''smoothly'' from A to B? Well, you just need to minimize a functional expression, see this [[http://arxiv.org/PS_cache/physics/pdf/0105/0105039v1.pdf paper]] for details.&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=File:CBG_picture_2011.jpg</id>
		<title>File:CBG picture 2011.jpg</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=File:CBG_picture_2011.jpg"/>
				<updated>2012-06-22T14:25:11Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Pictures</id>
		<title>Pictures</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Pictures"/>
				<updated>2012-06-22T14:24:42Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Homepage]]&lt;br /&gt;
&lt;br /&gt;
== 2011 ==&lt;br /&gt;
&lt;br /&gt;
[[Image:CBG picture 2011.jpg|900px]]&lt;br /&gt;
&lt;br /&gt;
Le Mont-sur-Lausanne, 1. May 2011, left to right: Zoltan, Micha, Ndya, Armand, Diana, Sven, Aitana, Andrea, Rico, Sascha&lt;br /&gt;
&lt;br /&gt;
== 2010 ==&lt;br /&gt;
&lt;br /&gt;
[[Image:CBG picture 2010.jpg|900px]]&lt;br /&gt;
&lt;br /&gt;
Lausanne, 11. Jan. 2010, left to right: Armand, Micha, Sascha, Tim, Bastian (jumping top row); Karen, Aitana, Diana (middle), Barbara, Gabor, Zoltán (bottom); Sven (lying :-)&lt;br /&gt;
&lt;br /&gt;
== 2008 ==&lt;br /&gt;
&lt;br /&gt;
[[Image:CBG picture 2008.jpg|900px]]&lt;br /&gt;
&lt;br /&gt;
Lausanne, 16. Oct. 2008, left to right: Zoltán, Micha, Aitana, Sven, Diana (front), Barbara (back), Bastian (front), Karen (back), Alain, Toby (back), Gabor&lt;br /&gt;
&lt;br /&gt;
== 2007 ==&lt;br /&gt;
&lt;br /&gt;
[[Image:CBG picture 2007.jpg|900px]]&lt;br /&gt;
&lt;br /&gt;
Lausanne, 5. Nov. 2007, left to right: Aitana, Diana, Bastian, Zoltán, Sven, Alain (front), Toby (back)&lt;br /&gt;
&lt;br /&gt;
== 2006 ==&lt;br /&gt;
&lt;br /&gt;
[[Image:CBG picture 2006.jpg|900px]]&lt;br /&gt;
&lt;br /&gt;
Gryon, 7. Sep. 2006, left to right: Sven, Alain, Diana, Zoltán, Bastian&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=Dates_%26_Deadlines_2012</id>
		<title>Dates &amp; Deadlines 2012</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=Dates_%26_Deadlines_2012"/>
				<updated>2012-02-23T10:26:34Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* '''First ''kick-off'' meeting''': Friday, 24. February 2012 at 9:00 (Spengler Authorium, Pathology building, Bugnon 25)&lt;br /&gt;
** The concept of the course will be explained (~15 min, [[User:Micha|Micha Hersch]])&lt;br /&gt;
** The projects will be briefly presented by the respective supervisors (~15 min for each supervisor)&lt;br /&gt;
** The course wiki will be explained (~15 min, [[User:Tim|Tim Hohm]])&lt;br /&gt;
&lt;br /&gt;
* Students need &lt;br /&gt;
** to '''confirm their participation''' in this course by: Tuesday, 28. February 2012 (midnight),&lt;br /&gt;
** to '''form study groups''' of 2-3 students and &lt;br /&gt;
** to announce by e-mail the '''three projects they are most interested in working on'''&lt;br /&gt;
&lt;br /&gt;
* Projects will be assigned no later than: Thursday, 1. March 2012&lt;br /&gt;
&lt;br /&gt;
* The first session with the supervisors will arranged per e-mail&lt;br /&gt;
&lt;br /&gt;
* ''free working &amp;amp; weekly meetings with supervisors'': until the intermediate meeting&lt;br /&gt;
&lt;br /&gt;
* '''[[Intermediate report meeting 2012]]''': Friday, 30. March 2012 at 9:00 (room 105 in Bugnon 27)&lt;br /&gt;
* Students will present their projects highlighting&lt;br /&gt;
** the goals they want to achieve&lt;br /&gt;
** the analysis tools they will use&lt;br /&gt;
** potential challenges &lt;br /&gt;
** preliminary results (if any) &lt;br /&gt;
&lt;br /&gt;
* ''free working &amp;amp; weekly meetings with supervisors'': until end of May 2012 &lt;br /&gt;
&lt;br /&gt;
* '''[[Final report meeting 2012]]''':  Friday, 1.June 2012 at 9:00 (room 105 in Bugnon 27)&lt;br /&gt;
** Each study group presents their results (Presentations should take no longer than 20 minutes and are the basis for the student evaluation)&lt;br /&gt;
** Feedback on the course in general&lt;br /&gt;
&lt;br /&gt;
(Meetings will be arranged freely between students and supervisors.)&lt;br /&gt;
&lt;br /&gt;
* back to [[UNIL BSc course: &amp;quot;Solving Biological Problems that require Math 2012&amp;quot;]]&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=UNIL_BSc_course:_%22Solving_Biological_Problems_that_require_Math_2012%22</id>
		<title>UNIL BSc course: &quot;Solving Biological Problems that require Math 2012&quot;</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=UNIL_BSc_course:_%22Solving_Biological_Problems_that_require_Math_2012%22"/>
				<updated>2012-02-21T16:47:33Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* Coordinators: [[Sven Bergmann]] and [[User:Micha|Micha Hersch]]&lt;br /&gt;
* [[Concept]]&lt;br /&gt;
* [[Useful tools]]&lt;br /&gt;
* [[Dates &amp;amp; Deadlines 2012]]&lt;br /&gt;
* Projects&lt;br /&gt;
** [[Micha: TBA]]&lt;br /&gt;
** [[Sascha: TBA]]&lt;br /&gt;
** [[Nicolas/Anna: TBA]]&lt;br /&gt;
** [[Can environmental factors modify the effect of genes influencing blood pressure?]]&lt;br /&gt;
** [[Genetic determinants of eating patterns]]&lt;br /&gt;
* Supervisors (to be confirmed)&lt;br /&gt;
** [[Murielle Bochud]]&lt;br /&gt;
** [[User:Sascha|Sascha Dalessi]]&lt;br /&gt;
** [[User:Zoltan|Zoltan Kutalik]]&lt;br /&gt;
** [[User:Diana|Diana Marek]]&lt;br /&gt;
** [[User:PedroMarquesVidal|Pedro Marques-Vidal]]&lt;br /&gt;
** [[User:Micha|Micha Hersch]]&lt;br /&gt;
** [[User:NicolasSalamin|Nicolas Salamin]]&lt;br /&gt;
** [[User:Anna|Anna Kostikova]]&lt;br /&gt;
&lt;br /&gt;
* see also [[UNIL BSc course: &amp;quot;Solving Biological Problems that require Math 2011&amp;quot;]]&lt;br /&gt;
* [[How to upload files and edit your project page]]&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=UNIL_BSc_course:_%22Solving_Biological_Problems_that_require_Math_2012%22</id>
		<title>UNIL BSc course: &quot;Solving Biological Problems that require Math 2012&quot;</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=UNIL_BSc_course:_%22Solving_Biological_Problems_that_require_Math_2012%22"/>
				<updated>2012-02-21T16:46:30Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* Coordinators: [[Sven Bergmann]] and [[User:Micha|Micha Hersch]]&lt;br /&gt;
* [[Concept]]&lt;br /&gt;
* [[Useful tools]]&lt;br /&gt;
* [[Dates &amp;amp; Deadlines 2012]]&lt;br /&gt;
* Projects&lt;br /&gt;
** [[Micha: TBA]]&lt;br /&gt;
** [[Sascha: TBA]&lt;br /&gt;
** [[Nicolas/Anna: TBA]]&lt;br /&gt;
** [[Can environmental factors modify the effect of genes influencing blood pressure?]]&lt;br /&gt;
** [[Genetic determinants of eating patterns]]&lt;br /&gt;
* Supervisors (to be confirmed)&lt;br /&gt;
** [[Murielle Bochud]]&lt;br /&gt;
** [[User:Sascha|Sascha Dalessi]]&lt;br /&gt;
** [[User:Zoltan|Zoltan Kutalik]]&lt;br /&gt;
** [[User:Diana|Diana Marek]]&lt;br /&gt;
** [[User:PedroMarquesVidal|Pedro Marques-Vidal]]&lt;br /&gt;
** [[User:Micha|Micha Hersch]]&lt;br /&gt;
** [[User:NicolasSalamin|Nicolas Salamin]]&lt;br /&gt;
** [[User:Anna|Anna Kostikova]]&lt;br /&gt;
&lt;br /&gt;
* see also [[UNIL BSc course: &amp;quot;Solving Biological Problems that require Math 2011&amp;quot;]]&lt;br /&gt;
* [[How to upload files and edit your project page]]&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	<entry>
		<id>http://www2.unil.ch/cbg/index.php?title=UNIL_BSc_course:_%22Solving_Biological_Problems_that_require_Math_2012%22</id>
		<title>UNIL BSc course: &quot;Solving Biological Problems that require Math 2012&quot;</title>
		<link rel="alternate" type="text/html" href="http://www2.unil.ch/cbg/index.php?title=UNIL_BSc_course:_%22Solving_Biological_Problems_that_require_Math_2012%22"/>
				<updated>2012-02-21T13:03:39Z</updated>
		
		<summary type="html">&lt;p&gt;Sven: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* Coordinators: [[Sven Bergmann]] and [[User:Micha|Micha Hersch]]&lt;br /&gt;
* [[Concept]]&lt;br /&gt;
* [[Useful tools]]&lt;br /&gt;
* [[Dates &amp;amp; Deadlines 2012]]&lt;br /&gt;
* Projects&lt;br /&gt;
** [[Can environmental factors modify the effect of genes influencing blood pressure?]]&lt;br /&gt;
** [[Genetic determinants of eating patterns]]&lt;br /&gt;
* Supervisors (to be confirmed)&lt;br /&gt;
** [[Murielle Bochud]]&lt;br /&gt;
** [[User:Sascha|Sascha Dalessi]]&lt;br /&gt;
** [[User:Zoltan|Zoltan Kutalik]]&lt;br /&gt;
** [[User:Diana|Diana Marek]]&lt;br /&gt;
** [[User:PedroMarquesVidal|Pedro Marques-Vidal]]&lt;br /&gt;
** [[User:Micha|Micha Hersch]]&lt;br /&gt;
** [[User:NicolasSalamin|Nicolas Salamin]]&lt;br /&gt;
** [[User:Anna|Anna Kostikova]]&lt;br /&gt;
&lt;br /&gt;
* see also [[UNIL BSc course: &amp;quot;Solving Biological Problems that require Math 2011&amp;quot;]]&lt;br /&gt;
* [[How to upload files and edit your project page]]&lt;/div&gt;</summary>
		<author><name>Sven</name></author>	</entry>

	</feed>