What could our genomes actually tell about disease risk?


Despite the recent advances in whole-genome sequencing, two recent studies let us think that we are far from uncovering the genetic basis of common diseases risk. In fact, information relevant to complex diseases might hide within rare or even private genome variations, often too scarce to be studied statistically. We might thus have to change radically our way of thinking of genes-diseases associations to make a step forward and make the DNA talk.
Whereas a few, usually rare and severe “genetic disorders” can be traced to variations at one or two locations, or “loci”, in the DNA sequence, most common diseases are the result of complex interactions between protein-coding genes, non-coding DNA and environmental effects. These well-named “complex diseases” include cardiovascular, metabolic, neurologic and psychiatric conditions of great concern to health policies, such as early-onset stroke, myocardial infarction, diabetes, dyslipemia, Alzeihmer’s, bipolar disorder or schizophrenia.
Some of these complex diseases have a high heritability, which means that a great part of individual differences in the probability to develop the disease can be explained by differences in genomes. For example, the heritability of early-onset myocardial infarction is about 60% [1]: genomes are more important than environment in explaining the differences in early-onset infarction between individuals. Thus a lot of work has been going into identifying the changes in DNA sequences involved in complex disease heritability. Especially, the development of new sequencing technologies has allowed for comparison of hundreds of individual sequences and their mapping to various symptoms, a method known as “genome-wide association studies”. Hundreds of disease-related genetic variations have been identified this way. However they explain only a very small fraction of the heritability: in the case of early-onset myocardial infarction, only 2.8% of the heritability has already been linked to particular genes [2].
To explain the low power of association studies to identify genetic variants contributing to complex diseases, it was hypothesized that most variation in disease predisposition were due to “high risk” variants, that have a strong negative impact on health, but remain rare in a population because they are counter-selected [3]. In consequence, we would only need to increase sample size and therefore our power to detect rare variants to better explain the genetic basis of common diseases. In that scope, two studies published in the July issue of Science have used large datasets (respectively 2 440  and 14 002 genomes) to investigate the potential role of rare variants, defined when one of the variants at one locus is present in less than 0.5% of the individuals sampled. The large sample sizes allowed for detection of lots of previously unknown variants, thus highlighting the limits of previous smaller-scale studies: 90% of rare variants, but only 5% of common variants, found in 202 drug-target genes were novel, and estimates of discovery rates showed that lots of new variants are still to discover (Fig. 1). 
Nelson et al., An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People, Science 337, 2012. 

Fig. 1 Number of variants discovered per kilobase of sequence with sample sizes increasing to 5000 people for multiple populations.

The studies also confirmed that variants with an potential impact on health remained rare: the proportion of non-synonymous variants, which result in an alteration of the protein synthesized, was higher in rare than in common variants (Fig. 2).

Nelson et al., An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People, Science 337, 2012. 

Fig. 2 Expected ratios of non-synonymous to synonymous variants in the absence of selection and observed ratios for rare to common alleles, from left to right. MAF (Minor Allele Frequency) is the frequency of the rarest version of a variant.
However, rare variants were found to be more numerous than previously thought: around 90% of variants were rare. Interestingly, individuals of African ancestry exhibited less rare variants, but more variants of intermediate frequency than those of European ancestry. Moreover, most rare variants were population-specific (Fig. 3 and 4) and about 60% of all variants were only present in one individual. 

Casals and Bertranpetit, Human Genetic Variation, Shared and Private,  Science 337, 2012. Data from Tennessen et al., Science 2012.

Fig. 3 Proportion of shared and unshared (private) variants between the African-American and the European-American populations.

Nelson et al., An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People, Science 2012. 
Fig. 4 Allele sharing and variant abundance. (A-C) The average allele sharing between pairs of populations for rare (A), intermediate (B) and common (C) variants computed as the frequency in the pooled population pair. (D) The number of variants per kilobase found in population samples of 2,500 individuals.
Such figures are in contradiction with the current estimates derived from recent population growth. In fact, human demography is currently described by the “Out-of-Africa” model, that posits an emergence of European and Asian populations from a small population in Africa about 60,000 years ago [4]. As the individuals that migrated represented only a small fraction of the ancestral African population, some genetic variants, especially the rarest one, were lost. Then African and European populations were supposed to increase regularly, all the while acquiring new population-specific variants by mutation that would increase in frequency only if they are not deleterious, or else eventually disappear. Such “bottleneck effect” can be observed in the Finns that have less variants, but more population specific variants than other Europeans (Fig. 4). But the overall excess in rare variants in Europeans does not fit the model: most of these variants should have either disappeared or increased in frequency over such a time-scale. Such pattern can however be explained by accelerated population growth in the last thousands year, during which lots of new mutations could occur in a short time (Fig. 5). Therefore, rare variants provide a precious insight into recent demography.
Tennessen et al., Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes, Science 2012.
Fig. 5 Schematic representation (not to scale) of the inferred demographic model. kya, thousand years ago.
These findings are not very good news for complex disease research. Of course, the rare variants discovered in protein-coding genes are numerous and often deleterious, and could therefore play an important role in disease risk. However they rarity makes it difficult to actually test that role: over thousands of genomes, less than 5% protein-coding genes afforded a sufficient power to detect the effect of rare variations on disease risk, even when that effect is relatively strong. Consistently, no significant association was found for 202 drug-target genes. Moreover, most variants are population- or even individual-specific. Thus association studies should be at least replicated across populations, with a careful determination of ancestry, to be universal and avoid false associations between population-specific traits and variants.
The high number of individual-specific variants and the low power of association for other rare variants highlight the importance of genome-wide functional studies to accurately estimate disease risk where association studies fail. Functional predictions might be the next prevailing tool in the study of genes and disease association. Ideally, such studies would directly estimate the functional impact of a given variant, but the methods currently implemented are rather inconsistent and have a high false-positive rate, that is they often detect a functional impact where there is none. Such caveats make them still unsuitable for applied uses in medical diagnosis. A better knowledge of molecular biology and its link to physiology seems still necessary to assess the actual impact of rare variants on complex diseases.

1       Nora J.J. et al. (1980). Genetic-epidemiologic study of early onset ischemic heart disease. Circulation 61:503 – 508.
2       Myocardial Infarction Genetics Consortium (2009). Genome-wide association of early-onset myocardial infarction with single-nucleotide polymorphisms and copy number variants. Nature Genetics 41:334-341.
3       Manolio T.A. et al. (2009). Finding the missing heritability of complex diseases. Nature 461:747-753.
4       Laval G. et al. (2010). Formulating a historical and demographic model of recent human evolution based on resequencing data from noncoding regions. PLoS ONE 5(4): e10284.

Nelson et al. (2012). An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People. Science, 337, 100-104 DOI: 10.1126/science.1217876

Tennessen et al. (2012). Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes. Science, 337, 64-69 DOI: 10.1126/science.1219240

Casals and Bertranpetit (2012). Human Genetic Variation, Shared and Private Science, 337, 39-40 DOI: 10.1126/science.1224528