Human – Tutorial Genomics, Ecology, Evolution, etc https://wp.unil.ch/genomeeee Blog of a tutorial of Ecole doctorale de biologie UNIL Mon, 08 Nov 2021 16:12:34 +0000 en-US hourly 1 https://wordpress.org/?v=5.8.1 Reconstructing prehistoric African population structure https://wp.unil.ch/genomeeee/2017/11/21/reconstructing-prehistoric-african-population-structure/ Tue, 21 Nov 2017 16:39:15 +0000 http://wp.unil.ch/genomeeee/?p=886 INTRODUCTION

The highest genetic diversity in humans is found in Africa, in line with Africa being the cradle of humanity. While the three articles we discussed previously during this tutorial (1,2,3) mainly focused on determining the most parsimonious “out-of-Africa” scenarios based on genetic diversity data, this article (Skoglund et al. 2017 4) investigates the population structure of Africa prior to the expansion of food producers (i.e. herders and farmers). In order to reconstruct the prehistoric population structure, the authors analyzed the genomes from 16 ancient African individuals who lived up to 8100 years ago (including 15 newly sequenced genomes), as well as SNP genotypes from 584 present-day Africans, and 300 high coverage genomes from 142 worldwide populations. This is the first study to gather and analyze such a high number of ancient genomes, thereby providing an unpreceded insight into the prehistoric human population structure.

RESULTS

An ancient cline of southern and eastern African hunter-gatherers

The authors used principal component analysis (PCA) and automated clustering in order to relate the 16 ancient individuals to present-day sub-Saharan Africans. This reveals that while the two ancient South African individuals share ancestry with present-day South Africans (Khoe-San), 11 of the 12 ancient individuals living in eastern and south-central Africa between ?8100 and ?400 years BP form a gradient of relatedness to the eastern African Hadza on one extremity and to Khoe-San on the other. This genetic cline is also correlated with geography along a North-South axis. Another pattern which emerged from this analysis is the lack of heterogeneity between the seven ancient individuals from Malawi, indicating a long-standing and distinctive population in ancient Malawi which persisted for at least 5000 years but which is extinct today.

Subsequently, the authors built a model where ancient and present-day African population trace their ancestry to a putative set of nine ancestral populations. They then used data from both ancient and present-day populations showing substantial ancestry to major lineages present in Africa today as proxies for these ancestral populations. These proxy populations consisted of three ancient Near Eastern populations representative of Anatolia, the Levant and Irak, respectively, and six African populations representative of different components of ancestry (western African, southern African before agriculture, northeastern African before agriculture, central African rainforest  hunter-gatherer, eastern African early pastoralist context and distinctive ancestry found in Nilotic speakers today). By using qpAdm (a generalization of f4 symmetry statistics), they tested for 1-, 2- or 3-source models and admixture proportions for all other ancient and present-day African populations, with a set of 10 non-African populations as outgroups. We note that the f4 statistics are poorly explained in this article, making it hard for a non-initiated reader to grasp its meaning and the relevance of the results. The main finding from this analysis is that ancestry closely related to the ancient southern Africans was present much farther north and east in the past than is apparent today.

Displacement of forager populations in eastern Africa

Unsupervised clustering and formal ancestry estimation both indicate that present-day Hadza in Tanzania can be modeled as deriving all their ancestry from a lineage related to ancient eastern Africans such as Ethiopia_4500BP. However the contribution of this lineage to present-day Bantu speakers in eastern Africans is small, who instead trace their ancestry to a lineage related to present-day western Africans and additional ancestry components. In present-day Malawians, population replacement by incoming food producers seems to have been almost complete as witnessed by a near absence of ancestry from the ancient individuals sampled, and by most of their ancestry coming from the Bantu expansion of western African origin.

Importantly, of all ancient individuals analyzed, only a 600 BP individual from Zanzibar has a genetic profile similar to present-day Bantu speakers, with even more western African ancestry. Using linkage disequilibrium, the authors estimate that the admixture between western- and eastern-African-related lineages occurred 800-400 years ago. This indicates that there was genetic isolation between early farmers and previously established foragers during the Bantu expansion into eastern Africa, and that this barrier disappeared over time as mixture occurred. However this delayed admixture did not occur in all African populations, as shown in present-day Malawians who display no signs of admixture from previously established hunter-gatherers.

Early Levantine farmer-related admixture in a ?3100-year-old pastoralist from Tanzania

The authors compared estimated the ancestry component from a 3100 BP individual from Tanzania and found that 38% of her ancestry was related to the pre-pottery farmers of the Levant (10000 BP), indicating a critical contribution of Levant-Neolithic-related populations to present-day eastern Africans. The best fitting ancestry component model in Somali indicates that they have ancestry from the 3100 BP Tanzanian individual but also Dinka-related ancestry as well as 16% ancestry related to Iranian-Neolithic-related ancestry. This suggests that ancestry related to the Iranian Neolithic appeared in eastern Africa after an earlier gene flow related to Levant Neolithic populations.

Direct evidence of migration bringing pastoralism to eastern and southern Africa

All three ancient southern Africans show affinities to the ancestry predominant in present-day Tuu speakers in the southern Kalahari. Among them, the 1200 BP sample from western Cape found in a pastoralist context has a similar ancestry composition as present-day pastoralists like the Nama, with affinity to three groups: Khoe-San, western Eurasians and eastern Africans. This is in line with the hypothesis of a non-Bantu-related population transporting eastern African and Levantine ancestry to southern Africa by at least 1200 BP. Using their model to determine the proportions of different ancestries present in western cape 1200 BP, they find mainly a mixture of non-southern African population. This is consistent with the hypothesis that the Savanna Pastoral Neolithic archaeological tradition in eastern Africa is a possible source for the spread of herding to southern Africa.

The earliest divergences among modern human populations

Previous studies indicate that the primary ancestry in the San population (southern Africa) comes from a lineage that separated from all other lineages present in modern humans, before separation of the different modern human lineages. While Skoglund et al. obtain a similar model in absence of admixture, the tree-like representation is a poor fit since ancient southern Africans (2000 BP) were not strictly an outgroup of all other African populations and several examples also show inconsistencies with this model. In order to find models that fit the data, the authors performed admixture graph modeling of the allele frequency correlations and found two parsimonious models. In the first one, present-day western Africans have ancestry from a basal African lineage that contributed more to the Mende than in did to the Yoruba, with the other source of western African ancestry being related to eastern Africans and non-Africans. In the second model, gene flow over long periods of time and over long distances has connected southern and eastern Africa to other groups in western Africa.

A selective sweep targeting a taste receptor locus in southern Africa

The authors then searched for the genomic signature of natural selection in ancient genomes, by searching for regions of greater allele frequency differentiation between ancient and present-day populations than predicted by the genome-wide background. To do this, the researchers compared the two ancient southern African genomes (2000 BP) to six present-day San genomes with minimal recent mixture. Since the small number of ancient genomes does not allow to infer changing allele frequencies at single loci, a scan for high allele frequency differentiation was conducted in 500 kb windows using 10kb steps. This led to the identification of the most differentiated locus which overlapped a cluster of eight taste-receptor genes. Although it is reported that taste receptors have already been identified as targets of natural selection as they affect the ability to detect poisonous compounds in plants, we must be wary that any analysis is bound to find something with such huge datasets, and that the biological interpretation of such finding may not be as straight-forward.

Polygenic adaptation

Skoglund et al. tested for evidence of selection on specific functional gene categories between present-day San and the two ancient genomes from southern Africa using allele frequency differentiation estimation. The functional category with the most extreme allele frequency differentiation between present-day San and the ancient southern Africans corresponded to “response to radiation”. In order to control that this was not a general inflated allele frequency differentiation, the same statistics were used using the Mbuti central African rainforest hunter-gatherer for which no enrichment for “response to radiation” was found. Instead, the top category for Mbutis was “response to growth”. Based on this, the authors speculate that the small stature of hunter-gatherer populations may be an acquired adaptation.

 

CONCLUSION

This study brings a first and unique view on the genetic makeup of prehistoric Africans. It is indeed a feat realized by 44 authors from institutions in 11 countries, which take advantage of 15 newly sequenced ancient genomes in addition to the only one that was previously available. The results indicate that an ancient lineage related to the San had a wider distribution in the past, depict two plausible scenarios of gene flow that led to the earliest divergences among modern populations and give new insights into the spread of herding and farming within Africa . As a side note, we noticed that all ancient individuals come from eastern or southern Africa, probably because this is where conditions were most favorable for the conservation of these ancient remains, although this could also introduce some biases, it seems to be the only possible way to go.

REFERENCES

  1. Pagani et al 2016 Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538: 238–242 (corresponding blog post)
  2. Mallick et al. 2016 The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538: 201–206 (corresponding blog post)
  3. Malaspinas et al 2016 A genomic history of Aboriginal Australia. Nature 538: 207–214 (corresponding blog post)
  4. Skoglund et al 2017 (and references therein) Reconstructing Prehistoric African Population Structure. Cell 171: 59–71.e21
]]>
A journey through the The Simons Genome Diversity Project: more genomes sequenced, more diverse populations https://wp.unil.ch/genomeeee/2017/11/13/a-journey-through-the-the-simons-genome-diversity-project-more-genomes-sequenced-more-diverse-populations/ Mon, 13 Nov 2017 11:46:14 +0000 http://wp.unil.ch/genomeeee/?p=857 Introduction

Since the first genome of Bacteriophage MS21 was completely sequenced, in 1976, until 2001 when the first draft of human genome2 was released, a lot of work was done to improve and to make accessible different methods to get inside of the genetics of various organisms. For human genome, this step was a very important one and the Human Genome Project was declared complete in 20033. During the last years, more and more projects are involved in deciphering the human wanderlust. To all of previous studies, we can add The Simons Genome Diversity Project, that brought us more information by sequencing 300 new genomes from 142 diverse populations. One of the aim was to chose populations that differ in genetics, language and culture. The study shows that some of the populations separated 100000 years ago and reveals more information about the ancestors of Australian, New Guinean and Andamanese people.

Results

One of the most important thing in discovering the real human peopling of the Earth is to sequence as many as possible genomes, but from individuals coming from diverse populations, that could differ in many aspects. In this study, the 300 samples were prepared by using PCR-free library, through Illumina Ltd. method and the median coverage it was 42-fold (Figure S1.1; Supplementary Data Table 1). The method is using an improved genome coverage to identify the greatest number of variants with some of them previously reported. The single-sample genotypes was made by using the reference-bias free modification of GATK, but the some preprocessing was conducted for eliminating some adapter sequences. For increasing the data accuracy, it was used a filtering system, highly specific to the SGDP dataset. The levels are from 0 to 9 for each sample as a single character and the first level is the best for having a good balance between sensitivity and low error rate, but level 9 is good to be used when there is needed to low the errors rates (Figure S2.1).

The first part of the study is offering us more information about the time needed for the worldwide populations subjected to the study to get separated. The pairwise sequential Markovian coalescent (PSMC) and multiple sequentially Markovian coalescent (MSMC) was used to interpret the changes in size of the populations and the split time, the phased haplotypes of split time estimation were made by using the SHAPEIT and IMPUTE2. The filter used was the level 1. From the Figure 2a we can see evidence about the ancestors of some present populations that were isolated by at least 100kya, that could be an obstacle of certain mutations across the ancestors of all populations. The gene flow continued until around 50kya among the great majority of ancestral populations. The graphs show the moments when the substructure of different populations starts: in the Figure 2a, we can see that the substructure between french and africans start around 200 kya. In the next ones there is a comparison between only africans (the Yoruba separated from KhoeSan 87kya, from Mbuti 56kya and from the Dinka 19kya) or only non-Africans (the oldest substructure is from 50kya, taking part during or shortly after the deepest part of the shared non-African bottleneck 40-60kya). For the Figure 2d-f, it was used the PSMC and PS1 that show the effective population sizes inferred and the cross-coalescence rates inferred.

By using the neighbours-joining tree (pairwise divergence per nucleotide) and FST, Mallick et. al could reconfirm the previous studies regarding the fact that the deepest splits happened among the Africans. Previous studies showed that all non-Africans today possess Neanderthal ancestry and Figure 1c shows that the higher proportion of Neanderthal ancestry we can find it in East Asians. If we compare the EuroAsians between them, the South Asians have highest Denisovan ancestry (heatmap from Figure 1d). Another result is that there are more Denisovan ancestry in eastern than in western EuroAsians. If we take Australia, New Guinea or Oceania we can see that the results from other studies are confirmed by having more ancestry than in mainland Eurasians. In the Figure 3 the deeper the split is, the more divergent is the early dispersal ancestry. By using the cross-population coalescence pattern and allele frequency correlations, the best model is that the Australian, New-Guinean and Andamanese history doesn’t involve ancestry from an early- diverging source. In this study there is no archeological data taken in consideration regarding southern Asia or Australia. So, by using only the data from this study, it is released that the Australians, New Guineans and Andamanese are lacking in an analogous deep ancestry component. All the data referring to Australians seems to be consistent with descending
from a common homogeneous population since separation from New Guineans. Also, New Guineans, Australians and Andamanese appear as part of an eastern clade together with mainland EastAsians.

The 3P-CLR was used to scan the genome for positive selection. In the end, 38 of the largest peaks emerged for selection in the common ancestors of all modern humans. These peaks are the sweeps at the time that the archeological data shows an accelerated evidence of behavioral modernity. This data does not search for the sweeps on chromosome X or in repetitive or difficult-to-analyze sections of the genome.

For the rate of mutation accumulation between the non-Africans (grouped in America, CentralAsiaSiberia, EastAsia, WestEurasia, Oceania) and sub-Saharan Africans (grouped in Pygmy, Khoesan and Africa) it was supposed to be quite equal, but this study revealed an significant average of 0,5% difference. For this part, they used a highly restriction to the samples, by choosing only the samples processed in the same way and the highest level of filtering, pooling the samples from the same regions together. The one strength of this experiment is the fact that they avoid the bias due to different heterozygosity level in different populations (the heterozygosity is higher in Africans), by using only the chromosome X for males. Although, they map everything to chimpanzee, which is equally distant to all present populations. There are differences in observations related to other studies, by having a different rate of CCT>CTT mutation, that is close to Africans in Europeans, but not in East Asians. This could be explained by the decrease in generation interval in non-Africans since separation. Previous studies5 showed a higher X-to-autosome heterozygosity ratio in sub-Saharan Africans than in non-Africans. Mallick et al. confirmed this results by adding more populations to be analyzed: Khoesan for sud-Saharan Africans and New Guineans, Australians, Native Americans, Near Easternes and indigenous Siberians for the non-Africans. The only one exception, that showed a lower X-to-autosome heterozygosity ratio in sud-Saharan African than in non-Africans is in Pygmies (eastern Mbuti and western Biaka). In the Figure 1b through a scatterplot we can observe the two primary clusters: sud-Saharan Africans and all other populations, but without a big difference among the groups, except of the Pygmies with a high autosomal heterozygosity. If we compare the two Pygmies populations with a lower X-to-autosome ratio, we can see that the Mbuti are closer to non-Africans than to Africans, even if in the Neighbour-joining tree based on pairwise divergence, they are integrated to the Africans. The reduction of the X-to-autosome ratio in the non-African compared to African populations could be explained by the repeated waves of male mixture in already mixed population, but in the Pygmy populations, the strongest argument is the sex-biased gene flow supported by the anthropological data.

In the last part, Mallick et al.  shows that the non-Africans are presenting a higher accumulation of mutations. This can be explained in two ways: the rate of mutations in non-Africans is increasing by acceleration of it or by a deceleration within Africans. The Extended Data Table 1, shows that none of the populations with strong signals of non-Africans could be in fact a deceleration of Africans. The acceleration in non-Africans could be caused by many possibilities: the life history traits (eg. generation interval) could change after the dispersal of modern humans outside of Africa, increasing the latitudes conquered by the humans or the colder climates, the gene conversion (GC to A or T alleles) was more effective in Africans or a Neanderthal admixture into the ancestors of non-Africans, that could accumulate more mutations than in the modern humans after separation (but there are not clear evidence about this fact).

Conclusion

The Simons Genome Diversity Project is bringing more information by studying 300 new genomes, from 142 diverse populations, that shows an acceleration of accumulation of mutations  in non-Africans compared to Africans. Also, the Pygmies seem to be the only African group with a low X-to-Autosome diversity ratio. Regarding the ancestors, the highest proportion of Neanderthal it was present in EastAsians and an excess of Denisovan in some SouthAsians compared to other Euroasians.

 

  1. Min Jou, W., Haegeman, G., Ysebaert, M., Fiers, W., Nucleotide Sequence of the Gene Coding for Bacteriophage MS2 Coat Protein, Nat., 237, 5350, pp. 82-88, 1972
  2. http://web.ornl.gov/sci/techresources/Human_Genome/project/clinton1.shtml
  3. International HapMap Consortium. The International HapMap Project. Nat., 426,789–796, 2003.
  4. Keinan, A., Mullikin, J. C., Patterson, N. & Reich, D., Accelerated genetic drift on chromosome X during the human dispersal out of Africa, Nat Genet 41, 66­?70, doi:10.1038/ng.303, 2009.
  5. Mallick, S., Li, H., Lipson, M., Mathieson, I., Gymrek, M., Racimo, F., Zhao, M., Chennagiri, N., Nordenfelt, S., Tandon, A. et al., The Simons Genome Diversity Project: 300 genomes from 142 diverse populations, Nat., 538: 201–206, 2016.
]]>
Genomic analyses inform on migration events during the peopling of Eurasia https://wp.unil.ch/genomeeee/2017/10/25/genomic-analyses-inform-on-migration-events-during-the-peopling-of-eurasia/ https://wp.unil.ch/genomeeee/2017/10/25/genomic-analyses-inform-on-migration-events-during-the-peopling-of-eurasia/#comments Wed, 25 Oct 2017 14:05:28 +0000 http://wp.unil.ch/genomeeee/?p=844

Introduction

In the past two decades, considerable research effort has been made to sequence the human genome and subsequently trying to unveil the demographic history underlying the genetic patterns of diversity we observe today across the globe. Here we discuss a recent research article by Pagani et al. 1 that addresses genomic diversity and historic migration patterns of human populations in Eurasia. The first human genome was sequenced in 2003 by the Human Genome Project2 and larger projects rapidly followed, such as HAPMAP3 and the 1000 Genomes Project4, largely due to the considerable technological improvement of sequencing technologies. Despite being extremely useful tools for a number of studies, these genome databases have some important sampling caveats that limit their use to address some particular topics. Indeed, HAPMAP sampled a reduced number of populations whereas the 1000 Genomes sampled a large number of populations but did not attempt to sample individuals of “pure” ancestry. For instance, the sampling in North America focused considerably on city-based individuals that were found to have a very diverse recent ancestry thus blurring the signal of ancient colonisation history. Importantly, in the studied paper, a considerable effort was made on sampling a broad panel of 447 unrelated individuals of pure ancestry from 148 distinct populations, particularly including previously unstudied regions like Siberia and western Asia.

One of the main topics of the demographic history of humans that has long been of interest to researchers is the Out of Africa (OoA) of Anatomically Modern Humans (AMH) – a turning point in which humans dispersed from Africa and colonised Eurasia and ultimately Oceania and the Americas. Among other aspects, the number of OoA events has been the focus of discussion from which two major hypotheses emerged. The first, arguably the most wide-accepted, advocates for a single OoA event estimated at around 40 to 80 kya which gave origin to all extant non-african populations. The second hypothesis, dubbed the multiple-dispersal model5, considers multiple migration waves, more or less successful in settling in new continents, and possible admixture events between them at various points in time, which appears to be supported by previously described fossil evidence6,7. Interestingly, Tucci & Akey8 argue that these theories are not necessarily mutually exclusive but rather complementary as there could have been several failed or low-success OoA events followed by a major one that effectively colonised and subsisted in most continents.

In this study, Pagani et al. argue in favour of a multiple-dispersal scenario based on small remaining genetic contributions in the genomes of extant Papuans from an extinct lineage of AMH OoA earlier than the main OoA 75 kya.

Genetic structure and barriers across space

To obtain the first insight on the genetic structure among the sampled genomes, Pagani et al. employed two different approaches: first, treating SNP as independent markers (with ADMIXTURE9) and second taking into account linkage blocks (with fineSTRUCTURE10). Both strategies identified the major biogeographic groups of populations despite differences in resolution, defining 14 main genetic clusters across the globe (Extended Data Figure 1C). The detailed output from fineSTRUCTURE was interestingly used for a range of analyses from spatial patterns of genetic differentiation (Figure 1), co-ancestry (Extended Data Figure 3) and demographic history reconstruction (Extended Data Figure 7).

Taking advantage of their detailed sampling from Eurasia to Sahul, the authors employed a spatially explicit framework to study genetic differences and gene flow between populations as well as their association with environmental/geographic features at a large scale. Figure 1 illustrates this by representing the magnitude of the gradient of allele frequencies from SNPs across space, allowing to pinpoint the regions of major genetic gradients, i.e. potential barriers to gene flow, specifically mountain ranges, deserts and large water masses. These were consistent in broad strokes among the different analyses with the fineSTRUCTURE output (Figure S2.2.2-I) as well as the complementary migration-based EEMS (Estimating Effective Migration Surfaces; Extended Data Figure 5H). Importantly, the authors tested whether the geographic gaps in their sampling could bias the interpolation of barriers and showed their model remained robust in the face of new gaps (Extended Data Figure 5E-G).

On a second stage, Pagani et al. measured the association between the gradients of allele frequencies (termed as SNPs in Figure 1) and fineSTRUCTURE with three environmental barriers – elevation, temperature and precipitation – to determine the relative importance of the role each played in shaping the genetic patterns observed today. As one can see in the inset of Figure 1, SNPs indicated that elevation and precipitation had a strong spatial correlation with genetic differences whereas fineSTRUCTURE gave higher support to precipitation and temperature. This dissimilarity is likely due to the fact that the latter, as explained above, is dependent on linkage patterns. Linkage blocks are physical associations of loci that recombination renders temporary, unless they are specifically maintained by selection. Thus, current neutral linkage patterns reflect relatively recent demographic history, whereas the bulk of raw allelic frequencies reveal older patterns that influenced the majority of the genome. In the same sense, when taking into account only the rare variants (i.e. more recent), the association of SNPs with elevation was reduced (Figure S2.2.2-II).

The authors conclude these observations by suggesting that elevation contributed to shaping old migration routes (as confirmed by patterns of isolation by distance; Extended Data Figure 5A-C) but has not recently impeded the persistence of human populations. On the other hand, precipitation seems to be of paramount importance as populations continue to this day to avoid inhabiting low-precipitation regions such as deserts.

Despite the credibility of the conclusions, we raised some important questions on the analysis that could bias the interpretation. First, the authors did not address the innate correlation between the environmental variables (ex.: elevation and temperature) nor how or whether it was taken into account. Additionally, it is unclear which time period was used for temperature and precipitation as the study spans 120 thousand years of demographic history. Both these points could change the relative importance of a given variable, and should therefore have been specified clearly in the main text.

Selection screening

The authors scanned the genomes for evidence of purifying and positive selection through a series of different approaches and identified multiple candidate loci, some of which had been identified as targets of positive selection in previous studies. Additionally, the authors highlighted different levels of inter-population purifying selection, such as on olfactory receptor genes in Asians. Interestingly, they identified significantly stronger purifying selection in pigmentation and immune response genes in Africans than in the remaining populations, with the single exception of Papuans for the pigmentation genes (Extended Data Figure 6B). However, the authors did not discuss the possible factors behind such selective forces nor how this section on selection contributed to the main storyline and conclusions of the study.

Demographic history of Papuans

The results of fineSTRUCTURE were summarised with ChromoPainter and revealed very interesting patterns of haplotype co-ancestry and length as well as proportion of shared genome between populations. Leading is the observation that African populations display the highest co-ancestry (Extended Data Figure 3) and the shortest haplotypes (Figure S2.2.1-III), confirming their status as the oldest and most diverse populations. Short haplotypes reflect multiple recombination events through time indicating older ancestry. Thus, the most surprising observation was that Papuans have the shortest average haplotype length of all non-African populations (Figure S2.2.1-III), as well as the shortest African-inherited haplotypes (Extended Data Figure 7), which suggests an older ancestry with Africans than that of the remaining populations.

To investigate this further, the authors used multiple sequential Markovian coalescent (MSMC) to determine mean split times between genomes of Papuans and other populations, and it is represented in Figure 2A. This figure depicts the proportion of genome coalescing between populations over time (in logarithmic axis). However, it is important to take into account that for these calculations they used a generation time of 30 years, whereas the selection scans were done with a 25 years’ generation time. The latter is the most commonly used in the literature and no justification is given for this change. This analysis revealed an old split between the Papuan and African at about 90 kya (represented as Koinanbe in Figure 2A, red line), predating the split between Eurasian and African estimated at 75 kya (black line) and between Papuan and Eurasian at 40 kya (blue line). Despite the possible fluctuation in the absolute split times due to the chosen generation time, the relative differences between them is in line with Papuans harboring high amounts of short haplotypes, all suggesting an older population split than previously thought.

To explain the demographic history behind the observed patterns, the authors propose that a previously unknown admixture event took place in Sahul with either an archaic non-AMH (different from Denisovan and Neanderthal) or with a AMH resulted from an extinct OoA (xOoA). The latter hypothesis, which fits into the multiple-dispersal model explained earlier, would have taken place after the split of AMH with Neanderthal but before the main OoA.

Using coalescent simulations, the authors tried to replicate the split times by adding varying amounts of admixture with a non-AMH or with an AMH from a xOoA. There was no plausible scenario simulated of archaic admixture with non-AMH that could mirror the observed data. On the other hand, including in Papuans a genomic component that diverged from the main human lineage prior to the main OoA, replicated somewhat similar population split times. It is noteworthy that the main text indicates the “observed shift in the African-Papuan MSMC split curve can be qualitatively reproduced” under these conditions. In detail, it obtained a 3ky difference between the Papuan-African and Papuan-Eurasian splits (Figure S2.2.8-III) whereas the observed time-gap between the two is actually 15 kya (Figure 2A). The authors suggest that they may not be able to reach a comparable gap due to higher complexities of the demographic model that were not simulated within this study, such as population expansion and bottlenecks. Although this explanation appears reasonable, we believe it ought to have been made clear in the main text of the article.

To discern the weight of admixture with non-AMH, the authors masked putatively introgressed Denisovan haplotypes in Papuan genomes, which did not change the split times estimated between Papuans and the other populations (dashed lines in Figure 2A). Furthermore, the authors confirmed that MSMC behaved linearly through multiple events of admixture by studying populations with known admixture proportions in time (African Americans and Central and East Asians; Extended Data Figure 8), which allowed the calculation that the hypothesized xOoA would have split from most Africans around 120 kya (Supplementary Information 2.2.4).

On a supplementary line of examination, Pagani et al. looked at the age of African haplotypes in Papuans not present in other Eurasian populations by accessing the density of non-African alleles (nAAs) within them. The rationale behind this lies on the assumption that the rate of accumulation of nAAs, i.e. alleles not found among African genomes, within a haplotype of determined African origin in a non-African genome is proportional to the split date of that given population with Africans. First, this analysis revealed that Papuans had an overall higher amount of nAAs within African haplotypes along the genome than Eurasians (Figure 2B), indicating an older coalescent time with the Africans. Further, the proportions of nAAs within African haplotypes in Papuans were modeled under demographic scenarios of single and multiple-dispersal. The results showed that a xOoA of AMH that split around 120 kya from Africans was necessary to explain the constant elevated proportions of nAAs in Papuans (Figure 2D).

Combining results from the different approaches, the authors support an xOoA that split from Africans around 120 kya, and conclude by estimating it contributes to approximately 2% in contemporary Papuan genomes.

Conclusion

In this wide-ranging study, Pagani et al. discussed three main topics of human evolutionary biology in Eurasia using their extensive sampling: i) detect main geographic barriers to gene flow, ii) identify loci and ultimately pathways under selective pressure and iii) propose an extinct Out of Africa event earlier than 75 kya.

The latter was arguably the most important finding of this study with, as described above, the description of a 2% contribution in the genome of Papuans from an early xOoA. The authors provided multiple lines of compelling evidence pointing to an extinct Out of Africa expansion around 120 kya from Africans that admixed with the main OoA later in Sahul. The complete scenario is described in Extended Data Figure 10.

Nevertheless, the results presented in this paper and their associated methods are consistently poorly detailed and/or not self-explanatory. Such a paper covering a trendy topic in a high impact journal should be less indigestible for neophytes or even to fellow evolutionary biologists. Furthermore, the connection between the three main sets of analyses of the study (geographic barriers to gene flow, selection screening and the possibility of an xOoA) seems to be lacking as there is no global discussion bringing all points together.

Written by Ana Paula Machado and Clément Train.

Studied papers
Tucci & Akey 2016 Population genetics: A map of human wanderlust. Nature 538: 179–180
Pagani et al 2016 Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538: 238–242

Reference

  1. Pagani, L. et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538, 238–242 (2016).
  2. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
  3. International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
  4. 1000 Genomes Project Consortium et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
  5. Lahr, M. M. & Foley, R. Multiple dispersals and modern human origins. Evolutionary Anthropology: Issues, News, and Reviews 3, 48–60 (2005).
  6. Groucutt, H. S. et al. Rethinking the dispersal of Homo sapiens out of Africa. Evol. Anthropol. 24, 149–164 (2015).
  7. Liu, W. et al. The earliest unequivocally modern humans in southern China. Nature 526, 696–699 (2015).
  8. Tucci, S. & Akey, J. M. Population genetics: A map of human wanderlust. Nature 538, 179–180 (2016).
  9. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
  10. Lawson, D. J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012).
]]>
https://wp.unil.ch/genomeeee/2017/10/25/genomic-analyses-inform-on-migration-events-during-the-peopling-of-eurasia/feed/ 2
ExAC presents a catalogue of human protein-coding genetic variation https://wp.unil.ch/genomeeee/2016/12/08/exac-presents-a-catalogue-of-human-protein-coding-genetic-variation/ https://wp.unil.ch/genomeeee/2016/12/08/exac-presents-a-catalogue-of-human-protein-coding-genetic-variation/#comments Thu, 08 Dec 2016 20:14:13 +0000 http://wp.unil.ch/genomeeee/?p=720 ResearchBlogging.org

Exploration of variability of human genomes represents a key step in the holy grail of human genetics – to link genotypes with phenotypes, it also provides insights to human evolution and history. For this purpose Exome Aggregation Consortium (ExAC) have been founded; to capture variability of human exomes using next-generation sequencing. The first ExAC dataset of 63,358 individuals was released 20th of October 2014. Recently, a paper describing updated version of the dataset was published : Analysis of protein-coding genetic variation in 60,706 humans.

Authors made a great work on the reproductibility of the downstream analyses they have performed and generally on the availability of data. All the code is well documented in blogpost and available in GitHub repository. All figures in this blogpost I plotted by my own!

Dataset

ExAC is composed of almost ten fold more individuals and previous dataset of the similar kind Fig 1a. 91,000 individuals were sequenced, of which 60,706 have been kept after quality filtering. Finnish population was excluded from European due to bottleneck they have gone though.

ExAC was targeting individuals with various genetic background. Principal component analysis have shown very strong geographical pattern in the dataset (Fig 1b). I expected a continuum of haplotypes in the environment without strong geographic obstacle (like European-Latino continuum). The gaps between South Asian samples and the rest Europen samples on the PCA plot is most likely caused by the absence of samples from Middle-East Asia. Middle-East Asian samples have just a colour, but no data points. Central Asians do not even have a colour.

Figure 1: Size and diversity of ExAC dataset a, ExAC dataset is almost ten fold bigger than datasets of similar kind: 1000 Genomes project and Exome Sequencing Project (ESP), but more importantly, it captures a far greater diversity of human populations compared to ESP and 1000 Genomes. b, The geographic signal of populations visualized using Principal component analysis (PCA). The first principal component get all the variability of African samples and it does not tells much about the rest of the dataset (Extended Data Figure 5 in the paper), therefore the second and third principal component has been show.

A 45 million nucleotide positions with sufficient coverage (>10x in at least 80% of individuals) are present in ExAC. These positions correspond to 18 million possible synonymous variants (in theory) of which ExAC is capturing 1.4 million (7.5%).

The size of ExAC allows to observe…

…mutational reoccurence: 43% of synonymous de Novo variants identified in previous studies were also identified in ExAC, which is a first direct evidence of mutational reoocuarence.

…multiple allels: 7.9% of high quality polymorphic sites are multiallelic, which is fairly close to Poisson expectation (whatever it means…)

…a LOT of variants after all the filtering, 7,404,909 high-quality variants were identified of which 317,381 indels. The density of variant is on the average one over eight bases. 99% of the variants had frequency bellow 1% and 54% of the variants are singletons (i.e. only one individual carries the variant).

…a selection effects The proportion of singletons among polymorphisms can serve as a measure of purifying selection acting on the polymorphisms of given size. The Figure 2 shows that indels that are not affecting open reading frame (ORF) have significantly less singleton variants than indels that actually affect ORF. There is also significant difference between indels of different sizes that are affecting ORF, but we (our topic group) have not found any possible explanation for this pattern.

…saturation of alleles in CpG sites: CpG sites have very high rate of transitions, therefore capturing all possible variants is substantially easier than for other sites. A subset of 20,000 individuals of ExAC dataset shows saturation of alleles – all non-lethal possible synonymous CpG transition variants are present. ExAC is the first dataset showing a saturation of human variation.

Figure 2: Indel frequencies with respect to the size a, Frequency of deletions is higher and smaller indels are more probable than greater. If we take into account the greater probability of smaller indels, frequency of indels that not shifting open reading frame is bit higher than frequency of indels than are not. b, Proportion of singletons in total number of indels (as proxy for strength of selection) is significantly and consistently lower in all indels that are not shifting open reading frame (-6, -3, +3, +6).

Deletireous alleles

Authors introduce a mutability adjusted proportion singleton (MAPS) metric as a measure of selection. This metric is correcting on biases caused by the different mutational rates allowing comparisons of categories with various mutational speed. Comparison across different functional classes have shown at Figure 3. MAPS shows higher values for categories predicted to be deleterious by conservation-based methods.

Figure 3: MAPS values of different functional classes. MAPS is highest for nonense substiturions and it also consistent with PolyPhen and Combined Annotation Dependent Depletion (CADD) classification.

Rare diseases

Average ExAC individual carries ~54 variants reported as Mendelian disease causing. Approximately 41 of these alleles were identified with frequency greater than one, therefore it is not expected to be caused by problem is variant calling, but in miss-classification of variants in the database. Evidence of 192 previously variants were manually curated of those only 9 had sufficient evidence in disease association. High allele frequencies were identified mainly in previously underrepresented categories Latino and South Asian.

ExAC have shown importance of matching reference population in identification disease-causing variant. An example is recessive disease North American Indian childhood cirrhosis previously reported to be caused by CIRH1A p.R565W. This variant was identified in homozygotic state in four individuals in Latino population, none of them having a record of liver problems during childhood.

Conclusion

ExAC shows the importance of diversity of sampled population in capturing the real link between genotype and phenotype. Even ExAC provides a lot of new insights, there are still populations that are underrepresented or not represented at all.

Given the richness of ExAC and the effort of authors in data sharing and availability, I guess that it will be a great resource for various analyses in the future for a lot of researchers around the globe.

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won HH, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur DG, & Exome Aggregation Consortium. (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536 (7616), 285-91 PMID: 27535533

]]> https://wp.unil.ch/genomeeee/2016/12/08/exac-presents-a-catalogue-of-human-protein-coding-genetic-variation/feed/ 1 Identification of a large set of rare complete human knockouts https://wp.unil.ch/genomeeee/2016/05/16/identification-of-a-large-set-of-rare-complete-human-knockouts/ Mon, 16 May 2016 19:22:40 +0000 http://wp.unil.ch/genomeeee/?p=695 ResearchBlogging.org

High throughput genotyping and sequencing has led to the discovery of numerous sequence variants associated to human traits and diseases. An important type of variants involved are Loss of Function (LoF) mutations (frameshift indels, stop-gain and essential sites variants), which are predicted to completely disrupt the function of protein-coding genes. In case of Mendelian recessive diseases, for the condition to occur, the LoF variants must be biallelic, i.e. affecting both copies of a gene. The affected gene is then defined as “knockout”.

By studying the Icelandic population, authors aim to identify rare LoF mutations (Minor Allele Frequency, MAF < 2%) present in individuals participating in various disease projects. They then investigate at which frequency in the population these LoF mutations are homozygous (i.e. knockout) in the germline genome.

The Icelandic population Iceland is well-suited for genetic studies for three main reasons. The island was colonized by human population around the 9th century by 8-20 thousand settlers. Since then the population grew to around 320’000 inhabitants today. The initial founder effect and rare genetic admixture make the Icelandic population a genetic isolate. In addition to an unusual genetic isolation, Iceland’s population benefits of a genealogical database containing family histories reaching centuries back in time, as well as a broad access to nationwide healthcare information.

These characteristics led to the development of large-scale genomic studies of Icelanders by deCODE Genetics. This biopharmaceutical company has published various studies, including this paper, related to genetic variants and diseases in Icelanders.

Loss of function mutation and rare complete knockouts Authors sequenced the whole genome of 2’626 Icelanders participating in various disease projects and identified variants in protein coding genes. These variants were annotated with the predicted impact that they have on the gene: LoF, moderate or low impact. A total of 6’795 LoF mutations in 4’924 genes were identified, with most of these variants (6’285) being rare (MAF < 2%).

The identified LoF variants were imputed into an additional 101’584 chip-genotyped and phased Icelanders, allowing the identification of the number of knockout genes in the population. Authors found that 1’485 previously identified LoF mutations (MAF <2%) are contributing to the knockout of 1’171 genes and that 8’041 individuals possess at least 1 of these knockout genes. Out of these 1’171 genes, 88 had been already linked by previous studies to conditions through a recessive mode of inheritance.

Double transmission deficit of LoF variants Because knockout genes should be deleterious for an organisms, we expect a deficit of homozygous for these genes in the population due to embryonic/fetal, perinatal or juvenile lethality. To investigate whether such a deficit was present, authors calculated the transmission probability of LoF variants from parents to their offspring.

Under Mendelian inheritance, the expected percent of transmission of the LoF mutated gene from heterozygous parents to their offspring (i.e. double transmission) is of 25%. However, results show a statistically significant deficit in double transmission, the observed double transmission probability being of 23.6%.

The rare LoF mutations were ranked according to the Residual Variation Intolerance Score (RVIS) percentiles and essentiality score percentiles. Both measures attempt to classify genes according to their tolerance to functional variation, with the lowest rank corresponding to genes being more sensitive to mutations. As expected, the lowest double transmission rate was found for the most sensitive genes (first percentile), suggesting that a homozygous state of LoF mutation in these genes is deleterious.

Tissue specific expression of knockout genes Authors investigated if genes were more likely to be knockout when expressed in specific tissues. By retrieving the information from previous studies of the number of genes that are highly expressed in 1 or more – but not all – 27 tissues, they calculated the fraction of these genes that were knockout in each tissue. They found that the brain and placenta were the tissue with the lowest fraction of knockout genes (3.1% and 3.9%, respectively), and that in testis, small intestine and duodenum were observed the highest fraction of biallelic LoF mutations (5.8%, 6.4%, and 6.9% respectively).

Conclusion and Comments The characteristics of Icelandic population and the incredibly large sample size (~ 1/3 of the total population) allowed authors to identify a large number of new and rare LoF mutations. Part of these mutations was shown to contribute to the knockout of an unexpected large number of genes in an unexpected large number of people. This study is the first to shed a light on the astonishing number of knockout present in human populations. In addition, by investigating the transmission probability, a deficit in homozygous loss-of function offspring was identified, especially when LoF mutations affected essential genes. This result was expected because of the predicted deleterious effect of biallelic LoF mutations.

Besides the aforementioned interesting results of the paper, some aspects were slightly disappointing. First, I was expecting authors to focus more on the genotype-phenotype aspects. Even if they pinpoint a deficit in double transmission, suggesting deleterious consequences for the organism, authors did not discuss the function of the identified knockout genes and their effect on the phenotype. Second, the paper was not an easy read. Many results were only mentioned without additional information on the methods or data used, and it was sometimes difficult to link them with the main aim of the study. Additionally, figures were sometimes misleading because of different axis scales or incomplete legends.

Finally, authors suggested that important tissues, such as the brain, have a lesser number of knockout compared to other tissues, writing that “genes that are highly expressed in the brain are less often completely knocked out than other genes”. However, this result is questionable as we do not have any measure of the number of knockout genes that we expect to be expressed only by chance in the tissues. In other words, the brain could have a lower number of knockout genes expressed compared to other tissues only because the total number of expressed genes in the brain is lower. Therefore we do not know if the lower number of knockout genes in the brain is due to chance or to biological reasons.

Nevertheless, this study opens the door to understanding how many knockout genes occur without phenotypic consequences in humans, what are the genes function and essentiality, and the role of the environment in the buildup of phenotype. The classical search for genetic variants associated to a phenotype, as in GWAS studies, could be reversed by first identifying individuals with the same genetic variants and then precisely phenotyping them.

Sulem, P., Helgason, H., Oddson, A., Stefansson, H., Gudjonsson, S., Zink, F., Hjartarson, E., Sigurdsson, G., Jonasdottir, A., Jonasdottir, A., Sigurdsson, A., Magnusson, O., Kong, A., Helgason, A., Holm, H., Thorsteinsdottir, U., Masson, G., Gudbjartsson, D., & Stefansson, K. (2015). Identification of a large set of rare complete human knockouts Nature Genetics, 47 (5), 448-452 DOI: 10.1038/ng.3243

]]>
Reconstructing human population history : ancestry and admixture https://wp.unil.ch/genomeeee/2016/03/31/reconstructing-human-population-history-ancestry-and-admixture/ Thu, 31 Mar 2016 16:43:41 +0000 http://wp.unil.ch/genomeeee/?p=656 ResearchBlogging.org

ResearchBlogging.org

ResearchBlogging.org

Understanding the evolutionary history of our own species, how migration and mixture of ancestral populations have shaped modern human populations is a key question in evolutionary biology. Here we present three articles related to this topic, the first two dealing with India and the third one focusing on a single Ethiopian group :

1) Moorjani et al 2013 Genetic Evidence for Recent Population Mixture in India AJHG 93,: 422–438

2) Basu et al 2016 Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure PNAS online before print

3) Van Dorp et al 2016 Evidence for a Common Origin of Blacksmiths and Cultivators in the Ethiopian Ari within the Last 4500 Years: Lessons for Clustering-Based Inference PLOS Genetics 11(8): e1005397

All of them use genome wide data from micro array. After a brief abstract of  each paper, showing their similarities and differences, we discuss their methodological approaches.

Ancestral populations of India

The aim of the first two articles is to understand the history of the populations of the Indian subcontinent. The first one (Moorjani et al 2013) reports data from 73 groups living in India for more than 570 individuals sampled. The authors filtered out  the data by removing all individuals with evidence of recent admixture or recent ancestry from out of India. The populations that were included in the analysis can be classified into two linguistic categories: the ones speaking Indo-European languages and the ones speaking Dravidian languages.

Figure 1 : map of sampled population (A) and PCA of 70 indians groups and some non-indians, highlighting the “Indian cline” (B)

Previous genetic evidence indicates that most of the groups of India descend from a mixture of two distinct ancestral populations: Ancestral North Indians (ANI) and Ancestral South Indians (ASI). Three different hypothesis exist for the date of mixture of these two populations:

1) arrival of ANI is due to migration prior to agriculture about 30,000-40,000 years ago

2) ANI arrived with the spread of agriculture who probably began around 8,000 and 9,000 years ago

3) ANI arrived very recently (3,000-4,000 years ago) when the Indo-European languages presumably began to be spoken in India.

To prove the admixed origin of Indian groups and estimate the proportion of each ancestry in each population they use a PCA and a statistic called F4 ratio that infers the mixture proportion measuring the correlation in allele frequencies between each pair of groups. They demonstrated that all populations are admixed and lie along an “Indian cline”, that is a gradient going from 17% of ANI ancestry to 71%. These results correlate well with geography and language, with the northern Indo-European populations having more ANI ancestry than the southern Dravidian ones. Then they use linkage disequilibrium (LD) to estimate the dates of admixture : LD blocs are longer if the admixture is younger. By fitting an exponential function to the decay of LD (that is expected from a sudden cessation of admixture) they could estimate that admixture occurred between 1,856 and 4,176 years ago, supporting the third hypothesis. These results correspond with demographic and cultural changes observed in India with the establishment of the caste system leading to strong endogamy that stopped the admixture rapidly. Moreover they found that Indo-Europeans groups have more recent admixture dates, which could be explained by multiple waves of mixture in these populations. Another finding of this paper is that aboriginal Andaman Islanders (Onge) belong to a sister group of ASI.

The second article (Basu et al 2016) has the same focus region and use the same basic dataset, except that the authors kept the all populations in the analyses, including the austro asiatic (AA) and tibeto burman (TB) speakers. They first ran ADMIXTURE on all populations and showed that islanders and mainland populations have distinct ancestral components (islanders share ancestry with oceanic peoples like Papuans). In a second time they ran the same analysis on mainland populations only (thus excluding population from the Andaman and Nicobar islands). The best model was composed of four ancestral components, the ANI, the ASI as well as the ancestral AA and TB and they found that several present day populations are almost pure representatives of these ancestral components (figure 2).

Fig. 2 : PCA of the 18 mainland Indian populations, the four clusters identified by the authors are surrounded (A). Admixture plot of mainland Indian populations with four ancestral components (K = 4, the most parsimonious) (B).

They further estimated the time and extent of admixture using the degree of fragmentation (due to recombination) of haplotypes blocs originating from a donor population into the recipient population. In each population, the distribution fitted again with an exponential curve. They showed that admixture abruptly came to an end about 1575 years ago in upper-caste populations, most likely due to the establishment of endogamy, while tribal populations seemed to have admixed until 1500-1000 years ago.

In short, although they share a common topic, these two papers propose divergent versions of the history of Indian population : while the first considers a priori that austro asiatic and tibeto burman speakers are not component of the ancestral populations of India and only focuses on the mixture between the ANI and ASI components, the second paper claims that the genetic structure of Indian population is the result of admixture events between four ancestral components. However the two views converge on the idea that admixture was a common phenomenon in India that ceased rapidly with the establishment of the caste systems that enforced endogamy.

Common origin of two subgroups of Ari people

The 3rd paper investigates the history of human populations at a smaller scale, focusing on a single ethnic group, the Ari people of Ethiopia. The Ari are composed of two socially and genetically distinct subgroups : the cultivators (Aric) and the blacksmiths (Arib). Anthropologists have proposed two alternatives hypothesis to explain the division of the Ari : under the remnant hypothesis (RN), the blacksmiths are the remnants of an indigenous group that was assimilated by the more recently arrived cultivators, whereas the marginalization (MA) hypothesis proposes that the two groups share a common ancestry but the blacksmith were recently marginalized due to their activity. While anthropologists traditionally favour the MA hypothesis, recent genetic studies have provided support for the RN hypothesis. In this article the authors use a new methodology on the same genetic dataset to bring evidence for the MA hypothesis. They show that when ADMIXTURE, fineSTRUCTURE or CHROMOPAINTER analysis are run on a complete dataset of 237 samples of 12 Ethiopian and neighbouring populations, the Arib are grouped into a single homogeneous cluster. But when the patterns of haplotype sharing are inferred by composing the Ari as a genetic mixture of all other groups, except themselves, the genetic differences between Arib and Aric disappear. In fact, their analyses reveal that the two Ari groups have the same mixture events with non Ari populations (figure 3).

Fig. 3 : Top :  Inferred ancestry composition of recipient groups when forming each group as mixtures of (a) all sampled groups, (b) all sampled groups except the Ari. Bottom : TVD XY values comparing the painting profiles for all pairwise comparisons of groups X, Y under each analysis, with scale at far right. Ari groups (ARIb/ARIc) are highlighted with black outlines in each plot.

To explain this pattern they propose that the genetic differentiation of the blacksmith is due to a bottleneck effect. Their hypothesis is supported by the fact that identity-by-descent (IBD) is stronger in blacksmiths than cultivators which is consistent with reduced genetic diversity in the blacksmiths. Using the D-statistic, they also show that the Arib and Aric are more closely related to each other than they are to any other Ethiopian group. Therefore they conclude that the observed genetic differentiation between the Arib and Aric does not represent separate ancestry but is rather the result of strong genetic drift due to a bottleneck effect induced by the social marginalization of the blacksmiths.

Methodological discussion

What stands out from reading these three articles is that selection of a proper methodology is crucial within an hypothesis testing framework. While the two articles on Indian populations use the same initial dataset, the way they filter and analyse it results in very different conclusions. The inclusion or exclusion of some populations from an admixture analysis or outgroup selection for an f4 ratio estimation directly impact the output of these analysis and can lead the authors to tell very different stories. Before disclaiming or putting forward one hypothesis, it is important to be aware of the limitations of the method that is used to produce the results. For example the authors of the second paper on India’s ancestral populations, claim to demonstrate a more complex history than shown in the first paper but their result is solely based on a clustering analyse (implemented in various softwares such as STRUCTURE or ADMIXTURE).

The basic principle of those STRUCTURE/ADMIXTURE like programs is to take the K most different groups of the dataset, consider them as the pure ancestral groups and force the others to be a combination of those. This means that the results depend on the populations and the number of clusters K that are input in the program. There are different methods to determine which K provide the best fit to the data (cross-validation error, delta K …) but in numerous cases the inferred mixture proportions are wrong. Only in very simple cases, like the African American genetic history (well explained in Daniel Falush’s blog) that involves three clearly defined and very differentiated ancestral populations (West Africans, Europeans and Native Americans) we can be confident in the results of the clustering analyse.

struct
Fig. 4 : Admixture plot of African American population (ASW) with his three ancestral populations, West Africans(YRI), Europeans (CEU) and Native Americans (MEX). Source : Daniel Falush’s blog

But in many cases the history is more complex and no current population actually corresponds to a pure ancestral population because of multiple waves of admixtures. In this case the most differentiated groups correspond only to the most extreme groups but it does not mean that these groups are pure or ancestral. This is well explained in Razib Khan’s blog using the simple example of Uygurs and Europeans :  it is known that the Uygurs are a recently mixed group (between European and Asian) but if K is fixed to 2 with Uygurs and Europeans, STRUCTURE will form two different clusters at 100% levels, one with the Uygurs and one with Europeans. This is  why, in the 2nd paper, the apparently pure AAA, ATB, ASI and ANI populations and all the clustering implications are probably meaningless.  In fact, when using the  f4 ratio (as in the first paper) all groups are found to be admixed to a certain extent (with the smallest rate of admixture being 17%).

This critic of clustering analysis is a key element of the study on the Ari people where the authors point out that results from such methods should not be taken for granted but interpreted with caution. Indeed this kind of method cannot discriminate between alternative scenarios of recent mixture of separate populations or shared ancestry followed by population divergence. Therefore support for one of these hypotheses should rely on additional tests. Instead of directly accepting the story suggested by a clustering analysis, a more reasonable work-flow would be to use other methods in order to address the specific implications of one hypothesis. This is exactly what is done in the third article where, as we previously explained, the authors constrain the analysis of mixture by forbidding self ancestry in the two groups of interest which remove the confounding effect of recent bottleneck. In such complex cases, associating PCA and STRUCTURE-like analyses with F-statistics and simulations allow to draw a more robust conclusion. Indeed statistics such as Fst or Dxy that estimate the genetic differentiation between two populations can be simulated under alternative scenarios, representing competing hypothesis (figure 5). These simulated statistics can be subsequently compared with the ones estimated from real data to favour one hypothesis over the other.  Simulations can also give an idea of how difficult it is to discriminate between the different hypothesis, which avoid over interpretation of the results. In the second paper, where the authors put forward an new hypothesis, radically different from the classical hypothesis of anthropology and other genetic studies, additional tests like these seem necessary to strengthen their conclusions.

Fig. 5 : Differences in inferred ancestry under analyses A and B using F XY from real data on the top and from simulated data on the bottom (under MA and RN hypotesis). Here the MA hypothesis is obviously the closest to the reality.

Although it was not mentioned in any of the articles, the quality of the data and the way to obtain them, i.e. the kind of sequencing methodology, should also be a matter of precaution. Indeed, they all use micro arrays designed from European populations. These micro arrays consist of thousands of DNA spots containing a predefined sequence, known to be polymorphic in Europeans and only the complementary sequence can fix to this spot and be sequenced. So using these micro arrays to study the history of non european populations may be problematic as only SNPs that are variable for europeans will be targeted, probably leading to the exclusion of meaningful information for non European populations. Today, with New Generation Sequencing (NGS) there are many alternatives, such as RAD sequencing or Whole Genome Sequencing, that allow to sequence tens of thousands non-predefined SNPs.

Conclusion

To conclude, the take home messages from these three articles are :

– Social systems leading to endogamy can influence and modify rapidly and dramatically the genetic structure and patterns of humans populations.

– It is difficult to reconstruct the ancestry of human populations, especially when they involve a complex process with multiple waves of admixture.

– Clustering methods are designed to find a structure in a genetic dataset but they do not necessarily reflect real shared ancestry. Further test using other methods are required to robustly support one hypothesis.

]]>
The African Genome Variation Project shapes medical genetics in Africa. https://wp.unil.ch/genomeeee/2015/03/22/the-african-genome-variation-project-shapes-medical-genetics-in-africa/ https://wp.unil.ch/genomeeee/2015/03/22/the-african-genome-variation-project-shapes-medical-genetics-in-africa/#comments Sun, 22 Mar 2015 18:30:58 +0000 http://wp.unil.ch/genomeeee/?p=504 ResearchBlogging.org 

Despite being the world’s most genetically diverse continent, only a handful of studies attempted to understand the genetic risks for diseases of the African populations. This study shines light not only on the genetic diversity to help learn more about the variants that are associated with malaria and hypertension, but also on the population history across sub-Saharan African populations. Beside the comprehensive map of the African variants obtained from genotypes of 1,481 individuals and whole-genome sequences of 320 individuals, authors offered a design of the array suitable to capturing variants of African populations.

Summary and comments of the paper

Population structure in SSA. Comparing ~2.2 million variants of 18 ethno-linguistic groups from sub-Saharan Africa (SSA), authors found modest differentiation among SSA populations (mean pairwise Fst = 0.019) and among Niger-Congo language groups (mean pairwise Fst = 0.009). In the article, authors suggested that the modest differentiation among Niger-Congo language group showed evidence for ‘Bantu expansion’. However, the Fig1.a shows sample distribution mostly next to the Western, East and South African coasts, rather then inside of continent where the Bantu expansion occurred, therefore indicating the sampling bias.

Fig1_African
Fig 1. a, 18 African populations studied in the AGVP including 2 populations from the 1000 Genomes Project. (The ‘term’ Ethiopia encompasses the Oromo, Amhara and Somali ethno-linguistic groups.) b,c, ADMIXTURE analysis of these 18 populations alone (n = 1,481) (b) and in a global context (n = 3,904) (c)

Furthermore, the authors found a high proportion of unshared and novel variants in Ethiopian population raising the importance of sequencing individuals across Africa.

Extending the analysis on population history in Africa, authors performed PCA analysis among African populations. The results suggested Euroasian gene flow and possible hunter-gatherer (HG) ancestry. To support the results from PCA analysis, unsupervised ADMIXTURE analysis (Fig1.c) showed similar results, with Euroasian admixture in Ethiopian population (Oromo, Amhara, Somali) and HG admixture in Biaka and Mbuti rainforest HG. Also, it is noticeable that Western Europeans and Central/Eastern Asian are well separated, indicating two branches of migration. ADMIXTURE analysis also pointed out the heterogeneous American population. The authors found that the most probable number of clusters of worlds populations in ADMIXTURE analysis is k = 18. Unfortunately, it is not clearly seen from the supplementary data that the CV error of clusters was lowest for k = 18.

The authors were interested in the more detailed gene flow effect among the African populations by masking Euroasian admixture. The results showed reduced population differentiation, suggesting that Euroasian admixture has a significant impact on those populations. Nevertheless, the authors did not discuss other possibilities of gene flow effects, such as allele surfing or allele fixation.

Population admixture in SSA. Using three population tests (f3 statistics), authors identified greatest proportion of Euroasian admixture in East Africa and HG admixture among Zulu and Sotho populations in South Africa. In the Fig2., authors showed that ancient Euroasian admixture appears in Yoruba population (~7,500-10,500 years), which gives support to Neanderthal ancestry in this African population.

Fig2_African
Fig 2. Dating and proportion of Euroasian and HG admixture among African populations.

Beside the observed HG admixture in South African samples, a HG admixture was also detected in Igbo populations and more recent in East Africa. The explanation of HG admixture in West and South Africa is related to Khoe-San populations, while in the East Africa is related to Mbuti rainforest HG populations dating to ~3,000 years ago.

Moreover, in the Fig 2. is observed an overlap of Euroasian and HG admixtures in East African populations (Barundi, Banyarwanda and Baganda) both dating to ~2,400-3,900 years ago. However, it was not commented in article do these populations have a presence of both admixtures or not and how is it possible.

Positive selection in SSA. The authors observed highly differentiated SNPs in two population structure approaches to inspect the positive selection due to local adaptive forces.

One approach was to observe highly differentiated SNPs between Euroasian and African populations. Beside some other locus-specific differentiations, they found evidence of differentiation in CR1 gene (chemokine receptor 1), previously reported as a gene implicated in malaria susceptibility. The authors also identified locus-specific differentiation within genes active in osmoregulation, specifically in hypertension. Given these results, the authors speculate that changes in these gene regions give basic support in differences of salt sensitivity and hypertension in sub-Saharian African populations.

Second approach observed highly differentiated SNPs among the African populations when Euroasian admixture was masked. It has not escaped to notice that the most of Euroasian admixture had main proportion in Ethiopian populations (as seen in Fig2. and Fig1c.). For that reason, masked Euroasian admixture might affect only Ethiopian population, but certainly cannot be generalized for other African populations that actually might have had a process of local adaptation. Consequently, the quote from paper “This suggests that a large proportion of differentiation observed among African populations could be due to Euroasian admixture, rather than adaptation to selective forces.” should be taken cum grano salis. The speculative reason why there is an observed Euroasian admixture in Ethiopian population is that nomadic groups survived the migration from North and cross the Sahara to inhabit current Eastern African territory.

However, the analysis of African populations with masked Euroasian admixture revealed 56 loci, together with highly differentiated variant in CSK gene region, involved in hypertension. The variant in CKS gene region showed complete linkage disequilibrium (LD) with another risk allele that correlates with latitude, giving the evidence of temperature local adaptation as a mechanism of hypertension.

Next, the authors were interested in comparison of populations situated in endemic and non-endemic regions to distinguish loci related to infectious diseases. They identified set of loci signals in gene regions for malaria, Lassa fever, trypanosomiasis and trachoma.

Fig3_African
Fig 3. Improvement in imputation accuracy with the AGVP WGS panel.

Designing medical genetics studies in Africa. Taking into consideration that there is a high genetic diversity on African continent, the importance to build the reference genome panel across African populations cannot be stressed enough since it enable us to shed light on most of the worlds variation. Current reference genome panels, such as HapMap and 1000Genome, were mostly built on European, American and Asian populations and they miss the African polymorphisms. This makes more difficult to recognize certain polymorphic biomarkers associated to spectrum of diseases in African populations.

Therefore, authors investigated imputation accuracy of two African populations using two different reference genome panels – 1000Genome project and ‘merged’ 1000Genome project with 320 whole genome-sequenced African individuals, respectively. They observed the slight improvement in imputation accuracy of the Sotho and Igbo populations using ‘merged’ reference genome panel (Fig3.).

Moreover, the authors compared the usefulness of current array chips to define the most favorable array design capturing African variants. Their results showed efficiency of HumanOmni2.5M array capturing >80% of common variation. Surprisingly, authors did not mention future possibilities of whole-genome sequencing in Africa that play a crucial role in modern research nor the drawbacks of microarray noisy data. The dropping costs of sequencing technology and its development would certainly bring more precise results.

Conclusion

In spite of the nicely presented results with plenty of supplementary data, the article raises lots of speculations and thoughtful discussions on migration of African populations. Furthermore, the PCA analysis in extended and supplementary data are hard to read due to many different symbols and colors. Easier representation of PCA analysis would help to distinct the patterns of African populations. However, the study provides invaluable resource of variant association information for several diseases that will increasingly improve medical diagnostics in African populations.

 

Gurdasani, D., Carstensen, T., Tekola-Ayele, F., Pagani, L., Tachmazidou, I., Hatzikotoulas, K., Karthikeyan, S., Iles, L., Pollard, M., Choudhury, A., Ritchie, G., Xue, Y., Asimit, J., Nsubuga, R., Young, E., Pomilla, C., Kivinen, K., Rockett, K., Kamali, A., Doumatey, A., Asiki, G., Seeley, J., Sisay-Joof, F., Jallow, M., Tollman, S., Mekonnen, E., Ekong, R., Oljira, T., Bradman, N., Bojang, K., Ramsay, M., Adeyemo, A., Bekele, E., Motala, A., Norris, S., Pirie, F., Kaleebu, P., Kwiatkowski, D., Tyler-Smith, C., Rotimi, C., Zeggini, E., & Sandhu, M. (2014). The African Genome Variation Project shapes medical genetics in Africa Nature, 517 (7534), 327-332 DOI: 10.1038/nature13997

]]>
https://wp.unil.ch/genomeeee/2015/03/22/the-african-genome-variation-project-shapes-medical-genetics-in-africa/feed/ 2
Gibbon genome and the fast karyotype evolution of small apes https://wp.unil.ch/genomeeee/2015/01/18/gibbon-genome-and-the-fast-karyotype-evolution-of-small-apes-3/ Sun, 18 Jan 2015 22:11:13 +0000 http://wp.unil.ch/genomeeee/?p=447 ResearchBlogging.org

All contents refer to the original paper (Carbone et al. Nature. 2014 Sep 11;513(7517):195-201)

Summary and personal comments

This paper concerns a study of gibbon karyotype in the perspective of their divergent evolution from ancestral primates. Gibbons, small monkeys living in South-East Asia, differ from other primates, such as great apes and Old World monkeys, for a surprising number of chromosomal rearrangements. The authors aimed to study the mechanisms underlying such an important plasticity in gibbon genome gibbon.

1) The authors sequenced and assembled the genome of a white-cheeked gibbon female (Nomascus leucogenys), ordered in 26 chromosomes (against human reference), and analyzed gibbon-human synteny breakpoints (= rupture of synteny=physical co-localization of genetic loci on the same chromosome within gibbon and human).

Fig 2a shows Oxford plots for human (axys y) versus other primates chromosomes (axys x), expressed in terms of collinear blocks of > 10 Mb. It is evident from the graphic that, when compared to other primates, gibbons present the highest rate of chromosome rearrangements, graphically visualized as a scattered instead of a linear plot (Fig2a), in particular large-scale reshuffling (as shown in Fig 2b, right part of the graphic). Examples of synteny breakpoints, such as chromosomal inversion, are shown in Fig 2c.

2) The authors analyzed various transposable elements of different primates and found that one retrotransposon, the LAVA element, is exclusive to gibbon genome. Intragenic LAVA insertions are observed particularly in genes that are important for cell division and chromosome segregation, as shown in Table 1 of the Extended Data. Authors hypothesized that antisense insertions of LAVA elements into introns could determine an early transcription termination by polyadenilation. They provided evidence supporting their hypothesis through a gene construct involving a luciferase: LAVA insertions into luciferase gene determined an early termination of luciferase transcription, as suggested by lower enzymatic activity (Fig 3 b right).

4) Moreover, authors explored LAVA families across 4 gibbon genera in order to study gibbon lineage evolution. They identified 22 LAVA subfamilies and used a maximum likelihood method to estimate LAVA age and to locate the divergence of gibbons from great apes at 16.8 Myr ago. Furthermore, they performed a WGS of the genome of 4 gibbon genera (from 2 individuals per genera) and constructed the most probable gene trees through a UPGMA method (unweighted pair group method with arithmetic mean) from a coalescent-based analysis (ABC), as shown in Fig 4a. Fig 5 from Extended Data shows the 15 top UPGMA trees for 100 kb non-overlapping sliding windows of gibbon genome. Interestingly, the most probable bifurcating species topology suggest a strikingly rapid speciation process for all 4 gibbon genera, with a beginning of speciation placed at 5 Myr ago (Fig 4b).

5) Finally, in order to investigate the features of such an adaptive evolution, authors analyzed genomic regions which could have undergone lineage-specific modifications. They identified 240 regions with gibbon-specific accelerated substitution rates (gibARs) that were not only intragenic but also co-localized with LAVA elements. They also identified genes (TBX5, COL1A1, CHRNA1, SNX19) that might have been undergone a positive selection related to gibbon-specific traits, such as longer arms or stronger shoulder/elbow muscles compared to humans.

This paper underlines important characteristics of gibbon genomes and provides novel insights into genome plasticity mechanisms of those small apes. Nevertheless, it remains largely unclear under which circumstances gibbons had undergone such an accelerated evolution and how speciation and fixation of specific traits could have been produced so rapidly.

Carbone, L., Alan Harris, R., Gnerre, S., Veeramah, K., Lorente-Galdos, B., Huddleston, J., Meyer, T., Herrero, J., Roos, C., Aken, B., Anaclerio, F., Archidiacono, N., Baker, C., Barrell, D., Batzer, M., Beal, K., Blancher, A., Bohrson, C., Brameier, M., Campbell, M., Capozzi, O., Casola, C., Chiatante, G., Cree, A., Damert, A., de Jong, P., Dumas, L., Fernandez-Callejo, M., Flicek, P., Fuchs, N., Gut, I., Gut, M., Hahn, M., Hernandez-Rodriguez, J., Hillier, L., Hubley, R., Ianc, B., Izsvák, Z., Jablonski, N., Johnstone, L., Karimpour-Fard, A., Konkel, M., Kostka, D., Lazar, N., Lee, S., Lewis, L., Liu, Y., Locke, D., Mallick, S., Mendez, F., Muffato, M., Nazareth, L., Nevonen, K., O’Bleness, M., Ochis, C., Odom, D., Pollard, K., Quilez, J., Reich, D., Rocchi, M., Schumann, G., Searle, S., Sikela, J., Skollar, G., Smit, A., Sonmez, K., Hallers, B., Terhune, E., Thomas, G., Ullmer, B., Ventura, M., Walker, J., Wall, J., Walter, L., Ward, M., Wheelan, S., Whelan, C., White, S., Wilhelm, L., Woerner, A., Yandell, M., Zhu, B., Hammer, M., Marques-Bonet, T., Eichler, E., Fulton, L., Fronick, C., Muzny, D., Warren, W., Worley, K., Rogers, J., Wilson, R., & Gibbs, R. (2014). Gibbon genome and the fast karyotype evolution of small apes Nature, 513 (7517), 195-201 DOI: 10.1038/nature13679

]]>
The genetics of Mexico recapitulates Native American substructure and affects biomedical traits https://wp.unil.ch/genomeeee/2015/01/04/the-genetics-of-mexico-recapitulates-native-american-substructure-and-affects-biomedical-traits/ Sun, 04 Jan 2015 00:16:01 +0000 http://wp.unil.ch/genomeeee/?p=431 ResearchBlogging.org
Mexico, hosted many cultures such as the Olmec, the Toltec, the Maya and the Aztec, conquered and colonized by the Spanish Empire in 1521. The country harbors a large source of pre-Columbian diversity and their genetic contributions to today’s population.

In a recent paper, Moreno-Estrada et al. 2014 performed a detailed genetic study of Mexican genetic diversity. The results showed the genetic stratification among indigenous populations and an association between subcontinental ancestry and lung function.

In the first part of the study, to estimate the genetic diversity, researchers examined autosomal single-nucleotide polymorphisms for more than 500 Native Mexican individuals from all around Mexico. Statistical analysis of genomic data showed that some populations within Mexico are more differentiated than European and East Asian populations. This extreme differentiation thought to be a result of isolation followed by a bottleneck and small effective population sizes.

The data was analyzed in various ways (ROH and IBD analysis, PCA etc.) and revealed the population substructure of Mexico. In all of the analysis, the results confirmed that Seri (northernmost) and Lacandon (southernmost) have the highest level of differentiation. Also, the differentiation between Seri and Lacandon was greater than average differentiation between human populations. The relationships between other populations were accordance with geography, migration and language history. When African and European genetic data were included in the analysis of native Mexicans, it had been shown that most individuals have the genetic composition of Native and European ancestry. Further analysis indicated the ancient Native American substructure was recapitulated even after postcolonial admixture.

In the second part of the study, Moreno-Estrada et al. 2014 investigated the potential biomedical applications of genetic substructure information. Previous studies indicated the relationship between forced expiratory volume in 1 second (FEV1) could be an indicative of pulmonary disease and another study suggested that the proportion of European ancestry was associated with FEV1 in Mexicans. Researchers measured the lung function in Mexican and Mexican-American children with asthma and correlated these findings with native ancestry. Results showed 7.3% change in FEV1 moving from Sonora to Yucatan and researchers proposed that native ancestry could alone have effects on lung function in admixed individuals within Mexico.

Personal Comments

This paper provides novel insights to Mexican genetic diversity and proposes the biomedical applications of genetic data. The sampling locations cover most of the country and the analysis of the data in various methods gives confidence to reader. The paper is easy to follow and the figures are quite helpful.

However, I think there is a critical point that needs to be discussed from a medical point of view. As far as I know, asthma is a complex disease and thought to be caused by both genetic and environmental factors. In this study, I could not find any information about the developmental and medical history of patients. I think this is a critical point because of heterogeneous geography of the county. Where were they raised – in volcano towns, Pacific shores, Sumidero Canyon, Laguna Salada (-10m) or piedmont plains of Pico de Orizaba (5636m)? Did their mothers smoke during pregnancy? Were they born in Mexico City – the city named as “the most polluted city on the planet” by United Nations in 1992? I hope the researchers have already checked for this type information and found them unnecessary to include.

Nevertheless, this is an interesting paper and shows the genetic history of Mexico – before and after 16th century. I recommend reading this paper and discussing with a medical doctor 😉

Moreno-Estrada, A., Gignoux, C., Fernandez-Lopez, J., Zakharia, F., Sikora, M., Contreras, A., Acuna-Alonzo, V., Sandoval, K., Eng, C., Romero-Hidalgo, S., Ortiz-Tello, P., Robles, V., Kenny, E., Nuno-Arana, I., Barquera-Lozano, R., Macin-Perez, G., Granados-Arriola, J., Huntsman, S., Galanter, J., Via, M., Ford, J., Chapela, R., Rodriguez-Cintron, W., Rodriguez-Santana, J., Romieu, I., Sienra-Monge, J., Navarro, B., London, S., Ruiz-Linares, A., Garcia-Herrera, R., Estrada, K., Hidalgo-Miranda, A., Jimenez-Sanchez, G., Carnevale, A., Soberon, X., Canizales-Quinteros, S., Rangel-Villalobos, H., Silva-Zolezzi, I., Burchard, E., & Bustamante, C. (2014). The genetics of Mexico recapitulates Native American substructure and affects biomedical traits Science, 344 (6189), 1280-1285 DOI: 10.1126/science.1251688

]]>
Gibbon genome and the fast karyotype evolution of small apes https://wp.unil.ch/genomeeee/2014/12/18/gibbon-genome-and-the-fast-karyotype-evolution-of-small-apes-2/ Thu, 18 Dec 2014 08:33:04 +0000 http://wp.unil.ch/genomeeee/?p=422 ResearchBlogging.org

Gibbons are small apes living in southeast Asia that diverged between Old Monkeys and great apes and whose most distinctive feature is the high rate of evolutionary chromosomal rearrangement.

The aim of this study was threefold: First, the authors looked into the mechanisms that could explain the extraordinary rate of chromosomal rearrangement of gibbons. Second, they explored their evolutionary history to shed light into the timing and order of splitting of the gibbon genera. Third, they looked into the functional evolution of genes that might be associated with gibbon-specific adaptations.

To do so, they sequenced and assembled the genome of the white-cheeked gibbon (Nomascus leucogenys), showing that the quality and statistics of the assembled genome was comparable to that of other primates (Table 1 and Fig.S1).

 

Chromosomal rearrangement and LAVA insertions

Chromosomal rearrangement was confirmed by comparing the karyotype of the assembled Gibbon genome (Nleu1.0) to that of human. Figure 2A shows the extraordinarily high number of rearrangements compared to other primates. Furthermore these reshuffling events affect long stretches of chromosomes (displayed in Fig.2A are collinear blocks larger than 10Mb), whereas short-scale rearrangement events occur at levels comparable to other primates (Fig.2B).

Since the four Gibbon genera of this study differ themselves in chromosome number (ranging from 38 to 52), it would be interesting to have also a global view of the large-scale chromosomal rearrangement of the other genera compared to human, as well as the differences in karyotype among the four species.

Next the authors classified the 94 identified gibbon-human synteny breakpoints in two classes, depending on whether the breakpoint could be defined at base-pair level or at interval level (exemplified in Fig.2C). In the latter case, authors observed that repetitive sequences tend to accumulate at the synteny intervals.

In order to investigate the possible mechanism underlying the increased rate of chromosomal reshuffling, Carbone et al. searched for LAVA insertions in the gibbon genome. LAVA elements are retrotransposons unique to gibbons, with a structure that combines parts of other repeats (Fig. 3A). More than 1200 functional LAVA insertions were found in the assembled genome, of which a significant proportion overlap with genes related to chromosome segregation. Moreover LAVA elements were found to lie within introns and mostly in the antisense orientation.

In a series of reporter assays using a luciferase construct in which the transcription termination site has been replaced with the 3’ end of LAVA elements (LAVA_E or LAVA_F, Fig.3B) in antisense orientation, the authors showed that the termination site provided by the LAVA element can cause the premature termination of the transcript (Fig.3B). However this was the case for only one of the two constructs (LAVA_F but not LAVA_E). Given the presence of several subfamilies of LAVA elements (as illustrated in Fig.3C) it would then be interesting to see if their hypothesis of intronic antisense LAVA insertions causing early transcription termination in genes related to chromosome segregation holds for more of these elements or whether LAVA_F and not LAVA_E elements are specifically enriched in the genes of interest.

 

Evolutionary history of gibbons

In order to study gibbon phylogeny and demography, Carbone et al. sequenced the genomes of two individuals from each genus (Nomascus, Hylobates, Hoolock, Symphalangus, see figure 1 for geographic distribution) to a medium coverage and constructed phylogenetic trees by UPGMA and ABC analyses. At least three UPGMA trees are observed with similar frequency (Fig.4A), therefore leaving still open the debate of the splitting order of the genera.

On the other hand, the short length of the internal branches in the best phylogenetic tree is suggestive of a fast speciation process, or even a nearly instantaneous appearance of all four genera around 5 millions years ago (Fig.4B) that would explain the difficulty to discern the order and timing of speciation.

 

Functional genome evolution

Carbone et al. found 240 short regions with and increased substitution rate, a proxy of adaptive and functional evolution. Moreover these regions co-localized in genes containing LAVA elements and therefore enriched for chromosome segregation-related pathways. The authors hypothesized that, similar to humans, these hotspots of accelerated substitutions can have a functional role by modulating the transcriptional termination of LAVA insertions. However, it is important to notice that the functional relevance of such accelerated regions in human is still a matter of debate.

Finally the study revealed the positive selection, solely in gibbons, of a series of genes responsible for the specific features of these animals, such as the longer and powerful arm muscles.

In summary, the fundamental finding of this study is the presence of gibbon-specific LAVA insertions in genes responsible for chromosome organization which, although it does not prove causality, provides an interesting and plausible molecular mechanisms that would explain the strikingly high rate of large chromosomal rearrangements observed in these species.

Carbone, L., Alan Harris, R., Gnerre, S., Veeramah, K., Lorente-Galdos, B., Huddleston, J., Meyer, T., Herrero, J., Roos, C., Aken, B., Anaclerio, F., Archidiacono, N., Baker, C., Barrell, D., Batzer, M., Beal, K., Blancher, A., Bohrson, C., Brameier, M., Campbell, M., Capozzi, O., Casola, C., Chiatante, G., Cree, A., Damert, A., de Jong, P., Dumas, L., Fernandez-Callejo, M., Flicek, P., Fuchs, N., Gut, I., Gut, M., Hahn, M., Hernandez-Rodriguez, J., Hillier, L., Hubley, R., Ianc, B., Izsvák, Z., Jablonski, N., Johnstone, L., Karimpour-Fard, A., Konkel, M., Kostka, D., Lazar, N., Lee, S., Lewis, L., Liu, Y., Locke, D., Mallick, S., Mendez, F., Muffato, M., Nazareth, L., Nevonen, K., O’Bleness, M., Ochis, C., Odom, D., Pollard, K., Quilez, J., Reich, D., Rocchi, M., Schumann, G., Searle, S., Sikela, J., Skollar, G., Smit, A., Sonmez, K., Hallers, B., Terhune, E., Thomas, G., Ullmer, B., Ventura, M., Walker, J., Wall, J., Walter, L., Ward, M., Wheelan, S., Whelan, C., White, S., Wilhelm, L., Woerner, A., Yandell, M., Zhu, B., Hammer, M., Marques-Bonet, T., Eichler, E., Fulton, L., Fronick, C., Muzny, D., Warren, W., Worley, K., Rogers, J., Wilson, R., & Gibbs, R. (2014). Gibbon genome and the fast karyotype evolution of small apes Nature, 513 (7517), 195-201 DOI: 10.1038/nature13679

]]>