Introduction
In the past two decades, considerable research effort has been made to sequence the human genome and subsequently trying to unveil the demographic history underlying the genetic patterns of diversity we observe today across the globe. Here we discuss a recent research article by Pagani et al. 1 that addresses genomic diversity and historic migration patterns of human populations in Eurasia. The first human genome was sequenced in 2003 by the Human Genome Project2 and larger projects rapidly followed, such as HAPMAP3 and the 1000 Genomes Project4, largely due to the considerable technological improvement of sequencing technologies. Despite being extremely useful tools for a number of studies, these genome databases have some important sampling caveats that limit their use to address some particular topics. Indeed, HAPMAP sampled a reduced number of populations whereas the 1000 Genomes sampled a large number of populations but did not attempt to sample individuals of “pure” ancestry. For instance, the sampling in North America focused considerably on city-based individuals that were found to have a very diverse recent ancestry thus blurring the signal of ancient colonisation history. Importantly, in the studied paper, a considerable effort was made on sampling a broad panel of 447 unrelated individuals of pure ancestry from 148 distinct populations, particularly including previously unstudied regions like Siberia and western Asia.
One of the main topics of the demographic history of humans that has long been of interest to researchers is the Out of Africa (OoA) of Anatomically Modern Humans (AMH) – a turning point in which humans dispersed from Africa and colonised Eurasia and ultimately Oceania and the Americas. Among other aspects, the number of OoA events has been the focus of discussion from which two major hypotheses emerged. The first, arguably the most wide-accepted, advocates for a single OoA event estimated at around 40 to 80 kya which gave origin to all extant non-african populations. The second hypothesis, dubbed the multiple-dispersal model5, considers multiple migration waves, more or less successful in settling in new continents, and possible admixture events between them at various points in time, which appears to be supported by previously described fossil evidence6,7. Interestingly, Tucci & Akey8 argue that these theories are not necessarily mutually exclusive but rather complementary as there could have been several failed or low-success OoA events followed by a major one that effectively colonised and subsisted in most continents.
In this study, Pagani et al. argue in favour of a multiple-dispersal scenario based on small remaining genetic contributions in the genomes of extant Papuans from an extinct lineage of AMH OoA earlier than the main OoA 75 kya.
Genetic structure and barriers across space
To obtain the first insight on the genetic structure among the sampled genomes, Pagani et al. employed two different approaches: first, treating SNP as independent markers (with ADMIXTURE9) and second taking into account linkage blocks (with fineSTRUCTURE10). Both strategies identified the major biogeographic groups of populations despite differences in resolution, defining 14 main genetic clusters across the globe (Extended Data Figure 1C). The detailed output from fineSTRUCTURE was interestingly used for a range of analyses from spatial patterns of genetic differentiation (Figure 1), co-ancestry (Extended Data Figure 3) and demographic history reconstruction (Extended Data Figure 7).
Taking advantage of their detailed sampling from Eurasia to Sahul, the authors employed a spatially explicit framework to study genetic differences and gene flow between populations as well as their association with environmental/geographic features at a large scale. Figure 1 illustrates this by representing the magnitude of the gradient of allele frequencies from SNPs across space, allowing to pinpoint the regions of major genetic gradients, i.e. potential barriers to gene flow, specifically mountain ranges, deserts and large water masses. These were consistent in broad strokes among the different analyses with the fineSTRUCTURE output (Figure S2.2.2-I) as well as the complementary migration-based EEMS (Estimating Effective Migration Surfaces; Extended Data Figure 5H). Importantly, the authors tested whether the geographic gaps in their sampling could bias the interpolation of barriers and showed their model remained robust in the face of new gaps (Extended Data Figure 5E-G).
On a second stage, Pagani et al. measured the association between the gradients of allele frequencies (termed as SNPs in Figure 1) and fineSTRUCTURE with three environmental barriers – elevation, temperature and precipitation – to determine the relative importance of the role each played in shaping the genetic patterns observed today. As one can see in the inset of Figure 1, SNPs indicated that elevation and precipitation had a strong spatial correlation with genetic differences whereas fineSTRUCTURE gave higher support to precipitation and temperature. This dissimilarity is likely due to the fact that the latter, as explained above, is dependent on linkage patterns. Linkage blocks are physical associations of loci that recombination renders temporary, unless they are specifically maintained by selection. Thus, current neutral linkage patterns reflect relatively recent demographic history, whereas the bulk of raw allelic frequencies reveal older patterns that influenced the majority of the genome. In the same sense, when taking into account only the rare variants (i.e. more recent), the association of SNPs with elevation was reduced (Figure S2.2.2-II).
The authors conclude these observations by suggesting that elevation contributed to shaping old migration routes (as confirmed by patterns of isolation by distance; Extended Data Figure 5A-C) but has not recently impeded the persistence of human populations. On the other hand, precipitation seems to be of paramount importance as populations continue to this day to avoid inhabiting low-precipitation regions such as deserts.
Despite the credibility of the conclusions, we raised some important questions on the analysis that could bias the interpretation. First, the authors did not address the innate correlation between the environmental variables (ex.: elevation and temperature) nor how or whether it was taken into account. Additionally, it is unclear which time period was used for temperature and precipitation as the study spans 120 thousand years of demographic history. Both these points could change the relative importance of a given variable, and should therefore have been specified clearly in the main text.
Selection screening
The authors scanned the genomes for evidence of purifying and positive selection through a series of different approaches and identified multiple candidate loci, some of which had been identified as targets of positive selection in previous studies. Additionally, the authors highlighted different levels of inter-population purifying selection, such as on olfactory receptor genes in Asians. Interestingly, they identified significantly stronger purifying selection in pigmentation and immune response genes in Africans than in the remaining populations, with the single exception of Papuans for the pigmentation genes (Extended Data Figure 6B). However, the authors did not discuss the possible factors behind such selective forces nor how this section on selection contributed to the main storyline and conclusions of the study.
Demographic history of Papuans
The results of fineSTRUCTURE were summarised with ChromoPainter and revealed very interesting patterns of haplotype co-ancestry and length as well as proportion of shared genome between populations. Leading is the observation that African populations display the highest co-ancestry (Extended Data Figure 3) and the shortest haplotypes (Figure S2.2.1-III), confirming their status as the oldest and most diverse populations. Short haplotypes reflect multiple recombination events through time indicating older ancestry. Thus, the most surprising observation was that Papuans have the shortest average haplotype length of all non-African populations (Figure S2.2.1-III), as well as the shortest African-inherited haplotypes (Extended Data Figure 7), which suggests an older ancestry with Africans than that of the remaining populations.
To investigate this further, the authors used multiple sequential Markovian coalescent (MSMC) to determine mean split times between genomes of Papuans and other populations, and it is represented in Figure 2A. This figure depicts the proportion of genome coalescing between populations over time (in logarithmic axis). However, it is important to take into account that for these calculations they used a generation time of 30 years, whereas the selection scans were done with a 25 years’ generation time. The latter is the most commonly used in the literature and no justification is given for this change. This analysis revealed an old split between the Papuan and African at about 90 kya (represented as Koinanbe in Figure 2A, red line), predating the split between Eurasian and African estimated at 75 kya (black line) and between Papuan and Eurasian at 40 kya (blue line). Despite the possible fluctuation in the absolute split times due to the chosen generation time, the relative differences between them is in line with Papuans harboring high amounts of short haplotypes, all suggesting an older population split than previously thought.
To explain the demographic history behind the observed patterns, the authors propose that a previously unknown admixture event took place in Sahul with either an archaic non-AMH (different from Denisovan and Neanderthal) or with a AMH resulted from an extinct OoA (xOoA). The latter hypothesis, which fits into the multiple-dispersal model explained earlier, would have taken place after the split of AMH with Neanderthal but before the main OoA.
Using coalescent simulations, the authors tried to replicate the split times by adding varying amounts of admixture with a non-AMH or with an AMH from a xOoA. There was no plausible scenario simulated of archaic admixture with non-AMH that could mirror the observed data. On the other hand, including in Papuans a genomic component that diverged from the main human lineage prior to the main OoA, replicated somewhat similar population split times. It is noteworthy that the main text indicates the “observed shift in the African-Papuan MSMC split curve can be qualitatively reproduced” under these conditions. In detail, it obtained a 3ky difference between the Papuan-African and Papuan-Eurasian splits (Figure S2.2.8-III) whereas the observed time-gap between the two is actually 15 kya (Figure 2A). The authors suggest that they may not be able to reach a comparable gap due to higher complexities of the demographic model that were not simulated within this study, such as population expansion and bottlenecks. Although this explanation appears reasonable, we believe it ought to have been made clear in the main text of the article.
To discern the weight of admixture with non-AMH, the authors masked putatively introgressed Denisovan haplotypes in Papuan genomes, which did not change the split times estimated between Papuans and the other populations (dashed lines in Figure 2A). Furthermore, the authors confirmed that MSMC behaved linearly through multiple events of admixture by studying populations with known admixture proportions in time (African Americans and Central and East Asians; Extended Data Figure 8), which allowed the calculation that the hypothesized xOoA would have split from most Africans around 120 kya (Supplementary Information 2.2.4).
On a supplementary line of examination, Pagani et al. looked at the age of African haplotypes in Papuans not present in other Eurasian populations by accessing the density of non-African alleles (nAAs) within them. The rationale behind this lies on the assumption that the rate of accumulation of nAAs, i.e. alleles not found among African genomes, within a haplotype of determined African origin in a non-African genome is proportional to the split date of that given population with Africans. First, this analysis revealed that Papuans had an overall higher amount of nAAs within African haplotypes along the genome than Eurasians (Figure 2B), indicating an older coalescent time with the Africans. Further, the proportions of nAAs within African haplotypes in Papuans were modeled under demographic scenarios of single and multiple-dispersal. The results showed that a xOoA of AMH that split around 120 kya from Africans was necessary to explain the constant elevated proportions of nAAs in Papuans (Figure 2D).
Combining results from the different approaches, the authors support an xOoA that split from Africans around 120 kya, and conclude by estimating it contributes to approximately 2% in contemporary Papuan genomes.
Conclusion
In this wide-ranging study, Pagani et al. discussed three main topics of human evolutionary biology in Eurasia using their extensive sampling: i) detect main geographic barriers to gene flow, ii) identify loci and ultimately pathways under selective pressure and iii) propose an extinct Out of Africa event earlier than 75 kya.
The latter was arguably the most important finding of this study with, as described above, the description of a 2% contribution in the genome of Papuans from an early xOoA. The authors provided multiple lines of compelling evidence pointing to an extinct Out of Africa expansion around 120 kya from Africans that admixed with the main OoA later in Sahul. The complete scenario is described in Extended Data Figure 10.
Nevertheless, the results presented in this paper and their associated methods are consistently poorly detailed and/or not self-explanatory. Such a paper covering a trendy topic in a high impact journal should be less indigestible for neophytes or even to fellow evolutionary biologists. Furthermore, the connection between the three main sets of analyses of the study (geographic barriers to gene flow, selection screening and the possibility of an xOoA) seems to be lacking as there is no global discussion bringing all points together.
Studied papers
Tucci & Akey 2016 Population genetics: A map of human wanderlust. Nature 538: 179–180
Pagani et al 2016 Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538: 238–242
Reference