An Aboriginal Australian genome reveals separate human dispersals into Asia – Tutorial Genomics, Ecology, Evolution, etc

This blog section concerns a trendy debate in science, the human population history, which has extensions into daily life, as it can constitutes a topic of general public curiosity. Therefore, let’s see what is contribution described herein.

Background

Modern human populations seems to be derived from a single African ancestral population, under the well supported “out of Africa” hypothesis (1). Particularly, for eastern Asian colonization a “single-dispersal” model have been hypothesized (2), which suggest the aboriginal australians are a lineage diversified recently within the Asian cluster. This hypothesis could be summarized in a topological representation, as drawn in figure 1A of the article (Africans,(Europeans,(Asians,Australians))). Recent studies dated the split between Europeans and Asians around 17K-43K years before the present (ybp). In addition, archaeological evidence supports modern humans in Australia back to ~50K ybp. Those inferences are incompatible with the above mentioned hypothesis, at least in a time framework. A second scenario could be hypothesized, with an early branching process and occupation of Australia, and probable later genetic exchange between Asians and Australians, described as (Africans, (Australians,(Asians, Europeans)). This possibility has been non tested so far. Using an ancient, free of current admixtures, aboriginal australian genome, and SNPs data from different human populations, as well as, a background in molecular evolution and population genetic theories, this paper aims to distinguish between competing hypotheses to tackle the human population relatedness and migrations history of ancient australian populations.

The facts in brief

A 100-year-old lock of hair from an aboriginal Australian male (from Museum of Archaeology and Ethnology, UK)
31 Institutions implied in a worldwide scale
58 Authors, with same geographical extent
An ancient genome sequenced by Illumina technology and SNP-chip on other human populations
Computational analyses (PCA, clustering methods, ABBA/BABA expectations)
A Science podcast interview (http://www.sciencemag.org/content/334/6052/94/suppl/DC2)

Discussion

We found the paper quite convincing in testing the two possible scenarios for human colonization in the Australian area. Next paragraphs will describe and discuss the evidence and test they used.

1. Testing the genetic clustering of Aboriginal Australian genome.

The principal component analysis illustrated in figure 1B shows the clustering pattern from 1220 individuals SNP chip data (449k SNPs), covering 79 human populations. This figure revealed a close relationship between the Australian genome, Highland Papua New Guinea (PNG), Bougainville and Aeta samples, all of them from the australo-melanesian region. That pattern could exclude any European contamination of the sample, which is highly probable by his long handling by Europeans. We noted the geographical tendency of a “continuous” colonization for human populations outside of Africa. I quoted continuous to clarify we are not referring to a single wave of colonization, but to a geographical ordination of the populations. A confusing point was expressed for the PCA inset, which looks like a 3D-box, but it already corresponds just to a zoom-in on the same PCA graph. A further review of the next PCA axes on supplementary material evidenced a very clear differentiation of the australo-melanesian sequences in the axis4.

We speculated about the amount of data explained in the first two PCA axes, which is not described. Contrary to our expectations, from experiences in other types of characters (as morphology and climatic variables), the proportion of variance explained on this plot seems to be very low, as usual for genomic studies. Then, we discussed a bit the idea of a checklist of requirements when a publication is being prepared: if you are planning to present an analysis, take at hand i, ii, iii and please do not forget to include them.

2. Testing admixture between Aboriginal Australian genome and other populations

The figure 1C describes the ancestry proportions of all individuals SNPs set, obtained by a maximum likelihood estimation in Admixture software. This clustering analysis resembles the Structure k-categories approach, in which each line in the plot correspond to an individual and the colors represent the ancestral populations identities. The number of k-categories is assigned a-priori, and can modify the ancestry proportions of certain individuals revealing admixture processes between populations. At first, using a k=5, the aboriginal australian sample appears belonging to the same ancestral population than PNG and a higher proportion of the Bougainville individuals. Interestingly, south Asian population seems to share a small proportion of the SNPs with the ancestral aboriginal australian category. Once we moved in deep k-values, as far as k=20, the aboriginal australian genome appears more mixed with PNG, Bougainville, Aetas and South Asian populations.

We debated the accuracy of use an individual genome to represent the admixture in the ancestral aboriginal australian population, and the unknown variability of the population at the ancient time, which is not being considered here. We formulated how could be affected the admixture patterns if this aboriginal Australian genome represents the most or the least mixed individual in the ancestral population? We wondered why there are not other recent Australian samples? Even if current aborigines inhabit in Australia. At this point in the discussion, we moved into more socio-political issues about the use of samples and information, as I stated at the beginning, this topic could be of general concern and discussion for several reasons.

The evidence presented so far and an additional test below can help to distinguish between single vs. multiple dispersals “out of Africa” and likely the proportion of admixture between the first established populations and the second wave of migration. Furthermore, questions about how or why the second migration replaced almost in a complete way the first one, from my point of view, constitute statements largely “historical” and therefore difficult to draw and test from the evidence available. I consider is very difficult to go beyond of the patterns and processes we are able to model and test.

3. D-test and ABBA/BABA hypothesis

We tried to identify the goal and configuration of this test to discriminate between the competing hypotheses. Complete information of the test could be found in references 3 and 4. I will try to summarize it in a nutshell. The D-test is a four-taxon configuration (see figure) in which only biallelic sites are considered (A and B variants), two out of four taxa have fixed states, commonly on the outgroup sequence (here the Africans, but also the Europeans), and the other two sites differ between groups (here Aboriginals and Asians). This configuration produces either BABA or ABBA patterns. The next step is to count the number of sites supporting one or other patterns. The D test = ? (sites ABBA – sites BABA) / ? total sites. Usually, the test was defined to identify admixture between populations (with AB/BA sites), with the expectation of an equal number of the two types of sites. D test can be considered more robust to sequencing errors because it compares nucleotides in more than one sequence, which is less probable that have been taken place twice by error. The authors explicitly said the test do not allow to distinguish neither between the two models of origin, nor gene flow between Asians and Australian populations, however I consider the D-test performed here can support the multiple dispersal model, due to a statistically significant excess of sites grouping Africans and Australian Aboriginal genomes (sites with pattern 2 in figure).

Expected vs. observed values of the D-test can facilitate the hypotheses discrimination (as they tried on the Table 2), however the expected values reported here for single and multiple dispersal models are so closer each other (~50%), with no credible intervals, that does difficult to support one or other hypothesis with the observed patterns. Finally, it is worthy of attention in the implementation of the D-test, consider that the patterns on current populations given the hypothetical past events, may have been altered by many other evolutionary processes as secondary gene flow, structure in the ancient population, incomplete lineage sorting, among others.

Figure 1. Grouping site patterns 1 and 2 used in D-test. Note that African and European populations have fixed states, whereas that Aboriginal Australian and Asian populations vary. This figure is a modification of the figure 3 in reference 5. Even though it is not clear the ABBA/BABA patters, the different grouping patterns are based on the article text describing the two models of early dispersal hypotheses used to perform the test.

Rasmussen, M., Guo, X., Wang, Y., Lohmueller, K., Rasmussen, S., Albrechtsen, A., Skotte, L., Lindgreen, S., Metspalu, M., Jombart, T., Kivisild, T., Zhai, W., Eriksson, A., Manica, A., Orlando, L., De La Vega, F., Tridico, S., Metspalu, E., Nielsen, K., Avila-Arcos, M., Moreno-Mayar, J., Muller, C., Dortch, J., Gilbert, M., Lund, O., Wesolowska, A., Karmin, M., Weinert, L., Wang, B., Li, J., Tai, S., Xiao, F., Hanihara, T., van Driem, G., Jha, A., Ricaut, F., de Knijff, P., Migliano, A., Gallego Romero, I., Kristiansen, K., Lambert, D., Brunak, S., Forster, P., Brinkmann, B., Nehlich, O., Bunce, M., Richards, M., Gupta, R., Bustamante, C., Krogh, A., Foley, R., Lahr, M., Balloux, F., Sicheritz-Ponten, T., Villems, R., Nielsen, R., Wang, J., & Willerslev, E. (2011). An Aboriginal Australian Genome Reveals Separate Human Dispersals into Asia Science, 334 (6052), 94-98 DOI: 10.1126/science.1211177

Additional references

1. H. Liu, F. Prugnolle, A. Manica, F. Balloux, A geographically explicit genetic model of worldwide human-settlement history. Am. J. Hum. Genet. 79, 230 (2006)
2. HUGO Pan-Asian SNP Consortium, Mapping human genetic diversity in Asia. Science 326, 1541 (2009).
3. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, Hansen NF, Durand EY, Malaspinas AS, Jensen JD, Marques-Bonet T, Alkan C, Prüfer K, Meyer M, Burbano HA, Good JM, Schultz R, Aximu-Petri A, Butthof A, Höber B, Höffner B, Siegemund M, Weihmann A, Nusbaum C, Lander ES, Russ C, Novod N, Affourtit J, Egholm M, Verna C, Rudan P, Brajkovic D, Kucan Z, Gusic I, Doronichev VB, Golovanova LV, Lalueza-Fox C, de la Rasilla M, Fortea J, Rosas A, Schmitz RW, Johnson PL, Eichler EE, Falush D, Birney E, Mullikin JC, Slatkin M, Nielsen R, Kelso J, Lachmann M, Reich D, Pääbo S. A draft sequence of the Neandertal genome. Science 328, 5979, 2010.
4. Durand, E., Patterson, N., Reich, D., Slatkin, M. Testing for ancient admixture between closely related populations. Mol Biol Evol, 2011.
5. The Heliconius Genome Consortium. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature. 2012

posted by MRR for Martha Serrano

Related Posts

The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish

How the Galapagos cormorant lost its ability to fly

Convergent evolution of caffeine in plants by co-option of exapted ancestral enzymes