Reconstructing prehistoric African population structure


The highest genetic diversity in humans is found in Africa, in line with Africa being the cradle of humanity. While the three articles we discussed previously during this tutorial (1,2,3) mainly focused on determining the most parsimonious “out-of-Africa” scenarios based on genetic diversity data, this article (Skoglund et al. 2017 4) investigates the population structure of Africa prior to the expansion of food producers (i.e. herders and farmers). In order to reconstruct the prehistoric population structure, the authors analyzed the genomes from 16 ancient African individuals who lived up to 8100 years ago (including 15 newly sequenced genomes), as well as SNP genotypes from 584 present-day Africans, and 300 high coverage genomes from 142 worldwide populations. This is the first study to gather and analyze such a high number of ancient genomes, thereby providing an unpreceded insight into the prehistoric human population structure.


An ancient cline of southern and eastern African hunter-gatherers

The authors used principal component analysis (PCA) and automated clustering in order to relate the 16 ancient individuals to present-day sub-Saharan Africans. This reveals that while the two ancient South African individuals share ancestry with present-day South Africans (Khoe-San), 11 of the 12 ancient individuals living in eastern and south-central Africa between ?8100 and ?400 years BP form a gradient of relatedness to the eastern African Hadza on one extremity and to Khoe-San on the other. This genetic cline is also correlated with geography along a North-South axis. Another pattern which emerged from this analysis is the lack of heterogeneity between the seven ancient individuals from Malawi, indicating a long-standing and distinctive population in ancient Malawi which persisted for at least 5000 years but which is extinct today.

Subsequently, the authors built a model where ancient and present-day African population trace their ancestry to a putative set of nine ancestral populations. They then used data from both ancient and present-day populations showing substantial ancestry to major lineages present in Africa today as proxies for these ancestral populations. These proxy populations consisted of three ancient Near Eastern populations representative of Anatolia, the Levant and Irak, respectively, and six African populations representative of different components of ancestry (western African, southern African before agriculture, northeastern African before agriculture, central African rainforest  hunter-gatherer, eastern African early pastoralist context and distinctive ancestry found in Nilotic speakers today). By using qpAdm (a generalization of f4 symmetry statistics), they tested for 1-, 2- or 3-source models and admixture proportions for all other ancient and present-day African populations, with a set of 10 non-African populations as outgroups. We note that the f4 statistics are poorly explained in this article, making it hard for a non-initiated reader to grasp its meaning and the relevance of the results. The main finding from this analysis is that ancestry closely related to the ancient southern Africans was present much farther north and east in the past than is apparent today.

Displacement of forager populations in eastern Africa

Unsupervised clustering and formal ancestry estimation both indicate that present-day Hadza in Tanzania can be modeled as deriving all their ancestry from a lineage related to ancient eastern Africans such as Ethiopia_4500BP. However the contribution of this lineage to present-day Bantu speakers in eastern Africans is small, who instead trace their ancestry to a lineage related to present-day western Africans and additional ancestry components. In present-day Malawians, population replacement by incoming food producers seems to have been almost complete as witnessed by a near absence of ancestry from the ancient individuals sampled, and by most of their ancestry coming from the Bantu expansion of western African origin.

Importantly, of all ancient individuals analyzed, only a 600 BP individual from Zanzibar has a genetic profile similar to present-day Bantu speakers, with even more western African ancestry. Using linkage disequilibrium, the authors estimate that the admixture between western- and eastern-African-related lineages occurred 800-400 years ago. This indicates that there was genetic isolation between early farmers and previously established foragers during the Bantu expansion into eastern Africa, and that this barrier disappeared over time as mixture occurred. However this delayed admixture did not occur in all African populations, as shown in present-day Malawians who display no signs of admixture from previously established hunter-gatherers.

Early Levantine farmer-related admixture in a ?3100-year-old pastoralist from Tanzania

The authors compared estimated the ancestry component from a 3100 BP individual from Tanzania and found that 38% of her ancestry was related to the pre-pottery farmers of the Levant (10000 BP), indicating a critical contribution of Levant-Neolithic-related populations to present-day eastern Africans. The best fitting ancestry component model in Somali indicates that they have ancestry from the 3100 BP Tanzanian individual but also Dinka-related ancestry as well as 16% ancestry related to Iranian-Neolithic-related ancestry. This suggests that ancestry related to the Iranian Neolithic appeared in eastern Africa after an earlier gene flow related to Levant Neolithic populations.

Direct evidence of migration bringing pastoralism to eastern and southern Africa

All three ancient southern Africans show affinities to the ancestry predominant in present-day Tuu speakers in the southern Kalahari. Among them, the 1200 BP sample from western Cape found in a pastoralist context has a similar ancestry composition as present-day pastoralists like the Nama, with affinity to three groups: Khoe-San, western Eurasians and eastern Africans. This is in line with the hypothesis of a non-Bantu-related population transporting eastern African and Levantine ancestry to southern Africa by at least 1200 BP. Using their model to determine the proportions of different ancestries present in western cape 1200 BP, they find mainly a mixture of non-southern African population. This is consistent with the hypothesis that the Savanna Pastoral Neolithic archaeological tradition in eastern Africa is a possible source for the spread of herding to southern Africa.

The earliest divergences among modern human populations

Previous studies indicate that the primary ancestry in the San population (southern Africa) comes from a lineage that separated from all other lineages present in modern humans, before separation of the different modern human lineages. While Skoglund et al. obtain a similar model in absence of admixture, the tree-like representation is a poor fit since ancient southern Africans (2000 BP) were not strictly an outgroup of all other African populations and several examples also show inconsistencies with this model. In order to find models that fit the data, the authors performed admixture graph modeling of the allele frequency correlations and found two parsimonious models. In the first one, present-day western Africans have ancestry from a basal African lineage that contributed more to the Mende than in did to the Yoruba, with the other source of western African ancestry being related to eastern Africans and non-Africans. In the second model, gene flow over long periods of time and over long distances has connected southern and eastern Africa to other groups in western Africa.

A selective sweep targeting a taste receptor locus in southern Africa

The authors then searched for the genomic signature of natural selection in ancient genomes, by searching for regions of greater allele frequency differentiation between ancient and present-day populations than predicted by the genome-wide background. To do this, the researchers compared the two ancient southern African genomes (2000 BP) to six present-day San genomes with minimal recent mixture. Since the small number of ancient genomes does not allow to infer changing allele frequencies at single loci, a scan for high allele frequency differentiation was conducted in 500 kb windows using 10kb steps. This led to the identification of the most differentiated locus which overlapped a cluster of eight taste-receptor genes. Although it is reported that taste receptors have already been identified as targets of natural selection as they affect the ability to detect poisonous compounds in plants, we must be wary that any analysis is bound to find something with such huge datasets, and that the biological interpretation of such finding may not be as straight-forward.

Polygenic adaptation

Skoglund et al. tested for evidence of selection on specific functional gene categories between present-day San and the two ancient genomes from southern Africa using allele frequency differentiation estimation. The functional category with the most extreme allele frequency differentiation between present-day San and the ancient southern Africans corresponded to “response to radiation”. In order to control that this was not a general inflated allele frequency differentiation, the same statistics were used using the Mbuti central African rainforest hunter-gatherer for which no enrichment for “response to radiation” was found. Instead, the top category for Mbutis was “response to growth”. Based on this, the authors speculate that the small stature of hunter-gatherer populations may be an acquired adaptation.



This study brings a first and unique view on the genetic makeup of prehistoric Africans. It is indeed a feat realized by 44 authors from institutions in 11 countries, which take advantage of 15 newly sequenced ancient genomes in addition to the only one that was previously available. The results indicate that an ancient lineage related to the San had a wider distribution in the past, depict two plausible scenarios of gene flow that led to the earliest divergences among modern populations and give new insights into the spread of herding and farming within Africa . As a side note, we noticed that all ancient individuals come from eastern or southern Africa, probably because this is where conditions were most favorable for the conservation of these ancient remains, although this could also introduce some biases, it seems to be the only possible way to go.


  1. Pagani et al 2016 Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538: 238–242 (corresponding blog post)
  2. Mallick et al. 2016 The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538: 201–206 (corresponding blog post)
  3. Malaspinas et al 2016 A genomic history of Aboriginal Australia. Nature 538: 207–214 (corresponding blog post)
  4. Skoglund et al 2017 (and references therein) Reconstructing Prehistoric African Population Structure. Cell 171: 59–71.e21