Reconstructing prehistoric African population structure

INTRODUCTION The highest genetic diversity in humans is found in Africa, in line with Africa being the cradle of humanity. While the three articles we discussed previously during this tutorial (1,2,3) mainly focused on determining the most parsimonious “out-of-Africa” scenarios based on genetic diversity data, this article (Skoglund et al. 2017 4) investigates the population structure of Africa prior to the expansion of food producers (i.e. herders and farmers). In order to reconstruct the prehistoric population structure, the authors analyzed the genomes from 16 ancient African individuals who lived up to 8100 years ago (including 15 newly sequenced genomes), as well as SNP genotypes from 584 present-day Africans, and 300 high coverage genomes from 142 worldwide populations. This is the first study to gather and analyze such a high number of ancient genomes, thereby providing an unpreceded insight into the prehistoric human population structure. RESULTS An ancient cline of southern and eastern African hunter-gatherers The authors used principal component analysis (PCA) and automated clustering in order to relate the 16 ancient individuals to present-day sub-Saharan Africans. This reveals that while the two ancient South African individuals share ancestry with present-day South Africans (Khoe-San), 11 of the 12 ancient individuals living …

Read More

A journey through the The Simons Genome Diversity Project: more genomes sequenced, more diverse populations

Introduction Since the first genome of Bacteriophage MS21 was completely sequenced, in 1976, until 2001 when the first draft of human genome2 was released, a lot of work was done to improve and to make accessible different methods to get inside of the genetics of various organisms. For human genome, this step was a very important one and the Human Genome Project was declared complete in 20033. During the last years, more and more projects are involved in deciphering the human wanderlust. To all of previous studies, we can add The Simons Genome Diversity Project, that brought us more information by sequencing 300 new genomes from 142 diverse populations. One of the aim was to chose populations that differ in genetics, language and culture. The study shows that some of the populations separated 100000 years ago and reveals more information about the ancestors of Australian, New Guinean and Andamanese people. Results One of the most important thing in discovering the real human peopling of the Earth is to sequence as many as possible genomes, but from individuals coming from diverse populations, that could differ in many aspects. In this study, the 300 samples were prepared by using PCR-free library, through …

Read More

Genomic analyses inform on migration events during the peopling of Eurasia

Introduction In the past two decades, considerable research effort has been made to sequence the human genome and subsequently trying to unveil the demographic history underlying the genetic patterns of diversity we observe today across the globe. Here we discuss a recent research article by Pagani et al. 1 that addresses genomic diversity and historic migration patterns of human populations in Eurasia. The first human genome was sequenced in 2003 by the Human Genome Project2 and larger projects rapidly followed, such as HAPMAP3 and the 1000 Genomes Project4, largely due to the considerable technological improvement of sequencing technologies. Despite being extremely useful tools for a number of studies, these genome databases have some important sampling caveats that limit their use to address some particular topics. Indeed, HAPMAP sampled a reduced number of populations whereas the 1000 Genomes sampled a large number of populations but did not attempt to sample individuals of “pure” ancestry. For instance, the sampling in North America focused considerably on city-based individuals that were found to have a very diverse recent ancestry thus blurring the signal of ancient colonisation history. Importantly, in the studied paper, a considerable effort was made on sampling a broad panel of 447 …

Read More

ExAC presents a catalogue of human protein-coding genetic variation

Exploration of variability of human genomes represents a key step in the holy grail of human genetics – to link genotypes with phenotypes, it also provides insights to human evolution and history. For this purpose Exome Aggregation Consortium (ExAC) have been founded; to capture variability of human exomes using next-generation sequencing. The first ExAC dataset of 63,358 individuals was released 20th of October 2014. Recently, a paper describing updated version of the dataset was published : Analysis of protein-coding genetic variation in 60,706 humans. Authors made a great work on the reproductibility of the downstream analyses they have performed and generally on the availability of data. All the code is well documented in blogpost and available in GitHub repository. All figures in this blogpost I plotted by my own! Dataset ExAC is composed of almost ten fold more individuals and previous dataset of the similar kind Fig 1a. 91,000 individuals were sequenced, of which 60,706 have been kept after quality filtering. Finnish population was excluded from European due to bottleneck they have gone though. ExAC was targeting individuals with various genetic background. Principal component analysis have shown very strong geographical pattern in the dataset (Fig 1b). I expected a continuum …

Read More

Identification of a large set of rare complete human knockouts

High throughput genotyping and sequencing has led to the discovery of numerous sequence variants associated to human traits and diseases. An important type of variants involved are Loss of Function (LoF) mutations (frameshift indels, stop-gain and essential sites variants), which are predicted to completely disrupt the function of protein-coding genes. In case of Mendelian recessive diseases, for the condition to occur, the LoF variants must be biallelic, i.e. affecting both copies of a gene. The affected gene is then defined as “knockout”. By studying the Icelandic population, authors aim to identify rare LoF mutations (Minor Allele Frequency, MAF < 2%) present in individuals participating in various disease projects. They then investigate at which frequency in the population these LoF mutations are homozygous (i.e. knockout) in the germline genome. The Icelandic population Iceland is well-suited for genetic studies for three main reasons. The island was colonized by human population around the 9th century by 8-20 thousand settlers. Since then the population grew to around 320’000 inhabitants today. The initial founder effect and rare genetic admixture make the Icelandic population a genetic isolate. In addition to an unusual genetic isolation, Iceland’s population benefits of a genealogical database containing family histories reaching centuries …

Read More

Reconstructing human population history : ancestry and admixture

Understanding the evolutionary history of our own species, how migration and mixture of ancestral populations have shaped modern human populations is a key question in evolutionary biology. Here we present three articles related to this topic, the first two dealing with India and the third one focusing on a single Ethiopian group : 1) Moorjani et al 2013 Genetic Evidence for Recent Population Mixture in India AJHG 93,: 422–438 2) Basu et al 2016 Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure PNAS online before print 3) Van Dorp et al 2016 Evidence for a Common Origin of Blacksmiths and Cultivators in the Ethiopian Ari within the Last 4500 Years: Lessons for Clustering-Based Inference PLOS Genetics 11(8): e1005397 All of them use genome wide data from micro array. After a brief abstract of  each paper, showing their similarities and differences, we discuss their methodological approaches. Ancestral populations of India The aim of the first two articles is to understand the history of the populations of the Indian subcontinent. The first one (Moorjani et al 2013) reports data from 73 groups living in India for more than 570 individuals sampled. …

Read More

The African Genome Variation Project shapes medical genetics in Africa.

  Despite being the world’s most genetically diverse continent, only a handful of studies attempted to understand the genetic risks for diseases of the African populations. This study shines light not only on the genetic diversity to help learn more about the variants that are associated with malaria and hypertension, but also on the population history across sub-Saharan African populations. Beside the comprehensive map of the African variants obtained from genotypes of 1,481 individuals and whole-genome sequences of 320 individuals, authors offered a design of the array suitable to capturing variants of African populations. Summary and comments of the paper Population structure in SSA. Comparing ~2.2 million variants of 18 ethno-linguistic groups from sub-Saharan Africa (SSA), authors found modest differentiation among SSA populations (mean pairwise Fst = 0.019) and among Niger-Congo language groups (mean pairwise Fst = 0.009). In the article, authors suggested that the modest differentiation among Niger-Congo language group showed evidence for ‘Bantu expansion’. However, the Fig1.a shows sample distribution mostly next to the Western, East and South African coasts, rather then inside of continent where the Bantu expansion occurred, therefore indicating the sampling bias. Furthermore, the authors found a high proportion of unshared and novel variants in …

Read More

Gibbon genome and the fast karyotype evolution of small apes

All contents refer to the original paper (Carbone et al. Nature. 2014 Sep 11;513(7517):195-201) Summary and personal comments This paper concerns a study of gibbon karyotype in the perspective of their divergent evolution from ancestral primates. Gibbons, small monkeys living in South-East Asia, differ from other primates, such as great apes and Old World monkeys, for a surprising number of chromosomal rearrangements. The authors aimed to study the mechanisms underlying such an important plasticity in gibbon genome gibbon. 1) The authors sequenced and assembled the genome of a white-cheeked gibbon female (Nomascus leucogenys), ordered in 26 chromosomes (against human reference), and analyzed gibbon-human synteny breakpoints (= rupture of synteny=physical co-localization of genetic loci on the same chromosome within gibbon and human). Fig 2a shows Oxford plots for human (axys y) versus other primates chromosomes (axys x), expressed in terms of collinear blocks of > 10 Mb. It is evident from the graphic that, when compared to other primates, gibbons present the highest rate of chromosome rearrangements, graphically visualized as a scattered instead of a linear plot (Fig2a), in particular large-scale reshuffling (as shown in Fig 2b, right part of the graphic). Examples of synteny breakpoints, such as chromosomal inversion, are shown in …

Read More

The genetics of Mexico recapitulates Native American substructure and affects biomedical traits

Mexico, hosted many cultures such as the Olmec, the Toltec, the Maya and the Aztec, conquered and colonized by the Spanish Empire in 1521. The country harbors a large source of pre-Columbian diversity and their genetic contributions to today’s population. In a recent paper, Moreno-Estrada et al. 2014 performed a detailed genetic study of Mexican genetic diversity. The results showed the genetic stratification among indigenous populations and an association between subcontinental ancestry and lung function. In the first part of the study, to estimate the genetic diversity, researchers examined autosomal single-nucleotide polymorphisms for more than 500 Native Mexican individuals from all around Mexico. Statistical analysis of genomic data showed that some populations within Mexico are more differentiated than European and East Asian populations. This extreme differentiation thought to be a result of isolation followed by a bottleneck and small effective population sizes. The data was analyzed in various ways (ROH and IBD analysis, PCA etc.) and revealed the population substructure of Mexico. In all of the analysis, the results confirmed that Seri (northernmost) and Lacandon (southernmost) have the highest level of differentiation. Also, the differentiation between Seri and Lacandon was greater than average differentiation between human populations. The relationships between …

Read More

Gibbon genome and the fast karyotype evolution of small apes

Gibbons are small apes living in southeast Asia that diverged between Old Monkeys and great apes and whose most distinctive feature is the high rate of evolutionary chromosomal rearrangement. The aim of this study was threefold: First, the authors looked into the mechanisms that could explain the extraordinary rate of chromosomal rearrangement of gibbons. Second, they explored their evolutionary history to shed light into the timing and order of splitting of the gibbon genera. Third, they looked into the functional evolution of genes that might be associated with gibbon-specific adaptations. To do so, they sequenced and assembled the genome of the white-cheeked gibbon (Nomascus leucogenys), showing that the quality and statistics of the assembled genome was comparable to that of other primates (Table 1 and Fig.S1).   Chromosomal rearrangement and LAVA insertions Chromosomal rearrangement was confirmed by comparing the karyotype of the assembled Gibbon genome (Nleu1.0) to that of human. Figure 2A shows the extraordinarily high number of rearrangements compared to other primates. Furthermore these reshuffling events affect long stretches of chromosomes (displayed in Fig.2A are collinear blocks larger than 10Mb), whereas short-scale rearrangement events occur at levels comparable to other primates (Fig.2B). Since the four Gibbon genera of this …

Read More