Andra-Octavia Roman – Tutorial Genomics, Ecology, Evolution, etc

The parallel evolution in amniotes seen through the eye of functional nodal mutations

Fri, 01 Dec 2017 17:25:53 +0000

Introduction

In this article the authors describe an evolutionary convergence in mammals, birds, and reptiles, based on genomic data from NCBI. The evolution of different species and lineages is due to mutations that can appear and accumulate in organisms over time. Those mutations need a high functional potential and have to be conserved in time in order to form new species. The conservation of mutations can occur via selection pressure, mutational compensation, and/or by the separation of members from the same species by geological and environmental events.

In this comprehensive study, the authors describe, a genomic landscape of the parallel evolution by analysing functional nodal mutations (fNMs) by using different types of DNA (mitochondrial and nucleic), the thermostability of mtDNA encoding RNA genes, and the structural proximity of proteins, using the available 3D structures from PDB database. Functional nodal mutations (fNMs) can be separated in single nodal (fSNMs), recurrent nodal mutations (fRNMs), occured independently in unrelated lineages and recurrent combinations of nodal mutations (fRCNMs) recurred independently along with other nodal mutations in combinations in more than a single lineage. The recurrent ones can be taken in consideration the most when we are talking about the convergent adaptive responses, that means the parallel evolution of different species. In this study, one of the aim is to find the best candidate for this adaptive mutations that was present in the evolution of the amniotes. The compensated ones are used to identify the adaptive mutations. The main explanation for the convergent evolution is the presence of the recurrent nodal mutations. Many fNMs are in combination with potential compensatory mutations in RNA and protein-coding genes. The compensation of a functional mutation is the co-occurrence with additional mutations that are “affecting” the original function.

Results

In the article it is claimed that the evidence for parallel evolution is mainly due to the presence of a high number of uncompensated reccurent fNMs. The best candidate to show the parallel evolution is the emergence of body thermoregulation in mammals and birds, that seems to be independent.

The mtDNA, the maternal genetic information was used to identify the fNMs in the amniotes. The study is based on mtDNA from 1003 species and nDNA from 91 species. The mtDNA was used for the structure-base alignment for 24 mtDNA-encoded RNA genes (tRNA and rRNa) and 13 protein-coding gene. To this, they added 4 more mtDNA proteins with the 3D structure: CO1-3 and Cytb, as the cytochromes are highly conserved proteins across various species. The mtDNA genes are usually the same, but what seems to be different it is the order of the genes, that are changed by evolutionary rearrangements. Because of this, they first aligned the genes individually and after this, they concatenated the 37 proteins to the human mtDNA gene order.

The sequence alignment revealed a number of 25234 nodal non-synonimous and RNA gene mutations. To see the potential of this mutations, there were calculating a score that include: evolutionary conservation, physical-properties (of non-synonymous changes) and the molecular thermostability (the free estimated energy (?G) for the two RNA sequences was calculated before and after the mutational event). The score, from 1 to 9 is depending to the level of conservation and physico-chemical properties of the tested amino acid.After calculating the potential function score of all the nodal mutations, there were 3262 non-synonimous fNMs, mainly in RNA genes with mutations related to disease-causing.

The next step was to identify the best candidate for adaptive fNMs by studying the compensated and non-compensated mutations, but the approach chosen by the authors cannot reveal the exact order of compensation process. Meanwhile, there are some compensatory mutations that could gain lower functionality scores than the co-occurring fNMs. In the Figure 1, we can see a demonstration of the potential compensation and a possible adaptation in a protein-coding gene (COX2) through different species. The panel b shows the locations of the fNMs (S155T) and different other co-occurring compensatory mutations. The S155T mutation appears as independently re-occurrent as well as compensatory co-occurring mutations. As we can see, this approach is pure theoretical, because cannot show all the compensations, only the best ones, that got fixed in evolution. The Figure 2 shows the prevalence of different types of mutations that could be compensated or not. The predictive results reveal a high probability of fRCNMs to be compensated for RNA and protein-coding genes. Here are introduced also the information from the nDNA, that is compared with mtDNA in term of prevalence of the compensatory and non-compensatory mutations. Because there was a big difference of the number of species involved in this approach, the evolutionary resolution was reduced. So, the authors decided to analyze the same 91 species for mtDNA and nDNA and reducing the bias. Because of the reduction in the resolution, they redid the analysis by using the most ancient mutations, that occurs in deeper nodes in the case of mtDNA, but this revealed almost the same proccent as they were working with the 91 species (37% for the ancient mutations and 34% by including the younger ones) (Figure 2e & Supplementary 5b,c). So, the older mutations appear to be less compensated and this give more uncompensated mutations that are best candidates in the ancient adaptative mutations. In the supplementary Figures, the authors are using the OXPHOS complexes to compare the fNMs in mtDNA and nDNA by using 91 species. For the intra-mtDNA the albeit is less prominent (31%).

For the nDNA data is used the whole genome of the species. So, the information is much more comprehensive by the presence of a higher number of genes. In comparison with the mtDNA, the compensation prevalence is lower, having a difference of 10%, but in both case the proccent of possible compensation is higher than can be explained by the mutation rate or the chance.

In the end, to determine the best adaptive mutations over the evolution, they used the fRNMs from mtDNA, but maybe because of the low number of the samples, the result did not show any proof of the impact of non-compensated fRNMs in being the main reason for the convergent evolution. Instead, the nDNA revealed a significant pattern with highest number of potential non-compensated fRNMs shared between birds and mammals (N=51). The best candidates resulted by being the mutations in the genes related to the thermoregulation in the birds and mammals.

Conclusion

In this comprehensive study, the authors merged several information, including different types of DNA, from many species, with various physico-chemical parameters. The results of this work reveal, that the ancient functional mutation are the best for being studied, because of their possibility to overcome negative selective. The best candidates for the adaptive nodal mutations are in the end the non-compensated fNMs, that are in a higher presence in the case of old fNM. This seems to be the main helper for the evolution of the thermoregulation in birds and mammals. The protein analysis reinforces the main conclusion: for enriching the adaptative mutations, the non-compensated mutations are the best candidates.

Taken together this study provides new insights into how different lineages and species might have developed over time. It also shows a new way how to combine data from different sources. However, the authors fail in giving an adequate explanation for the fNMs, together with the fact that they lack references that describe this term makes the article difficult to understand, especially for people that are not from the field and this is in fact the contrary of how scientific writing should be done.

Levin & Mishmar, 2017, The genomic landscape of evolutionary convergence in mammals, birds and reptiles. Nature Ecology & Evolution 1: 0041

]]>

A journey through the The Simons Genome Diversity Project: more genomes sequenced, more diverse populations

Mon, 13 Nov 2017 11:46:14 +0000

Introduction

Since the first genome of Bacteriophage MS21 was completely sequenced, in 1976, until 2001 when the first draft of human genome2 was released, a lot of work was done to improve and to make accessible different methods to get inside of the genetics of various organisms. For human genome, this step was a very important one and the Human Genome Project was declared complete in 20033. During the last years, more and more projects are involved in deciphering the human wanderlust. To all of previous studies, we can add The Simons Genome Diversity Project, that brought us more information by sequencing 300 new genomes from 142 diverse populations. One of the aim was to chose populations that differ in genetics, language and culture. The study shows that some of the populations separated 100000 years ago and reveals more information about the ancestors of Australian, New Guinean and Andamanese people.

Results

One of the most important thing in discovering the real human peopling of the Earth is to sequence as many as possible genomes, but from individuals coming from diverse populations, that could differ in many aspects. In this study, the 300 samples were prepared by using PCR-free library, through Illumina Ltd. method and the median coverage it was 42-fold (Figure S1.1; Supplementary Data Table 1). The method is using an improved genome coverage to identify the greatest number of variants with some of them previously reported. The single-sample genotypes was made by using the reference-bias free modification of GATK, but the some preprocessing was conducted for eliminating some adapter sequences. For increasing the data accuracy, it was used a filtering system, highly specific to the SGDP dataset. The levels are from 0 to 9 for each sample as a single character and the first level is the best for having a good balance between sensitivity and low error rate, but level 9 is good to be used when there is needed to low the errors rates (Figure S2.1).

The first part of the study is offering us more information about the time needed for the worldwide populations subjected to the study to get separated. The pairwise sequential Markovian coalescent (PSMC) and multiple sequentially Markovian coalescent (MSMC) was used to interpret the changes in size of the populations and the split time, the phased haplotypes of split time estimation were made by using the SHAPEIT and IMPUTE2. The filter used was the level 1. From the Figure 2a we can see evidence about the ancestors of some present populations that were isolated by at least 100kya, that could be an obstacle of certain mutations across the ancestors of all populations. The gene flow continued until around 50kya among the great majority of ancestral populations. The graphs show the moments when the substructure of different populations starts: in the Figure 2a, we can see that the substructure between french and africans start around 200 kya. In the next ones there is a comparison between only africans (the Yoruba separated from KhoeSan 87kya, from Mbuti 56kya and from the Dinka 19kya) or only non-Africans (the oldest substructure is from 50kya, taking part during or shortly after the deepest part of the shared non-African bottleneck 40-60kya). For the Figure 2d-f, it was used the PSMC and PS1 that show the effective population sizes inferred and the cross-coalescence rates inferred.

By using the neighbours-joining tree (pairwise divergence per nucleotide) and FST, Mallick et. al could reconfirm the previous studies regarding the fact that the deepest splits happened among the Africans. Previous studies showed that all non-Africans today possess Neanderthal ancestry and Figure 1c shows that the higher proportion of Neanderthal ancestry we can find it in East Asians. If we compare the EuroAsians between them, the South Asians have highest Denisovan ancestry (heatmap from Figure 1d). Another result is that there are more Denisovan ancestry in eastern than in western EuroAsians. If we take Australia, New Guinea or Oceania we can see that the results from other studies are confirmed by having more ancestry than in mainland Eurasians. In the Figure 3 the deeper the split is, the more divergent is the early dispersal ancestry. By using the cross-population coalescence pattern and allele frequency correlations, the best model is that the Australian, New-Guinean and Andamanese history doesn’t involve ancestry from an early- diverging source. In this study there is no archeological data taken in consideration regarding southern Asia or Australia. So, by using only the data from this study, it is released that the Australians, New Guineans and Andamanese are lacking in an analogous deep ancestry component. All the data referring to Australians seems to be consistent with descending
from a common homogeneous population since separation from New Guineans. Also, New Guineans, Australians and Andamanese appear as part of an eastern clade together with mainland EastAsians.

The 3P-CLR was used to scan the genome for positive selection. In the end, 38 of the largest peaks emerged for selection in the common ancestors of all modern humans. These peaks are the sweeps at the time that the archeological data shows an accelerated evidence of behavioral modernity. This data does not search for the sweeps on chromosome X or in repetitive or difficult-to-analyze sections of the genome.

For the rate of mutation accumulation between the non-Africans (grouped in America, CentralAsiaSiberia, EastAsia, WestEurasia, Oceania) and sub-Saharan Africans (grouped in Pygmy, Khoesan and Africa) it was supposed to be quite equal, but this study revealed an significant average of 0,5% difference. For this part, they used a highly restriction to the samples, by choosing only the samples processed in the same way and the highest level of filtering, pooling the samples from the same regions together. The one strength of this experiment is the fact that they avoid the bias due to different heterozygosity level in different populations (the heterozygosity is higher in Africans), by using only the chromosome X for males. Although, they map everything to chimpanzee, which is equally distant to all present populations. There are differences in observations related to other studies, by having a different rate of CCT>CTT mutation, that is close to Africans in Europeans, but not in East Asians. This could be explained by the decrease in generation interval in non-Africans since separation. Previous studies5 showed a higher X-to-autosome heterozygosity ratio in sub-Saharan Africans than in non-Africans. Mallick et al. confirmed this results by adding more populations to be analyzed: Khoesan for sud-Saharan Africans and New Guineans, Australians, Native Americans, Near Easternes and indigenous Siberians for the non-Africans. The only one exception, that showed a lower X-to-autosome heterozygosity ratio in sud-Saharan African than in non-Africans is in Pygmies (eastern Mbuti and western Biaka). In the Figure 1b through a scatterplot we can observe the two primary clusters: sud-Saharan Africans and all other populations, but without a big difference among the groups, except of the Pygmies with a high autosomal heterozygosity. If we compare the two Pygmies populations with a lower X-to-autosome ratio, we can see that the Mbuti are closer to non-Africans than to Africans, even if in the Neighbour-joining tree based on pairwise divergence, they are integrated to the Africans. The reduction of the X-to-autosome ratio in the non-African compared to African populations could be explained by the repeated waves of male mixture in already mixed population, but in the Pygmy populations, the strongest argument is the sex-biased gene flow supported by the anthropological data.

In the last part, Mallick et al. shows that the non-Africans are presenting a higher accumulation of mutations. This can be explained in two ways: the rate of mutations in non-Africans is increasing by acceleration of it or by a deceleration within Africans. The Extended Data Table 1, shows that none of the populations with strong signals of non-Africans could be in fact a deceleration of Africans. The acceleration in non-Africans could be caused by many possibilities: the life history traits (eg. generation interval) could change after the dispersal of modern humans outside of Africa, increasing the latitudes conquered by the humans or the colder climates, the gene conversion (GC to A or T alleles) was more effective in Africans or a Neanderthal admixture into the ancestors of non-Africans, that could accumulate more mutations than in the modern humans after separation (but there are not clear evidence about this fact).

Conclusion

The Simons Genome Diversity Project is bringing more information by studying 300 new genomes, from 142 diverse populations, that shows an acceleration of accumulation of mutations in non-Africans compared to Africans. Also, the Pygmies seem to be the only African group with a low X-to-Autosome diversity ratio. Regarding the ancestors, the highest proportion of Neanderthal it was present in EastAsians and an excess of Denisovan in some SouthAsians compared to other Euroasians.

]]>