Evolution – Tutorial Genomics, Ecology, Evolution, etc https://wp.unil.ch/genomeeee Blog of a tutorial of Ecole doctorale de biologie UNIL Mon, 08 Nov 2021 16:12:32 +0000 en-US hourly 1 https://wordpress.org/?v=5.8.1 The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish. https://wp.unil.ch/genomeeee/2017/12/21/the-genomic-landscape-of-rapid-repeated-evolutionary-adaptation-to-toxic-pollution-in-wild-fish/ Thu, 21 Dec 2017 19:50:20 +0000 http://wp.unil.ch/genomeeee/?p=906 Introduction

Environmental pollution is a widespread problem that living organisms have to contend with on a global scale. In contaminated sites especially, wild populations undergo intense selective pressure that may result in phenotypic adaptations to pollutants (Hendry et al., 2008). The scientific article (Reid et al., 2016) discussed in this blogpost explores the genetic mechanisms that have allowed the rapid adaptation to industrial pollutants in wild Atlantic killifish populations.

Results

The genomic landscape of the killifish populations

Atlantic killifish (Fundulus heteroclitus) are non-migratory fish that are abundant along the US east coastline (Fig. 1A). Some killifish populations show inherited resistance to lethal levels of industrial pollutants in sites that have been contaminated for decades. For instance, the authors show that the percentage of larva that survive in increasing concentrations of a highly toxic pollutant called PCB 126, is higher in tolerant populations compared to the sensitive populations (Fig. 1B). To understand the genetic adaptations underlying the rapid adaptation to polluted sites in killifish populations, the authors sequenced the complete genomes from eight populations. Four tolerant populations that reside in highly polluted sites were sampled. Each one was paired with a sensitive population from a nearby site (Fig. 1A). The authors combined these genomic data with corresponding RNA sequencing (RNA-seq) to identify unique and shared pathways among tolerant populations as well as to uncover adaptive evidence in the populations.

The genomes from 43-50 individuals from each population were sequenced. One pair of tolerant and sensitive populations (T1 and S1) were sequenced to 7-fold coverage, while the remaining populations, to 0.6-fold coverage. These data indicate that the populations’ genetic variation is strongly  by their geographical locations. Meanwhile, all tolerant and sensitive pairs of populations share the most similar genomic backgrounds and have low Fst values between them (0.01 – 0.08). Additionally, tolerant populations show a lower genome-wide nucleotide diversity (?) along with a positive-shifted Tajima’s D. Thus, the authors conclude that tolerant populations have recently and independently diverged from local ancestral  populations.

Figure 1. Atlantic killifish populations. (A) Locations of pollution-tolerant and sensitive populations studied (“T”, filled circles; “S”, open circles respectively). (B) Larval survival (linear regression of logit survival to 7 days post hatch) when challenged with increasing concentrations of the pollutant PCB 126.

Signatures of convergent evolution in tolerant killifish populations

To identify genomic regions responsible for conferring pollution tolerance in killifish, the authors scanned the populations’ genomes looking for signals of  selective sweeps in 5 kb sliding windows. Candidate regions were defined as those showing low values of genetic diversity and a skewed allele frequency spectrum (0.1% tails of ? and Tajima’s D, respectively) as well as high allele frequency differentiation (99.9% tails for Fst). Each tolerant population showed prevalent selection signatures compared to their sensitive counterparts (as seen by ? and Fst). Most of these outlier regions are small (52 – 69 kb, up to ~1.8 Mb) and specific to each tolerant population. Nevertheless, the highest ranked outlier regions are shared between tolerant populations (Fig. 2A). The shared outlier regions harbour genes involved in the aryl hydrocarbon receptor (AHR) signaling pathway (AHR2a, AHR1a, AIP, and CYP1A) (Fig. 2B). These results suggest repeated convergent evolution of pollutant tolerance in the sampled killifish populations.

The authors then tested whether the genes located in outlier regions showed distinct expression profiles in tolerant killifish. Individuals from sensitive and tolerant populations were reared in a common, clean environment for two generations. Following this, embryos were challenged with the toxic pollutant PCB 126 and RNA was collected ~10 days post fertilization. Indeed, AHR-regulated genes were less induced in individuals from tolerant populations (Fig. 2C). Concomitantly, AHR-regulated genes were enriched (P < 0.0001) in the set of genes that were up-regulated in response to PCB 126 treatment in sensitive populations exclusively. Notably, some of the dominant pollutants at the sampled “T” sites bind AHR. Also, aberrant AHR signalling leads to embryo and larval lethality (Pohjanvirta, 2011). The authors thus conclude that the AHR signalling pathway is a key and repeated target of natural selection in polluted sites given the multiple, independent “desensitizing” events in tolerant killifish populations.

Fig. 2. Structural and functional genomic signals of adaptation to pollutants. Adapted from (Reid et al., 2016). (A) Allele frequency differentiation (Fst, top) and nucleotide diversity (pi, bottom) difference (tolerant pi – sensitive pi) for each population pair studied for top- ranking outlier regions (including the top two per pair). Colored panels span the outlier region of each respective population comparison where number indicates outlier rank for each tolerant-sensitive pair. Red dashed lines indicate outlier thresholds. Each tick on x axis is at the 500-kb position on the scaffold, and each candidate gene name is indicated (top) for each outlier region. (B) Model of key molecules in the AHR signaling pathway, including regulatory genes and transcriptional targets (AHR gene battery). Boxes next to genes are color-coded by population pair; filled boxes indicate the gene is within a top-ranking outlier region for that pair, and number indicates ranking of the outlier region as in (A). (C) Gene-expression (of developing embryos) heat map shows up-regulated genes in response to PCB 126 exposure (“PCB”; 200 ng/liter) compared with control exposure (“Con”) for sensitive populations, most of which are unresponsive in tolerant populations. The bottom panel highlights genes characterized as transcriptionally activated by ligand-bound AHR.

It is important to note that genome sequencing coverage seems to have an effect on the ranking of outlier regions. For instance, the regions that contain key AHR-signalling genes (AIP, CYP1A, AHR1a/2a, and ARNT) are very highly ranked in low-coverage populations whereas they are lowly ranked in the high-coverage population pair (T1-S1). Given that outlier regions are ranked based on Fst and nucleotide diversity, these measures must be impacted by low genome sequencing coverage. It would be interesting to determine the ranking of the outlier regions if the other populations were sequenced to higher coverage. However, despite being lowly ranked, these regions are classified as outliers in all four population pairs, giving strength to the argument that impaired AHR-signalling is key to pollution tolerance.

In-depth analysis of genetic variants in tolerant populations

There is evidence for selection of AHR pathway genes in tolerant killifish populations. Tolerant populations harbour distinct deletions spanning AHR2a and AHR1a. On the contrary, individuals from their sensitive counterparts are almost completely devoid of such mutations. Furthermore, RNA-seq data revealed the expression of a chimeric transcript, part AHR2a, part AHR1a in T4 individuals. Meanwhile, AIP (a regulator of AHR stability and cellular localization) is found within a region showing the strongest signals of selection that is shared between all tolerant populations. CYP1A (a transcriptional target of AHR) is also in located in top-ranking outlier regions in all tolerant populations (except for T1 where the region is ranked #401). Interestingly, CYP1A duplications are found in high frequencies in tolerant populations, without a concomitant increase in expression. The authors hypothesize that CYP1A duplications may function as a dosage-compensation mechanism in tolerant populations with impaired AHR signalling because it has been reported that AHR knockout decreases CYP1A expression in rodents (Schmidt et al., 1996). Finally, other AHR-related genes lie within population-specific outlier regions such as the tandem paralogs AHR1b and AHR2b in T3 and T4 and five other AHR pathway genes in T4. Together these observations led the authors to conclude that AHR pathway genes are indeed common and repeated targets of selection, a clear example of convergent evolution.

Genes outside of the AHR signalling pathway are also targets of selection. For example, two genes that are implicated in AHR-independent cardiotoxicity (KCNB2 and KCNC3) are within outlier regions in T4, where such cardiotoxic pollutants are abundant. Additionally, the authors found adaptations that may compensate for the potential costs of pollutant tolerance. AHR signalling is interconnected with multiple other pathways, such as estrogen and hypoxia signalling, as well as cell cycle and immune system regulation (Beischlag et al., 2008). Consequently, estrogen receptor 2b lies within an outlier region in T2, while estrogen receptor-regulated genes are enriched in the gene set of the outlier regions in all tolerant populations (P < 0.001). Furthermore, the estrogen receptor is inferred as an upstream regulator of differentially expressed genes between tolerant and sensitive killifish (Fig. 2C). Alternatively, the hypoxia-inducible factor 2? is in an outlier window in T3, and interleukin and cytokine receptors are in outlier regions in T4. Thus, the authors highlighted the possibility that compensatory adaptation selection may be common following rapid adaptive evolution.

Conclusions

In this article, we can appreciate that genetic adaptations to pollution in wild killifish populations are complex. The authors attribute this to two main factors. Firstly, sites are contaminated with a complex mix of pollutants. This may affect how the AHR- and other pathways are impacted. Therefore, adaptations in multiple pathways, at different genetic levels, may be necessary for tolerance to diverse pollutant mixes to arise. Second, AHR pathway genes are interconnected with other gene-regulatory pathways, thus these genes’ functions may be impaired upon aberrant AHR signalling, It follows that adaptations that compensate these genes’ functions may also be selected for in tolerant populations.

The authors argue that their data clearly reveal signals of convergent evolution. The AHR pathway genes are shown to be repeated targets of selection in distinct pollutant-tolerant killifish populations. This also suggests molecular constraints in the adaptation to pollution. However, in spite of this, multiple variants were favoured in different tolerant populations. The authors say that their data show evidence of selection of preexisting common variants in multiple tolerant populations. In other words, it seems that soft sweeps have been important for the emergence of pollutant tolerance in killifish. This conclusion is supported by several lines of evidence: 1) the sensitive-tolerant populations are genetically close, which suggests that the selected variants were part of the standing variation, 2) sensitive populations have some of the variants that tolerant populations have, and finally 3) these fish are low dispersal. Interestingly, the authors point out that Atlantic killifish have large population size and a wide range of standing genetic variation. These little critters were not only the first space-going-fish, they are one of the most genetically diverse vertebrates, which positioned them well to evolve pollutant-tolerance.

However, it is important to realize that not all species are as well poised to adapt to ever-changing, increasing pollution in their habitats. Research like this gives us key information on how natural populations are dealing with our pollution. The best chance we can give all life forms on earth is to curb or rates of worldwide pollution. Luckily, it is on our power to do so!

References

  1. Beischlag, T. V., Morales, J. L., Hollingshead, B. D., & Perdew, G. H. (2008). The aryl hydrocarbon receptor complex and the control of gene expression. Critical Reviews in Eukaryotic Gene Expression, 18(3), 207–250. http://doi.org/10.1615/CritRevEukarGeneExpr.v18.i3.20
  2. Hendry, A. P., Farrugia, T. J., & Kinnison, M. T. (2008). Human influences on rates of phenotypic change in wild animal populations. Molecular Ecology, 17(1), 20–29. http://doi.org/10.1111/j.1365-294X.2007.03428.x
  3. Pohjanvirta, R. (2011). The AH Receptor in Biology and Toxicology. Wiley, Hoboken, NJ.
  4. Reid, N. M., Proestou, D. A., Clark, B. W., Warren, W. C., Colbourne, J. K., Shaw, J. R., et al. (2016). The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish. Science (New York, N.Y.), 354(6317), 1305–1308. http://doi.org/10.1126/science.aah4993
  5. Schmidt, J. V., Su, G., Reddy, J. K., Simon, M. C., & Bradfield, C. A. (1996). Characterization of a murine Ahr null allele: Involvement of the Ah receptor in hepatic growth and development. Proceedings of the National Academy of Sciences of the United States of America, 93(13), 6731–6736.
]]>
The parallel evolution in amniotes seen through the eye of functional nodal mutations https://wp.unil.ch/genomeeee/2017/12/01/the-parallel-evolution-in-amniotes-seen-through-the-eye-of-functional-nodal-mutations/ Fri, 01 Dec 2017 17:25:53 +0000 http://wp.unil.ch/genomeeee/?p=892 Introduction

In this article the authors describe an evolutionary convergence in mammals, birds, and reptiles, based on genomic data from NCBI. The evolution of different species and lineages is due to mutations that can appear and accumulate in organisms over time. Those mutations need a high functional potential and have to be conserved in time in order to form new species. The conservation of mutations can occur via selection pressure, mutational compensation, and/or by the separation of members from the same species by geological and environmental events.

In this comprehensive study, the authors describe, a genomic landscape of the parallel evolution by analysing functional nodal mutations (fNMs) by using different types of DNA (mitochondrial and nucleic), the thermostability of mtDNA encoding RNA genes, and the structural proximity of proteins, using the available 3D structures from PDB database. Functional nodal mutations (fNMs) can be separated in single nodal (fSNMs), recurrent nodal mutations (fRNMs), occured independently in unrelated lineages and recurrent combinations of nodal mutations (fRCNMs) recurred independently along with other nodal mutations in combinations in more than a single lineage. The recurrent ones can be taken in consideration the most when we are talking about the convergent adaptive responses, that means the parallel evolution of different species. In this study, one of the aim is to find the best candidate for this adaptive mutations that was present in the evolution of the amniotes. The compensated ones are used to identify the adaptive mutations. The main explanation for the convergent evolution is the presence of the recurrent nodal mutations. Many fNMs are in combination with potential compensatory mutations in RNA and protein-coding genes. The compensation of a functional mutation is the co-occurrence with additional mutations that are “affecting” the original function.

Results

In the article it is claimed that the evidence for parallel evolution is mainly due to the presence of a high number of uncompensated reccurent fNMs. The best candidate to show the parallel evolution is the emergence of body thermoregulation in mammals and birds, that seems to be independent.

The mtDNA, the maternal genetic information was used to identify the fNMs in the amniotes. The study is based on mtDNA from 1003 species and nDNA from 91 species. The mtDNA was used for the structure-base alignment for 24 mtDNA-encoded RNA genes (tRNA and rRNa) and 13 protein-coding gene. To this, they added 4 more mtDNA proteins with the 3D structure: CO1-3 and Cytb, as the cytochromes are highly conserved proteins across various species. The mtDNA genes are usually the same, but what seems to be different it is the order of the genes, that are changed by evolutionary rearrangements. Because of this, they first aligned the genes individually and after this, they concatenated the 37 proteins to the human mtDNA gene order.

The sequence alignment revealed a number of 25234 nodal non-synonimous and RNA gene mutations. To see the potential of this mutations, there were calculating a score that include: evolutionary conservation, physical-properties (of non-synonymous changes) and the molecular thermostability (the free estimated energy (?G) for the two RNA sequences was calculated before and after the mutational event). The score, from 1 to 9 is depending to the level of conservation and physico-chemical properties of the tested amino acid.After calculating the potential function score of all the nodal mutations, there were 3262 non-synonimous fNMs, mainly in RNA genes with mutations related to disease-causing.

The next step was to identify the best candidate for adaptive fNMs by studying the compensated and non-compensated mutations, but the approach chosen by the authors cannot reveal the exact order of compensation process. Meanwhile, there are some compensatory mutations that could gain lower functionality scores than the co-occurring fNMs. In the Figure 1, we can see a demonstration of the potential compensation and a possible adaptation in a protein-coding gene (COX2) through different species. The panel b shows the locations of the fNMs (S155T) and different other co-occurring compensatory mutations. The S155T mutation appears as independently re-occurrent as well as compensatory co-occurring mutations. As we can see, this approach is pure theoretical, because cannot show all the compensations, only the best ones, that got fixed in evolution. The Figure 2 shows the prevalence of different types of mutations that could be compensated or not. The predictive results reveal a high probability of fRCNMs to be compensated for RNA and protein-coding genes. Here are introduced also the information from the nDNA, that is compared with mtDNA in term of prevalence of the compensatory and non-compensatory mutations. Because there was a big difference of the number of species involved in this approach, the evolutionary resolution was reduced. So, the authors decided to analyze the same 91 species for mtDNA and nDNA and reducing the bias. Because of the reduction in the resolution, they redid the analysis by using the most ancient mutations, that occurs in deeper nodes in the case of mtDNA, but this revealed almost the same proccent as they were working with the 91 species (37% for the ancient mutations and 34% by including the younger ones) (Figure 2e & Supplementary 5b,c). So, the older mutations appear to be less compensated and this give more uncompensated mutations that are best candidates in the ancient adaptative mutations. In the supplementary Figures, the authors are using the OXPHOS complexes to compare the fNMs in mtDNA and nDNA by using 91 species. For the intra-mtDNA the albeit is less prominent (31%).

For the nDNA data is used the whole genome of the species. So, the information is much more comprehensive by the presence of a higher number of genes. In comparison with the mtDNA, the compensation prevalence is lower, having a difference of 10%, but in both case the proccent of possible compensation is higher than can be explained by the mutation rate or the chance.

In the end, to determine the best adaptive mutations over the evolution, they used the fRNMs from mtDNA, but maybe because of the low number of the samples, the result did not show any proof of the impact of non-compensated fRNMs in being the main reason for the convergent evolution. Instead, the nDNA revealed a significant pattern with highest number of potential non-compensated fRNMs shared between birds and mammals (N=51). The best candidates resulted by being the mutations in the genes related to the thermoregulation in the birds and mammals.

Conclusion

In this comprehensive study, the authors merged several information, including different types of DNA, from many species, with various physico-chemical parameters. The results of this work reveal, that the ancient functional mutation are the best for being studied, because of their possibility to overcome negative selective. The best candidates for the adaptive nodal mutations are in the end the non-compensated fNMs, that are in a higher presence in the case of old fNM. This seems to be the main helper for the evolution of the thermoregulation in birds and mammals. The protein analysis reinforces the main conclusion: for enriching the adaptative mutations, the non-compensated mutations are the best candidates.

Taken together this study provides new insights into how different lineages and species might have developed over time. It also shows a new way how to combine data from different sources. However, the authors fail in giving an adequate explanation for the fNMs, together with the fact that they lack references that describe this term makes the article difficult to understand, especially for people that are not from the field and this is in fact the contrary of how scientific writing should be done.

 

Levin & Mishmar, 2017, The genomic landscape of evolutionary convergence in mammals, birds and reptiles. Nature Ecology & Evolution 1: 0041

 

 

]]>
Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals https://wp.unil.ch/genomeeee/2017/01/18/coregulation-of-tandem-duplicate-genes-slows-evolution-of-subfunctionalization-in-mammals/ Wed, 18 Jan 2017 14:46:50 +0000 http://wp.unil.ch/genomeeee/?p=805 Gene duplications are main contributors of genome evolution, but most of the duplicates are redundant and go through pseudogenization. There are several mechanisms proposed to explain how young duplicates survive long-term and escape from being degraded. Among these, dosage-balance model likely to explain the importance of shared expression levels of young duplicate genes. An alternative model indicates sub-functionalization (new copies shares the initial functions) or neo-functionalization (new copy gains new function) as the main mechanisms of the survival of new duplicate. However, it is largely unknown the survival of gene duplication in mammals. In this current study, by using RNA-seq profiles of different human and mouse tissues, authors show that sub-functionalization is a slowly evolving and rare event. Most of the young duplicates are shown to have decreased level of expression, thereby providing initial survival and long-term preservation in the genome.

Figure 1 Expression profiles of duplicate genes. Examples of Sub-or Neofunctionalization (A) and asymmetrically expressed gene pairs (B) are shown. In sub-functionalized example, SLC4A2 was shown to be expressed in Lung, Kidney, Liver and Testis, whereas SLC4A3 is expressed in Cortex, Heart and Testis. In asymmetrically expressed gene example, CRB1 is shown to be expressed higher in all tissues that examined.

In order to understand the process of long-term survival after gene duplication, they analyzed RNAseq data of 46 human tissues (from Genotype Tissue Expression, GTEx) and26 mouse tissues. With a computational pipeline (More than %80 coding sequence similarity and more than %50 average sequence similarity), 1444 duplicate gene pairs are identified. These gene pairs are classified as major gene and minor gene, for the higher or lower expression level, respectively. In addition, if a gene pair is at least two-fold higher expressed in minimum one tissue, then it is classified as sub- or neo-functionalized (Figure 1A). Moreover, if a gene pair is expressed more than the other pair in 1/3 of the tissues that examined, it is considered as asymmetrically expressed duplicate (AED) as shown in Figure 1B. Synonymous divergence (ds) was used to estimate divergence time, human-mouse split was shown as 0.45 ds and origin of placental mammals was shown as 0.7 ds.

Figure 2 Sub-functionalized or neo-functionalized genes dating back before the emergence of placental mammals.

Some gene pairs (Mostly of ds < 0.7) are shown to be neo or sub functionalized, yet there are very few examples of neo or sub-functionalization in lately occurred duplication events (Figure 2A-C). In addition, as it is expected that sub-functionalized genes would be under strong selective constraint comparing with non-divergent genes, Kolmogorov-Smirnov test showed that sub-functionalized genes have high fraction rare variants (Figure 2D). Since functionalization would rather give new functions to the gene pairs, authors examined if one of the gene pairs is associated with any disease. There is indeed a correlation that indicating an increase of both minor gene specific disease and minor gene associated disease, when there is a sub-functionalization event (Figure 2E).

The duplicates that are risen within placental mammals, most duplicate pairs are shown to be AEDs other than sub-functionalized and within AEDs, very few minor genes are associated with disease in contrast to what was shown in Figure 2E. All these results indicate that, sub-functionalization is a slowly evolving event, although it was shown that duplicates on different chromosomes have higher rates or neo- or sub-functionalization when it is compared with duplicates that are in tandem arrays. This brings the question, whether separation of the duplicates is a facilitating process for sub-functionalization.

Figure 3 Genomic Location of the duplicates and expression correlation. It is shown that most of the young duplicates are located in same chromosome and are closely located to each other, whereas the older duplicates tend to locate on different chromosomes. Depending on how closely the duplicates locate on the chromosome (both in human and mouse), there is a higher of expression correlation of the duplicates.

Supporting this idea, authors indicated that 87% young gene pairs with ds < 0.1 are found in tandem arrays in the same chromosomes (Figure 3A). The rest of the duplicates found on different chromosomes are most likely separated by the result of chromosomal rearrangements and they have diverged expression pattern due to the genomic separation (Figure 3B). It is shown that the more genomic distance of the duplicates increases, the less expression correlation of the duplicates is observed. Notably, it is also shown that duplicates in mouse have a similar correlation with human duplicates, indicating the negative relation between genomic distance and expression correlation is not human specific (Figure 3C). This data supports what was previously shown about the coregulation of closely located genes in the genome and it is once shown in Figure 3D, as neighbor duplicates have higher expression correlation comparing with duplicates on different chromosomes and singletons. In addition, whole-genome chromosome conformation capture (Hi-C) shows that neighboring duplicates have higher connectivity and more promoter-promoter links comparing with neighboring singletons (Figure 3D).

So far, it is shown that expression sub-functionalization is a slowly evolving process and duplicates that are in tandem arrays are mostly coregulated. As an alternative explanation, if dosage sharing is crucial for the preservation of newborn duplicates, it must be shown that there is a shared and lower expression of the duplicates. In order to prove this hypothesis, the authors investigated the human duplicates since human-macaque split with RNA-seq results of six different tissues. It is obvious that, the sum of expression levels of human major and minor duplicate is corresponding to the expression level of macaque singleton ortholog (Figure 4A). This data proves that dosage sharing is a fast evolving event, contributing to the preservation of duplicates in the genome.

Figure 4 Dosage sharing and multi-step model of how duplicate genes are preserved. Summed expression of human young duplicate is similar to the expression of macaque ortholog.

Overall, in this current study the mechanism of how duplicated genes are preserved is explained with a multi-step model (Figure 4C). According to the model, after a duplication event happens, expression dosage is shared between two duplicates which was also suggested for whole genome duplications. In this process, there is a tight competition between dosage sharing and mutational degradation of one of the duplicates. After this important step, minor gene of the asymmetrically expressed duplicate can be lost slowly under reduced constraint. In an alternative long-term scenario, chromosomal rearrangements would happen to separate the coregulation of these tandem duplicates and providing different expression pattern and/or protein adaptation which will cause long-term survival of the duplicated genes. To sum up, this study shows that rapid dose sharing is a fundamental first step after the duplication of a gene and it can be followed by a slow evolving subfunctionalization event of the duplicate.

References

Xun Lan and Jonathan K. Pritchard

Science  20 May 2016:
Vol. 352, Issue 6288, pp. 1009-1013
DOI: 10.1126/science.aad8411

]]>
Peppered moth melanism mutation is a transposable element https://wp.unil.ch/genomeeee/2016/12/16/peppered-moth-melanism-mutation-is-a-transposable-element/ Fri, 16 Dec 2016 10:47:58 +0000 http://wp.unil.ch/genomeeee/?p=788 ResearchBlogging.org

One of the most known examples of natural selection in action is the evolution of the peppered moth (Biston betularia), the rapid replacement of the light-colored form of the moth (typica) by a dark-colored form (carbonaria) (Fig. 1) during 1800s in Britain. The first live specimen of the carbonaria form was found in 1848 and its frequency had increased drastically until late 1800s. In 1895, 98% of the moth population in Manchester was the carbonaria form (reviewed in Clarke et al., 1985). Such a phenomenon 36 years after the publication of Darwin’s On the Origin of Species, attracted biologists’ attention. J.W. Tutt first proposed “Differential bird predation hypothesis” in 1896, which is confirmed by a series of experiments by Kettlewell during mid 1950s (reviewed in Cook and Saccheri, 2013). The hypothesis states that the industrial revolution in Britain resulted in blackened trees by soot, so that birds can easily spot light-colored moths on soot-darkened trees while dark-colored moths are camouflaged. However, genetic events giving rise to carbonaria phenotype remained elusive until recently. Researchers from University of Liverpool and Wellcome Trust Sanger Institute now reported in Nature that the mutation causing the peppered moth industrial melanism is the insertion of a large, tandemly repeated transposable element into first intron of gene cortex.

Figure 1. The dark-colored form, carbonaria (top) and the light-colored form, typica (bottom) of Biston betularia

The term industrial melanism refers to darkening of species in response to pollutants. It is widespread in many Lepidoptera species (moths and butterflies). Initial experiments identified that melanism in Biston betularia is determined by a single locus dominant allele (reviewed in Cook and Saccheri, 2013). However, the molecular identity of the gene determining the melanism in peppered moths was completely unknown. In order to determine the gene identity, van’t Hof and Saccheri looked for associations between genetic polymorphisms within sixteen genes previously implicated in melanisation pattern differences in other insects and the carbonaria morph by the candidate gene approach (van’t Hof and Saccheri, 2010). However, this earlier study showed that the carbonaria gene is not a structural variant of a canonical melanisation pathway gene. One year after the failure of the candidate gene approach, Saccheri group constructed a linkage map to identify the chromosomal region containing the carbonariatypica polymorphisms. In 2011, they coarsely localized the carbonaria locus to a <400-kilobase region orthologous to a segment of silkworm (Bombyx mori) chromosome 17 (van’t Hof et al., 2011). However, what the gene is and what it does was still a mystery.

The same group now reported that they have found the gene and the mutation event causing the industrial melanism in Biston betularia (van’t Hof et al., 2016). By using a larger population sample and more closely spaced genetic markers, they narrowed down the carbonaria candidate region to ~100 kb region in Biston betularia genome. The candidate region is the orthologue of Drosophila cortex (cort) gene. As a distant member of the Cdc20 protein family, Drosophila cort gene encodes for a cell-cycle regulator and is shown to be important in regulating oocyte meiosis (Chu et al., 2001), but it is not involved in wing patterning or development. Unlike Drosophila cort, two of multiple alternative first exons (1A and 1B) in Biston betularia cortex are strongly expressed in developing wing disks. In addition, cortex gene has a very large first intron and eight non-first exons.

After identification of the gene, authors compared one carbonaria to three typica haplotypes to identify the first set of carbonaria specific polymorphisms. This initial alignment revealed 87 melanisation candidate polymorphisms concentrated within the large first intron of the gene. However, natural selection increases not only the frequency of the favored allele in carbonaria but also the frequency of the neutral alleles linked to the causal allele. In an earlier study, they have also shown that Biston betularia melanism was originated from a single recent mutation (van’t Hof et al., 2011). Having screened more typica individuals, they further eliminated rare variants and were eventually able to find one polymorphism unique to carbonaria, a very large insert in the first intron of the gene.

The size of the causative large insert is 21,925 nucleotides long and is composed of a roughly 9-kb essentially non-repetitive sequence. The nature of the insert indicated that it is a class II transposable element (TE) – DNA transposon. The transposition of class II TEs are catalyzed by transposases that cut the DNA at the target site in a staggered fashion producing 5′ or 3′ DNA overhangs that are duplicated after transposition. Another hallmark of class II TEs is short inverted repeats at two ends of TE. Sequence analysis of the insert and comparison with the typica haplotypes revealed that both short inverted repeats (6 bp) and duplication of the target site (4 bp) are present in the carbonaria insert (Fig. 2).

Figure 2. The structure of the insert, shown in the carbonaria sequence, corresponds to a class II DNA transposon, with direct repeats resulting from target site duplication (black nucleotides) next to inverted repeats (red nucleotides). Typica haplotypes (lower sequence) lack the 4-base target site duplication, the inverted repeats and the core insert sequence. The transposon consists of ?9 kb tandemly repeated two and one-third times (repeat unit (RU)1–RU3), with three short tandem subrepeat units (green dots, SRU1–SRU9) within each repeat unit.

To estimate the age of mutation event, the authors looked at 200 kb either side of the carb-TE insert. The idea is to track recombination events that have eroded the ancestral carbonaria haplotype. Given the ancestral state of carbonaria haplotype and recombination rate, how many years do we need to explain the observed haplotypes that are shuffled version of the ancestral one? Simulations based on this assumption predicted the most likely date of the mutation as 1819, shortly before it was first seen in the wild (1848) (Fig. 3).

Figure 3. Probability density for the age of the carb-TE mutation inferred from the recombination pattern in the carbonaria haplotypes (maximum density at 1819 shown by dotted line; first record of carbonaria in 1848 shown by dashed line).

The next question is how the carbonaria – TE leads to the melanisation of Biston betularia. TEs localized in introns effects the expression of the gene through several mechanisms. In order to test this possibility, first they checked tissue-specific expression of cortex splice isoforms and alternative first exons. They have identified two first exons, 1A and 1B, which are expressed highly in developing wing discs. Comparison of the abundance of 1A and 1B-initiated full transcripts between different genetic backgrounds (homozygous carbonariac/c, homozygous typicat/t, and heterozygous individuals – c/t) revealed that 1B expression is significantly higher in carbonaria background (c/c > c/t > t/t) (Fig. 4), whereas 1A-initiated full transcript does not show a significant difference between genotypes. In addition, cumulative expression of all splice-isoforms increases starting from the sixth larval instar (La6) until day 6 prepupa (Cr6) with highest value on day 4 prepupa (Cr4). Surprisingly, a phase of rapid wing disc morphogenesis also occurs in the same time interval, possibly indicating a function of cortex in wing pattern melanisation.

Figure 4. Tukey plot for relative expression of cortex 1B full transcript in developing wings of the three carbonaria-locus genotypes (c/c, c/t and t/t) produced within the progeny of a c/t x c/t cross. Genotypes differ significantly for the transcript (P < 0.001)

As mentioned earlier, Drosophila cort encodes for a distant member of the Cdc20 protein family (Chu et al., 2001). Members of the Cdc20 protein family activate an “E3” ubiquitin ligase, the anaphase-promoting complex (APC) and present its substrates. APC then ubiquitinates presented cell-cycle proteins, causing their degradation. This proteolysis destroys a panel of proteins including cyclins, allowing the cell cycle to progress. Degrons, short linear motifs located anywhere in the protein, are important for substrate recognition in proteolysis. A single shared site in lepidopterans and non-lepidopterans cortex binding the same degron sequence is also predicted for both 1A and 1B full isoforms, indicating a shared function of cortex between D. melanogaster and B. betularia. However, we still need further evidence to understand the exact connection between cell-cycle protein degradation and melanisation.

In conclusion, we now know that the industrial melanism mutation event in British peppered moth is the insertion of a large, tandemly repeated, transposable element into the first intron of the gene cortex. Although we still do not know the molecular mechanisms connecting cortex gene and the melanism in peppered moths, the discovery of causative mutation as a transposable element is breakthrough in the peppered moth story. In addition, it provides a spectacular evidence for the importance of transposable elements in adaptive evolution.

References:

Chu, T., Henrion, G., Haegeli, V., and Strickland, S. (2001). Cortex, a drosophila gene required to complete oocyte meiosis, is a member of the Cdc20/fizzy protein family. Genesis 29, 141–152.

Clarke, C.A., Mani, G.S., and Wynne, G. (1985). Evolution in reverse: clean air and the peppered moth. Biol. J. Linn. Soc. 26, 189–199.

Cook, L.M., and Saccheri, I.J. (2013). The peppered moth and industrial melanism: evolution of a natural selection case study. Heredity (Edinb). 110, 207–212.

van’t Hof, A.E., Edmonds, N., Dalíková, M., Marec, F., and Saccheri, I.J. (2011). Industrial melanism in British peppered moths has a singular and recent mutational origin. Science 332, 958–960.

van’t Hof, A.E., Campagne, P., Rigden, D.J., Yung, C.J., Lingley, J., Quail, M.A., Hall, N., Darby, A.C., and Saccheri, I.J. (2016). The industrial melanism mutation in British peppered moths is a transposable element. Nature 534, 102–105.

van’t Hof, A.E., and Saccheri, I.J. (2010). Industrial melanism in the peppered moth is not associated with genetic variation in canonical melanisation gene candidates. PLoS One 5.

 

 

]]>
ExAC presents a catalogue of human protein-coding genetic variation https://wp.unil.ch/genomeeee/2016/12/08/exac-presents-a-catalogue-of-human-protein-coding-genetic-variation/ https://wp.unil.ch/genomeeee/2016/12/08/exac-presents-a-catalogue-of-human-protein-coding-genetic-variation/#comments Thu, 08 Dec 2016 20:14:13 +0000 http://wp.unil.ch/genomeeee/?p=720 ResearchBlogging.org

Exploration of variability of human genomes represents a key step in the holy grail of human genetics – to link genotypes with phenotypes, it also provides insights to human evolution and history. For this purpose Exome Aggregation Consortium (ExAC) have been founded; to capture variability of human exomes using next-generation sequencing. The first ExAC dataset of 63,358 individuals was released 20th of October 2014. Recently, a paper describing updated version of the dataset was published : Analysis of protein-coding genetic variation in 60,706 humans.

Authors made a great work on the reproductibility of the downstream analyses they have performed and generally on the availability of data. All the code is well documented in blogpost and available in GitHub repository. All figures in this blogpost I plotted by my own!

Dataset

ExAC is composed of almost ten fold more individuals and previous dataset of the similar kind Fig 1a. 91,000 individuals were sequenced, of which 60,706 have been kept after quality filtering. Finnish population was excluded from European due to bottleneck they have gone though.

ExAC was targeting individuals with various genetic background. Principal component analysis have shown very strong geographical pattern in the dataset (Fig 1b). I expected a continuum of haplotypes in the environment without strong geographic obstacle (like European-Latino continuum). The gaps between South Asian samples and the rest Europen samples on the PCA plot is most likely caused by the absence of samples from Middle-East Asia. Middle-East Asian samples have just a colour, but no data points. Central Asians do not even have a colour.

Figure 1: Size and diversity of ExAC dataset a, ExAC dataset is almost ten fold bigger than datasets of similar kind: 1000 Genomes project and Exome Sequencing Project (ESP), but more importantly, it captures a far greater diversity of human populations compared to ESP and 1000 Genomes. b, The geographic signal of populations visualized using Principal component analysis (PCA). The first principal component get all the variability of African samples and it does not tells much about the rest of the dataset (Extended Data Figure 5 in the paper), therefore the second and third principal component has been show.

A 45 million nucleotide positions with sufficient coverage (>10x in at least 80% of individuals) are present in ExAC. These positions correspond to 18 million possible synonymous variants (in theory) of which ExAC is capturing 1.4 million (7.5%).

The size of ExAC allows to observe…

…mutational reoccurence: 43% of synonymous de Novo variants identified in previous studies were also identified in ExAC, which is a first direct evidence of mutational reoocuarence.

…multiple allels: 7.9% of high quality polymorphic sites are multiallelic, which is fairly close to Poisson expectation (whatever it means…)

…a LOT of variants after all the filtering, 7,404,909 high-quality variants were identified of which 317,381 indels. The density of variant is on the average one over eight bases. 99% of the variants had frequency bellow 1% and 54% of the variants are singletons (i.e. only one individual carries the variant).

…a selection effects The proportion of singletons among polymorphisms can serve as a measure of purifying selection acting on the polymorphisms of given size. The Figure 2 shows that indels that are not affecting open reading frame (ORF) have significantly less singleton variants than indels that actually affect ORF. There is also significant difference between indels of different sizes that are affecting ORF, but we (our topic group) have not found any possible explanation for this pattern.

…saturation of alleles in CpG sites: CpG sites have very high rate of transitions, therefore capturing all possible variants is substantially easier than for other sites. A subset of 20,000 individuals of ExAC dataset shows saturation of alleles – all non-lethal possible synonymous CpG transition variants are present. ExAC is the first dataset showing a saturation of human variation.

Figure 2: Indel frequencies with respect to the size a, Frequency of deletions is higher and smaller indels are more probable than greater. If we take into account the greater probability of smaller indels, frequency of indels that not shifting open reading frame is bit higher than frequency of indels than are not. b, Proportion of singletons in total number of indels (as proxy for strength of selection) is significantly and consistently lower in all indels that are not shifting open reading frame (-6, -3, +3, +6).

Deletireous alleles

Authors introduce a mutability adjusted proportion singleton (MAPS) metric as a measure of selection. This metric is correcting on biases caused by the different mutational rates allowing comparisons of categories with various mutational speed. Comparison across different functional classes have shown at Figure 3. MAPS shows higher values for categories predicted to be deleterious by conservation-based methods.

Figure 3: MAPS values of different functional classes. MAPS is highest for nonense substiturions and it also consistent with PolyPhen and Combined Annotation Dependent Depletion (CADD) classification.

Rare diseases

Average ExAC individual carries ~54 variants reported as Mendelian disease causing. Approximately 41 of these alleles were identified with frequency greater than one, therefore it is not expected to be caused by problem is variant calling, but in miss-classification of variants in the database. Evidence of 192 previously variants were manually curated of those only 9 had sufficient evidence in disease association. High allele frequencies were identified mainly in previously underrepresented categories Latino and South Asian.

ExAC have shown importance of matching reference population in identification disease-causing variant. An example is recessive disease North American Indian childhood cirrhosis previously reported to be caused by CIRH1A p.R565W. This variant was identified in homozygotic state in four individuals in Latino population, none of them having a record of liver problems during childhood.

Conclusion

ExAC shows the importance of diversity of sampled population in capturing the real link between genotype and phenotype. Even ExAC provides a lot of new insights, there are still populations that are underrepresented or not represented at all.

Given the richness of ExAC and the effort of authors in data sharing and availability, I guess that it will be a great resource for various analyses in the future for a lot of researchers around the globe.

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won HH, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur DG, & Exome Aggregation Consortium. (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536 (7616), 285-91 PMID: 27535533

]]> https://wp.unil.ch/genomeeee/2016/12/08/exac-presents-a-catalogue-of-human-protein-coding-genetic-variation/feed/ 1 Supergenes and social organization in a bird species https://wp.unil.ch/genomeeee/2016/05/06/supergenes-and-social-organization-in-a-bird-species/ Fri, 06 May 2016 16:52:51 +0000 http://wp.unil.ch/genomeeee/?p=681 ResearchBlogging.org

 

 

 

Cindy Dupuis, Xinji Li, Casper van der Kooi

 

The development of new molecular mechanisms and next generation sequencing techniques have advanced our knowledge on the genetic basis underlying phenotypic polymorphism. Over the coarse of recent years, scientific studies have documented large genomic regions with drastic phenotypic effects, the so-called supergenes. A supergene is a set of genes on the same chromosome that exhibit close genetic linkage and thus inherits as one unit.

The evolution of a supergene requires that multiple loci with complementary effects become linked (i.e. they are genetically clustered and recombination between the loci is suppressed) and that optimal alleles at the linked loci are combined. Genetic clustering of different loci can occur when, via mutation, an adaptive interaction between two closely placed loci is created. In addition, gene duplications or translocations that generate a series of (novel) complementary genes can give rise to supergenes. The probability of a recombination event occurring in between loci depends on various factors. The chance of a recombination event occurring in between two loci will be small when the loci are located closely together, as the chance of a recombination event in between two loci generally decreases with physical distance between the loci. Given the large size of supergenes, additional mechanisms seem, nonetheless, important. This can, for instance, be maintained via structural differences, such as inversions, between the supergene and their homologous chromosomal region.

An interesting example of a supergene in an invertebrate is the case documented by Purcell et al. (2014). They documented a large, nonrecombining region that is association with social organisation in an ant species. The nonrecombining region was found to largely constitute one chromosome and was hence aptly called the ‘social chromosome’. They find a structurally similar region with similar effects in another ant species, however the regions exhibit no homology, suggesting parallel evolution of the social chromosome. Examples of vertebrates social systems determined by supergenes are, to our knowledge, unknown.

Two recent articles (Küpper et al., 2016; Lamichhancy et al., 2016) revealed a single supergene controlling alternative male mating tactics in the ruff (Philomachus pugnax). The studies were carried out independently by two research groups, but reach almost the same conclusions. The ruff (Philomachus pugnax) is a lekking wader known for the great diversity in the male plumage color and behavioral polymorphism. Three types of males can be distinguished; these types are characterized by differences in territoriality and behavior that are highly correlated with differences in nuptial plumage and body size. Predominantly dark-colored Independent males are most common (80-95% of males), these males defend small territories on a lek. Smaller, lighter colored Satellite males (5-20%) are non-territorial and less strict to a particular lek. Satellite males make use of – and are largely tolerated by – the residences of Independent males. The third type are the Faeder males, which are very rare (<1% of males). Faeder males lack male display, are small and resemble the unornamented females; however, they have disproportionately large testes.

Previous studies using pedigrees of large, captive populations showed that reproductive polymorphism follows a single-locus autosomal pattern of inheritance (Lank et al., 1995; Lank et al., 2013). The dominant Faeder allele controls development into Faeder males, whereas the Satelllite allele (that is dominant to Independent) controls development into Satellite or Independent males. Ekblom et al. (2012) studied the nucleotide sequence variation and gene expression in ornamental feathers from 5 Independent and 6 Satellites males using transcriptome sequencing. No significant expression divergence of pre-identified coloration candidate genes was found, but many genetic markers showed nucleotide differentiation between the two morphs. Later, Farrell et al. (2013) used linkage analysis and comparative mapping to locate the Faeder locus, and found linkage to microsatellite markers on avian chromosome 11 that included the Melanocortin-1 receptor (MC1R) gene, a strong candidate in alternative male morph determination, because it is considered to be important in plumage coloration.

Using the captive population that was previously phenotyped, Küpper et al. now set out to determine the genomic structure of the existing morph divergence in P. pugnax. The first step in their analysis was to generate and annotate the full genome for one Independent male. Followingly, the authors identified SNPs in the population using RAD sequencing. More than one million SNPs could be distinguished, and Faeder and Satellites could be mapped to a genetic map based on 3’948 SNPs. Interestingly, both morphs mapped to the same region on chromosome 11, but exhibited clear structural differences. This was corroborated by a GWAS analysis on 41 unrelated Satellite, Independant and Faeder males from a natural population.

 

In order to characterize the genomic region more precisely, they conducted a whole genome sequencing of a small set of Independent, Satellite and Faeder males. They showed that the region on chromosome 11 was highly differentiated between Satellite and Faeder morphs and that this region contained a greater nucleotide variation compared to the adjacent regions. Using the reads orientation, they found clear evidence for an inversion of the chromosomal regions between the different morphs. Interestingly, they found that one breakpoint occurs within an essential gene, CENPN (encoding centromere protein N, recessive lethal), which implies that individuals homozygous for the inversion are not viable – an observation that is confirmed by breeding experiments. The authors also suggested a recombination event or gene conversion to have occurred between the Satellites and Independent alleles.

 

By comparing gene sequences among morphs, the authors discovered that 78% of the gene sequences were different between morphs, and that those differences had the potential to change the encoded protein. Among the divergent genes, some where found to be involved in hormonal production, like HSD17B2, an enzyme inactivating testosterone and estradiol. Varying specifically depending on the morph, this enzyme may alter steroid metabolism and explain partly why plumage patterns and behavior is different between morphs. The MC1R gene was also found within the altered genomic region. This gene is considered an important locus controlling color polymorphism, which could be at the source of the reduced melanin levels in satellites. The PLCG2 gene, which has been rearranged in Faeders, was found to be a candidate gene for the rather feminine appearance and non-aggressive behavior in Faeders. Presumably, this gene is part of a cascade leading to the development of the usual impressive plumage of other males morphs.

 

In a second article, Lamichhancy et al., 2016 studied a natural ruff population using whole-genome sequencing. They first established a high-quality reference genome assembly from an Independent male and conducted functional annotation based on both evidence data and de novo gene predictions. Then, whole-genome resequencing and SNP calling were performed for 15 Independent, 9 Satellite and 1 Faeder males. Their genome-wide screen for genetic divergence estimates (FST) between different male morphs identified a 4.5-Mb region, based on which Independents and Satellites could be phylogenetically clustered as distinct groups. Screening for structural variants identified a 4.5-Mb inversion in Satellites that perfectly overlapped with the differentiated region. In addition, PCR-based sequencing confirmed the positions of proximal and distal breakpoints and identified a 2,108-bp insertion of a repetitive sequence at the distal breakpoint. Diagnostic tests showed that Satellite males were heterozygous (S/I), while most Independent males were homozygous (I/I). They suggested the Independent allele to represent the ancestral state, which is consistent with the conserved synteny among birds.

The comparison between Faeder and Independent males showed that the genetic differentiation was equally strong across the same region, creating a mirror image of the differentiation pattern between Satellites and Independents. Accordingly, the region could be subdivided into two parts: region A where Satellite and Faeder chromosomes were closely related and less closely related to Independent, and region B where the Satellite and Independent loci were closer related and divergent from Faeder. Since an inversion is expected to reduce the amount of recombination within the region between the wild-type (I) and mutant alleles (either S or F), the disruption of the differentiation pattern might be considered the result of one or two recombination events between an Independent and a Faeder-like chromosome. The divergence time between the Independent allele and Satellite or Faeder alleles was estimated to be approximately 4 million years, using the nucleotide divergence and estimated mutation rates for birds. The last recombination event was estimated to occur 520,000 ± 20,000 years ago.

To better understand the genetic consequences of the inversion and relate it to the phenotypic variantion in male ruffs, the authors searched for candidate mutations amongst the genes in the inverted region. Mutations in several genes with important functions were found on Satellite and Faeder chromosomes, including the abovementioned CENPN, HSD17B2 and MC1R genes as well as and SDR42E1 (the latter one is important for the metabolism of sex hormones). Missense mutations in derived MC1R were found to be associated to the Satellite and Faeder alleles, hinting at a potential mechanism explaining the male plumage polymorphism during breeding season.

In conclusion, these two studies demonstrated presence of a genomic inversion that led to the evolution of a supergene. This supergene determines the complex phenotypic variation in male ruffs. These two papers contribute to our understanding of supergenes, complex phenotypes and social organization.

 

Küpper C, Stocks M, Risse JE, Dos Remedios N, Farrell LL, McRae SB, Morgan TC, Karlionova N, Pinchuk P, Verkuil YI, Kitaysky AS, Wingfield JC, Piersma T, Zeng K, Slate J, Blaxter M, Lank DB, & Burke T (2016). A supergene determines highly divergent male reproductive morphs in the ruff. Nature genetics, 48 (1), 79-83 PMID: 26569125

]]>
Evolution of Darwin’s finches and their beaks revealed by genome sequencing https://wp.unil.ch/genomeeee/2015/05/28/evolution-of-darwins-finches-and-their-beaks-revealed-by-genome-sequencing/ Thu, 28 May 2015 00:47:46 +0000 http://wp.unil.ch/genomeeee/?p=602 ResearchBlogging.org

Introduction

Darwin’s finches from Galapagos and Cocos Island are classic example of young adaptive radiation, entirely intact because none of the species having become extinct as a result of human activity. They have diversified in beak sizes and shapes, feeding habits and diets in adapting to different food resources. Although traditional taxonomy of Darwin’s is based on morphology and has been largely supported by observations of breeding birds finches, in this paper, authors showed the results of whole-genome re-sequencing of 120 individuals representing all of the Darwin’s finch species inhabiting Galapagos archipelago (Fig. 1a) and two close relatives, trying to analyse patterns of intra-and interspecific genome diversity and phylogenetic relationships among the species.

Figure 1a. Sample location of Darwin’s finches

blog post 2

Summary and comments of the paper

The authors analyzed location and phylogeny of Darwin’s finches and found widespread evidence of interspecific gene flow that may have enhanced evolutionary diversification throughout phylogeny. They also reported discovery of a locus with the major effect on beak shape. They generated 10x sequence coverage per individual bird and using 2×100 base-pair (bp) paired-end reads and found evidence of introgression from three sources: ABBA-BABA tests, discrepancies between phylogenetic trees based on autosomal and sex linked loc, and mtDNA. Extensive sharing of genetic variation among populations was evident, particularly among ground and tree finches, with almost no fixed differences between species in each group. Their maximum-likelihood phylogenetic tree based on autosomal genome sequences is generally consistent with current taxonomy showing several interesting deviations (Fig. 1b).

Figure 1b. Phylogeny of Darwin’s finches

blogpost 1

Revised and dated phylogeny of Darwin’s finches shows that the adaptive radiation took place in the past million years, with a rapid accumulation of species recently. Genomic characterization of the entire radiation revealed a striking connection between past and present evolution. Evidence of introgressive hybridization is found throughout the radiation, showing that hybridization always gives rise to species of mixed ancestry, which is explained in detail (species and location) in this paper. The most obvious morphological difference among Darwin’s finches concerns beak shape. The authors performed a genome wide scan on the basis of populations that are closely related but show different beak morphology. In this study, they indicated a polygenic basis for beak diversity, discovering 15 regions with strong genetic differentiation between groups of finches with blunt or pointed beaks. Their analysis revealed that ALX homeobox 1 is an excellent candidate for variation in beak morphology, because it encodes a paired-type homeodomain protein (transcription factor), that plays a crucial role in development of structures derived from craniofacial mesenchyme, the first branchial arch and the limb bud, and have influence on migration of cranial neural crest cells, highly relevant to beak development. They observed single nucleotide polymorphisms (SNPs) in ALX1 gene of various finch species and concluded that blunt haplotype has a long evolutionary history because it’s origin predates the radiation of vegetarian, tree and ground finches. The haplotype might have evolved by accumulating both coding and regulatory changes affecting ALX1 function. Natural selection and introgression affecting this locus have contributed to the diversification of beak shapes among Darwin’s finches and hence to their expanded utilization of food resources on different Galapagos islands.

Lamichhaney, S., Berglund, J., Almén, M., Maqbool, K., Grabherr, M., Martinez-Barrio, A., Promerová, M., Rubin, C., Wang, C., Zamani, N., Grant, B., Grant, P., Webster, M., & Andersson, L. (2015). Evolution of Darwin’s finches and their beaks revealed by genome sequencing Nature, 518 (7539), 371-375 DOI: 10.1038/nature14181

]]>
The genomic substrate for adaptive radiation in African cichlid fish https://wp.unil.ch/genomeeee/2015/05/25/the-genomic-substrate-for-adaptive-radiation-in-african-cichlid-fish-2/ Mon, 25 May 2015 13:41:32 +0000 http://wp.unil.ch/genomeeee/?p=588 In African lakes, cichlid fishes are famous for large, diverse and replicated adaptive radiations. Nearly 1,500 new species of cichlid fish evolved in a few million years when environmentally determined opportunity for sexual selection and ecological niche expansion was met by an evolutionary lineage with unusual potential to adapt, speciate and diversify. The phenotypic diversity encompasses variation in behaviour, body shape, coloration and ecological specialization. The frequent occurrence of convergent evolution of similar ecotypes suggests a primary role of natural selection in shaping cichlid phenotypic diversity.

To identify the ecological and molecular basis of divergent evolution in the cichlid system, David et al. [1] sequenced the genomes and transcriptomes of five lineages of African cichlids, Pundamilia nyererei (endemic of Lake Victoria); Neolamprologus brichardi (endemic of Lake Tanganyika); Metriaclima zebra (endemic of Lake Malawi); Oreochromis niloticus (from rivers across northern Africa); Astatotilapia burtoni (from rivers connected to Lake Tanganyika). These five lineages diverged primarily through geographical isolation, and three of them subsequently underwent adaptive radiations in the three largest lakes of Africa. Authors comprehensively investigate the features from these massive genomic data. Here is some interesting finding:

Accelerated gene evolution was assessed by non-synonymous/synonymous ratio. Compare with stickleback fish, O. niloticus has significant higher ranks. And three gene, a ligand (bmp4), a receptor (bmpr1b) and an antagonist (nog2) in the BMP pathway, all known to influence cichlid jaw morphology, show accelerated rates of protein evolution in haplo-chromine cichlids.

East African cichlids, including O. niloticus, possess an unexpectedly large number of gene duplicates. The author found 280 duplication events in the lineage leading to the common ancestor of the radiations. And that was 4.5- to 6-fold increase in gene duplications relative to other clades, normalizing by the branch length. But again, same as high dN/dS analysis, there is no significant enrichment for particular gene pathway.

For the transposable elements insertion in different lineage, the authors claimed that there were three waves of TE insertions. And the TE inserted near the 5’ UTR increased gene expression significantly. Surprisingly, none of the five cichlid genomes showed any deficit of sense-oriented LINE insertions, which correspond to a time of transposable element insertions in the common ancestor of the haplo-tilapiine cichlids. This suggests that ancestral East African cichlids went through an extended period of relaxed purifying selection.

For people who interested in small RNA, the authors also found surprising excess number of novel microRNA emerge in cichlid and with wet lab experiment confirmation, these novel miRNAs were believed to alter gene expression in multiple organs.

Last but not the least, they also did a lot of population genetic analysis in closely related species of the genera Pundamilia, Mbipia and Neochromis, all of which are endemic to Lake Victoria. Because Lake Victoria is where the most recent radiation happened. Several hundred endemic species emerged within the past 15,000–100,000 years. Their results from Fst comparing suggests that (1) variation in coding sequence is most likely to be involved in the divergence of physiological and/or terminally differentiated traits like color; (2) regulatory variation is more important in morphological changes involving genes that have pleiotropic effects in developmental networks.

Conclusion:

Sometimes with massive interesting point, it is hard to get the simple answer for the ultimate question, why some species diversify so dramatically, some species did not. Here is the case for cichlid, which they try to address the question of what is the genomic substrate for adaptive radiation. The author’s conclusion is neutral and adaptive processes both make important contributions to the genetic basis of cichlid radiations.

Reference:

  1. Brawand D, Wagner CE, Li YI, Malinsky M, Keller I, Fan S, Simakov O, Ng AY, Lim ZW, Bezault E, Turner-Maier, J. Johnson J, Alcazar R, Noh HJ, Russell P, Aken B, Alföldi J, Amemiya C, Azzouzi N, Baroiller J-F, Barloy-Hubler F, Berlin A, Bloomquist R, Carleton KL, Conte MA, D’Cotta H, Eshel O, Gaffney L, Galibert F, Gante HF, et al.: The genomic substrate for adaptive radiation in African cichlid fish. Nature 2014.
]]>
The genomic landscape underlying phenotypic integrity in the face of gene flow in crows https://wp.unil.ch/genomeeee/2015/05/08/the-genomic-landscape-underlying-phenotypic-integrity-in-the-face-of-gene-flow-in-crows/ Fri, 08 May 2015 12:12:53 +0000 http://wp.unil.ch/genomeeee/?p=567 ResearchBlogging.org
The role of interspecific gene flow in species diversification has long been debated and is increasingly appreciated. However, the effect of gene introgression on phenotypic divergences and genome heterogeneity remain unclear in case of early speciation. To investigate these questions Poelstra and colleagues studied the hybrid zone between the all-black carrion crows (Corvus corone) and the gray-coated hooded crows (C. cornix). Indeed, the absence of neutral genetic diversity between these two species and successful back-crossing of hybrids strongly contrast with the plumage coloration polymorphism that remained stable in natural populations. Moreover, colour assortative mating has been observed suggesting a prezygotic isolation and ongoing speciation. To investigate the causes of this stable phenotypes the authors first analysed the effect of gene flow on genome heterogeneity and then tried to link the observed gene flow heterogeneity with gene expression and phenotypes.

Genetic differentiation between hooded and carrion crows

First, they assembled and annotated a high-quality reference genome of one hooded male crow and identified 20’794 protein-coding genes. This reference genome was then used to aligned 60 genomes of unrelated individuals from two populations of carrion and two populations of hooded crows (Fig. 1). They identified 8.44 million of SNPs among which 5.27 millions were shared between carrion and hooded crows. Although the major axes of genetic variation is consistent with the hypothesised expansion out of Spain after the last glaciation maxima the German carrion crows clustered closer with local German hooded crows than Spanish carrion crows (Fig. 1, on which the points represent the genetic distances between population according to the axes 1 & 2 of the principal component analysis and not the geographical location of sampled populations). Gene flow between hooded and carrion crows was also supported by complementary analyses.

Fig. 1. The crow system. Species distribution and genetic distances
Fig. 1. The crow system. Species distribution and genetic distances

 

Gene expression divergence

Further more by estimating gene expression through mRNA sequencing (19 individuals, 5 tissues) they found that only between 0.03% and 0.41% of the genes were differentially expressed between the two species. Interestingly they observed that lots of these genes were implicated in plumage coloration. Especially expression bias in the growing feather follicles from the torso where hooded crows are grey and carrion crows black was predominated by genes implicated in the melanogenesis pigmentation pathway and under-expressed in hooded crows (19 of the 20 identified genes). They confirmed that this expression bias was not due to different melanocyte density (Fig. 4) but rather to a broad scale down-regulation of genes implicated in melanogenesis.

Fig. 4. Characterisation of feather melanocytes. There is no striking differences in melanocytes density.
Fig. 4. Characterisation of feather melanocytes. There is no striking differences in melanocytes density.

 

Genomic divergence and gene expression between the two species

The authors investigated the landscape of genomic divergence with a 50-kb window-based approach (Fig. 2A) and a free clustering phylogenetic reconstruction for each window (Fig 2C). Out of the phylogenetic trees they inferred that only 0.28% of the genome strongly differ between the two species. Both methods revealed a 1.95 MB region exhibiting extreme genetic differentiation and regrouping 81 of all 82 fixed sites between the two species. Moreover, this region showed reduced nucleotide diversity and linkage disequilibrium with two local FST peaks connected by a saddle, revealing a possible inversion.

Fig. 2. Genomic landscape of divergence.  (A) Pairewise genetic differentiation in 50-KB sliding windows across the genome. (B) The largest and most extrem genetic differentiation. (C) Localized phylogenetic patterns within the genome.
Fig. 2. Genomic landscape of divergence. (A) Pairewise genetic differentiation in 50-KB sliding windows across the genome. (B) The largest and most extrem genetic differentiation. (C) Localized phylogenetic patterns within the genome.

In the centre of one of these two FST peaks one region showed evidence for recent positive selection in hooded crows (enriched for fixed hooded crow specific variants and reduced values of Fu and Li’s D statistic (P < 0.05)). This region contains some CACNG genes that code for regulators influencing the transcription factor gene MITF. This transcription factor is a central regulatory element of the melanogenesis pathway (Fig. 3C) and regulates at least 11 melanogenesis genes under-expressed in the hooded crows. Therefore, the authors suggest a link between gene expression, colour phenotype and the signature of local divergent selection. Two others differentially expressed melanogenesis genes were located in divergent genomic regions. Yet, multigenic architecture of the colour trait is consistent with the colour polymorphism observed in the hybrids.

Fig. 3. The functionnal genomic basis of plumage colour differences. (A) Feather follicules used for gene expression. (B) Percentage of all expressed genes (white) and melanogenesis genes (striped) inferred to be differentially expressed. (C) Schematic overview of the melanogenesis pathway.
Fig. 3. The functionnal genomic basis of plumage colour differences. (A) Feather follicules used for gene expression. (B) Percentage of all expressed genes (white) and melanogenesis genes (striped) inferred to be differentially expressed. (C) Schematic overview of the melanogenesis pathway.

One gene stepped aside of the other one; the gene RGS9 which play a role in visual perception in vertebrates. This gene was underexpressed in hooded crows (including expression in eye). Besides its implication in visual perception alternative splicing forms (present in crows) of RGS9 play a role in dopamine regulation and opioid signalling in the brain and therefore may influence the observed assortative mating.

To conclude this paper showed that small local peaks of divergence (less than 1% of the genome, also called “speciation island”) is sufficient to maintain strong phenotypic differences between both species despite considerable gene flow. Moreover, they showed that assortative mating and sexual selection can exclusively cause phenotypic differentiation and speciation as there is apparently no ecological selection between the two species.

Personal observations

This paper nicely linked local genetic differences, regulatory pathway, gene expression and finally phenotypic differentiation between two closely related species. Clearly the authors had to produced an important and complete work and were somehow lucky to find a straightforward explanation. Yet, they provided evidences for several debated questions, as speciation without ecological selection, gene flow heterogeneity and “speciation island”, species identity even under important gene flow and the probable role of inversions in evolution.

Poelstra, J., Vijay, N., Bossu, C., Lantz, H., Ryll, B., Muller, I., Baglione, V., Unneberg, P., Wikelski, M., Grabherr, M., & Wolf, J. (2014). The genomic landscape underlying phenotypic integrity in the face of gene flow in crows Science, 344 (6190), 1410-1414 DOI: 10.1126/science.1253226

]]>
Crossovers are associated with mutation and biased gene conversion at recombination hotspots https://wp.unil.ch/genomeeee/2015/05/07/crossovers-are-associated-with-mutation-and-biased-gene-conversion-at-recombination-hotspots/ Thu, 07 May 2015 16:43:11 +0000 http://wp.unil.ch/genomeeee/?p=487 ResearchBlogging.org

Meiosis is an important biological process by which combination of various types of the genes called alleles, are segregated and packed in each germ cell waiting to be transferred and expressed in descendants. This combinations of alleles are products of chromosomal crossovers (COs) during meiotic recombination, which increases the genetic diversity of gametes. Recombination may cause local mutagenic effect at crossover sites with recurrent double strand breaks (DSBs) and thus be the source of sequence variation too.

SUMMARY OF THE PAPER

By sequencing a large number of single sperm DNA molecules, the authors showed that meiosis is an important source of germline mutations and consequently gene variation. They found more de novo mutations in molecules with COs then in molecules without a recombination event by amplifying single CO products, using allele-specific PCR, at two previously identified recombination hotspots (HSI and HSII) from a pool of sperm. The binding site used by the human recombination machinery contains PRDM9 (PR Domain Containing 9), very polymorphic in humans. In order to investigate why sequence diversity positively correlates with high recombination activity regions, the authors sequenced 5,796 COs in total, including both reciprocal recombination products from 6 Caucasian donors. As a control they screened single nonrecombinants (NRs) in the same region and subset of donors using the same experimental conditions.
To adjust CO mutation frequencies, authors used mutation frequency of NRs as a control, which are combination of rare de novo mutations and PCR artefacts. COs had a mutation frequency nearly 3.6 times higher than NR control (0.29% de novo mutations per CO of which 50% occurred between the DSB and the CO breakpoint, 348 nucleotides away from hotspot center (Fig. 1A), and it was similar for both hotspots (HSI and HSII), but most of the donors actually came from HSI, so the authors focused more on data analysis of this hotspot.

Figure 1. COs, mutations, and CCOs in HSI.

Fig.1

This also suggests that the observed mutations are associated with CO formation and they are independent of other site-specific factors such as base composition. Mutation rate at hotspots (? HS) showed that more active hotspots exert a stronger mutagenic effect than weaker hotspots. All the observed de novo mutations changed strong (S) CG into weak (W) TA base pairs and they all occurred mainly at CpG sites. This strong mutational bias at CpGs are not exclusive to COs. CpG dinucleotides generally have high mutation rates, but what explains high mutation level at COs containing CpG sites is that single stranded DNA arised from double strand break (DSB), is more susceptible for chemical modification leading to G ? T conversion on single-stranded DNA, and repair of those mismatched base pairs is only possible by the recognition from repair machinery bounded for double stranded, which is not the case for single stranded resected DNA 3’-ends created after DSB (Fig. 2A). If the formation of single-stranded DNA at methylated CpG sites is the main driver for de novo mutations, then DSBs resolved alternatively as noncrossovers (NCOs) might also have a higher frequency of the mutations.

Figure 2. Model of CO-driven evolution.

Fig.2

Considering hotspot CpG sites, the authors also observed very high methylation level (83-88%), both, in testis and in sperm, pointing on the cellular states before and after meiosis, respectively. The authors also suggested that non-Mendelian segregation of alleles at hotspots, arised during DSB repair could be either a result of an initiation bias, in which DSB-suppressing alleles are used to repair the broken homolog, or gene conversion favoring GC-alleles leading to GC-biased gene conversion (g-BGC).
Biased transmission is favorised for GC alleles representative for g-BGC, rather than for an initiation bias. In the authors data it is shown that all of the donors are homozygous at the DSB site which makes initiation bias unlikely.  Sites with the strongest evidence for unequal transmission favor strong (GC) vs weak (AT) alleles (Fig 1B). Authors observed that transmission bias affects also single nucleotide polymorphisms (SNPs), which can be preferentially transferred, regardless of whether they are near or away from DSB (Fig.1C). Initiation bias do not favor strong over weak alleles in humans and yeast, whereas gBGC does. Another potential source of gBGC in the 6,085 mapped COs is the formation of COs with rare, discontinuous conversion tracks containing two CO breakpoints, and it is called complex COs (CCOs). Heteroduplex tracts are formed in polymorphic regions during DSB. If there acts gBGC, COs will tend to include strong alleles and exclude weak alleles (Fig. 2B). How fast crossing over drives the decimation of a hotspot via mutagenesis and gBGC depends on how sequence changes affect CO frequencies, which is still a mystery.

CONCLUSIONS
This study contributes to the understanding of the sequence evolution at recombination hotspots. Authors suggested that GC-biased gene conversion (gBGC) is the dominant force shaping the nucleotide composition at hotspots during crossing over, and potentially in other recombination products, which might explain the high GC content associated with recombination. It is possible that gBGC is an adaptation to reduce the mutational load of recombination, knowing that mutation favors weak over strong nucleotides. Still, small sample size gives a little power for detecting potential differences between COs and NCOs.

Arbeithuber, B., Betancourt, A., Ebner, T., & Tiemann-Boege, I. (2015). Crossovers are associated with mutation and biased gene conversion at recombination hotspots Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1416622112

]]>