Towards discovering the evolutionary history of dogs and dog domestication process – Tutorial Genomics, Ecology, Evolution, etc

In the times in which publishing a paper is not always totally related with the new knowledge´s contribution but leaded by how the researchers sell this new knowledge, it is quite refreshing to find a paper that outstands because deals with a relevant topic, combines several methods, and shows coherence and consistency.

The aim of this study was to underlie dog domestication and reconstructing the early evolutionary history of dogs. To accomplish this aim, the authors built a demographic model. As you might know a demographic model involves variables such as the effective population size, bottlenecks, migration and gene flow.

Within the main findings, the authors determined some bottlenecks in dogs and wolves. These bottlenecks were stronger in dogs than in wolves, and it showed that the strength of the bottlenecks in previous studies was underestimated. In addition, it was also shown that dogs and wolves had a common ancestral lineage and the tree shows that the dog form a distinct clade than the wolves, none of the wolf lineages from the hypothesized domestication centers is supported as the source lineage for dogs. Moreover, disregarding from where exactly the dogs come from, the dogs diverged from the wolf population in the same time these wolf populations diverged from each other. The dogs and wolves diverged 11,000-16,000 years ago, which would place domestication prior to the extensive adoption of agriculture, in a process involving extensive admixtures and that was followed by a bottleneck in wolves. The demographic model states that the divergence between dogs and wolves was relatively recent and then a strong decrease in the population size happened.

The data used were the sequenced genomes of six canid individuals, including three wolves (Canis lupus), an Australian Dingo, a Basenji, and a golden jackal (Canis aureus). The three wolves were chosen to represent the regions of Eurasia where the process of domestication allegedly began (Europe, the Middle East, and East/Southest Asia); specifically the samples were collected from Croatia, Israel, and China. The Dingo and Basenji, represent lineages relative to the reference Boxer genome, and were considered to maximize the chance of capturing different alleles present in the earliest dogs because these dogs due to their geographic isolation are less likely to share the same geographic area and admixed with wolves recently. For some analyses the authors used leverage data of 12 additional dogs breed from other study.

Although the two criteria to choose the used sample seems well justified, I am not convinced that the sample size was big enough to make a strong conclusion about the dogs’ evolutionary history, disregarding if this full coverage of the genome of the species used here represented processing a great amount of data. I think the effect of the sample size became evident when they had to select the best demographic model where the three models appeared to be fairly compatible and the differences of the absolute error among the models were small. Maybe to compensate this lack of bigger sample size, they could try to remove one of the samples and observe how this affects the models? If there is an effect then this could mean that adding more samples will tell us a different evolutionary history.

To compute the effective population size (Ne) the authors used two methods. The first one was genome-wide patterns of heterozygosity to compute the mean heterozygosity. However, then the pairwise sequential Markovian coalescent (PSMC) had to be used to understand the changes in the population size. The PSMC helps in the understanding the population size changes because it gives the mutation-scales estimates (in years) and the population size (number of individuals). Before building the phylogenetic tree the researchers used the pairwise sequence divergence, which captures mean coalescent times that are robust for the incomplete lineage sorting (ILS) and the gene flow, to avoid the effect of these factors that arise from large ancestral population sized in the tree. The tree then was built using neighbor-joining (NJ) using a conservative estimator of the genome-wide pairwise sequence divergence for all pairs of the samples. The confidence of the tree was computed by bootstrapping, and the robustness was evaluated by an estimator of sequence divergence for which all possible mismatches between alleles from a pair of individuals were counted. Afterwards, the researchers used the non-parametric method “ABBA-BABA” to eliminate the post-divergence gene flow because it would interfere in determining the divergence between dogs and wolves, and to detect the admixture among the samples. With the previously computed demographic variables the authors formulated the demographic model using the Generalized Phylogenetic Coalescent Sampler (G-PhoCS) because it considers ILS and post-divergence gene flow, and because it has a better resolution power than “ABBA-BABA”. G-PhoCS and “ABBA-BABA” got the same gene flow, which shows consistency, but the G-PhoCS also found extra gene flow that were not detected by “ABBA-BABA”. After contrasting the G-PhoCS and PSMC it was clear that G-PhoCS gave a better demographic model because detected rapid bottleneck that PSMC could not. The divergence between dogs and wolves, between wolves, and between dogs were computed and validated. The G-PhoCS reflects the population phylogeny obtained by NJ, and to test its robustness the authors compared this demographic model with other two models. The difference between these models is how each one represents the relationship between dogs and wolves. The first, dog/wolf reciprocal monophyly, is the one that best supported by the genome-wide sequence divergence; the second, the regional domestication tree, represents that each dog population originated from the wolf population corresponding to its geographical origin; and the third model, the single wolf lineage origin model, states that dogs diverged most recently from the Israeli wolf lineage (ISW-source). To select the best model, the authors compared the topologies by statistical methods and ran simulations under each model because G-ProCS has a lack of statistical test for model selection. The best model was the reciprocal monophyly model because showed the lowest discrepancy, measured in absolute error, among the data and the model.

Regarding the domestication process, besides obtaining an estimate of when this process happened, they performed a real-time quantitative PCR (qPCR) by which they could evaluate the variation of the amount of amylase (AMY2B) among dogs and wolves to determine if the hypothesis that states that the amylase increased in dogs due to the fact that they started being close to human during the agricultural period.

This paper shows coherence and consistency because gives clear details of how the authors retrieved the data, how they process this data for generating high-quality genome sequences, (low genotype error rate and PCA to show that the assembling of the sequence was fine), what methods they used, why and how they combined them, what are the strengths and limitation of each method. All these factors show their intention of prioritizing the knowledge against all desire to agree with their hypothesis (or previous hypothesis)

I think that the scientific approach is elegant. I use the word elegant because this paper utilizes many complementary methods (the limitations of some methods motivated the use of others to compensate the previous methods) taking advantages of their particular properties to maximize the insights of this research decreasing or, at least, being aware of the factors (controllable or uncontrollable) that could affect the outcomes. As well, compares the results obtained with different methods to validate the outcomes. This research approach gave a clear impression that all what it mattered was the knowledge.

It is remarkable to see the capacity of the authors to highlight their findings, develop possible explanations and theories without exaggerating, or over explaining them.

Furthermore, the supplementary material gives so many details that you could replicate this study to face similar problematic in the field.

As a drawback I could affirm that reading this paper could be a bit challenging because of the how the authors organized their content, requiring being very concentrated if you do not want to get lost! I state this because as it involved several methods, these methods are continuously compared for validating their outcomes, so this could imply, in certain cases, recalling methods that were already left behind so you might have to go backwards to remember the purpose of each method to understand the current section. Furthermore, sometimes when comparing the methods they provide many details that you are not sure if they are taking about one method or the other.

The authors inferred that the dogs had a pre-agricultural origin based on the computed evolutionary time. That the domestication happened in the agricultural times was based on the amount of AMY2B in dogs. The dogs would start to have greater amounts of this amylase because of the kind of diet that they ingested due to their closeness to humans. Nevertheless, this theory shakes because dingo has just two copies of that the AMY2B copy number expansion was not fixed across all dogs early in the domestication process, and other dogs had different amounts of copies. In fact, the qPCR show that modern dogs have a high copy number whereas the wolves and dingoes do not, yet the AMY2B expansion is polymorphic in wolves, and exits variation in the number of copies among wolves and thus the amylase is not restricted to dogs. As a consequence, using the AMY2B is insufficient to explain domestication.

To conclude I highly recommend readying this paper because although the authors could not fully answer how the evolution of dogs’ and the domestication process happened we definitely got new knowledge that could be used as basis for further studies. Disregarding whether you agree or disagree with the interpretation of the findings you for sure will learn from how they approached this genetic demography topic. Indeed, if you are not a bioinformatician you will have a feeling of what is done in this field; on the other hand, if you are a bioinformatician, then you could get some insights of how to apply in a complementary way several methods, which is an uncommon approach in bioinformatics.

Freedman, A., Gronau, I., Schweizer, R., Ortega-Del Vecchyo, D., Han, E., Silva, P., Galaverni, M., Fan, Z., Marx, P., Lorente-Galdos, B., Beale, H., Ramirez, O., Hormozdiari, F., Alkan, C., Vilà, C., Squire, K., Geffen, E., Kusak, J., Boyko, A., Parker, H., Lee, C., Tadigotla, V., Siepel, A., Bustamante, C., Harkins, T., Nelson, S., Ostrander, E., Marques-Bonet, T., Wayne, R., & Novembre, J. (2014). Genome Sequencing Highlights the Dynamic Early History of Dogs PLoS Genetics, 10 (1) DOI: 10.1371/journal.pgen.1004016

Related Posts

The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish.

The parallel evolution in amniotes seen through the eye of functional nodal mutations

Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals