Identification of a large set of rare complete human knockouts

ResearchBlogging.org

High throughput genotyping and sequencing has led to the discovery of numerous sequence variants associated to human traits and diseases. An important type of variants involved are Loss of Function (LoF) mutations (frameshift indels, stop-gain and essential sites variants), which are predicted to completely disrupt the function of protein-coding genes. In case of Mendelian recessive diseases, for the condition to occur, the LoF variants must be biallelic, i.e. affecting both copies of a gene. The affected gene is then defined as “knockout”.

By studying the Icelandic population, authors aim to identify rare LoF mutations (Minor Allele Frequency, MAF < 2%) present in individuals participating in various disease projects. They then investigate at which frequency in the population these LoF mutations are homozygous (i.e. knockout) in the germline genome.

The Icelandic population Iceland is well-suited for genetic studies for three main reasons. The island was colonized by human population around the 9th century by 8-20 thousand settlers. Since then the population grew to around 320’000 inhabitants today. The initial founder effect and rare genetic admixture make the Icelandic population a genetic isolate. In addition to an unusual genetic isolation, Iceland’s population benefits of a genealogical database containing family histories reaching centuries back in time, as well as a broad access to nationwide healthcare information.

These characteristics led to the development of large-scale genomic studies of Icelanders by deCODE Genetics. This biopharmaceutical company has published various studies, including this paper, related to genetic variants and diseases in Icelanders.

Loss of function mutation and rare complete knockouts Authors sequenced the whole genome of 2’626 Icelanders participating in various disease projects and identified variants in protein coding genes. These variants were annotated with the predicted impact that they have on the gene: LoF, moderate or low impact. A total of 6’795 LoF mutations in 4’924 genes were identified, with most of these variants (6’285) being rare (MAF < 2%).

The identified LoF variants were imputed into an additional 101’584 chip-genotyped and phased Icelanders, allowing the identification of the number of knockout genes in the population. Authors found that 1’485 previously identified LoF mutations (MAF <2%) are contributing to the knockout of 1’171 genes and that 8’041 individuals possess at least 1 of these knockout genes. Out of these 1’171 genes, 88 had been already linked by previous studies to conditions through a recessive mode of inheritance.

Double transmission deficit of LoF variants Because knockout genes should be deleterious for an organisms, we expect a deficit of homozygous for these genes in the population due to embryonic/fetal, perinatal or juvenile lethality. To investigate whether such a deficit was present, authors calculated the transmission probability of LoF variants from parents to their offspring.

Under Mendelian inheritance, the expected percent of transmission of the LoF mutated gene from heterozygous parents to their offspring (i.e. double transmission) is of 25%. However, results show a statistically significant deficit in double transmission, the observed double transmission probability being of 23.6%.

The rare LoF mutations were ranked according to the Residual Variation Intolerance Score (RVIS) percentiles and essentiality score percentiles. Both measures attempt to classify genes according to their tolerance to functional variation, with the lowest rank corresponding to genes being more sensitive to mutations. As expected, the lowest double transmission rate was found for the most sensitive genes (first percentile), suggesting that a homozygous state of LoF mutation in these genes is deleterious.

Tissue specific expression of knockout genes Authors investigated if genes were more likely to be knockout when expressed in specific tissues. By retrieving the information from previous studies of the number of genes that are highly expressed in 1 or more – but not all – 27 tissues, they calculated the fraction of these genes that were knockout in each tissue. They found that the brain and placenta were the tissue with the lowest fraction of knockout genes (3.1% and 3.9%, respectively), and that in testis, small intestine and duodenum were observed the highest fraction of biallelic LoF mutations (5.8%, 6.4%, and 6.9% respectively).

Conclusion and Comments The characteristics of Icelandic population and the incredibly large sample size (~ 1/3 of the total population) allowed authors to identify a large number of new and rare LoF mutations. Part of these mutations was shown to contribute to the knockout of an unexpected large number of genes in an unexpected large number of people. This study is the first to shed a light on the astonishing number of knockout present in human populations. In addition, by investigating the transmission probability, a deficit in homozygous loss-of function offspring was identified, especially when LoF mutations affected essential genes. This result was expected because of the predicted deleterious effect of biallelic LoF mutations.

Besides the aforementioned interesting results of the paper, some aspects were slightly disappointing. First, I was expecting authors to focus more on the genotype-phenotype aspects. Even if they pinpoint a deficit in double transmission, suggesting deleterious consequences for the organism, authors did not discuss the function of the identified knockout genes and their effect on the phenotype. Second, the paper was not an easy read. Many results were only mentioned without additional information on the methods or data used, and it was sometimes difficult to link them with the main aim of the study. Additionally, figures were sometimes misleading because of different axis scales or incomplete legends.

Finally, authors suggested that important tissues, such as the brain, have a lesser number of knockout compared to other tissues, writing that “genes that are highly expressed in the brain are less often completely knocked out than other genes”. However, this result is questionable as we do not have any measure of the number of knockout genes that we expect to be expressed only by chance in the tissues. In other words, the brain could have a lower number of knockout genes expressed compared to other tissues only because the total number of expressed genes in the brain is lower. Therefore we do not know if the lower number of knockout genes in the brain is due to chance or to biological reasons.

Nevertheless, this study opens the door to understanding how many knockout genes occur without phenotypic consequences in humans, what are the genes function and essentiality, and the role of the environment in the buildup of phenotype. The classical search for genetic variants associated to a phenotype, as in GWAS studies, could be reversed by first identifying individuals with the same genetic variants and then precisely phenotyping them.

Sulem, P., Helgason, H., Oddson, A., Stefansson, H., Gudjonsson, S., Zink, F., Hjartarson, E., Sigurdsson, G., Jonasdottir, A., Jonasdottir, A., Sigurdsson, A., Magnusson, O., Kong, A., Helgason, A., Holm, H., Thorsteinsdottir, U., Masson, G., Gudbjartsson, D., & Stefansson, K. (2015). Identification of a large set of rare complete human knockouts Nature Genetics, 47 (5), 448-452 DOI: 10.1038/ng.3243