Difference between revisions of "Heritability of BMI"

 
(73 intermediate revisions by 2 users not shown)
Line 2: Line 2:
 
heritability says about it
 
heritability says about it
  
*Assistant: Sofia Ortin Vela (sofia.ortinvela_AT_unil.ch)
+
*Tutor: Sofia Ortin Vela (sofia.ortinvela_AT_unil.ch)
  
 
*Slides: [[File:BMI heritability.pdf|thumb]]
 
*Slides: [[File:BMI heritability.pdf|thumb]]
 +
 +
 +
Heritability of BMI | In a search of a relevant phenotype for normalized weight, and what
 +
 +
 +
 +
====  Introduction ====
 +
 +
Most common human traits and diseases have a polygenic pattern of inheritance. The variation of DNA sequence at many genetic loci, influence the phenotype. GWAS have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation.
 +
 +
 +
====  Background ====
 +
 +
<big>'''Previous research'''</big>
 +
 +
A GWAS is an analysis of many genetic variations in many individuals to study their correlations with phenotypic traits. GWAS have already been done on height, weight and BMI. The development of genome-wide association studies has been made possible by advances in genotyping technology, and has greatly accelerated gene discovery. GWAS studies have identified many genes with strong associations for phenotypic traits such as diseases. GWAS generally focus on associations between SNPs and phenotypes. GWAS have already been carried out on height, weight and BMI.
 +
 +
'''GWAS on height'''
 +
 +
First, a study showed that about 180 loci influence adult height. The most strongly connected genes include the Hedgehog and TGF-b signaling pathways. These signaling pathways are involved in chondrocyte proliferation and differentiation, growth plate signaling and bone formation. Other genes such as ECM2 involved in the formation of the extracellular matrix influence size. We also know that mutations in the STAT2 and FGFR3 genes cause growth failure and skeletal dysplasia as well as dwarfism.
 +
 +
'''GWAS on weight'''
 +
 +
Then, many genes related to the weight were found in a GWAS including covariants such as for example triglyceride level, or BMI. Two main genes have been found: the FTO and the IRS1.
 +
 +
'''GWAS on BMI'''
 +
 +
Next, a bigger cluster of weight-related SNPs are located in intron 1 of the fat mass and obesity associated FTO gene. The FTO gene encodes an RNA demethylase, and is the most associated SNP with obesity throughout life and across generations. SNPs at this locus have also been associated with other specificities such as type 2 diabetes, osteoarthritis or cardiometabolic characteristics. This is implied by the effect of the FTO gene on BMI. The study concludes that FTO increases the risk of obesity through changes in food consumption and preference. 23 other SNPs were studied such as BDNF, FAIM2, TFAP2B, FTO.
 +
 +
More studies on BMI have shown that other genes such as NEGR1 were also associated with BMI. We know that this gene is highly expressed in the brain and has a role in the body weight and food intake.
 +
Identifying the genetic determinants of BMI could lead for example to a better understanding of the biological basis of obesity.
 +
 +
[[File:BMI_Research.pdf|BMI_Research]]
 +
 +
'''Study on heritability'''
 +
 +
Estimation of the heritability of BMI differ between experimental designs and also because of the different types of heritability. However, the results of the studies show that the heritability of SNPs for BMI is greater than 0.2, approximately half of the heritability of height which is greater than 0.5.
 +
 +
 +
====  Aim of the project ====
 +
 +
But it is also known that studies report that BMI as a measure of body fat is inaccurate and can lead to bias in measuring the health effects of obesity. The problems arise because BMI does not take into account the difference between fat and non-fat mass, such as bone and muscle, and also does not include changes in body composition that occur with age. For example, very muscular people may have very low body fat, but their BMI puts them in the obese category. There is also the fact that very tall people have a BMI that is too high in comparison to their body fat. There are therefore limitations to the use of BMI as a measure of body fat.
 +
 +
A question arises here, why not use another formula than the BMI. The BMI is a function of height and weight, but it is associated with different genes. We can therefore assume that other phenotypes based on height and weight could also bring different biological results. With a power of 3, we know that this is the formula for the ponderal index, generally used for babies. Since there is no GWAS on the PI, we can wonder if it has something relevant. Based on the BMI formula, many other phenotypes would be possible.
 +
 +
So, the goal of the project is to see if other phenotypes based on height and weight show better signals than BMI and bring different biology by looking at related genes and pathways. Heritability is also an aspect to determine if a phenotype seems relevant or not.
 +
 +
 +
====  Methodology ====
 +
 +
<big>'''Data Exploration'''</big>
 +
 +
'''Size of our sample '''
 +
 +
As there was a lot of incomplete data at the genotype level, the dataset was cleaned to remove all individuals without genotype information.
 +
 +
Our sample size is different from the number of data in the UKB database. This is due to the fact that there is not necessarily the genotype collected for each individual. In the weight dataset  in the UK biobank, we have 499,806 individuals. While in our sample we have only 45829 individuals. lees data we just have the data who came for the 3rd time. BMI data height data from the third time. It limits our number.
 +
 +
 +
'''For height and weight independently'''
 +
 +
[[File:Data_exploration.pdf|Data_exploration]]
 +
 +
'''For height and weight together'''
 +
 +
[[File:fig_pathwayMirrorBarChart_tau3_PCA_median_tortuosity.pdf|images]]
 +
 +
 +
<big>'''Choose phenotypes and covariants'''</big>
 +
 +
'''Phenotypes'''
 +
 +
From the basic BMI formula we can create new phenotypes. Thus we have the weight to the mu power divided by the height to the gamma power. This way we get data in 5 dimensions. It is necessary to reduce the complexity, so we decided to remove the mu variable. The reason for this choice is that size is more robust and closer to normalisation than weight.
 +
 +
[[File:pheno.pdf|Formula_Pheno]]
 +
 +
[[File:3D_plot.pdf|3D-plot]]
 +
 +
Then, by varying the gamma variable, we can observe the variation of our phenotypes using a 3D graph.  The graph uses the weight as a constant, here we chose 50kg. We made this decision because by changing the weight, the curves will simply be shifted while keeping the same distribution.
 +
According to our research, by choosing a value for gamma that was too high, it was not possible to discriminate the whole population.  Thus, by increasing the gamma too much, we get an almost binary group classification, which we should avoid.
 +
 +
Moreover, if we choose a gamma value lower than zero, we end up with a multiplication of height by weight. By multiplying the two variables we lose information because a short fat person will get the same result as a tall thin person.
 +
 +
Based on these results, we decided to choose values for gamma in the range [1;3.5]. This allows us to have a phenotype that represents the population well without group classification.
 +
In particular, eight phenotypes were selected: [1.4; 1.618; 1.8; exp(1); 2.2; 2.4; 2.5; 3]
 +
Some of its values are intermediate and others such as the exponential of 1 has more decimal places to see if this changed the result. For each of the phenotypes, we observed their distribution using a histogram and graph 2 to confirm whether they best represented the population.
 +
 +
 +
'''Covariants'''
 +
Once the phenotypes were selected, the selection of covariates was not made from the GWAS default covariates in order to avoid inflecting the result.
 +
Each covariant that could have an impact on our phenotypes was selected from the UKbiobank. The Pearson correlation between each of these covariates and BMI was examined. The most correlated variants were selected as age, processed meat consumption or gender. In addition, the principal components of the whole genotypic profile were added to the GWAS even though we could not check their correlation, as they are sensitive data.
 +
 +
[[File:Correlation_Matrix.pdf|Correlation_Matrix]]
 +
 +
 +
<big>'''Make a GWAS'''</big>
 +
 +
File with -999 where values are missing. This format is needed to run the GWAS on Jura using BGENIE, which is a program for efficient GWAS for multiple continuous traits focussed on (the BGEN) format files used to store the UK Biobank genetic data. Also, phenotypes are previously normalized to avoid having any residuals.
 +
 +
GWAS are based on multiple linear regression, the equation can be seen on the following file:
 +
 +
[[File:Multiple_linear_regression_for_GWAS.pdf|MLR_GWAS]]
 +
 +
The slope is found by linear regression.
 +
For the genotype, alleles are encoded with 0,1 and 2 corresponding to the number of the less frequent allele (0 and 2 homozygous, 1 heterozygous)
 +
 +
 +
====  Results of the GWAS ====
 +
 +
<big>'''SNPs: Manhattan plot, QQplots'''</big>
 +
 +
SNPs are positioned along the x-axis according to chromosomal position on the Manhattan plot. On the y-axis is the negative log of the SNP's associated P-value. SNPs with the lowest P-value significance are positioned at the top of the graph. (highest association with our phenotype)
 +
The Bonferroni threshold (2.2 × 10 -10 ) is indicated by the red line and the blue line corresponds to the genome-wide significance threshold (5 × 10 -8 ). The Bonferroni correction  normalizes the alpha significance level by the number of tests performed.
 +
 +
For the Manhattan plot of BMI, we can see that we have a good signal for chromosome 16 as shown by the blue box.
 +
The Manhattan of the other phenotypes have relatively similar results with less signal for the SNPs than the one of BMI. The plots are more scattered and noisy.
 +
 +
This can be explained by the fact that we have not yet pruned our results because we have not yet corrected for the linkage disequilibrium (LD).  The disequilibrium may be due to many different factors such as selection, the rate of genetic recombination, mutation rate, genetic drift, the system of mating, population structure, and genetic linkage. We will pruned it to remove the disequilibrium.
 +
 +
[[File:Manhattan_plot.pdf|Manhattan_plots]]
 +
 +
 +
<big>'''Heritability'''</big>
 +
 +
To determine whether our phenotypes have a certain heritability we used SNP-based heritability. Heritability can be determined from GWAS using GitHub containing the LDscore software. It aims to determine the proportion of phenotypic variance in the population explained by a good linear predictor composed of common SNPs. The average conditional variance explained by each SNP in the regression is estimated.
 +
 +
Comparing our results with other heritability research on height or BMI, we find that our heritability percentages are much lower. This can be explained by a lack of individuals, the sample size is not large enough to be representative of reality. However, the results still give an insight into the most heritable phenotype.
 +
 +
In particular, we observe that height has a higher heritability than weight, which is in line with our expectations. Surprisingly, we observe that all our other phenotypes have a very low percentage compared to the BMI value. Perhaps BMI is an ideal value.
 +
 +
 +
<big>'''Genes'''</big>
 +
 +
Then to obtain information on the genes associated with the different phenotypes we used the platform FUMAGWAS. To verify the results, the SNPs obtained were pruned and searched in the GWAS catalog. We obtained the same genes as with FUMAGWAS.
 +
 +
The Manhattan plots show the significance of the association between all genes and BMI or the phenotypes.First, we see that the FTO gene is associated with BMI. This is consistent with previous research. Indeed, variants of this gene would be involved in obesity. Secondly, the results obtained show us that there is no common gene associated with BMI and the other phenotypes we have chosen. There are different genes found depending on the index used in the formula. Phenotypes with an index closer to 1 have genes in common. That’s the same for phenotypes with an index closer to 2, they have more genes in common. Genes in common between phenotypes with an index close to 1 are the CHST4 gene codes for a protein that plays a central role in the traffic of lymphocytes during chronic inflammation. The UCP2 gene would play a role in thermogenesis, obesity and diabetes. So, this raises the question of why this UCP2 gene is not associated with BMI given its role in obesity. Others analysis would therefore be necessary in order to make conclusions.
 +
 +
Then, concerning the other phenotypes with an index close to 2, they also have several genes in common. A new gene that was not seen in the graphs before is the FGG gene in green. And for the phenotype with an index of 3, which corresponds to the ponderal index, is associated almost with the same genes than before and in addition with the UGTA gene in orange.
 +
 +
With the results obtained of the genes according to the phenotypes, we see that there are no genes that are in common with the previous studies on BMI. So, more studies would be needed to validate and investigate if our results are relevant or not.
 +
 +
[[File:Genes.pdf|Genes]]
 +
 +
 +
 +
<big>'''Pathways'''</big>
 +
 +
To obtain the pathways we used the software PascalX (Pascal for Pathway scoring algorithm). More information about this software can be found at https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004714 (Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics). Only pathways associated with height were found significant (see file below). The data we used was significantly reduced and might have been too small to find a pathway level signal. Using a more complete data, this analysis should be repeated.
 +
 +
[[File:PathwayResults.pdf|Pathway_Results]]
 +
 +
 +
====  Conclusion ====
 +
So in conclusion this work allowed us to perform a GWAS on a large population using selected phenotypes based on height and weight. However we noticed when we did the heritability that the power was too low and that it really matters to have a good power.  Therefore, we cannot conclude that our phenotype is relevant due to the low power. We have  found different genes for the phenotypes  which means that they are index depending.  But we cannot make conclusions on the genes  as we are missing some power. There are several elements that could be improved in our study, including making a large GWAS. We tried to run the big GWAS a week ago but the JURA programme was so slow and it stopped every time before we got the results for all the chromosomes. The study could also be repeated by separating men and women, with a larger size of sample. If we had more time we could test many more phenotypes without selecting some specific values for gamma. We could also do more experiments on the gene UCP2 that we have found in our phenotypes but not in BMI.

Latest revision as of 19:06, 4 June 2021

  • Project name: Heritability of BMI - in a search of a relevant phenotype for normalized weight, and what

heritability says about it

  • Tutor: Sofia Ortin Vela (sofia.ortinvela_AT_unil.ch)


Heritability of BMI | In a search of a relevant phenotype for normalized weight, and what


Introduction

Most common human traits and diseases have a polygenic pattern of inheritance. The variation of DNA sequence at many genetic loci, influence the phenotype. GWAS have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation.


Background

Previous research

A GWAS is an analysis of many genetic variations in many individuals to study their correlations with phenotypic traits. GWAS have already been done on height, weight and BMI. The development of genome-wide association studies has been made possible by advances in genotyping technology, and has greatly accelerated gene discovery. GWAS studies have identified many genes with strong associations for phenotypic traits such as diseases. GWAS generally focus on associations between SNPs and phenotypes. GWAS have already been carried out on height, weight and BMI.

GWAS on height

First, a study showed that about 180 loci influence adult height. The most strongly connected genes include the Hedgehog and TGF-b signaling pathways. These signaling pathways are involved in chondrocyte proliferation and differentiation, growth plate signaling and bone formation. Other genes such as ECM2 involved in the formation of the extracellular matrix influence size. We also know that mutations in the STAT2 and FGFR3 genes cause growth failure and skeletal dysplasia as well as dwarfism.

GWAS on weight

Then, many genes related to the weight were found in a GWAS including covariants such as for example triglyceride level, or BMI. Two main genes have been found: the FTO and the IRS1.

GWAS on BMI

Next, a bigger cluster of weight-related SNPs are located in intron 1 of the fat mass and obesity associated FTO gene. The FTO gene encodes an RNA demethylase, and is the most associated SNP with obesity throughout life and across generations. SNPs at this locus have also been associated with other specificities such as type 2 diabetes, osteoarthritis or cardiometabolic characteristics. This is implied by the effect of the FTO gene on BMI. The study concludes that FTO increases the risk of obesity through changes in food consumption and preference. 23 other SNPs were studied such as BDNF, FAIM2, TFAP2B, FTO.

More studies on BMI have shown that other genes such as NEGR1 were also associated with BMI. We know that this gene is highly expressed in the brain and has a role in the body weight and food intake. Identifying the genetic determinants of BMI could lead for example to a better understanding of the biological basis of obesity.

File:BMI Research.pdf

Study on heritability

Estimation of the heritability of BMI differ between experimental designs and also because of the different types of heritability. However, the results of the studies show that the heritability of SNPs for BMI is greater than 0.2, approximately half of the heritability of height which is greater than 0.5.


Aim of the project

But it is also known that studies report that BMI as a measure of body fat is inaccurate and can lead to bias in measuring the health effects of obesity. The problems arise because BMI does not take into account the difference between fat and non-fat mass, such as bone and muscle, and also does not include changes in body composition that occur with age. For example, very muscular people may have very low body fat, but their BMI puts them in the obese category. There is also the fact that very tall people have a BMI that is too high in comparison to their body fat. There are therefore limitations to the use of BMI as a measure of body fat.

A question arises here, why not use another formula than the BMI. The BMI is a function of height and weight, but it is associated with different genes. We can therefore assume that other phenotypes based on height and weight could also bring different biological results. With a power of 3, we know that this is the formula for the ponderal index, generally used for babies. Since there is no GWAS on the PI, we can wonder if it has something relevant. Based on the BMI formula, many other phenotypes would be possible.

So, the goal of the project is to see if other phenotypes based on height and weight show better signals than BMI and bring different biology by looking at related genes and pathways. Heritability is also an aspect to determine if a phenotype seems relevant or not.


Methodology

Data Exploration

Size of our sample

As there was a lot of incomplete data at the genotype level, the dataset was cleaned to remove all individuals without genotype information.

Our sample size is different from the number of data in the UKB database. This is due to the fact that there is not necessarily the genotype collected for each individual. In the weight dataset in the UK biobank, we have 499,806 individuals. While in our sample we have only 45829 individuals. lees data we just have the data who came for the 3rd time. BMI data height data from the third time. It limits our number.


For height and weight independently

File:Data exploration.pdf

For height and weight together

File:Fig pathwayMirrorBarChart tau3 PCA median tortuosity.pdf


Choose phenotypes and covariants

Phenotypes

From the basic BMI formula we can create new phenotypes. Thus we have the weight to the mu power divided by the height to the gamma power. This way we get data in 5 dimensions. It is necessary to reduce the complexity, so we decided to remove the mu variable. The reason for this choice is that size is more robust and closer to normalisation than weight.

File:Pheno.pdf

File:3D plot.pdf

Then, by varying the gamma variable, we can observe the variation of our phenotypes using a 3D graph. The graph uses the weight as a constant, here we chose 50kg. We made this decision because by changing the weight, the curves will simply be shifted while keeping the same distribution. According to our research, by choosing a value for gamma that was too high, it was not possible to discriminate the whole population. Thus, by increasing the gamma too much, we get an almost binary group classification, which we should avoid.

Moreover, if we choose a gamma value lower than zero, we end up with a multiplication of height by weight. By multiplying the two variables we lose information because a short fat person will get the same result as a tall thin person.

Based on these results, we decided to choose values for gamma in the range [1;3.5]. This allows us to have a phenotype that represents the population well without group classification. In particular, eight phenotypes were selected: [1.4; 1.618; 1.8; exp(1); 2.2; 2.4; 2.5; 3] Some of its values are intermediate and others such as the exponential of 1 has more decimal places to see if this changed the result. For each of the phenotypes, we observed their distribution using a histogram and graph 2 to confirm whether they best represented the population.


Covariants Once the phenotypes were selected, the selection of covariates was not made from the GWAS default covariates in order to avoid inflecting the result. Each covariant that could have an impact on our phenotypes was selected from the UKbiobank. The Pearson correlation between each of these covariates and BMI was examined. The most correlated variants were selected as age, processed meat consumption or gender. In addition, the principal components of the whole genotypic profile were added to the GWAS even though we could not check their correlation, as they are sensitive data.

File:Correlation Matrix.pdf


Make a GWAS

File with -999 where values are missing. This format is needed to run the GWAS on Jura using BGENIE, which is a program for efficient GWAS for multiple continuous traits focussed on (the BGEN) format files used to store the UK Biobank genetic data. Also, phenotypes are previously normalized to avoid having any residuals.

GWAS are based on multiple linear regression, the equation can be seen on the following file:

File:Multiple linear regression for GWAS.pdf

The slope is found by linear regression. For the genotype, alleles are encoded with 0,1 and 2 corresponding to the number of the less frequent allele (0 and 2 homozygous, 1 heterozygous)


Results of the GWAS

SNPs: Manhattan plot, QQplots

SNPs are positioned along the x-axis according to chromosomal position on the Manhattan plot. On the y-axis is the negative log of the SNP's associated P-value. SNPs with the lowest P-value significance are positioned at the top of the graph. (highest association with our phenotype) The Bonferroni threshold (2.2 × 10 -10 ) is indicated by the red line and the blue line corresponds to the genome-wide significance threshold (5 × 10 -8 ). The Bonferroni correction normalizes the alpha significance level by the number of tests performed.

For the Manhattan plot of BMI, we can see that we have a good signal for chromosome 16 as shown by the blue box. The Manhattan of the other phenotypes have relatively similar results with less signal for the SNPs than the one of BMI. The plots are more scattered and noisy.

This can be explained by the fact that we have not yet pruned our results because we have not yet corrected for the linkage disequilibrium (LD). The disequilibrium may be due to many different factors such as selection, the rate of genetic recombination, mutation rate, genetic drift, the system of mating, population structure, and genetic linkage. We will pruned it to remove the disequilibrium.

File:Manhattan plot.pdf


Heritability

To determine whether our phenotypes have a certain heritability we used SNP-based heritability. Heritability can be determined from GWAS using GitHub containing the LDscore software. It aims to determine the proportion of phenotypic variance in the population explained by a good linear predictor composed of common SNPs. The average conditional variance explained by each SNP in the regression is estimated.

Comparing our results with other heritability research on height or BMI, we find that our heritability percentages are much lower. This can be explained by a lack of individuals, the sample size is not large enough to be representative of reality. However, the results still give an insight into the most heritable phenotype.

In particular, we observe that height has a higher heritability than weight, which is in line with our expectations. Surprisingly, we observe that all our other phenotypes have a very low percentage compared to the BMI value. Perhaps BMI is an ideal value.


Genes

Then to obtain information on the genes associated with the different phenotypes we used the platform FUMAGWAS. To verify the results, the SNPs obtained were pruned and searched in the GWAS catalog. We obtained the same genes as with FUMAGWAS.

The Manhattan plots show the significance of the association between all genes and BMI or the phenotypes.First, we see that the FTO gene is associated with BMI. This is consistent with previous research. Indeed, variants of this gene would be involved in obesity. Secondly, the results obtained show us that there is no common gene associated with BMI and the other phenotypes we have chosen. There are different genes found depending on the index used in the formula. Phenotypes with an index closer to 1 have genes in common. That’s the same for phenotypes with an index closer to 2, they have more genes in common. Genes in common between phenotypes with an index close to 1 are the CHST4 gene codes for a protein that plays a central role in the traffic of lymphocytes during chronic inflammation. The UCP2 gene would play a role in thermogenesis, obesity and diabetes. So, this raises the question of why this UCP2 gene is not associated with BMI given its role in obesity. Others analysis would therefore be necessary in order to make conclusions.

Then, concerning the other phenotypes with an index close to 2, they also have several genes in common. A new gene that was not seen in the graphs before is the FGG gene in green. And for the phenotype with an index of 3, which corresponds to the ponderal index, is associated almost with the same genes than before and in addition with the UGTA gene in orange.

With the results obtained of the genes according to the phenotypes, we see that there are no genes that are in common with the previous studies on BMI. So, more studies would be needed to validate and investigate if our results are relevant or not.

File:Genes.pdf


Pathways

To obtain the pathways we used the software PascalX (Pascal for Pathway scoring algorithm). More information about this software can be found at https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004714 (Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics). Only pathways associated with height were found significant (see file below). The data we used was significantly reduced and might have been too small to find a pathway level signal. Using a more complete data, this analysis should be repeated.

File:PathwayResults.pdf


Conclusion

So in conclusion this work allowed us to perform a GWAS on a large population using selected phenotypes based on height and weight. However we noticed when we did the heritability that the power was too low and that it really matters to have a good power. Therefore, we cannot conclude that our phenotype is relevant due to the low power. We have found different genes for the phenotypes which means that they are index depending. But we cannot make conclusions on the genes as we are missing some power. There are several elements that could be improved in our study, including making a large GWAS. We tried to run the big GWAS a week ago but the JURA programme was so slow and it stopped every time before we got the results for all the chromosomes. The study could also be repeated by separating men and women, with a larger size of sample. If we had more time we could test many more phenotypes without selecting some specific values for gamma. We could also do more experiments on the gene UCP2 that we have found in our phenotypes but not in BMI.