Difference between revisions of "Heritability of BMI"
Sbprm2021 1 (talk | contribs) |
Sbprm2021 1 (talk | contribs) |
||
Line 150: | Line 150: | ||
GWAS are based on multiple linear regression, the equation can be seen on the following file: | GWAS are based on multiple linear regression, the equation can be seen on the following file: | ||
− | + | [[File:Multiple_linear_regression_for_GWAS.pdf|MLR_GWAS]] | |
+ | |||
The slope is found by linear regression. | The slope is found by linear regression. |
Revision as of 11:36, 3 June 2021
- Project name: Heritability of BMI - in a search of a relevant phenotype for normalized weight, and what
heritability says about it
- Tutor: Sofia Ortin Vela (sofia.ortinvela_AT_unil.ch)
- Slides: File:BMI heritability.pdf
Heritability of BMI | In a search of a relevant phenotype for normalized weight, and what
Contents
Introduction
Most common human traits and diseases have a polygenic pattern of inheritance. The variation of DNA sequence at many genetic loci, influence the phenotype. GWAS have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation.
Background
Previous research
A GWAS is an analysis of many genetic variations in many individuals to study their correlations with phenotypic traits. GWAS have already been done on height, weight and BMI. The development of genome-wide association studies has been made possible by advances in genotyping technology, and has greatly accelerated gene discovery. GWAS studies have identified many genes with strong associations for phenotypic traits such as diseases. GWAS generally focus on associations between SNPs and phenotypes. GWAS have already been carried out on height, weight and BMI.
GWAS on height
First, a study showed that about 180 loci influence adult height. The most strongly connected genes include the Hedgehog and TGF-b signaling pathways. These signaling pathways are involved in chondrocyte proliferation and differentiation, growth plate signaling and bone formation. Other genes such as ECM2 involved in the formation of the extracellular matrix influence size. We also know that mutations in the STAT2 and FGFR3 genes cause growth failure and skeletal dysplasia as well as dwarfism.
GWAS on weight
Then, many genes related to the weight were found in a GWAS including covariants such as for example triglyceride level, or BMI. Two main genes have been found: the FTO and the IRS1.
GWAS on BMI
Next, a bigger cluster of weight-related SNPs are located in intron 1 of the fat mass and obesity associated FTO gene. The FTO gene encodes an RNA demethylase, and is the most associated SNP with obesity throughout life and across generations. SNPs at this locus have also been associated with other specificities such as type 2 diabetes, osteoarthritis or cardiometabolic characteristics. This is implied by the effect of the FTO gene on BMI. The study concludes that FTO increases the risk of obesity through changes in food consumption and preference. 23 other SNPs were studied such as BDNF, FAIM2, TFAP2B, FTO.
More studies on BMI have shown that other genes such as NEGR1 were also associated with BMI. We know that this gene is highly expressed in the brain and has a role in the body weight and food intake. Identifying the genetic determinants of BMI could lead for example to a better understanding of the biological basis of obesity.
Study on heritability
Estimation of the heritability of BMI differ between experimental designs and also because of the different types of heritability. However, the results of the studies show that the heritability of SNPs for BMI is greater than 0.2, approximately half of the heritability of height which is greater than 0.5.
Aim of the project
But it is also known that studies report that BMI as a measure of body fat is inaccurate and can lead to bias in measuring the health effects of obesity. The problems arise because BMI does not take into account the difference between fat and non-fat mass, such as bone and muscle, and also does not include changes in body composition that occur with age. For example, very muscular people may have very low body fat, but their BMI puts them in the obese category. There is also the fact that very tall people have a BMI that is too high in comparison to their body fat. There are therefore limitations to the use of BMI as a measure of body fat.
A question arises here, why not use another formula than the BMI. The BMI is a function of height and weight, but it is associated with different genes. We can therefore assume that other phenotypes based on height and weight could also bring different biological results. With a power of 3, we know that this is the formula for the ponderal index, generally used for babies. Since there is no GWAS on the PI, we can wonder if it has something relevant. Based on the BMI formula, many other phenotypes would be possible.
So, the goal of the project is to see if other phenotypes based on height and weight show better signals than BMI and bring different biology by looking at related genes and pathways. Heritability is also an aspect to determine if a phenotype seems relevant or not.
Methodology
Data Exploration
Size of our sample
As there was a lot of incomplete data at the genotype level, the dataset was cleaned to remove all individuals without genotype information.
Our sample size is different from the number of data in the UKB database. This is due to the fact that there is not necessarily the genotype collected for each individual. In the weight dataset in the UK biobank, we have 499,806 individuals. While in our sample we have only 45829 individuals. lees data we just have the data who came for the 3rd time. BMI data height data from the third time. It limits our number.
For height and weight independently
Central tendency
Mean
Median
Mode
Min
Max
Mesure of dispersion
Range
Quantile
IQR
Variance
Standard deviation
Skewness
Kurtosis
Visualisation
Histograms
Boxplot
For height and weight together
Covariance
Correlation
Scatterplot
File:Fig pathwayMirrorBarChart tau3 PCA median tortuosity.pdf
Choose phenotypes and covariants
Phenotypes
From the basic BMI formula we can create new phenotypes. Thus we have the weight to the mu power divided by the height to the gamma power. This way we get data in 5 dimensions. It is necessary to reduce the complexity, so we decided to remove the mu variable. The reason for this choice is that size is more robust and closer to normalisation than weight.
Then, by varying the gamma variable, we can observe the variation of our phenotypes using a 3D graph. The graph uses the weight as a constant, here we chose 50kg. We made this decision because by changing the weight, the curves will simply be shifted while keeping the same distribution. According to our research, by choosing a value for gamma that was too high, it was not possible to discriminate the whole population. Thus, by increasing the gamma too much, we get an almost binary group classification, which we should avoid.
Moreover, if we choose a gamma value lower than zero, we end up with a multiplication of height by weight. By multiplying the two variables we lose information because a short fat person will get the same result as a tall thin person.
Based on these results, we decided to choose values for gamma in the range [1;3.5]. This allows us to have a phenotype that represents the population well without group classification. In particular, eight phenotypes were selected: [1.4; 1.618; 1.8; exp(1); 2.2; 2.4; 2.5; 3] Some of its values are intermediate and others such as the exponential of 1 has more decimal places to see if this changed the result. For each of the phenotypes, we observed their distribution using a histogram and graph 2 to confirm whether they best represented the population.
Covariants
Once the phenotypes were selected, the selection of covariates was not made from the GWAS default covariates in order to avoid inflecting the result.
Each covariant that could have an impact on our phenotypes was selected from the UKbiobank. The Pearson correlation between each of these covariates and BMI was examined. The most correlated variants were selected as age, processed meat consumption or gender. In addition, the principal components of the whole genotypic profile were added to the GWAS even though we could not check their correlation, as they are sensitive data.
Make a GWAS
File with -999 where values are missing. This format is needed to run the GWAS on Jura using BGENIE, which is a program for efficient GWAS for multiple continuous traits focussed on (the BGEN) format files used to store the UK Biobank genetic data. Also, phenotypes are previously normalized to avoid having any residuals.
GWAS are based on multiple linear regression, the equation can be seen on the following file:
File:Multiple linear regression for GWAS.pdf
The slope is found by linear regression.
For the genotype, alleles are encoded with 0,1 and 2 corresponding to the number of the less frequent allele (0 and 2 homozygous, 1 heterozygous)
Results of the GWAS
SNPs: Manhattan plot, QQplots
SNPs are positioned along the x-axis according to chromosomal position on the Manhattan plot. On the y-axis is the negative log of the SNP's associated P-value. SNPs with the lowest P-value significance are positioned at the top of the graph. (highest association with our phenotype)
The Bonferroni threshold (2.2 × 10 -10 ) is indicated by the red line and the blue line corresponds to the genome-wide significance threshold (5 × 10 -8 ). The Bonferroni correction normalizes the alpha significance level by the number of tests performed.
For the Manhattan plot of BMI, we can see that we have a good signal for chromosome 16 as shown by the blue box. The Manhattan of the other phenotypes have relatively similar results with less signal for the SNPs than the one of BMI. The plots are more scattered and noisy.
This can be explained by the fact that we have not yet pruned our results because we have not yet corrected for the linkage disequilibrium (LD). The disequilibrium may be due to many different factors such as selection, the rate of genetic recombination, mutation rate, genetic drift, the system of mating, population structure, and genetic linkage. We will pruned it to remove the disequilibrium.
Heritability
To determine whether our phenotypes have a certain heritability we used SNP-based heritability. Heritability can be determined from GWAS using GitHub containing the LDscore software. It aims to determine the proportion of phenotypic variance in the population explained by a good linear predictor composed of common SNPs. La variance conditionnelle moyenne expliquée par chaque SNP dans la régression est estimée.
Comparing our results with other heritability research on height or BMI, we find that our heritability percentages are much lower. This can be explained by a lack of individuals, the sample size is not large enough to be representative of reality. However, the results still give an insight into the most heritable phenotype.
In particular, we observe that height has a higher heritability than weight, which is in line with our expectations. Surprisingly, we observe that all our other phenotypes have a very low percentage compared to the BMI value. Perhaps BMI is an ideal value.
Genes
Then to obtain information on the genes associated with the different phenotypes we used the platform FUMAGWAS. To verify the results, the SNPs obtained were pruned and searched in the GWAS catalog. We obtained the same genes as with FUMAGWAS.
The Manhattan plots show the significance of the association between all genes and BMI or the phenotypes. We have the plot for BMI and on the right for the phenotypes from 1.4 to 1.8. The graphs that are not shown here were similar to the others.
First, we see that the FTO gene is associated with BMI. This is consistent with previous research. Indeed, variants of this gene would be involved in obesity. Secondly, the results obtained show us that there is no common gene associated with BMI and the other phenotypes we have chosen.
There are different genes found depending on the index used in the formula. Phenotypes with an index closer to 1 have genes in common. That’s the same for phenotypes with an index closer to 2, they have more genes in common.
Genes in common between phenotypes with an index close to 1 are the CHST4 gene codes for a protein that plays a central role in the traffic of lymphocytes during chronic inflammation. The UCP2 gene would play a role in thermogenesis, obesity and diabetes.
So, this raises the question of why this UCP2 gene is not associated with BMI given its role in obesity. Others analysis would therefore be necessary in order to make conclusions.
Then, concerning the other phenotypes with an index close to 2, they also have several genes in common. A new gene that was not seen in the graphs before is the FGG gene in green. And for the phenotype with an index of 3, which corresponds to the ponderal index, is associated almost with the same genes than before and in addition with the UGTA gene in orange.
With the results obtained of the genes according to the phenotypes, we see that there are no genes that are in common with the previous studies on BMI. So, more studies would be needed to validate and investigate if our results are relevant or not.
Pathways
To obtain the pathways we used the software PascalX (Pascal for Pathway scoring algorithm). More information about this software can be found at https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004714 (Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics). Only pathways associated with height were found significant (see file below). The data we used was significantly reduced and might have been too small to find a pathway level signal. Using a more complete data, this analysis should be repeated.
ICICICI