Genetics of different body mass measurements

Revision as of 18:17, 3 June 2022 by Sbprm2022 Ron (talk | contribs) (Results of the genetic part)

File:Heritability of BMI Sofia.pdf

Introduction

Obesity is a "condition in which excess body fat has accumulated to such an extent that it may have a negative effect on health[1]". It is correlated to a lot of diseases, but particularly with cardiovascular diseases. They include heart attacks, strokes or even heart failures. The World Health Organization (WHO) states that cardiovascular diseases are the first cause of mortality in the world. 31% of deaths are attributable to cardiovascular diseases[2]. The systolic blood pressure is a potential indicator for these diseases. Finding a good definition of obesity seemed important. That is what we tried to do in the first part of the project. The most used definition is the BMI. This index is defined by dividing the weight by the square of the height. The BMI has a lot of limitations, so we tried to find another definition. We tried different combinations of diverse body measurements that potentially correlated to systolic blood pressure. The combinations are called the phenotypes. In the second part of the project, we performed a GWAS. That is an "observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait[3]". The study focused on associations between SNPs and our phenotypes which are potentially correlated to high systolic blood pressure. GWAS have already been performed on height, weight and BMI. The goal of this part was to see if other phenotypes showed better signals than BMI and bring different biology by looking at related genes. Heritability helped us also to determine whether a phenotype is relevant or not.

Methodology

Phenotypic part


Genotypic part

Still from the UK Biobank [4], we had data about individuals containing their characteristics, such as weight, height, ethnicity, …, and their SNPs (“Single Nucleotide Polymorphism”). We then performed a mini-GWAS on them GWAS (Genome-Wide Association Study) is a statistical approach to identify the importance of a SNP for a certain trait/phenotype. Indeed, GWAS contains all the SNPs and their p-value associated to the phenotype As mentioned above, we worked on a mini-GWAS, meaning that it wasn’t about all the SNPs of the whole genome, it was a representative set of SNPs, which should normally give the same results as if we did it on the whole genome. As the phenotypic part, we chose some covariants which we knew had an effect on the SNPs. By this way, we could see the phenotypic effect in a cruder way . We chose as covariants age, sex and genetic principal components (as ethnicity)

Then, we used these different GWAS to represent the relevant SNPs on a Manhattan plot and QQ plot. We also compute the heritability of the different phenotypes, to have an idea on the impact we could have on our environment to improve the health.

Then, we only kept the significant SNPs for each phenotype, and extracted the genes they belong to. By this way, we created some Venn diagrams to visualize how linked are the different genes between the different phenotypes

Results of the Phenotypic part

Data exploration

We started by exploring the distribution of our data. Based on the histograms, the variables seem to be either quite normally distributed (height) or a bit skewed to the right (weight, hip, waist). We also looked at the correlation between the variables visually thanks to scatterplots. Unsurprisingly, we see a clear correlation between BMI, waist circumference and hip circumference. The correlation is a lot less obvious with height, and seems completely absent between age and all other phenotypes. This is good, as it suggests that all age classes have a similar BMI, height, waist circumference and hip circumference distribution.

[Figure1]

Linear regression

To see which of our phenotypes was the most correlated with systolic blood pressure, we performed the following regression with each of our phenotypes: SBP ~ phenotype * sex + age + age^2 We take into account the interaction between the phenotype and the sex as well as the effect of the age. As the latter is often exponential, we introduce both the age and the age squared in the model. The phenotypes were normalized (F scores) so that the results of different phenotypes could be compared. We also looked at the adjusted R squares; as the model is always the same out of the phenotype, we can use them to determine which phenotype explains a bigger part of the variance. The conditions for the validity of a linear regression are the homogeneity of the variance of the residues, the normality of the residues, the independence of the observations and the fact that the relationship is indeed linear. We checked these conditions using diagnostic plots (residual vs fitted, normal QQ plot, Scale location and residuals vs leverage). The conditions were respected for all the phenotypes.

[Figure 2]

We also checked that our phenotypes were not too correlated with the other variables; we can trust the model if the correlation is not higher than 0.7. The correlations are small enough for all the phenotypes out of RFM which already takes sex into account and has a correlation of nearly 1 with it. We tried to remove sex from the model for the RFM, but this makes it hard to compare the results we obtain for RFM with the others phenotypes. RFM is unarguably an interesting phenotype to study, but the way we designed our work does not really allow us to say much about it.

[Table 1]

There is an interaction between the phenotype and the sex in all the models. However, we observe that the general relationship is still similar in both sexes, as we generally have parallel curves, or only slightly different slopes. Moreover, we performed linear regressions both taking into account the interaction with the sex and without doing it and obtained overall similar results. For this reason, it may not be needed to stratify by sex.

For each model, we collected the effect of the phenotype as well as the adjusted R square of the model to which we subtracted the adjusted R square of the background model to estimate the variability explained by the phenotype. Looking at this table, we obtain conflicting results. On the one way, looking at the factors suggests that the waist based phenotypes explain best the systolic blood pressure (with respectively an increase of 3.44 points of SBP for each increase of 1 sd of waist circumference, 3.32 for waist/height and 3.24 for waist/hip ratio). BAI, BMI1.6 and BMI3 are associated with factors between 3.1 and 3.2, whereas BMI and BMI2.5 are associated with factors under 1. When looking at the adjusted R squares however, variants of the BMI explain a bigger part of the variability of SBP (2.82 % for BMI3, 2.77 % for BMI1.6 and BMI2.5, 2.65 % for BMI). Waist phenotypes

[Figure 3]

table des beta et R^2 ajustés, variance expliquée par chaque phénotype based on the Beta: best predictors of SBP are the one based on waist uncorrected is the best, there might be a confounding effect of the height. however, not that clear of a correlation visually between height and waist so perhaps not dramatic BMI1.6, BMI3 and BAI gave middle results, BMI and BMI2.5 are way lower when looking at adjusted R squared, BMI3 is the best, then BMI1.6 and BMI2.5 all the BMIs come now before the waist phenotypes. we don’t know which are the best phenotypes; in general, they seem quite similar

[Table 2]


Results of the genetic part

Manhattan plots

A Manhattan plot displays the significant SNPs. SNPs are shown along the x-axis, sorted by chromosomes The relative negative logarithm of the association p-value for each SNP is displayed along the y-axis. Each SNP is thus represented with a dot. The strongest associations have the smallest p-value and thus their negative logarithm will be the greatest and high on the graph. All the SNP that are above the red line are significant. All the phenotypes show a lot of significant SNPs. All the phenotypes have similar results. We also see that there are more genes on the first chromosomes.

'Figure 4 :' Manhattan.jpg


Qq plots

It is a different representation of the same results. A QQ plot plots the quantile distribution of observed p-values (on the y-axis) versus the quantile distribution of expected p-values. The SNPs here are all mixed, not sorted by chromosomes. All the SNP that are not on the red line are significant. The red line represents the probability of it occurring by chance. The plots are similar between the different phenotypes even if the upper right differs a little bit. Again, we see that a lot of SNPs are relevant with our phenotypes.

Figure 5 : Qq.jpg

Heritability

From the GWAS we performed we also calculated the heritability of each phenotype. The heritability can be defined such as the « amount of phenotypic (observable) variation in a population that is attributable to individual genetic differences » [5]. A phenotype is explained by phenotypic variation, genetic variation and random effects. If the heritability is very high, it means that the phenotype is a lot attributable to individual genetic differences and thus that the environment has a low impact. For us, as we want to have influence on the phenotype, the phenotypes with a low heritability are more interesting. As an example, it is known that the height is very heritable as its heritability is around 80% [6]. In our case, the waist-hip ratio has the lowest heritability as its at 13%. The BMI variations phenotypes have a heritability around 23%. The waist phenotypes have a heritability around 15%. Globally, all the phenotypes have a low heritability which make all the phenotypes interesting.

Table 3 : Her.jpg


Venn Diagrams

When confronting all the genes of all phenotypes, a total of 6097, we wanted to visualize how they were dispersed between all the phenotypes.


At first sight, we can see that the majority of the genes are shared between the different phenotypes 686 genes of them are shared between all phenotypes, which represents 11.3%

We also remark than BAI and weight have a lot of genes that are not shared with the other ones.

Figure 6: Venn diagram of the genes of the phenotypes Figure 6.jpg


We can see in a better way the repartition of the genes containing significatn SNPs in Fig.7 and Fig.8

Figure 7: Venn diagram of the genes of the three categories Figure 7.jpg

Figure 8: Venn diagram of the genes of phenotypes in their own categories Figure 8.jpg


We found on “GWAS Catalog”[7] a list of different genes that have already been associated with systolic blood pressure (SBP). We made a Venn diagram (Fig.6) to compare them with the genes containing significant SNPs we found above. On the 1962 SBP genes, 552 (28%) are shared with at least one of the phenotypes, and 125 (6.4%) are shared with all of them Here is a table containing the number of shared gene between all the phenotypes and these SBP genes (Fig.7). We see that BAI is the phenotype which shares the highest number of genes with SBP genes, followed by BMI3 and BMI2.5

Figure 9: Venn diagrams of the genes of our phenotypes and genes associated with SBP Figure 9.jpg

Table 4: Number of genes in common between the pehnotype and the SBP genes Table 4.jpg

Conclusion

In conclusion, it is really difficult to find a good way to measure the obesity. A better phenotype than the usual BMI is not obvious, that is probably why the BMI is still commonly used even if it has some big bias. From our study, the different variants of the BMI phenotypes (BMI3, BMI1.6) are perhaps good phenotypes. They have a large R-squared and do not correlate a lot with the sex and the age. Also, a lot of SNP were found to be significant from the GWAS. Numerous genes were common to all the phenotype indicating that there is a genetic component of obesity. The heritability for all the phenotypes was around 20% which is not a lot. It means that the environment plays a big role and thus that we can have an influence on the phenotype.


Here is the link for our ppt presentation for additional information : Media:ppt.pptx

References

[1] https://en.wikipedia.org/wiki/Obesity

[2] https://www.who.int/fr/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)#:~:text=Les%20maladies%20cardio%2Dvasculaires%20sont,de%20la%20mortalit%C3%A9%20mondiale%20totale.

[3] https://en.wikipedia.org/wiki/Genome-wide_association_study

[4] https://www.ukbiobank.ac.uk/

[5] https://www.britannica.com/science/heritability

[6] https://www.nature.com/articles/d41586-019-01157-y#:~:text=For%20height%2C%20Visscher%20and%20colleagues,environmental%20factors%2C%20such%20as%20nutrition.