### From Computational Biology Group

**Background**: Large studies (including hundreds of thousands of individuals) identified genetic factors influencing blood pressure. However, it is unknown whether the effect of the discovered genetic variants are modified by life-style factors.

**Goal**: The goal of this project is to use the Cohorte Lausannois (CoLaus) data to find environmental factors (e.g. alcohol consumption, physical activity, smoking, etc.) that modify genetic effects influencing blood pressure.

**Mathematical tools**: Statistics. The students will learn how to use Matlab to read in large data sets including genetic data; conduct linear and logistic regression and interaction analysis.

**Biological or Medical aspects**: Hypertension is a cardiovascular disease in which the blood pressure in the arteries is elevated. A high blood pressure can be a risk factor for stroke and heart attack.
Blood pressure is defined by two measurements, the systolic and the diastolic. We talk about hypertension when the systolic blood pressure is above 140mmHg and/or the diastolic blood pressure is above 90mmHg.
The main organ responsible for the control of the blood pressure is the kidney so the most causal genes should be expressed in this organ or involved in the renin–angiotensin–aldosterone system.

**Method**:
Three studies are made using linear and logistic regression to analyse:

- Effects of the environmental factors on the phenotype (systolic blood pressure for linear regression and a binary variable for logistic regression)
- Effects of the genetic variants (snps) on the phenotype
- Effect of the interaction of the environmental factors and the snps on the phenotype

**Main results**:

For the analysis of the influence of the environmental factors on the systolic blood pressure (SBP) using linear regression, we mainly found that pretty much all of the variables (Age, Alcohol, BMI Coffee, Physical activity, Sex, Smoking duration, Smoking status, WHR) have an effect on the SBP, except for Job activity, Height and Sun at birth. The response variable SBP was normalized for the analysis. In a second step we analysed the influence of 142 SNPs situated on chromosome 15 on SBP again by using linear regression. Figure 1, represents a summary of these results. In the upper panel, the –log10(p-values) of each SNP are plotted against their chromosomal position. The pink points above the red line (significance threshold), illustrate the SNPs associated with a significant p-value. In the middle panel of Figure 1, the regression coefficients of all the significant SNPs are shown in red with the corresponding confidence interval. There are around 20 SNPs having almost all a positive effect on the SBP, meaning a possession of the minor allele leads to a higher blood pressure. The R-squares (Rsq) are quite low (Figure 1, bottom panel). The explained variation of the effect of these SNPS on blood pressure is not more than 0.2 %. There must be one or more other variables that affect blood pressure. We searched on the Internet for the rs numbers of those SNPs. We found that a lot of them are located in introns of different genes. One of the genes is SCAMP2 (Secretory Carrier Membrane Protein 2). The exact function is unknown, but it might be involved in vesicular transport. The link to Hypertension is difficult to make. It is also possible that these SNPs are located on an important regulatory region for other genes located somewhere else in the genome.

Figure 1: Linear Regresssion: Effect of the variants (SNPs) on SBP

We also tested if interactions of different environmental factors with SNPs have an effect on blood pressure. In Figure 2, p-values of these interaction tests are represented. On the x-axis are the different environments and on the y-axis the SNPs. The green lines represent the significant p-values. There is only job activity with two SNPs associated with significant p-values. What is strange is that this factor wasn’t significant with our first analyses. If we have a closer look (Figure 3) at the variable ‘job activity’, we can see that the minor alleles have a protective effect (Figure 3, middle panel) and what is surprising is that the R-squares are very high (27.64% and 27.65% respectively). These R-squares are likely to be wrong, but we couldn’t find any error in our scripts and calculation when we checked them. What also has to be mentioned is that the p-values are not far above the significance threshold. We also performed stratified analysis on the ‘Job activity’ variable, what didn't lead to any explicative results. We searched on the internet for the two SNPs and we found that they are both located in introns of the gene SCAMP5 (Secretory Carrier Membrane Protein 5), which is predicted to be involved in membrane trafficking. The link to hypertension is difficult to make.

Figure 2: Linear Regression: Effect of the interaction of the variants (SNPs) and environmenta factors on SBP

Figure 3: Linear Regression: Effect of the interaction of SNPs and Job activity

In a further step, we used logistic regression to practically repeat the analyses described above using a binary response variable for hypertension. Most of the results weren’t significant. We believe, this is due to loss of statistical power between logistic and linear regression.

To conclude, we can say, that the regressions with the interactions between SNPs and systolic blood pressure didn't revealed a SNP that could be important in hypertension. This could be changed by searching other chromosomal region, by using less restrictive p-value correction (we used Bonferroni), by including the diastolic blood pressure in our analyses etc.

- French-English Translations of the environements:

Alcool = Alcohol, IMC=BMI (Body Mass Index), Caféine = Amount of Coffee, Type de profession = Job activity, Activité physique = Physical activity, Durée fumeur = Smoking duration, Statut fumeur = Smoking status, Rapport taill/hanche = Waist-to-hip ratio (WHR)

**Supervisors**: Diana Marek , Tanguy Corre, Murielle Bochud & Zolt,án Kutalik

**Students**: Sophie Debonneville, Daniel Schmocker

**Presentations**: Media:CYP1A1-A2_BP.ppt Media:Genotypic_data_course_March2012.ppt Media:How_to_solve_biological_problems_with_math_23032012_.pptx

**References**:

- Ehret et al.
*Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk.*Nature. 2011 Oct 6; 478: 103–109. - Firmann M, Mayor V, Vidal PM, Bochud M, Pécoud A, Hayoz D, Paccaud F, Preisig M, Song KS, Yuan X, Danoff TM, Stirnadel HA, Waterworth D, Mooser V, Waeber G, Vollenweider P.The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord. 2008 Mar 17;8:6.

see Genome Wide Association Studies, Media: McCarthy_review_GWAS.pdf , GWAS papers

Back to UNIL BSc course: "Solving Biological Problems that require Math 2012"