http://www2.unil.ch/cbg/api.php?action=feedcontributions&user=Sbprm2022+Hermione&feedformat=atomCBG - User contributions [en]2024-03-28T19:01:27ZUser contributionsMediaWiki 1.31.12http://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6466Predicting Blood Pressure from the retina using Deep Learning2022-06-06T21:45:47Z<p>Sbprm2022 Hermione: </p>
<hr />
<div>[[File:Final Presentation Retina.pptx|thumb|Project 4]]<br />
<br />
= Introduction =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
= Deep Learning Model =<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images. The statistical tests performed in the first part allow us to select the variable for which delta (L, R) can be used as an approximation of technical error (or delta (0, 1)), i.e select the variable with the smallest difference between delta (L, R) and delta (0, 1).<br />
<br />
[[File:FD_dist.png|Distribution of the Delta(L, R) values for the "FD_all" variable]] <br />
<br />
The delta values for the "FD_all" variable were used here to discriminate participants. Participants with the highest delta values were excluded. We ran the model with 10 different sets of images: Retaining 90%, 80%, 70%, 60% and 50% of images using the delta values, and random selection to make comparisons.<br />
<br />
== Results ==<br />
<br />
The ROC and training accuracy curves were extracted after every run. The shape of both curves didn't change much from run to run, but notable changes in AUROC were noted.<br />
<br />
[[File:Roc.png]] [[File:Acc.png]]<br />
<br />
The AUC values for the different sets of images seem to follow a general trend: Decrease in precision as dataset size decreases for the randomly selected images, and increases when using delta.<br />
<br />
[[File:AUC_all.png]]<br />
<br />
However, the inherent variation in AUC results from run to run makes it hard to draw conclusions from such little data. Running the model at least thrice with each set of images would allow us to get a much clearer idea of what is actually happening, and to do statistical tests.<br />
<br />
= GWAS =<br />
<br />
The goal of the GWAS was to investigate if the asymmetry of the eyes could have genetic origins. <br />
We decided to look at the variables with the largest left right difference were selected: Fractal dimension and tortuosity.<br />
The phenotype for the GWAS was the delta (delta = abs(L-R)/(L+R)) of fractal dimension and tortuosity. That way, we would we able to identify genes responsible for asymmetry in these variables.<br />
Two rounds of GWAS were made. The first one had approximately 40'000 subjects and the second one had approximately 50'000 subjects.<br />
<br />
== Results == <br />
<br />
The results were not significant. Only one GWAS was very slightly significant, the fractal dimension with the larger set of participants (indicated by red circle).<br />
<br />
[[File:GWAS.png]]<br />
<br />
In the event of the GWAS showing a significant peak, we could have then investigated the part of the genome associated with it by looking up the reference SNP cluster ID (rSID) in NCBI. We could have then identified genes associated with fractal dimension asymmetry in the eyes.</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6465Predicting Blood Pressure from the retina using Deep Learning2022-06-06T21:37:24Z<p>Sbprm2022 Hermione: /* GWAS */</p>
<hr />
<div>[[File:Final Presentation Retina.pptx|thumb|Project 4]]<br />
<br />
= Introduction =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
= Deep Learning Model =<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images. The statistical tests performed in the first part allow us to select the variable for which delta (L, R) can be used as an approximation of technical error (or delta (0, 1)), i.e select the variable with the smallest difference between delta (L, R) and delta (0, 1).<br />
<br />
[[File:FD_dist.png|Distribution of the Delta(L, R) values for the "FD_all" variable]] <br />
<br />
The delta values for the "FD_all" variable were used here to discriminate participants. Participants with the highest delta values were excluded. We ran the model with 10 different sets of images: Retaining 90%, 80%, 70%, 60% and 50% of images using the delta values, and random selection to make comparisons.<br />
<br />
== Results ==<br />
<br />
The ROC and training accuracy curves were extracted after every run. The shape of both curves didn't change much from run to run, but notable changes in AUROC were noted.<br />
<br />
[[File:Roc.png]] [[File:Acc.png]]<br />
<br />
The AUC values for the different sets of images seem to follow a general trend: Decrease in precision as dataset size decreases for the randomly selected images, and increases when using delta.<br />
<br />
[[File:AUC_all.png]]<br />
<br />
However, the inherent variation in AUC results from run to run makes it hard to draw conclusions from such little data. Running the model at least thrice with each set of images would allow us to get a much clearer idea of what is actually happening, and to do statistical tests.<br />
<br />
= GWAS =<br />
<br />
The goal of the GWAS was to investigate if the asymmetry of the eyes could have genetic origins. <br />
We decided to look at the variables with the largest left right difference were selected: Fractal dimension and tortuosity.<br />
The phenotype for the GWAS was the delta (delta = abs(L-R)/(L+R)) of fractal dimension and tortuosity. That way, we would we able to identify genes responsible for asymmetry in these variables.<br />
Two rounds of GWAS were made. The first one had approximately 40'000 subjects and the second one had approximately 50'000 subjects.<br />
<br />
= Results = <br />
<br />
The results were not significant. Only one GWAS was very slightly significant, the fractal dimension with the larger set of participants (indicated by red circle).<br />
<br />
[[File:GWAS.png]]<br />
<br />
In the event of the GWAS showing a significant peak, we could have then investigated the part of the genome associated with it by looking up the reference SNP cluster ID (rSID) in NCBI. We could have then identified genes associated with fractal dimension asymmetry in the eyes.</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=File:GWAS.png&diff=6464File:GWAS.png2022-06-06T21:32:49Z<p>Sbprm2022 Hermione: </p>
<hr />
<div></div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6463Predicting Blood Pressure from the retina using Deep Learning2022-06-06T21:32:28Z<p>Sbprm2022 Hermione: </p>
<hr />
<div>[[File:Final Presentation Retina.pptx|thumb|Project 4]]<br />
<br />
= Introduction =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
= Deep Learning Model =<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images. The statistical tests performed in the first part allow us to select the variable for which delta (L, R) can be used as an approximation of technical error (or delta (0, 1)), i.e select the variable with the smallest difference between delta (L, R) and delta (0, 1).<br />
<br />
[[File:FD_dist.png|Distribution of the Delta(L, R) values for the "FD_all" variable]] <br />
<br />
The delta values for the "FD_all" variable were used here to discriminate participants. Participants with the highest delta values were excluded. We ran the model with 10 different sets of images: Retaining 90%, 80%, 70%, 60% and 50% of images using the delta values, and random selection to make comparisons.<br />
<br />
== Results ==<br />
<br />
The ROC and training accuracy curves were extracted after every run. The shape of both curves didn't change much from run to run, but notable changes in AUROC were noted.<br />
<br />
[[File:Roc.png]] [[File:Acc.png]]<br />
<br />
The AUC values for the different sets of images seem to follow a general trend: Decrease in precision as dataset size decreases for the randomly selected images, and increases when using delta.<br />
<br />
[[File:AUC_all.png]]<br />
<br />
However, the inherent variation in AUC results from run to run makes it hard to draw conclusions from such little data. Running the model at least thrice with each set of images would allow us to get a much clearer idea of what is actually happening, and to do statistical tests.<br />
<br />
= GWAS =<br />
<br />
The goal of the GWAS was to investigate if the asymmetry of the eyes could have genetic origins. <br />
We decided to look at the variables with the largest left right difference were selected: Fractal dimension and tortuosity.<br />
The phenotype foe the GWAS was the delta (delta = abs(L-R)/(L+R)) of fractal dimension and tortuosity. That way, we would we able to identify genes responsible for asymmetry in these variables.<br />
Two rounds of GWAS were made. The first one had approximately 40'000 subjects and the second one had approximately 50'000 subjects.<br />
<br />
= Results = <br />
<br />
The results were not significant. Only one GWAS was very slightly significant, the fractal dimension with the larger set of participants (indicated by red circle).<br />
<br />
[[File:GWAS.png]]<br />
<br />
In the event of the GWAS showing a significant peak, we could have then investigated the part of the genome associated with it by looking up the reference SNP cluster ID (rSID) in NCBI. We could have then identified genes associated with fractal dimension asymmetry in the eyes.</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6462Predicting Blood Pressure from the retina using Deep Learning2022-06-06T21:30:42Z<p>Sbprm2022 Hermione: /* GWAS */</p>
<hr />
<div>[[File:Final Presentation Retina.pptx|thumb|Project 4]]<br />
<br />
= Introduction =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
= Deep Learning Model =<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images. The statistical tests performed in the first part allow us to select the variable for which delta (L, R) can be used as an approximation of technical error (or delta (0, 1)), i.e select the variable with the smallest difference between delta (L, R) and delta (0, 1).<br />
<br />
[[File:FD_dist.png|Distribution of the Delta(L, R) values for the "FD_all" variable]] <br />
<br />
The delta values for the "FD_all" variable were used here to discriminate participants. Participants with the highest delta values were excluded. We ran the model with 10 different sets of images: Retaining 90%, 80%, 70%, 60% and 50% of images using the delta values, and random selection to make comparisons.<br />
<br />
== Results ==<br />
<br />
The ROC and training accuracy curves were extracted after every run. The shape of both curves didn't change much from run to run, but notable changes in AUROC were noted.<br />
<br />
[[File:Roc.png]] [[File:Acc.png]]<br />
<br />
The AUC values for the different sets of images seem to follow a general trend: Decrease in precision as dataset size decreases for the randomly selected images, and increases when using delta.<br />
<br />
[[File:AUC_all.png]]<br />
<br />
However, the inherent variation in AUC results from run to run makes it hard to draw conclusions from such little data. Running the model at least thrice with each set of images would allow us to get a much clearer idea of what is actually happening, and to do statistical tests.<br />
<br />
= GWAS =<br />
<br />
The goal of the GWAS was to investigate if the asymmetry of the eyes could have genetic origins. <br />
We decided to look at the variables with the largest left right difference were selected: Fractal dimension and tortuosity.<br />
The phenotype foe the GWAS was the delta (delta = abs(L-R)/(L+R)) of fractal dimension and tortuosity. That way, we would we able to identify genes responsible for asymmetry in these variables.<br />
Two rounds of GWAS were made. The first one had approximately 40'000 subjects and the second one had approximately 50'000 subjects.<br />
<br />
= Results = <br />
<br />
The results were not significant. Only one GWAS was very slightly significant, the fractal dimension with the larger set of participants (indicated by red circle).<br />
<br />
<br />
<br />
In the event of the GWAS showing a significant peak, we could have then investigated the part of the genome associated with it by looking up the reference SNP cluster ID (rSID) in NCBI. We could have then identified genes associated with fractal dimension asymmetry in the eyes.</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6461Predicting Blood Pressure from the retina using Deep Learning2022-06-06T21:25:51Z<p>Sbprm2022 Hermione: /* Deep Learning Model */</p>
<hr />
<div>[[File:Final Presentation Retina.pptx|thumb|Project 4]]<br />
<br />
= Introduction =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
= Deep Learning Model =<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images. The statistical tests performed in the first part allow us to select the variable for which delta (L, R) can be used as an approximation of technical error (or delta (0, 1)), i.e select the variable with the smallest difference between delta (L, R) and delta (0, 1).<br />
<br />
[[File:FD_dist.png|Distribution of the Delta(L, R) values for the "FD_all" variable]] <br />
<br />
The delta values for the "FD_all" variable were used here to discriminate participants. Participants with the highest delta values were excluded. We ran the model with 10 different sets of images: Retaining 90%, 80%, 70%, 60% and 50% of images using the delta values, and random selection to make comparisons.<br />
<br />
== Results ==<br />
<br />
The ROC and training accuracy curves were extracted after every run. The shape of both curves didn't change much from run to run, but notable changes in AUROC were noted.<br />
<br />
[[File:Roc.png]] [[File:Acc.png]]<br />
<br />
The AUC values for the different sets of images seem to follow a general trend: Decrease in precision as dataset size decreases for the randomly selected images, and increases when using delta.<br />
<br />
[[File:AUC_all.png]]<br />
<br />
However, the inherent variation in AUC results from run to run makes it hard to draw conclusions from such little data. Running the model at least thrice with each set of images would allow us to get a much clearer idea of what is actually happening, and to do statistical tests.<br />
<br />
= GWAS =<br />
<br />
The goal of the GWAS was to investigate if the asymmetry of the eyes could have genetic origins.</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=File:AUC_all.png&diff=6460File:AUC all.png2022-06-06T20:59:47Z<p>Sbprm2022 Hermione: </p>
<hr />
<div></div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=File:AUC.JPG&diff=6459File:AUC.JPG2022-06-06T20:57:34Z<p>Sbprm2022 Hermione: </p>
<hr />
<div></div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=File:AUC.pdf&diff=6458File:AUC.pdf2022-06-06T20:53:52Z<p>Sbprm2022 Hermione: </p>
<hr />
<div></div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6457Predicting Blood Pressure from the retina using Deep Learning2022-06-06T20:52:40Z<p>Sbprm2022 Hermione: /* GWAS */</p>
<hr />
<div>[[File:Final Presentation Retina.pptx|thumb|Project 4]]<br />
<br />
= Introduction =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
= Deep Learning Model =<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images. The statistical tests performed in the first part allow us to select the variable for which delta (L, R) can be used as an approximation of technical error (or delta (0, 1)), i.e select the variable with the smallest difference between delta (L, R) and delta (0, 1).<br />
<br />
[[File:FD_dist.png|Distribution of the Delta(L, R) values for the "FD_all" variable]] <br />
<br />
The delta values for the "FD_all" variable were used here to discriminate participants. Participants with the highest delta values were excluded. We ran the model with 10 different sets of images: Retaining 90%, 80%, 70%, 60% and 50% of images using the delta values, and random selection to make comparisons.<br />
<br />
== Results ==<br />
<br />
The ROC and training accuracy curves were extracted after every run. The shape of both curves didn't change much from run to run, but notable changes in AUROC were noted.<br />
<br />
[[File:Roc.png]] [[File:Acc.png]]<br />
<br />
The AUC values for the different sets of images seem to follow a general trend: Decrease in precision as dataset size decreases for the randomly selected images, and increase<br />
<br />
= GWAS =<br />
<br />
The goal of the GWAS was to investigate if the asymmetry of the eyes could have genetic origins.</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6456Predicting Blood Pressure from the retina using Deep Learning2022-06-06T20:13:03Z<p>Sbprm2022 Hermione: /* Deep Learning Model */</p>
<hr />
<div>[[File:Final Presentation Retina.pptx|thumb|Project 4]]<br />
<br />
= Introduction =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
= Deep Learning Model =<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images. The statistical tests performed in the first part allow us to select the variable for which delta (L, R) can be used as an approximation of technical error (or delta (0, 1)), i.e select the variable with the smallest difference between delta (L, R) and delta (0, 1).<br />
<br />
[[File:FD_dist.png|Distribution of the Delta(L, R) values for the "FD_all" variable]] <br />
<br />
The delta values for the "FD_all" variable were used here to discriminate participants. Participants with the highest delta values were excluded. We ran the model with 10 different sets of images: Retaining 90%, 80%, 70%, 60% and 50% of images using the delta values, and random selection to make comparisons.<br />
<br />
== Results ==<br />
<br />
The ROC and training accuracy curves were extracted after every run. The shape of both curves didn't change much from run to run, but notable changes in AUROC were noted.<br />
<br />
[[File:Roc.png]] [[File:Acc.png]]<br />
<br />
The AUC values for the different sets of images seem to follow a general trend: Decrease in precision as dataset size decreases for the randomly selected images, and increase<br />
<br />
= GWAS =</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=File:Acc.png&diff=6455File:Acc.png2022-06-06T20:06:00Z<p>Sbprm2022 Hermione: </p>
<hr />
<div></div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=File:Roc.png&diff=6454File:Roc.png2022-06-06T20:04:51Z<p>Sbprm2022 Hermione: </p>
<hr />
<div></div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6453Predicting Blood Pressure from the retina using Deep Learning2022-06-06T12:13:13Z<p>Sbprm2022 Hermione: </p>
<hr />
<div>[[File:Final Presentation Retina.pptx|thumb|Project 4]]<br />
<br />
= Introduction =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
= Deep Learning Model =<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images. The statistical tests performed in the first part allow us to select the variable for which delta (L, R) can be used as an approximation of technical error (or delta (0, 1)), i.e select the variable with the smallest difference between delta (L, R) and delta (0, 1).<br />
<br />
[[File:FD_dist.png|Distribution of the Delta(L, R) values for the "FD_all" variable]] <br />
<br />
The delta values for the "FD_all" variable were used here to discriminate participants. Participants with the highest delta values were excluded. We ran the model with 10 different sets of images: Retaining 90%, 80%, 70%, 60% and 50% of images using the delta values, and random selection to make comparisons.<br />
<br />
== Results ==<br />
<br />
= GWAS =</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6452Predicting Blood Pressure from the retina using Deep Learning2022-06-06T12:09:55Z<p>Sbprm2022 Hermione: </p>
<hr />
<div>[[File:Final Presentation Retina.pptx|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images. The statistical tests performed in the first part allow us to select the variable for which delta (L, R) can be used as an approximation of technical error (or delta (0, 1)), i.e select the variable with the smallest difference between delta (L, R) and delta (0, 1).<br />
<br />
[[File:FD_dist.png|Distribution of the Delta(L, R) values for the "FD_all" variable]] <br />
<br />
The delta values for the "FD_all" variable were used here to discriminate participants. Participants with the highest delta values were excluded. We ran the model with 10 different sets of images: Retaining 90%, 80%, 70%, 60% and 50% of images using the delta values, and random selection to make comparisons.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6451Predicting Blood Pressure from the retina using Deep Learning2022-06-06T12:06:25Z<p>Sbprm2022 Hermione: /* Deep Learning Model */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images. The statistical tests performed in the first part allow us to select the variable for which delta (L, R) can be used as an approximation of technical error (or delta (0, 1)), i.e select the variable with the smallest difference between delta (L, R) and delta (0, 1).<br />
<br />
[[File:FD_dist.png|Distribution of the Delta(L, R) values for the "FD_all" variable]] <br />
<br />
The delta values for the "FD_all" variable were used here to discriminate participants. Participants with the highest delta values were excluded. We ran the model with 10 different sets of images: Retaining 90%, 80%, 70%, 60% and 50% of images using the delta values, and random selection to make comparisons.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=File:FD_dist.png&diff=6450File:FD dist.png2022-06-06T11:38:30Z<p>Sbprm2022 Hermione: </p>
<hr />
<div></div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6449Predicting Blood Pressure from the retina using Deep Learning2022-06-06T11:32:38Z<p>Sbprm2022 Hermione: /* Deep Learning Model */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images. The statistical tests performed in the first part allow us to select the variable for which delta L/R can be used as an approximation of technical error, or delta 0/1, i.e select the variable with the smallest difference between delta L/R and delta 0/1.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6448Predicting Blood Pressure from the retina using Deep Learning2022-06-06T11:24:34Z<p>Sbprm2022 Hermione: /* Data cleaning processes */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning process ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6447Predicting Blood Pressure from the retina using Deep Learning2022-06-05T20:01:48Z<p>Sbprm2022 Hermione: /* Background and Motivation */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning processes ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking (biological) differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6446Predicting Blood Pressure from the retina using Deep Learning2022-06-05T19:59:53Z<p>Sbprm2022 Hermione: /* Background and Motivation */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px|alt=An image of retina being taken by funduscopy.]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning processes ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking (biological) differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6445Predicting Blood Pressure from the retina using Deep Learning2022-06-05T19:56:50Z<p>Sbprm2022 Hermione: /* Background and Motivation */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning processes ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking (biological) differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6444Predicting Blood Pressure from the retina using Deep Learning2022-06-05T19:56:07Z<p>Sbprm2022 Hermione: /* Background and Motivation */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|A retina image taken by funduscopy|500px]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning processes ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking (biological) differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6443Predicting Blood Pressure from the retina using Deep Learning2022-06-05T19:50:21Z<p>Sbprm2022 Hermione: /* Background and Motivation */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. [[File:Funduscopy.png|thumb|500px]] First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning processes ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking (biological) differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6442Predicting Blood Pressure from the retina using Deep Learning2022-06-05T19:49:14Z<p>Sbprm2022 Hermione: /* Background and Motivation */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank [[File:Funduscopy.png|thumb|500px]] and data of biologically relevant variables collected in a dataset are used for two different purposes. First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning processes ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking (biological) differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6441Predicting Blood Pressure from the retina using Deep Learning2022-06-05T19:48:04Z<p>Sbprm2022 Hermione: /* Data cleaning processes */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning processes ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken. [[File:Fundusimages.png|thumb|500px]]<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking (biological) differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=File:Funduscopy.png&diff=6440File:Funduscopy.png2022-06-05T19:45:51Z<p>Sbprm2022 Hermione: </p>
<hr />
<div></div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=File:Fundusimages.png&diff=6439File:Fundusimages.png2022-06-05T19:43:46Z<p>Sbprm2022 Hermione: </p>
<hr />
<div></div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6438Predicting Blood Pressure from the retina using Deep Learning2022-06-05T19:24:16Z<p>Sbprm2022 Hermione: /* Data cleaning processes */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning processes ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken.<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one (of size 78254x32) containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other (of size 464x32) containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking (biological) differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
The classification has then used the 39127x27 delta(L, R) cleaned and transformed dataset for the selection of its images.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6437Predicting Blood Pressure from the retina using Deep Learning2022-06-05T19:16:59Z<p>Sbprm2022 Hermione: /* Data cleaning processes */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning processes ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken.<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Removing 15 variables by recommendation of the assistants and dividing the dataset into two : one containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. For every participants, every variables, and in the two datasets : applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset. This delta computes the relative distance between either L and R, or 0 and 1.<br />
<br />
3. Computing the T-test and the Cohen's D (the effect size) between each corresponding variables of the two datasets and removing the 5 variables with significant p-values after Bonferroni correction for 32 tests. This was done because for the classification model and to predict hypertension, it is better for input images of left and right eyes to not have striking (biological) differences between them, otherwise the machine could lose in accuracy by accounting for these supplementary data, instead of focusing on the overall structure of the images it analyses. We can check if each variable has a high left-right difference by comparing it to the corresponding variable 0-1 difference ; if a variable has a low left-right difference - a low delta (L, R) - its delta(L, R) distribution should be similarly distributed as its corresponding delta(0, 1) variable, because a replica has by definition no other difference with its original than the technical variability related to the way it was practically captured.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6436Predicting Blood Pressure from the retina using Deep Learning2022-06-05T18:53:49Z<p>Sbprm2022 Hermione: </p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Data cleaning processes ==<br />
<br />
The data has been collected from the UK biobank and consists of :<br />
<br />
1. Retina images of left eyes, right eyes, or both left and right eyes of the participants. Also, a few hundreds of participants have had replica images of either their left or right eye taken.<br />
<br />
2. A 92366x47 dataset with rows corresponding to every left or right retina images. Columns refer to biologically relevant data previously measured on those images.<br />
<br />
The cleaning process has involved :<br />
<br />
1. Cutting 20 variables by recommendation of the assistants and dividing the dataset into two : one containing only participants which had both their left (labelled "L") and right (labelled "R") eyes taken and nothing else, and the other containing each replica (labelled "1") image alongside its original (labelled "0").<br />
<br />
2. Applying <math> \delta = \frac{|L-R|}{L+R} </math> to the left-right dataset and <math> \delta = \frac{|0-1|}{0+1} </math> to the original-replica dataset.<br />
<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6435Predicting Blood Pressure from the retina using Deep Learning2022-06-05T18:05:42Z<p>Sbprm2022 Hermione: /* Background and Motivation */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Intro content...<br />
<br />
<math> \delta = \frac{|L-R|}{L+R} </math><br />
<br />
Heart disease has been the leading cause of death in the world for the last twenty years. It is therefore of great importance to look for ways to prevent it. In this project, funduscopy images of retinas of tens of thousands of participants collected by the UK biobank and data of biologically relevant variables collected in a dataset are used for two different purposes. First, GWAS analysis of some of the variables in the dataset allows us to look at their concrete importance in the genome. Second, the dataset was used as a means of refining the selection of retinal images so that they could be subjected to a classification model called Dense Net with as output a prediction of hypertension. A key point associated with both of these analyses - especially the for the classification part - is that mathematically adequate data cleaning should enhance the relevant GWAS p-values, or accuracy of hypertension prediction.<br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6431Predicting Blood Pressure from the retina using Deep Learning2022-06-05T09:48:07Z<p>Sbprm2022 Hermione: /* Deep Learning Model */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Intro content...<br />
<br />
<math> \delta = \frac{|L-R|}{L+R} </math><br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. A CNN model was built by the CBG to predict hypertension from retina fundus images. We wished to improve the predictions by reducing technical error in the input images.<br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6430Predicting Blood Pressure from the retina using Deep Learning2022-06-05T09:43:04Z<p>Sbprm2022 Hermione: /* Background and Motivation */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Intro content...<br />
<br />
<math> \delta = \frac{|L-R|}{L+R} </math><br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. <br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=File:Final_Presentation_Retina.pptx&diff=6429File:Final Presentation Retina.pptx2022-06-05T09:40:13Z<p>Sbprm2022 Hermione: </p>
<hr />
<div></div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6428Predicting Blood Pressure from the retina using Deep Learning2022-06-05T09:37:38Z<p>Sbprm2022 Hermione: /* Report title */</p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Retina Image Analysis =<br />
<br />
== Background and Motivation ==<br />
<br />
Intro content...<br />
<math> \delta = \frac{|L-R|}{L+R} </math><br />
<br />
== Deep Learning Model ==<br />
<br />
This section focused on using the previously defined Delta variable to sort the images used as input for the classifier. <br />
<br />
== GWAS ==</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6296Predicting Blood Pressure from the retina using Deep Learning2022-03-29T12:05:02Z<p>Sbprm2022 Hermione: </p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Report title =<br />
<br />
Report content...</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Predicting_Blood_Pressure_from_the_retina_using_Deep_Learning&diff=6295Predicting Blood Pressure from the retina using Deep Learning2022-03-29T12:01:40Z<p>Sbprm2022 Hermione: </p>
<hr />
<div>[[File:Retina DNN analysis Alex.pdf|thumb|Project 4]]<br />
<br />
= Report goes here =</div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Welcome_to_the_Computational_Biology_Group!&diff=6294Welcome to the Computational Biology Group!2022-03-29T11:55:21Z<p>Sbprm2022 Hermione: </p>
<hr />
<div>[[Category:Homepage]]<br />
<br />
__NOTOC__<br />
<br />
<!-- <newsbulletins>header=NEWS|limit=3</newsbulletins> --><br />
<!-- A history of all news can be found [[History | here]]. --><br />
<br />
<br />
[[Image:CBG 2017.png|800px]]<br />
<br />
== Who are we? ==<br />
The Computational Biology Group (CBG) is a research group embedded in the [http://unil.ch/dbc Department of Computational Biology] at the [http://unil.ch University of Lausanne]. The group consists of [http://www2.unil.ch/cbg/index.php?title=People PhD students and postdocs] and is led by [[user:Sven | Prof. Sven Bergmann]].<br />
<br />
== What are our interests? ==<br />
We develop and apply methods for the integrative analysis of large-scale biological and clinical data. Our goals are to improve fundamental understanding of how genetic variability affects phenotypes, to learn about underlying molecular mechanisms, and to make use of our insights to improve the diagnosis, prevention and treatment of disease whenever possible [http://www2.unil.ch/cbg/index.php?title=Science#Integrative_analysis_of_large-scale_biological_and_clinical_data Learn more]. We are also interested in relatively small biological systems that can be modeled quantitatively. Here our goal is to better understand the properties of these systems that contribute their functionality such as robustness and evolvability under changing environmental conditions [http://www2.unil.ch/cbg/index.php?title=Science#Study_of_small_genetic_networks Learn more].<br />
<br />
== How do we work? ==<br />
Most of our work is computational, which means we use computer algorithms to process and analyse data. Our analyses often have a statistical component to evaluate the significance of results. Whenever possible we describe our data using mathematical models. Sometimes these models can be solved analytically, but often we rely on numerical solutions and simulations. Some of our methods have a heuristic component, but we try to evaluate them rigorously and make them as practical as possible. We strongly believe in sharing our analysis tools and Open Science in general. <br />
<br />
Our group seeks an interdisciplinary approach, bridging the traditional gaps between physics, mathematics and biology. Our lab collaborates with experimental and medical research groups.<br />
<br />
== General info on this wiki ==<br />
This wiki is the main instrument to centralize and archive information on and generated by the CBG. Ask [[user:Micha | Micha]] if you have any questions or need an account.<br />
<br />
<br />
<!-- <newsbulletins>header=NEWS|limit=1</newsbulletins> --></div>Sbprm2022 Hermionehttp://www2.unil.ch/cbg/index.php?title=Welcome_to_the_Computational_Biology_Group!&diff=6293Welcome to the Computational Biology Group!2022-03-29T11:55:11Z<p>Sbprm2022 Hermione: </p>
<hr />
<div>[[Category:Homepage]]<br />
<br />
__NOTOC__<br />
<br />
<!-- <newsbulletins>header=NEWS|limit=3</newsbulletins> --><br />
<!-- A history of all news can be found [[History | here]]. --><br />
<br />
<br />
[[Image:CBG 2017.png|800px]]<br />
<br />
== Who are wee? ==<br />
The Computational Biology Group (CBG) is a research group embedded in the [http://unil.ch/dbc Department of Computational Biology] at the [http://unil.ch University of Lausanne]. The group consists of [http://www2.unil.ch/cbg/index.php?title=People PhD students and postdocs] and is led by [[user:Sven | Prof. Sven Bergmann]].<br />
<br />
== What are our interests? ==<br />
We develop and apply methods for the integrative analysis of large-scale biological and clinical data. Our goals are to improve fundamental understanding of how genetic variability affects phenotypes, to learn about underlying molecular mechanisms, and to make use of our insights to improve the diagnosis, prevention and treatment of disease whenever possible [http://www2.unil.ch/cbg/index.php?title=Science#Integrative_analysis_of_large-scale_biological_and_clinical_data Learn more]. We are also interested in relatively small biological systems that can be modeled quantitatively. Here our goal is to better understand the properties of these systems that contribute their functionality such as robustness and evolvability under changing environmental conditions [http://www2.unil.ch/cbg/index.php?title=Science#Study_of_small_genetic_networks Learn more].<br />
<br />
== How do we work? ==<br />
Most of our work is computational, which means we use computer algorithms to process and analyse data. Our analyses often have a statistical component to evaluate the significance of results. Whenever possible we describe our data using mathematical models. Sometimes these models can be solved analytically, but often we rely on numerical solutions and simulations. Some of our methods have a heuristic component, but we try to evaluate them rigorously and make them as practical as possible. We strongly believe in sharing our analysis tools and Open Science in general. <br />
<br />
Our group seeks an interdisciplinary approach, bridging the traditional gaps between physics, mathematics and biology. Our lab collaborates with experimental and medical research groups.<br />
<br />
== General info on this wiki ==<br />
This wiki is the main instrument to centralize and archive information on and generated by the CBG. Ask [[user:Micha | Micha]] if you have any questions or need an account.<br />
<br />
<br />
<!-- <newsbulletins>header=NEWS|limit=1</newsbulletins> --></div>Sbprm2022 Hermione