Difference between revisions of "Cancer Genes: How are they normally expressed?"
Line 17: | Line 17: | ||
'''Project description''': | '''Project description''': | ||
[[File: Tugce_CancerGenes.pdf]] | [[File: Tugce_CancerGenes.pdf]] | ||
+ | |||
+ | |||
+ | '''Report''': | ||
+ | [[File: CancerGenes.pdf]] | ||
+ | |||
+ | |||
+ | == Introduction == | ||
+ | |||
+ | TSG are genes which repress cell divisions and oncogenes promote them. | ||
+ | |||
+ | == AIM == | ||
+ | |||
+ | |||
+ | - Compare the expression of tumor suppressor gene and oncogenes in different healthy human tissues. | ||
+ | - Does the environment and lifestyle of individuals/patients influence expression of their genes? | ||
+ | |||
+ | |||
+ | == DATA == | ||
+ | - Gtex | ||
+ | - Cancer Census | ||
+ | - CoLaus | ||
+ | |||
+ | |||
+ | == TOOLS== | ||
+ | - R logiciel ( wilcox.test / cor.test ) | ||
+ | - Excel | ||
+ | |||
+ | - We calculated the mean, the median and sd values for each tissue for the two sets of genes. | ||
+ | - We used the function wilcox.test in R to compare the pooled gene expression of TSG against | ||
+ | oncogene | ||
+ | |||
+ | == Results == | ||
+ | |||
+ | - TSG expression is bigger than the oncogene expression (p-value: 2.2*10- 16 ). We explain this | ||
+ | by the fact that we study healthy individuals and as TSG control the cell cycles and divisions, it makes sense that they are more expressed than oncogenes which promote cell divisions. | ||
+ | |||
+ | We used a wilcox.test to compare the gene expression between tumor suppressor gene and oncogene for each tissue | ||
+ | Bonferroni correction of wilcox p.values (p.values*number of tests (52 here)) | ||
+ | We found that the brain cerebellar hemisphere showed significant differences: | ||
+ | |||
+ | p.value = 0.033. We explain this significant result by the fact that there is very few new cells in the | ||
+ | brain so the ones that are there have to be protected better as in any other tissue. - We started an across tissue analysis (Brain cerebellar hemisphere) | ||
+ | |||
+ | - Because our results in the across tissue analysis weren’t coherent (not significant result between brain cerebellar hemisphere and bladder) with the wilcox.test. We did also a correlation test with R ( cor.test ) and we found that there was a significant difference between all the tissues. | ||
+ | In fact, the wilcox.test measures the difference between expression levels for each tissues. However, the spearman correlation test measures dependence between two variables. | ||
+ | - We have pooled the gene expression values from all tissues and we have made boxplot of them. We have suppressed the outliers. | ||
+ | - We plan to analyze if age, sex and death cause have an effect on gene expression (the second part of our aim) . | ||
+ | |||
+ | - Brain tissues have a correlation of over 90% suggesting that DNA has a major importance in the expression of cancer genes as the level of correlation is very high through all the tissues. | ||
+ | - That’s the end of our first aim and the beginning of our second aim: we will see if other factors also have an effect on this expression. | ||
+ | |||
+ | '''Female/Male''' | ||
+ | |||
+ | We can see that males and females are not equally distributed. | ||
+ | Next we did the comparison of oncogenes and TSG expression between males and females and we found no significant result. | ||
+ | Cancer genes are essential and expressed in every tissue so we don’t expect a difference in expression of cancer genes between males and females. | ||
+ | Then, we did a comparison by using wilcox test of each genes expression between male and female. We found four genes (2 oncogenes and 2 TSG) expressed differently between males and females : | ||
+ | TSG : PCDH11Y ; BCORP1 | ||
+ | Oncogenes : ARSFP1 ; PARP4P1 | ||
+ | We looked at the distribution of PCDH11Y : | ||
+ | |||
+ | We see that this gene is mainly expressed in males and not female. When we look at the name of this gene (« Protocadherin 11 Y-Linked ») we see that PCDH11Y is located on the Y chromosome and that’s why it is expressed only in males. | ||
+ | |||
+ | |||
+ | '''Cause of death''' | ||
+ | First, we looked through a graph the number of people who died from different causes. | ||
+ | We found that there was between 4 and 9 people who died of cause 0, 1 and 3. 22 people who died of cause 4 and 61 who died of cause 2. | ||
+ | In a second step, we created a boxplot representing to the left the expression of the oncogenes according to the cause of the death, and on the right the expression of the tsg according to the causes of the death. | ||
+ | |||
+ | - We observe no obvious difference, and we find no significant result. | ||
+ | - The expression of cancer genes is therefore not affected by the cause of death. | ||
+ | |||
+ | |||
+ | |||
+ | '''Class of age''' | ||
+ | We created a graph representing the distribution of the number of people according to the age groups. | ||
+ | We observe that there are between 2 and 6 people in the age groups of 20 to 29 and 70 to 79 years, while there are between 11 and 45 individuals between 40 and 69 years old | ||
+ | Thus, we expect to find more significant results in age groups between 40 and 69 years old, because the population size for these classes is larger than for the others. | ||
+ | After using a wilcox.test comparing the expression between TSG and oncogene by age group, there is a significant difference for age groups of 40 to 69 years, but not in other population classes (because the size of population is too small). | ||
+ | Tumor suppressor genes are found to be more expressed than oncogenes in these age groups. Which is consistent with the results found previously. | ||
+ | Then, we have created boxplots. They allow us to see if there is a correlation between age and the expression of cancer genes. At a glance these graphs make us think of a negative correlation. | ||
+ | |||
+ | We use a spearman’s correlation test to confirm our hypothesis: | ||
+ | - Correlation coefficient = - 0.21 | ||
+ | - P-value = 0.03 | ||
+ | Spearman's correlation is positive: the expression of cancer genes decreases with age. After a multiple correlation tests for each gene separately, we found that: | ||
+ | - Spearman correlation test was significant for 22 oncogenes out of 1416 Spearman correlation test was significant for 13 tsg out of 889 | ||
+ | We found the function of each of its genes through the Gtex database: | ||
+ | |||
+ | |||
+ | |||
+ | Most of these genes are involved in cell growth, and in neuronal development. | ||
+ | We selected two genes as an example: | ||
+ | - The PSIP1 gene that is involved in neuroepithelial stem cell differentiation and neurogenesis. We have traced the expression of this gene according to age using boxplots and we observe that the expression of PSIP1 decreases with age. | ||
+ | - Correlation test: -4.221471 - P-value: 5.242974e-05 | ||
+ | - The PARM1 gene that is involved in the regulation of telomeres, and is therefore directly related to aging. | ||
+ | We have traced the expression of this gene according to age using boxplots and we observe that the expression of PARM1 decreases with age. | ||
+ | - Spearman correlation test: -5.884783 - P-value: 5.009928e-08 | ||
+ | |||
+ | So, we found that: | ||
+ | Many of the genes are involved in cell proliferation. | ||
+ | Many of them are also either expressed in brain or involved in neuronal development Age does seem to affect cancer gene expression | ||
+ | => brain tissue loses its ability to proliferate with age | ||
+ | |||
+ | |||
+ | After that, we began to work with CoLaus by choosing a phenotype to work on. | ||
+ | |||
+ | == CoLaus - A phenotypic database == | ||
+ | |||
+ | After having worked on GTEx data, we decided it would be interesting to test if there were any correlations with cancer gene expression and certain phenotypes. | ||
+ | CoLaus is a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. The study consists of approximately 6000 people in Lausanne with data or around 500 gene expressions. Although the focus of the study is not cancer, it was interesting to see if we would find any significant correlations with oncogenes end TSG gene expressions. In other words, we focused on trying to determine if there were any new genetic determinants associated with Cancer gene expression. | ||
+ | Gene expressions of the CoLaus data are from a lymphoma blastoma cell line. As this is a derived cell line we may not catch environmental effects, but only the effects of mutations, that we checked with many different phenotypes thereafter. | ||
+ | A summary of our findings (phenotype tested, type of test pertaining to the type of data and examples of the significant genes) can be found in the following table. | ||
+ | For most of the phenotypes that we tested, we found a few significant results. However, the significant genes found were not particularly highly expressed, and were almost never involved in the relevant phenotype: there was a correlation, but no causation. | ||
+ | These results were partially to be expected because, as we mentioned, the way the experiment is designed is that gene expression is only affected by DNA ant not the environment, so we do not expect to see any effect of the phenotype; phenotype is not in blood cells. | ||
+ | Concerning inflammation, we knew beforehand that there was a known correlation between cancer gene expression and inflammation. | ||
+ | One of the genes that proved significant after our tests was the TARM1 gene. This gene is a T Cell-Interacting, Activating Receptor on Myeloid Cells. When we visualised the gene expression in the form of a graph, we saw that there was only one individual with a significantly higher expression of this gene. This could ultimately be an indication that this specific sample may have something else going on, a confounding factor in the background. | ||
+ | |||
+ | Throughout our tests, this gene revealed itself as significant to other phenotypes as well, thus enforcing our previous hypothesis that one sample may have introduced a bias in our results. It is interesting to note that the original correlation between this gene and oncogenes was a mere 0.22, and no significant correlation was found with Tumour Suppressor Genes. | ||
+ | We also analysed the phenotype « prostate » and found one gene (ZNF299P) that is significantly expressed differently between healthy subject and prostate cancer patient. We looked at the distribution of this gene: | ||
+ | In fact this gene is not expressed in healthy subject but only in 2 cancer patients. And even in these 2 patients the expression is very low. The analysis was thus falsified by these outliers. So we cannot draw of scientific conclusion of this analysis. | ||
+ | Knowing that this gene is a zinc finger protein and by looking in the literature, we saw that certain zinc to finger proteins are known to be associated with prostate cancers. | ||
+ | |||
+ | |||
+ | |||
+ | == Conclusion == | ||
+ | |||
+ | The TSG are more highly expressed than oncogenes because they control the cell divisions and we study healthy individuals. This difference is especially significant in the brain cerebellar hemisphere as there is few cell renewal. Sex and cause of death don’t seem to affect the expression but the age shows significant results. CoLaus isn’t a good database to see phenotypes effect. |
Latest revision as of 10:13, 31 May 2018
Cancer Genes: How are they normally expressed?
Cancer is a disease that occurs through accumulation of mutations. Mutations at oncogenes and tumor supressor genes play an important role in tumorigenesis. Ocogenes normally regulate cell growth, however mutations at these genes can upregulate or active them, causing aberrant cellular growth. Tumor supressor genes normally protect cells from cancer by stoping cellular growth. However, if mutated they will not be able to function properly, leaving the cell unprotected and the cell will grow more than needed. In the last decade, the western world observed a major increase in the awareness on genetic screening. Followers of the media would remember the famous case of Angelina Jolie, where she has her ovaries and breasts removed after learning that she had a mutation in the genes BRCA1 and BRCA2, both are known to increase the risk of getting breast and overies cancer to more than 80 percent. Hence we need a better understanding of the nature of these genes so that we can control and reverse the impact of their alterations better. In my one of my projects, I study cancr evolution and tumor-specific differential regulation. I therefore propose to investigate cancer gene expression in healthy individuals for this project. The database (Gtex) we will use has gene expression data for more 60 tissues in more than 100 individuals. The questions you can answer will include:
In which tissues are they expressed at greatest? Which group is expressed higher? Does the expression change based on age or gender? Which group shows most variation?
Hence, the aims of the projects are: - get familiar with gene expression analysis - improve your R skills for data handling and statistical tests - if possible, formulate your own questions on a given data set
Project description: File:Tugce CancerGenes.pdf
Report:
File:CancerGenes.pdf
Contents
Introduction
TSG are genes which repress cell divisions and oncogenes promote them.
AIM
- Compare the expression of tumor suppressor gene and oncogenes in different healthy human tissues. - Does the environment and lifestyle of individuals/patients influence expression of their genes?
DATA
- Gtex - Cancer Census - CoLaus
TOOLS
- R logiciel ( wilcox.test / cor.test ) - Excel
- We calculated the mean, the median and sd values for each tissue for the two sets of genes. - We used the function wilcox.test in R to compare the pooled gene expression of TSG against oncogene
Results
- TSG expression is bigger than the oncogene expression (p-value: 2.2*10- 16 ). We explain this by the fact that we study healthy individuals and as TSG control the cell cycles and divisions, it makes sense that they are more expressed than oncogenes which promote cell divisions.
We used a wilcox.test to compare the gene expression between tumor suppressor gene and oncogene for each tissue Bonferroni correction of wilcox p.values (p.values*number of tests (52 here)) We found that the brain cerebellar hemisphere showed significant differences:
p.value = 0.033. We explain this significant result by the fact that there is very few new cells in the brain so the ones that are there have to be protected better as in any other tissue. - We started an across tissue analysis (Brain cerebellar hemisphere)
- Because our results in the across tissue analysis weren’t coherent (not significant result between brain cerebellar hemisphere and bladder) with the wilcox.test. We did also a correlation test with R ( cor.test ) and we found that there was a significant difference between all the tissues. In fact, the wilcox.test measures the difference between expression levels for each tissues. However, the spearman correlation test measures dependence between two variables. - We have pooled the gene expression values from all tissues and we have made boxplot of them. We have suppressed the outliers. - We plan to analyze if age, sex and death cause have an effect on gene expression (the second part of our aim) .
- Brain tissues have a correlation of over 90% suggesting that DNA has a major importance in the expression of cancer genes as the level of correlation is very high through all the tissues. - That’s the end of our first aim and the beginning of our second aim: we will see if other factors also have an effect on this expression.
Female/Male
We can see that males and females are not equally distributed. Next we did the comparison of oncogenes and TSG expression between males and females and we found no significant result. Cancer genes are essential and expressed in every tissue so we don’t expect a difference in expression of cancer genes between males and females. Then, we did a comparison by using wilcox test of each genes expression between male and female. We found four genes (2 oncogenes and 2 TSG) expressed differently between males and females : TSG : PCDH11Y ; BCORP1 Oncogenes : ARSFP1 ; PARP4P1 We looked at the distribution of PCDH11Y :
We see that this gene is mainly expressed in males and not female. When we look at the name of this gene (« Protocadherin 11 Y-Linked ») we see that PCDH11Y is located on the Y chromosome and that’s why it is expressed only in males.
Cause of death
First, we looked through a graph the number of people who died from different causes.
We found that there was between 4 and 9 people who died of cause 0, 1 and 3. 22 people who died of cause 4 and 61 who died of cause 2.
In a second step, we created a boxplot representing to the left the expression of the oncogenes according to the cause of the death, and on the right the expression of the tsg according to the causes of the death.
- We observe no obvious difference, and we find no significant result. - The expression of cancer genes is therefore not affected by the cause of death.
Class of age We created a graph representing the distribution of the number of people according to the age groups. We observe that there are between 2 and 6 people in the age groups of 20 to 29 and 70 to 79 years, while there are between 11 and 45 individuals between 40 and 69 years old Thus, we expect to find more significant results in age groups between 40 and 69 years old, because the population size for these classes is larger than for the others. After using a wilcox.test comparing the expression between TSG and oncogene by age group, there is a significant difference for age groups of 40 to 69 years, but not in other population classes (because the size of population is too small). Tumor suppressor genes are found to be more expressed than oncogenes in these age groups. Which is consistent with the results found previously. Then, we have created boxplots. They allow us to see if there is a correlation between age and the expression of cancer genes. At a glance these graphs make us think of a negative correlation.
We use a spearman’s correlation test to confirm our hypothesis: - Correlation coefficient = - 0.21 - P-value = 0.03 Spearman's correlation is positive: the expression of cancer genes decreases with age. After a multiple correlation tests for each gene separately, we found that: - Spearman correlation test was significant for 22 oncogenes out of 1416 Spearman correlation test was significant for 13 tsg out of 889 We found the function of each of its genes through the Gtex database:
Most of these genes are involved in cell growth, and in neuronal development. We selected two genes as an example: - The PSIP1 gene that is involved in neuroepithelial stem cell differentiation and neurogenesis. We have traced the expression of this gene according to age using boxplots and we observe that the expression of PSIP1 decreases with age. - Correlation test: -4.221471 - P-value: 5.242974e-05 - The PARM1 gene that is involved in the regulation of telomeres, and is therefore directly related to aging. We have traced the expression of this gene according to age using boxplots and we observe that the expression of PARM1 decreases with age. - Spearman correlation test: -5.884783 - P-value: 5.009928e-08
So, we found that: Many of the genes are involved in cell proliferation. Many of them are also either expressed in brain or involved in neuronal development Age does seem to affect cancer gene expression => brain tissue loses its ability to proliferate with age
After that, we began to work with CoLaus by choosing a phenotype to work on.
CoLaus - A phenotypic database
After having worked on GTEx data, we decided it would be interesting to test if there were any correlations with cancer gene expression and certain phenotypes. CoLaus is a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. The study consists of approximately 6000 people in Lausanne with data or around 500 gene expressions. Although the focus of the study is not cancer, it was interesting to see if we would find any significant correlations with oncogenes end TSG gene expressions. In other words, we focused on trying to determine if there were any new genetic determinants associated with Cancer gene expression. Gene expressions of the CoLaus data are from a lymphoma blastoma cell line. As this is a derived cell line we may not catch environmental effects, but only the effects of mutations, that we checked with many different phenotypes thereafter. A summary of our findings (phenotype tested, type of test pertaining to the type of data and examples of the significant genes) can be found in the following table. For most of the phenotypes that we tested, we found a few significant results. However, the significant genes found were not particularly highly expressed, and were almost never involved in the relevant phenotype: there was a correlation, but no causation. These results were partially to be expected because, as we mentioned, the way the experiment is designed is that gene expression is only affected by DNA ant not the environment, so we do not expect to see any effect of the phenotype; phenotype is not in blood cells. Concerning inflammation, we knew beforehand that there was a known correlation between cancer gene expression and inflammation. One of the genes that proved significant after our tests was the TARM1 gene. This gene is a T Cell-Interacting, Activating Receptor on Myeloid Cells. When we visualised the gene expression in the form of a graph, we saw that there was only one individual with a significantly higher expression of this gene. This could ultimately be an indication that this specific sample may have something else going on, a confounding factor in the background.
Throughout our tests, this gene revealed itself as significant to other phenotypes as well, thus enforcing our previous hypothesis that one sample may have introduced a bias in our results. It is interesting to note that the original correlation between this gene and oncogenes was a mere 0.22, and no significant correlation was found with Tumour Suppressor Genes. We also analysed the phenotype « prostate » and found one gene (ZNF299P) that is significantly expressed differently between healthy subject and prostate cancer patient. We looked at the distribution of this gene: In fact this gene is not expressed in healthy subject but only in 2 cancer patients. And even in these 2 patients the expression is very low. The analysis was thus falsified by these outliers. So we cannot draw of scientific conclusion of this analysis. Knowing that this gene is a zinc finger protein and by looking in the literature, we saw that certain zinc to finger proteins are known to be associated with prostate cancers.
Conclusion
The TSG are more highly expressed than oncogenes because they control the cell divisions and we study healthy individuals. This difference is especially significant in the brain cerebellar hemisphere as there is few cell renewal. Sex and cause of death don’t seem to affect the expression but the age shows significant results. CoLaus isn’t a good database to see phenotypes effect.