Difference between revisions of "Pascal"

Line 6: Line 6:
  
  
[[File:PascalFigure1.pdf]]
+
[[File:PascalFigure1.jpg]]
 
'''Overview of the methodology to compute gene and pathway scores.'''  
 
'''Overview of the methodology to compute gene and pathway scores.'''  
 
a) We compute gene scores by aggregating SNP p-values from a GWAS meta-analysis (without the need for individual genotypes), while correcting for linkage disequilibrium (LD) structure. To this end, we use numerical and analytic solutions to compute gene p-values efficiently and accurately given LD information from a reference population (e.g. one provided by the 1000 Genomes Project40). Two options are available: the max and sum of chi-squared statistics, which are based on the most significant SNP and the average association signal across the region, respectively.  
 
a) We compute gene scores by aggregating SNP p-values from a GWAS meta-analysis (without the need for individual genotypes), while correcting for linkage disequilibrium (LD) structure. To this end, we use numerical and analytic solutions to compute gene p-values efficiently and accurately given LD information from a reference population (e.g. one provided by the 1000 Genomes Project40). Two options are available: the max and sum of chi-squared statistics, which are based on the most significant SNP and the average association signal across the region, respectively.  
 
b) We use external databases to define gene sets for each reported pathway. We then compute pathway scores by combining the scores of genes that belong to the same pathways, i.e. gene sets. The fast gene scoring method allows us to dynamically recalculate gene scores by aggregating SNP p-values across pathway genes that are in LD and thus cannot be treated independently. This amounts to fusing the genes and computing a new score that takes the full LD structure of the corresponding locus into account. We evaluate pathway enrichment of high-scoring (potentially fused) genes using one of two parameter-free procedures (chi-square or empirical score), avoiding any p-value cutoffs inherent to standard binary enrichment tests.
 
b) We use external databases to define gene sets for each reported pathway. We then compute pathway scores by combining the scores of genes that belong to the same pathways, i.e. gene sets. The fast gene scoring method allows us to dynamically recalculate gene scores by aggregating SNP p-values across pathway genes that are in LD and thus cannot be treated independently. This amounts to fusing the genes and computing a new score that takes the full LD structure of the corresponding locus into account. We evaluate pathway enrichment of high-scoring (potentially fused) genes using one of two parameter-free procedures (chi-square or empirical score), avoiding any p-value cutoffs inherent to standard binary enrichment tests.

Revision as of 14:39, 15 June 2015

Pascal (Pathway scoring algorithm) is an easy-to-use tool for gene scoring and pathway analysis from GWAS results. Pascal uses external data to estimate linkage disequilibrium. Therefore, the user only needs to supply genome wide SNP p-values. Pascal then derives p-values for genes and predefined pathways. Pascal doesn’t use Monte-Carlo simulation to derive gene p-values. This leads to increased speed and accuracy. This speed in the gene scoring is then leveraged to control the false positive rate in pathway scoring. For pathway scoring we implemented and tested enrichment strategies that compared very favorably compared to hypergeometric enrichment. This comparison was done on a large collection of GWAS results giving us confidence to recommend Pascal for downstream analysis of GWAS results. Pascal is mainly written in Java and has been tested on Unix systems and Mac OsX.

You can download the package here. (Note: This might take a while because it included the 1KG-EUR data.)


PascalFigure1.jpg Overview of the methodology to compute gene and pathway scores. a) We compute gene scores by aggregating SNP p-values from a GWAS meta-analysis (without the need for individual genotypes), while correcting for linkage disequilibrium (LD) structure. To this end, we use numerical and analytic solutions to compute gene p-values efficiently and accurately given LD information from a reference population (e.g. one provided by the 1000 Genomes Project40). Two options are available: the max and sum of chi-squared statistics, which are based on the most significant SNP and the average association signal across the region, respectively. b) We use external databases to define gene sets for each reported pathway. We then compute pathway scores by combining the scores of genes that belong to the same pathways, i.e. gene sets. The fast gene scoring method allows us to dynamically recalculate gene scores by aggregating SNP p-values across pathway genes that are in LD and thus cannot be treated independently. This amounts to fusing the genes and computing a new score that takes the full LD structure of the corresponding locus into account. We evaluate pathway enrichment of high-scoring (potentially fused) genes using one of two parameter-free procedures (chi-square or empirical score), avoiding any p-value cutoffs inherent to standard binary enrichment tests.