Difference between revisions of "Pascal"

Line 7: Line 7:
 
</teaser>
 
</teaser>
  
Pascal (Pathway scoring algorithm) is an easy-to-use tool for gene scoring and pathway analysis from GWAS results. Pascal uses external data to estimate linkage disequilibrium. Therefore, the user only needs to supply genome wide SNP p-values. Pascal then derives p-values for genes and predefined pathways. Pascal doesn’t use Monte-Carlo simulation to derive gene p-values. This leads to increased speed and accuracy. This speed in the gene scoring is then leveraged to control the false positive rate in pathway scoring. For pathway scoring we implemented and tested enrichment strategies that compared very favorably compared to hypergeometric enrichment. This comparison was done on a large collection of GWAS results giving us confidence to recommend Pascal for downstream analysis of GWAS results.
+
== Rigorous gene and pathway analysis of GWAS ==
 +
 
 +
'''Pascal (Pathway scoring algorithm) is an easy-to-use tool for gene scoring and pathway analysis from GWAS results'''. Pascal uses external data to estimate linkage disequilibrium. Therefore, the user only needs to supply genome wide SNP p-values. Pascal then derives p-values for genes and predefined pathways. Pascal doesn’t use Monte-Carlo simulation to derive gene p-values. This leads to increased speed and accuracy. This speed in the gene scoring is then leveraged to control the false positive rate in pathway scoring. For pathway scoring we implemented and tested enrichment strategies that compared very favorably compared to hypergeometric enrichment. This comparison was done on a large collection of GWAS results giving us confidence to recommend Pascal for downstream analysis of GWAS results.
 
Pascal is mainly written in Java and has been tested on Unix systems and Mac OsX.   
 
Pascal is mainly written in Java and has been tested on Unix systems and Mac OsX.   
  
You can download the package [http://www2.unil.ch/cbg/images/3/3d/PASCAL.zip here].
+
''Download''
(Note: This might take a while because it included the 1KG-EUR data.)
+
* '''[http://www2.unil.ch/cbg/images/3/3d/PASCAL.zip Pascal package]''' (Download might take a while because the 1KG-EUR data is included)
 
+
*  '''[[PascalTestData | Test data]]''' (Additional data that was used for evaluation in the paper)
 
 
[[File:PascalFigure1.jpg|500px|right]]
 
'''Overview of the methodology to compute gene and pathway scores.'''
 
  
a) We compute gene scores by aggregating SNP p-values from a GWAS meta-analysis (without the need for individual genotypes), while correcting for linkage disequilibrium (LD) structure. To this end, we use numerical and analytic solutions to compute gene p-values efficiently and accurately given LD information from a reference population. Two options are available: the max and sum of chi-squared statistics, which are based on the most significant SNP and the average association signal across the region, respectively.  
+
''Reference''
 +
* '''Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics'''. ('''[http://regulatorycircuits.org/data/papers/LamparterMarbach2016.pdf PDF]''')<br> Lamparter D*, Marbach D*, Rico R, Kutalik Z, and Bergmann S. <br> [http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004714 ''PLoS Computational Biology'' 12, e1004714, 2016.]
  
b) We use external databases to define gene sets for each reported pathway. We then compute pathway scores by combining the scores of genes that belong to the same pathways, i.e. gene sets. The fast gene scoring method allows us to dynamically recalculate gene scores by aggregating SNP p-values across pathway genes that are in LD and thus cannot be treated independently. This amounts to fusing the genes and computing a new score that takes the full LD structure of the corresponding locus into account. We evaluate pathway enrichment of high-scoring (potentially fused) genes using one of two parameter-free procedures (chi-square or empirical score), avoiding any p-value cutoffs inherent to standard binary enrichment tests.
+
----
 +
[[File:PascalFigure1.jpg|500px|left]]
 +
'''Figure: Overview of methodology to compute gene and pathway scores'''
  
'''Test Data'''
+
('''a''') We compute gene scores by aggregating SNP p-values from a GWAS meta-analysis (without the need for individual genotypes), while correcting for linkage disequilibrium (LD) structure. To this end, we use numerical and analytic solutions to compute gene p-values efficiently and accurately given LD information from a reference population. Two options are available: the max and sum of chi-squared statistics, which are based on the most significant SNP and the average association signal across the region, respectively.
  
Additional data that was used in evaluating the tool can be found [[PascalTestData | here]].
+
('''b''') We use external databases to define gene sets for each reported pathway. We then compute pathway scores by combining the scores of genes that belong to the same pathways, i.e. gene sets. The fast gene scoring method allows us to dynamically recalculate gene scores by aggregating SNP p-values across pathway genes that are in LD and thus cannot be treated independently. This amounts to fusing the genes and computing a new score that takes the full LD structure of the corresponding locus into account. We evaluate pathway enrichment of high-scoring (potentially fused) genes using one of two parameter-free procedures (chi-square or empirical score), avoiding any p-value cutoffs inherent to standard binary enrichment tests.

Revision as of 13:01, 4 April 2016



Rigorous gene and pathway analysis of GWAS

Pascal (Pathway scoring algorithm) is an easy-to-use tool for gene scoring and pathway analysis from GWAS results. Pascal uses external data to estimate linkage disequilibrium. Therefore, the user only needs to supply genome wide SNP p-values. Pascal then derives p-values for genes and predefined pathways. Pascal doesn’t use Monte-Carlo simulation to derive gene p-values. This leads to increased speed and accuracy. This speed in the gene scoring is then leveraged to control the false positive rate in pathway scoring. For pathway scoring we implemented and tested enrichment strategies that compared very favorably compared to hypergeometric enrichment. This comparison was done on a large collection of GWAS results giving us confidence to recommend Pascal for downstream analysis of GWAS results. Pascal is mainly written in Java and has been tested on Unix systems and Mac OsX.

Download

  • Pascal package (Download might take a while because the 1KG-EUR data is included)
  • Test data (Additional data that was used for evaluation in the paper)

Reference


PascalFigure1.jpg

Figure: Overview of methodology to compute gene and pathway scores

(a) We compute gene scores by aggregating SNP p-values from a GWAS meta-analysis (without the need for individual genotypes), while correcting for linkage disequilibrium (LD) structure. To this end, we use numerical and analytic solutions to compute gene p-values efficiently and accurately given LD information from a reference population. Two options are available: the max and sum of chi-squared statistics, which are based on the most significant SNP and the average association signal across the region, respectively.

(b) We use external databases to define gene sets for each reported pathway. We then compute pathway scores by combining the scores of genes that belong to the same pathways, i.e. gene sets. The fast gene scoring method allows us to dynamically recalculate gene scores by aggregating SNP p-values across pathway genes that are in LD and thus cannot be treated independently. This amounts to fusing the genes and computing a new score that takes the full LD structure of the corresponding locus into account. We evaluate pathway enrichment of high-scoring (potentially fused) genes using one of two parameter-free procedures (chi-square or empirical score), avoiding any p-value cutoffs inherent to standard binary enrichment tests.