Difference between revisions of "GWAS project"

Latest revision as of 15:08, 14 September 2009

Project description

We offer projects on Genome Wide Association Studies (GWAS). These studies search for correlations between genetic markers (usually Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals. The motivation is that such associations could provide new candidates for causal variants in genes (or their regulatory elements) that play a role for the phenotype of interest. In the clinical context this may eventually lead to a better understanding of the genetic components of diseases and their risk factors.

We concentrate on the data generated for the Cohorte Lausanne (CoLaus). The CoLaus phenotypic dataset includes a large range of measurements, including extensive blood chemistry, anatomic and physiological measures, as well as parameters related to life style and history. Genotypes have been measured for ~500`000 SNPs using Affymetrix 500k SNP arrays. Regressing the various phenotypes onto these SNPs has already revealed a number of highly significant associations (see http://serverdgm.unil.ch/bergmann/CBG_publications.html for our publications).

In this GWAS project students will gain first experience with the “standard” protocols for GWAS:

genotype calling from the raw chip-data and basic quality control
principle component analysis (PCA) to detect and possibly correct for population stratification
genotype imputation (using linkage disequilibrium information from HapMap)
testing for association between a single SNP and continuous or categorical phenotypes
global significance analysis and correction for multiple testing
data presentation (e.g. using quantile-quantile and Manhattan plots)
cross-replication and meta-analysis for integration of association data from multiple studies

From the many GWAS that were performed in the last years it became apparent that even well-powered (meta-)studies with many thousands (and even ten-thousands) of samples could at best identify a few (dozen) candidate loci with highly significant associations. While many of these associations have been replicated in independent studies, each locus explains but a tiny (<1%) fraction of the genetic variance of the phenotype (as predicted from twin-studies). Remarkably, models that pool all significant loci into a single predictive scheme still miss out by at least one order of magnitude in explained variance. Thus, while GWAS already today provide new candidates for disease-associated genes and potential drug targets, very few – if any – of the currently identified (sets of) genotypic markers are of any practical use for accessing risk for predisposition to any of the complex diseases that have been studied. Different solutions to this apparent enigma have been proposed:

other variants like Copy Number Variations (CNVs) or epigenetics may play an important role
interactions between genetic variants (GxG) or with the environment (GxE)
many causal variants may be rare and/or poorly tagged by the measured SNPs
many causal variants may have very small effect sizes
overestimation of heritabilities from twin-studies

If time permits the student may develop his or her own research towards any of these new directions.

More Advanced Statistical Methodology

An important and widely used approach to dealing with cryptic population structure PricePC, and key references on genotype imputation ServinImputationMarchiniImputation.

A powerful approach to deal with strain structure or relatedness between individuals KangEMMA.

Software

PLINK is an excellent data handling tool, and implements many useful statistical methods. It's the Swiss Army Knife for GWAS.

EIGENSOFT is widely used for population structure analysis and correction.

IMPUTE and SNPTEST, or MACH and ProbABEL, or BimBam, and all be used to perform more sophisticated model based genotype imputation and association testing.

QUICKTEST is our own software for association testing using uncertain genotypes. For quantitative trait analysis, we think it is faster and better than SNPTEST.

References

BaldingTutorial pmid=16983374
McCarthyReview pmid=18398418
FlintReview pmid=15803197
PricePC pmid=16862161
ServinImputation pmid=17676998
MarchiniImputation pmid=17572673
KangEMMA pmid=18385116

</biblio>

@@ Line 1: / Line 1: @@
-We offer project on [[Genome Wide Association Studies]] (GWAS).
-These studies search for associations between genetic markers (usually Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals. We concentrate on the data generated for the Cohorte Lausanne (CoLaus). The CoLaus phenotypic dataset includes a large range of measurements, including extensive blood chemistry, anatomic and physiological measures, as well as parameters related to life style and history. Genotypes have been measured for ~500`000 SNPs using Affymetrix 500k SNP arrays. Regressing the various phenotypes onto these SNPs has already revealed a number of highly significant associations (see http://serverdgm.unil.ch/bergmann/CBG_publications.html for our publications).
+== Project description ==
+We offer projects on [[Genome Wide Association Studies]] (GWAS). These studies search for correlations between genetic markers (usually Single Nucleotide Polymorphisms, short SNPs) and any measurable trait in a population of individuals. The motivation is that such associations could provide new candidates for causal variants in genes (or their regulatory elements) that play a role for the phenotype of interest. In the clinical context this may eventually lead to a better understanding of the genetic components of diseases and their risk factors.
+We concentrate on the data generated for the Cohorte Lausanne (CoLaus). The CoLaus phenotypic dataset includes a large range of measurements, including extensive blood chemistry, anatomic and physiological measures, as well as parameters related to life style and history. Genotypes have been measured for ~500`000 SNPs using Affymetrix 500k SNP arrays. Regressing the various phenotypes onto these SNPs has already revealed a number of highly significant associations (see http://serverdgm.unil.ch/bergmann/CBG_publications.html for our publications).
 In this GWAS project students will gain first experience with the “standard” protocols for GWAS:
-.	genotype calling from the raw chip-data and basic quality control
+* genotype calling from the raw chip-data and basic quality control
-.	principle component analysis (PCA) to detect and possibly correct for population stratification
+* principle component analysis (PCA) to detect and possibly correct for population stratification
-.	genotype imputation (using linkage disequilibrium information from HapMap)
+* genotype imputation (using linkage disequilibrium information from HapMap)
-.	testing for association between a single SNP and continuous or categorical phenotypes
+* testing for association between a single SNP and continuous or categorical phenotypes
-.	global significance analysis and correction for multiple testing
+* global significance analysis and correction for multiple testing
-.	data presentation (e.g. using quantile-quantile and Manhattan plots)
+* data presentation (e.g. using quantile-quantile and Manhattan plots)
-.	cross-replication and meta-analysis for integration of association data from multiple studies
+* cross-replication and meta-analysis for integration of association data from multiple studies
 From the many GWAS that were performed in the last years it became apparent that even well-powered (meta-)studies with many thousands (and even ten-thousands) of samples could at best identify a few (dozen) candidate loci with highly significant associations. While many of these associations have been replicated in independent studies, each locus explains but a tiny (<1%) fraction of the genetic variance of the phenotype (as predicted from twin-studies). Remarkably, models that pool all significant loci into a single predictive scheme still miss out by at least one order of magnitude in explained variance. Thus, while GWAS already today provide new candidates for disease-associated genes and potential drug targets, very few – if any – of the currently identified (sets of) genotypic markers are of any practical use for accessing risk for predisposition to any of the complex diseases that have been studied. Different solutions to this apparent enigma have been proposed:
-.	other variants like Copy Number Variations (CNVs) or epigenetics may play an important role
+* other variants like Copy Number Variations (CNVs) or epigenetics may play an important role
-.	interactions between genetic variants (GxG) or with the environment (GxE)
+* interactions between genetic variants (GxG) or with the environment (GxE)
-.	many causal variants may be rare and/or poorly tagged by the measured SNPs
+* many causal variants may be rare and/or poorly tagged by the measured SNPs
-.	many causal variants may have very small effect sizes
+* many causal variants may have very small effect sizes
-.	overestimation of heritabilities from twin-studies
+* overestimation of heritabilities from twin-studies
 If time permits the student may develop his or her own research towards any of these new directions.
 == Further reading ==

Anonymous

Search

Navigation

About us

Science

Teaching

Software

Internal

Wiki tools

Wiki tools

Difference between revisions of "GWAS project"

Namespaces

Page actions

Latest revision as of 15:08, 14 September 2009

Contents

Project description

Further reading

More Advanced Statistical Methodology

Software

References

Anonymous

Search

Navigation

Wiki tools

Page tools

Difference between revisions of "GWAS project"

Latest revision as of 15:08, 14 September 2009

Contents

Project description

Further reading

More Advanced Statistical Methodology

Software

References