Module 4: How does feature selection impact integrative clustering analysis?

Revision as of 14:31, 13 October 2015 by Sven (talk | contribs)
  • Title: "How to make valid prognostic models with gene expression signatures?"
  • Paper to be examined: “The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups”, Nature 486(7403):346-52 (2012)[1]
  • Key claim of the paper: "We have generated a robust, population-based molecular subgrouping of breast cancer based on multiple genomic views. [...] The joint clustering of CNAs and gene expression profiles further resolves the considerable heterogeneity of the expression-only subgroups."
  • Data and Code
  • Schedule:
    • H1: General introduction to the paper/motivation
    • H2: Write code to import the data and practice with the iClusterPlus R package with vignette example
    • H3: Reproduce results from Figure 4 on subsample(s) of the data
    • H4-5: Write code to import second dataset and reproduce clustering results
    • H6: Discussion: "What features discriminate the resulting clusters? Do we see the issue? How can we improve?"
    • H7-8: Based on discussion, modify feature selection and redo the analyses on one (two) datasets
    • H9: Summarize results (e.g. on this wiki)
  • Key bioinformatics concept of this module:
    • Feature selection (and its importance for cluster analyses)
    • integrative analysis