Module 4: How does feature selection impact integrative clustering analysis?
- Title: "How to make valid prognostic models with gene expression signatures?"
- Paper to be examined: “The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups”, Nature 486(7403):346-52 (2012)[1]
- Key claim of the paper: "We have generated a robust, population-based molecular subgrouping of breast cancer based on multiple genomic views. [...] The joint clustering of CNAs and gene expression profiles further resolves the considerable heterogeneity of the expression-only subgroups."
- Data and Code
- Schedule:
- H1: General introduction to the paper/motivation
- H2: Write code to import the data and practice with the iClusterPlus R package with vignette example
- H3: Reproduce results from Figure 4 on subsample(s) of the data
- H4-5: Write code to import second dataset and reproduce clustering results
- H6: Discussion: "What features discriminate the resulting clusters? Do we see the issue? How can we improve?"
- H7-8: Based on discussion, modify feature selection and redo the analyses on one (two) datasets
- H9: Summarize results (e.g. on this wiki)
- Key bioinformatics concept of this module:
- Feature selection (and its importance for cluster analyses)
- integrative analysis