Module 3: How to make valid prognostic models with gene expression signatures?

Title: "How to make valid prognostic models when data contain many features like gene expression signatures?"

Paper to be examined / reproduced:

“Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification”,

JNCI J Natl Cancer Inst (2003) 95 (1): 14-18; doi: 10.1093/jnci/95.1.14 [1] by R. Simon, M. D. Radmacher, K. Dobbin, L. M. McShane.

Richard Simon team is at the Biometric Research Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD (http://linus.nci.nih.gov/index.html)

Key claim of the paper: "Many publications report erroneous classification performances due to incorrect application of cross-validation methodology."

Data and Code

The study is based on simulated data with known results and shows the impact of variations in the cross-validation implementation with a well-chosen "toy example".

Approximate Schedule:

- H1: General introduction to the to the field and to useful terms
- H2: Reading sections of the papers, extract main messages, information about what was done exactly, discussion

- H3-6: Programming by students to reproduce the results of the paper, at least partially. Writing of a short report to be

mailed to the teacher along with the code used.

- H7: Presentation of the results obtained in the course. Discussion of the take-home messages and of possible extensions to the study.
- H8-9: Programming to adjust the code used previously or to explore extensions of the investigation.

Writing of a revised short report to be mailed to the teacher along with the code used.

Key bioinformatics concept of this module:
- Prediction models - classifiers
- cross validation

Requirement to students of this module:
- Ability to program in R. Students should come to the course with R ready to use.

back to UNIL MSc course: "Case studies in bioinformatics 2015"