Difference between revisions of "Module 3: How to make valid prognostic models with gene expression signatures?"

Line 1: Line 1:
*  Title: "How to make valid prognostic models with gene expression signatures?"
+
*  Title: "How to make valid prognostic models when data contain many features like gene expression signatures?"
  
* Paper to be examined: “Gene Expression Profiling for Survival Prediction in Pediatric Rhabdomyosarcomas: A Report From the Children's Oncology Group”, J Clin Oncol. 2010 Mar 1;28(7):1240-6 (2010)[http://jco.ascopubs.org/content/28/7/1240.long]
 
  
* Key claim of the paper: "Metagenes to discriminate patients with good prognosis from those with poor prognosis, with the potential to direct risk-adapted therapy."  
+
* Paper to be examined / reproduced:
 +
“Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification”,
 +
JNCI J Natl Cancer Inst (2003) 95 (1): 14-18; doi: 10.1093/jnci/95.1.14
 +
[http://jnci.oxfordjournals.org/content/95/1/14.full]
 +
by R. Simon, M. D. Radmacher, K. Dobbin, L. M. McShane.
 +
 
 +
Richard Simon team is at the Biometric Research Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD
 +
(http://linus.nci.nih.gov/index.html)
 +
 
 +
 
 +
* Key claim of the paper: "Many publications report erroneous classification performances due to incorrect application of cross-validation methodology."  
  
 
* Data and Code
 
* Data and Code
 +
The study is based on simulated data with known results and shows the impact of variations in the cross-validation implementation
 +
with a well-chosen "toy example".
 
    
 
    
* Schedule:  
+
* Approximate Schedule:  
 +
 
 +
** H1: General introduction to the to the field and to useful terms
 +
** H2: Reading sections of the papers, extract main messages, information about what was done exactly, discussion
  
** H1: General introduction to the paper/motivation
+
** H3-6: Programming by students to reproduce the results of the paper, at least partially. Writing of a short report to be
** H2-3: Write code to import the data and start computing "meta-genes"
+
mailed to the teacher along with the code used.
  
** H4-6: Aim to fit a predictive model for clinical outcome based on meta-genes
+
** H7: Presentation of the results obtained in the course. Discussion of the take-home messages and of possible extensions to the study.
 +
** H8-9: Programming to adjust the code used previously or to explore extensions of the investigation.
 +
Writing of a revised short report to be mailed to the teacher along with the code used.
  
** H7: Discussion: “Is a model that fits the data necessarily a good predictive model?”
 
** H8: Sketch cross-validation approach
 
** H9: Summarize results (e.g. on this wiki)
 
  
 
* Key bioinformatics concept of this module:  
 
* Key bioinformatics concept of this module:  
** Prognostic models
+
** Prediction models - classifiers
** cross validation
+
** cross validation  
 +
 
 +
* Requirement to students of this module:
 +
** Ability to program in R. Students should come to the course with R ready to use.
 +
 
  
 
* back to [[UNIL MSc course: "Case studies in bioinformatics 2015"]]
 
* back to [[UNIL MSc course: "Case studies in bioinformatics 2015"]]

Revision as of 11:01, 28 October 2015

  • Title: "How to make valid prognostic models when data contain many features like gene expression signatures?"


  • Paper to be examined / reproduced:
“Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification”, 

JNCI J Natl Cancer Inst (2003) 95 (1): 14-18; doi: 10.1093/jnci/95.1.14 [1] by R. Simon, M. D. Radmacher, K. Dobbin, L. M. McShane.

Richard Simon team is at the Biometric Research Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD (http://linus.nci.nih.gov/index.html)


  • Key claim of the paper: "Many publications report erroneous classification performances due to incorrect application of cross-validation methodology."
  • Data and Code

The study is based on simulated data with known results and shows the impact of variations in the cross-validation implementation with a well-chosen "toy example".

  • Approximate Schedule:
    • H1: General introduction to the to the field and to useful terms
    • H2: Reading sections of the papers, extract main messages, information about what was done exactly, discussion
    • H3-6: Programming by students to reproduce the results of the paper, at least partially. Writing of a short report to be

mailed to the teacher along with the code used.

    • H7: Presentation of the results obtained in the course. Discussion of the take-home messages and of possible extensions to the study.
    • H8-9: Programming to adjust the code used previously or to explore extensions of the investigation.

Writing of a revised short report to be mailed to the teacher along with the code used.


  • Key bioinformatics concept of this module:
    • Prediction models - classifiers
    • cross validation
  • Requirement to students of this module:
    • Ability to program in R. Students should come to the course with R ready to use.