Tree of Life: Inferring species phylogenies based on entire proteomes

Background: One of the central questions in modern biology that emanates from the availability of fully sequenced genomes and their protein-coding complements, the proteomes, is: how do species relate to each other through evolution?

Goals: We will use a data set of precomputed orthologous clusters from seven fully sequenced plant genomes. The first goal is to infer species phylogenies for these plant species using an approach based on entire proteomes rather than a couple of proteins (families). For this, we implement a (modified) phylogenomics pipeline using phyletic patterns of orthologous clusters. The second goal is to compare the results of the proposed heuristics to the results of conventional single-gene and of super-tree methods known from the literature.

Mathematical tools: Students will learn how to use bioinformatics tools for sequence similarity search (e.g. BLAST) and orthology clustering from the Linux/MacOSX command-line, to write simple scripts in the Perl and R/Octave languages, as well as to apply some mathematics of metric geometry for calculating similarities and/or distances between data vectors.

Supervisors: Arnold Kuzniar and Hannes Schabauer (Lab of Marc Robinson-Rechavi, UNIL-DEE)

Students: Didar Tolou, Marie Gallotlavallee, Rachel Barman

Presentations: File:TreeOfLife.pdf, File:TreeOfLife-final.pdf

References:

1. Snel, B.; Huynen, M. A. and Dutilh, B. E. Genome trees and the nature of genome evolution. Annu Rev Microbiol, 2005, 59, 191-209.

2. Finet, C.; Timme, R. E.; Delwiche, C. F. and Marlétaz, F. Multigene phylogeny of the green lineage reveals the origin and diversification of land plants. Curr Biol, 2010, 20, 2217-2222.

3. Kuzniar, A.; van Ham, R. C. H. J.; Pongor, S. and Leunissen, J. A. M. The quest for orthologs: finding the corresponding gene across genomes. Trends Genet, 2008, 24, 539-551.


Back to UNIL BSc course: "Solving Biological Problems that require Math 2011"]