Difference between revisions of "Classify Rab proteins (GTPases) using ML approach"
Line 1: | Line 1: | ||
− | Our project aimed to build a classifier for the Rab proteins. We tried 3 machine learning methods: k nearest neighbour, decision tree, random forest. To use them | + | Our project aimed to build a classifier for the Rab proteins. We tried 3 machine learning methods: k nearest neighbour, decision tree, random forest. To use them, we translated our amino acid sequences (source Tracy database of Fassauer Lab) by extracting features. |
We trained our model then tested its performance. We optimised our model with cross validation, over/undersampling to get an even distribution and by adding a non rab group. | We trained our model then tested its performance. We optimised our model with cross validation, over/undersampling to get an even distribution and by adding a non rab group. | ||
− | The best performing model was KNN with k=11 using the CKSAAP feature. | + | The best performing model was KNN (with k neighbours = 11) using the CKSAAP feature (with default k space = 3). |
[https://www2.unil.ch/cbg/images/3/37/Rab_classifier.pdf Presentation slides] | [https://www2.unil.ch/cbg/images/3/37/Rab_classifier.pdf Presentation slides] |
Latest revision as of 13:00, 3 June 2024
Our project aimed to build a classifier for the Rab proteins. We tried 3 machine learning methods: k nearest neighbour, decision tree, random forest. To use them, we translated our amino acid sequences (source Tracy database of Fassauer Lab) by extracting features. We trained our model then tested its performance. We optimised our model with cross validation, over/undersampling to get an even distribution and by adding a non rab group. The best performing model was KNN (with k neighbours = 11) using the CKSAAP feature (with default k space = 3).