http://www2.unil.ch/cbg/api.php?action=feedcontributions&user=Armand&feedformat=atomCBG - User contributions [en]2024-03-29T07:34:41ZUser contributionsMediaWiki 1.31.12http://www2.unil.ch/cbg/index.php?title=File:Colaus_PCAmerge_results.zip&diff=2457File:Colaus PCAmerge results.zip2011-11-04T09:26:33Z<p>Armand: </p>
<hr />
<div></div>Armandhttp://www2.unil.ch/cbg/index.php?title=File:Colaus_PCAmerge_results.zip&diff=2450File:Colaus PCAmerge results.zip2011-10-30T19:43:06Z<p>Armand: </p>
<hr />
<div></div>Armandhttp://www2.unil.ch/cbg/index.php?title=File:PCA_MERGE_SOURCE_SOM.zip&diff=2370File:PCA MERGE SOURCE SOM.zip2011-08-19T12:05:49Z<p>Armand: </p>
<hr />
<div></div>Armandhttp://www2.unil.ch/cbg/index.php?title=File:PCA_MERGE_SOURCE_SOM.zip&diff=2368File:PCA MERGE SOURCE SOM.zip2011-08-19T11:36:46Z<p>Armand: </p>
<hr />
<div></div>Armandhttp://www2.unil.ch/cbg/index.php?title=File:Valsesia_Armand_PhD_Thesis_2011bw.pdf&diff=2225File:Valsesia Armand PhD Thesis 2011bw.pdf2011-04-12T06:37:05Z<p>Armand: </p>
<hr />
<div></div>Armandhttp://www2.unil.ch/cbg/index.php?title=File:PCA_MERGE_SOURCE_SOM.zip&diff=2202File:PCA MERGE SOURCE SOM.zip2011-04-03T19:42:11Z<p>Armand: </p>
<hr />
<div></div>Armandhttp://www2.unil.ch/cbg/index.php?title=File:Valsesia_-_Yearly_reports.zip&diff=1912File:Valsesia - Yearly reports.zip2011-01-06T11:42:54Z<p>Armand: </p>
<hr />
<div></div>Armandhttp://www2.unil.ch/cbg/index.php?title=File:Chapter_5_-_Melanoma_-_Supplementary_Files.zip&diff=1911File:Chapter 5 - Melanoma - Supplementary Files.zip2011-01-06T11:36:30Z<p>Armand: </p>
<hr />
<div></div>Armandhttp://www2.unil.ch/cbg/index.php?title=File:Valsesia_-_PhD_thesis_-_January_2011.zip&diff=1910File:Valsesia - PhD thesis - January 2011.zip2011-01-06T11:35:52Z<p>Armand: </p>
<hr />
<div></div>Armandhttp://www2.unil.ch/cbg/index.php?title=UNIX_recipes&diff=1883UNIX recipes2010-11-09T16:18:59Z<p>Armand: </p>
<hr />
<div>Armand's unix memo<br />
<br />
= Disc checks =<br />
<br />
How to find which partition is which ?<br />
sudo fdisk -l /dev/hda<br />
<br />
How to check a disk<br />
sudo fsck /dev/sda1<br />
<br />
How to find the bad blocks<br />
sudo badblocks /dev/sda1<br />
<br />
How to reformat a disc ignoring bad blocks<br />
sudo mke2fs -c /dev/sda1<br />
<br />
= R tricks =<br />
<br />
R locale setting for Mac OS X <br />
<br />
To enforce US-english setting regardless of the system setting : <br />
system("defaults write org.R-project.R force.LANG en_US.UTF-8")<br />
<br />
Note that you must always use .UTF-8 version of the locale, otherwise R.app will not work properly..<br />
<br />
= System monitoring =<br />
<br />
list processes<br />
top<br />
list logged on users, and what they are doing<br />
w<br />
shows processes<br />
ps<br />
list open files, by process <br />
lsof<br />
info about processors<br />
cat /proc/cpuinfo<br />
nb of processor on system<br />
grep -ic ^processor /proc/cpuinfo<br />
<br />
= Running scripts =<br />
<br />
How to execute a given command for many different parameters stored in a file<br />
cat paramList.txt | xargs mycommand<br />
<br />
<br />
= SSH =<br />
<br />
How to generate public key<br />
ssh-keygen -f dsa<br />
<br />
How to display variables for sshagent<br />
ssh-agent<br />
<br />
How to create a passphrase<br />
ssh-add<br />
<br />
How to authorize ssh connection to a remote machine using the generated key<br />
cat .ssh/id_dsa.pub | ssh machine_name "cat - >> .ssh/authorized_keys"<br />
<br />
<br />
= File manipulation =<br />
<br />
How to print a section of a file<br />
awk 'NR >= mystart && NR <= myend' myfile<br />
<br />
How to count #of columns per line<br />
awk '{ print NF }' myfile<br />
<br />
Print a given column (i.e. 2nd one)<br />
awk '{ print $2 }' myfile<br />
cut -f2 myfile<br />
<br />
Pasting 2 files together by their columns<br />
paste file1 file2<br />
<br />
Joining 2 files by a common column (ie 1st column of file1 contains some common identifiers than the 3rd column of file2)<br />
join -1 1 -2 3 file1 file2<br />
<br />
Sorting numerically a file by its 3rd column<br />
sort -n +2 myfile<br />
Sorting numerically a file by its 2nd column then 1st and then 3rd<br />
sort -n -k 2,1,3 myfile<br />
<br />
Splitting a file into smaller files with a fixed number of lines (i.e. 100)<br />
split -l 100 myfile<br />
Remove 10 first line of a file<br />
sed '1,10d' myfile<br />
<br />
Checking file type :<br />
file myfile<br />
<br />
Converting dos "end-like" file to unix<br />
perl -p -e 's/\r$//' < myfile > mynewfile<br />
<br />
Checking ascii content of a file :<br />
od -c myfile | more<br />
<br />
Intersect 2 files<br />
comm -12 a b<br />
<br />
Substraction (lines unique to a)<br />
comm -23 a b<br />
<br />
Symmetric difference<br />
comm -2 a b</div>Armandhttp://www2.unil.ch/cbg/index.php?title=UNIX_recipes&diff=1870UNIX recipes2010-10-12T14:15:13Z<p>Armand: </p>
<hr />
<div>Armand's unix memo<br />
<br />
= Disc checks =<br />
<br />
How to find which partition is which ?<br />
sudo fdisk -l /dev/hda<br />
<br />
How to check a disk<br />
sudo fsck /dev/sda1<br />
<br />
How to find the bad blocks<br />
sudo badblocks /dev/sda1<br />
<br />
How to reformat a disc ignoring bad blocks<br />
sudo mke2fs -c /dev/sda1<br />
<br />
= System monitoring =<br />
<br />
list processes<br />
top<br />
list logged on users, and what they are doing<br />
w<br />
shows processes<br />
ps<br />
list open files, by process <br />
lsof<br />
info about processors<br />
cat /proc/cpuinfo<br />
nb of processor on system<br />
grep -ic ^processor /proc/cpuinfo<br />
<br />
= Running scripts =<br />
<br />
How to execute a given command for many different parameters stored in a file<br />
cat paramList.txt | xargs mycommand<br />
<br />
<br />
= SSH =<br />
<br />
How to generate public key<br />
ssh-keygen -f dsa<br />
<br />
How to display variables for sshagent<br />
ssh-agent<br />
<br />
How to create a passphrase<br />
ssh-add<br />
<br />
How to authorize ssh connection to a remote machine using the generated key<br />
cat .ssh/id_dsa.pub | ssh machine_name "cat - >> .ssh/authorized_keys"<br />
<br />
<br />
= File manipulation =<br />
<br />
How to print a section of a file<br />
awk 'NR >= mystart && NR <= myend' myfile<br />
<br />
How to count #of columns per line<br />
awk '{ print NF }' myfile<br />
<br />
Print a given column (i.e. 2nd one)<br />
awk '{ print $2 }' myfile<br />
cut -f2 myfile<br />
<br />
Pasting 2 files together by their columns<br />
paste file1 file2<br />
<br />
Joining 2 files by a common column (ie 1st column of file1 contains some common identifiers than the 3rd column of file2)<br />
join -1 1 -2 3 file1 file2<br />
<br />
Sorting numerically a file by its 3rd column<br />
sort -n +2 myfile<br />
Sorting numerically a file by its 2nd column then 1st and then 3rd<br />
sort -n -k 2,1,3 myfile<br />
<br />
Splitting a file into smaller files with a fixed number of lines (i.e. 100)<br />
split -l 100 myfile<br />
Remove 10 first line of a file<br />
sed '1,10d' myfile<br />
<br />
Checking file type :<br />
file myfile<br />
<br />
Converting dos "end-like" file to unix<br />
perl -p -e 's/\r$//' < myfile > mynewfile<br />
<br />
Checking ascii content of a file :<br />
od -c myfile | more<br />
<br />
Intersect 2 files<br />
comm -12 a b<br />
<br />
Substraction (lines unique to a)<br />
comm -23 a b<br />
<br />
Symmetric difference<br />
comm -2 a b</div>Armandhttp://www2.unil.ch/cbg/index.php?title=UNIX_recipes&diff=1683UNIX recipes2010-05-20T15:29:22Z<p>Armand: </p>
<hr />
<div>Armand's unix memo<br />
<br />
= Disc checks =<br />
<br />
How to find which partition is which ?<br />
sudo fdisk -l /dev/hda<br />
<br />
How to check a disk<br />
sudo fsck /dev/sda1<br />
<br />
How to find the bad blocks<br />
sudo badblocks /dev/sda1<br />
<br />
How to reformat a disc ignoring bad blocks<br />
sudo mke2fs -c /dev/sda1<br />
<br />
= System monitoring =<br />
<br />
list processes<br />
top<br />
list logged on users, and what they are doing<br />
w<br />
shows processes<br />
ps<br />
list open files, by process <br />
lsof<br />
info about processors<br />
cat /proc/cpuinfo<br />
nb of processor on system<br />
grep -ic ^processor /proc/cpuinfo<br />
<br />
= Running scripts =<br />
<br />
How to execute a given command for many different parameters stored in a file<br />
cat paramList.txt | xargs mycommand<br />
<br />
<br />
= SSH =<br />
<br />
How to generate public key<br />
ssh-keygen -f dsa<br />
<br />
How to display variables for sshagent<br />
ssh-agent<br />
<br />
How to create a passphrase<br />
ssh-add<br />
<br />
How to authorize ssh connection to a remote machine using the generated key<br />
cat .ssh/id_dsa.pub | ssh machine_name "cat - >> .ssh/authorized_keys"<br />
<br />
<br />
= File manipulation =<br />
<br />
How to print a section of a file<br />
awk 'NR >= mystart && NR <= myend' myfile<br />
<br />
How to count #of columns per line<br />
awk '{ print NF }' myfile<br />
<br />
Print a given column (i.e. 2nd one)<br />
awk '{ print $2 }' myfile<br />
cut -f2 myfile<br />
<br />
Pasting 2 files together by their columns<br />
paste file1 file2<br />
<br />
Joining 2 files by a common column (ie 1st column of file1 contains some common identifiers than the 3rd column of file2)<br />
join -1 1 -2 3 file1 file2<br />
<br />
Sorting numerically a file by its 3rd column<br />
sort -n +2 myfile<br />
Sorting numerically a file by its 2nd column then 1st and then 3rd<br />
sort -n -k 2,1,3 myfile<br />
<br />
Splitting a file into smaller files with a fixed number of lines (i.e. 100)<br />
split -l 100 myfile<br />
Remove 10 first line of a file<br />
sed '1,10d' myfile<br />
<br />
Checking file type :<br />
file myfile<br />
<br />
Converting dos "end-like" file to unix<br />
perl -p -e 's/\r$//' < myfile > mynewfile<br />
<br />
Checking ascii content of a file :<br />
od -c myfile | more</div>Armandhttp://www2.unil.ch/cbg/index.php?title=GMM&diff=1214GMM2010-01-08T16:07:02Z<p>Armand: </p>
<hr />
<div><br/><br />
Deletion, insertion and duplication events giving rise to copy number variations (CNVs) have been found genome-wide in the humans and other species.<br />
Such genomic aberrations were identified already more than a decade ago using array-based comparative hybridization. They can also be detected using <br />
data from SNP genotyping arrays, typically by combining the intensities of the two probes for a given SNP and comparing to the same SNP from other arrays (thus deriving a copy number ratio).<br />
Significant shift from the baseline (unit ratio or zero log ratio) reflects copy number changes. Such changes can be identified in many ways, for example, one can use segmentation algorithms to partition the signal then classify such segments into gain, copy neutral and loss status.<br />
Yet, for large datasets, one can take advantage of the signal distribution at each SNP, and cluster each individual from the distribution into a component that would reflect a given copy number change.<br />
<br />
We developed a Gaussian Mixture Model, which detect copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage). <br />
<br />
=== License ===<br />
<br />
The GMM algorithm is licensed under the GNU General Public License, version 2 or later. For details, see http://www.gnu.org/licenses/old-licenses/gpl-2.0.html.<br />
<br />
=== Usage ===<br />
<br />
The GMM can be applied to identify CNVs from any rectangular matrix of copy number ratio. <br />
<br />
Format is like : chr pos sample1 sample2 ...<br />
<br />
Fields should be tab-delimited and it assumes data (within chromosome) are sorted by position.<br />
<br />
An example input file is available within the GMM_CNV.zip (see Download section).<br />
<br />
<br />
For '''[http://www.mathworks.com/ Matlab] users''', download the source code and use the callCNVs.m script.<br />
<br />
'''Users without Matlab''', can use the compiled version and the Matlab Component Runtime (MCR). (Please note, we are only providing a compiled Linux x86_64 version for now).<br />
<br />
Then you can use (and edit according to your need) the shell script called "run_CallCNVs.sh" <br />
<br />
sh run_CallCNVs.sh path_to_the_MCR/v79/<br />
<br />
=== Requirements ===<br />
<br />
If you have the MATLAB software, you can directly use the source code.<br />
<br />
Otherwise, you will need to download the Matlab Component Runtime to use the executables (see Download section).<br />
<br />
=== Download ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Description <br />
! File Name<br />
! Size <br />
! md5sum<br />
|-<br />
| MCR for 64-bit Linux <br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/MCR2007_x86_64.zip|MCR2007_x86_64.zip|/download/MCR2007_x86_64.zip</googa><br />
| 224M<br />
| 451c54a811b3e01402b6a46a1b814c4d <br />
|-<br />
| Linux Executables (+ example input file)<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV.zip|GMM_CNV.zip|/download/GMM_CNV.zip</googa><br />
| 556k <br />
| bd579f39c340a50de2bb80a649643be3<br />
|-<br />
| Source code<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_SOURCE.zip|GMM_CNV_SOURCE.zip|/download/GMM_CNV_SOURCE.zip</googa><br />
| 16k <br />
| 3cb7799bf3e180b33a6742ef382b105e<br />
|-<br />
|-<br />
| Example output files<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_outputs.zip|GMM_CNV_outputs.zip|/download/GMM_CNV_outputs.zip</googa><br />
| 460k <br />
| 6b621a6a8e279697f610db35810777ce<br />
|-<br />
|}<br />
<br />
=== Frequently Ask Questions ===<br />
<br />
<br />
<br />
'''* What are the default component the model will try to fit?'''<br />
<br />
The current implementation models deletion, copy neutral, 3 copies and more than 3 copies.<br />
<br />
<br />
'''* What happen if the model fails to fit the data ?'''<br />
<br />
The model will output this warning :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
Missing data will be set as 0. Then the model will analyse the next SNP (if any).<br />
<br />
<br />
'''* I am getting :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
What does this mean?<br />
'''<br />
The model could not find the component separation before reaching its maximal iteration limit.<br />
This can be due to noisy data, or distribution where no such separation exists.<br />
Try increasing :<br />
MAX_FUN_CALL=10000; # nb of optimization function call<br />
MAX_FUN_ITER=5000; # nb of iterations for each optimization function call<br />
But note, this can significantly increase the runtime.<br />
<br />
<br />
'''* Can I apply some extra normalization before fitting the Gaussian Mixture Model?'''<br />
<br />
Yes, by default a Loess smoothing is applied. (This step can be skipped by setting DO_LOESS_SMOOTH=0 in the shell script or setting DO_LOESS=0; in callCNVs.m).<br />
<br />
Since Gaussian Mixture Model can be sensitive to batch effects, it is strongly recommended that adequate normalization is applied before using the model. <br />
note : The loess smoothing will not correct batch effects, but will improve the signal to noise ratio within individual profile. By default, the Loess windows size is 41 SNPs. For higher density arrays (Affymetrix 6.0 or Illumina 1M) such window could be increased. <br />
<br />
<br />
'''* I am getting this error : <br />
error while loading shared libraries: libmwmclmcrrt.so:<br />
cannot open shared object file: No such file or directory<br />
what does it mean?'''<br />
<br />
Most likely your LD_LIBRARY_PATH is not pointing correctly to the MCR.<br />
The run_callCNVs.sh script should do it for you.<br />
<br />
sh run_callCNVs.sh /path-to-my-MCR/v79 test.dat</div>Armandhttp://www2.unil.ch/cbg/index.php?title=GMM&diff=1213GMM2010-01-08T16:06:40Z<p>Armand: </p>
<hr />
<div><br/><br />
Deletion, insertion and duplication events giving rise to copy number variations (CNVs) have been found genome-wide in the humans and other species.<br />
Such genomic aberrations were identified already more than a decade ago using array-based comparative hybridization. They can also be detected using <br />
data from SNP genotyping arrays, typically by combining the intensities of the two probes for a given SNP and comparing to the same SNP from other arrays (thus deriving a copy number ratio).<br />
Significant shift from the baseline (unit ratio or zero log ratio) reflects copy number changes. Such changes can be identified in many ways, for example, one can use segmentation algorithms to partition the signal then classify such segments into gain, copy neutral and loss status.<br />
Yet, for large datasets, one can take advantage of the signal distribution at each SNP, and cluster each individual from the distribution into a component that would reflect a given copy number change.<br />
<br />
We developed a Gaussian Mixture Model, which detect copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage). <br />
<br />
=== License ===<br />
<br />
The GMM algorithm is licensed under the GNU General Public License, version 2 or later. For details, see http://www.gnu.org/licenses/old-licenses/gpl-2.0.html.<br />
<br />
=== Usage ===<br />
<br />
The GMM can be applied to identify CNVs from any rectangular matrix of copy number ratio. <br />
<br />
Format is like : chr pos sample1 sample2 ...<br />
<br />
Fields should be tab-delimited and it assumes data (within chromosome) are sorted by position.<br />
<br />
An example input file is available within the GMM_CNV.zip (see Download section).<br />
<br />
<br />
For '''[http://www.mathworks.com/ Matlab] users''', download the source code and use the callCNVs.m script.<br />
<br />
'''Users without Matlab''', can use the compiled version and the Matlab Component Runtime (MCR). (Please note, we are only providing a compiled Linux x86_64 version for now).<br />
<br />
Then you can use (and edit according to your need) the shell script called "run_CallCNVs.sh" <br />
<br />
sh run_CallCNVs.sh path_to_the_MCR/v79/<br />
<br />
=== Requirements ===<br />
<br />
If you have the MATLAB software, you can directly use the source code.<br />
<br />
Otherwise, you will need to download the Matlab Component Runtime to use the executables (see Download section).<br />
<br />
=== Download ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Description <br />
! File Name<br />
! Size <br />
! md5sum<br />
|-<br />
| MCR for 64-bit Linux <br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/MCR2007_x86_64.zip|MCR2007_x86_64.zip|/download/MCR2007_x86_64.zip</googa><br />
| 224M<br />
| 451c54a811b3e01402b6a46a1b814c4d <br />
|-<br />
| Linux Executables (+ example input file)<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV.zip|GMM_CNV.zip|/download/GMM_CNV.zip</googa><br />
| 556k <br />
| bd579f39c340a50de2bb80a649643be3<br />
|-<br />
| Source code<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_SOURCE.zip|GMM_CNV_SOURCE.zip|/download/GMM_CNV_SOURCE.zip</googa><br />
| 16k <br />
| 3cb7799bf3e180b33a6742ef382b105e<br />
|-<br />
|-<br />
| Example output files<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_outputs.zip|GMM_CNV_outputs.zip]|/download/GMM_CNV_outputs.zip</googa><br />
| 460k <br />
| 6b621a6a8e279697f610db35810777ce<br />
|-<br />
|}<br />
<br />
=== Frequently Ask Questions ===<br />
<br />
<br />
<br />
'''* What are the default component the model will try to fit?'''<br />
<br />
The current implementation models deletion, copy neutral, 3 copies and more than 3 copies.<br />
<br />
<br />
'''* What happen if the model fails to fit the data ?'''<br />
<br />
The model will output this warning :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
Missing data will be set as 0. Then the model will analyse the next SNP (if any).<br />
<br />
<br />
'''* I am getting :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
What does this mean?<br />
'''<br />
The model could not find the component separation before reaching its maximal iteration limit.<br />
This can be due to noisy data, or distribution where no such separation exists.<br />
Try increasing :<br />
MAX_FUN_CALL=10000; # nb of optimization function call<br />
MAX_FUN_ITER=5000; # nb of iterations for each optimization function call<br />
But note, this can significantly increase the runtime.<br />
<br />
<br />
'''* Can I apply some extra normalization before fitting the Gaussian Mixture Model?'''<br />
<br />
Yes, by default a Loess smoothing is applied. (This step can be skipped by setting DO_LOESS_SMOOTH=0 in the shell script or setting DO_LOESS=0; in callCNVs.m).<br />
<br />
Since Gaussian Mixture Model can be sensitive to batch effects, it is strongly recommended that adequate normalization is applied before using the model. <br />
note : The loess smoothing will not correct batch effects, but will improve the signal to noise ratio within individual profile. By default, the Loess windows size is 41 SNPs. For higher density arrays (Affymetrix 6.0 or Illumina 1M) such window could be increased. <br />
<br />
<br />
'''* I am getting this error : <br />
error while loading shared libraries: libmwmclmcrrt.so:<br />
cannot open shared object file: No such file or directory<br />
what does it mean?'''<br />
<br />
Most likely your LD_LIBRARY_PATH is not pointing correctly to the MCR.<br />
The run_callCNVs.sh script should do it for you.<br />
<br />
sh run_callCNVs.sh /path-to-my-MCR/v79 test.dat</div>Armandhttp://www2.unil.ch/cbg/index.php?title=GMM&diff=1212GMM2010-01-08T16:06:19Z<p>Armand: </p>
<hr />
<div><br/><br />
Deletion, insertion and duplication events giving rise to copy number variations (CNVs) have been found genome-wide in the humans and other species.<br />
Such genomic aberrations were identified already more than a decade ago using array-based comparative hybridization. They can also be detected using <br />
data from SNP genotyping arrays, typically by combining the intensities of the two probes for a given SNP and comparing to the same SNP from other arrays (thus deriving a copy number ratio).<br />
Significant shift from the baseline (unit ratio or zero log ratio) reflects copy number changes. Such changes can be identified in many ways, for example, one can use segmentation algorithms to partition the signal then classify such segments into gain, copy neutral and loss status.<br />
Yet, for large datasets, one can take advantage of the signal distribution at each SNP, and cluster each individual from the distribution into a component that would reflect a given copy number change.<br />
<br />
We developed a Gaussian Mixture Model, which detect copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage). <br />
<br />
=== License ===<br />
<br />
The GMM algorithm is licensed under the GNU General Public License, version 2 or later. For details, see http://www.gnu.org/licenses/old-licenses/gpl-2.0.html.<br />
<br />
=== Usage ===<br />
<br />
The GMM can be applied to identify CNVs from any rectangular matrix of copy number ratio. <br />
<br />
Format is like : chr pos sample1 sample2 ...<br />
<br />
Fields should be tab-delimited and it assumes data (within chromosome) are sorted by position.<br />
<br />
An example input file is available within the GMM_CNV.zip (see Download section).<br />
<br />
<br />
For '''[http://www.mathworks.com/ Matlab] users''', download the source code and use the callCNVs.m script.<br />
<br />
'''Users without Matlab''', can use the compiled version and the Matlab Component Runtime (MCR). (Please note, we are only providing a compiled Linux x86_64 version for now).<br />
<br />
Then you can use (and edit according to your need) the shell script called "run_CallCNVs.sh" <br />
<br />
sh run_CallCNVs.sh path_to_the_MCR/v79/<br />
<br />
=== Requirements ===<br />
<br />
If you have the MATLAB software, you can directly use the source code.<br />
<br />
Otherwise, you will need to download the Matlab Component Runtime to use the executables (see Download section).<br />
<br />
=== Download ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Description <br />
! File Name<br />
! Size <br />
! md5sum<br />
|-<br />
| MCR for 64-bit Linux <br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/MCR2007_x86_64.zip|MCR2007_x86_64.zip|/download/MCR2007_x86_64.zip</googa><br />
| 224M<br />
| 451c54a811b3e01402b6a46a1b814c4d <br />
|-<br />
| Linux Executables (+ example input file)<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV.zip|GMM_CNV.zip|/download/GMM_CNV.zip</googa><br />
| 556k <br />
| bd579f39c340a50de2bb80a649643be3<br />
|-<br />
| Source code<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_SOURCE.zip|GMM_CNV_SOURCE.zip|/download/GMM_CNV_SOURCE.zip</googa><br />
| 16k <br />
| 3cb7799bf3e180b33a6742ef382b105e<br />
|-<br />
|-<br />
| Example output files<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_outputs.zip|GMM_CNV_outputs.zip]|/download/GMM_CNV_outputs.zip]</googa><br />
| 460k <br />
| 6b621a6a8e279697f610db35810777ce<br />
|-<br />
|}<br />
<br />
=== Frequently Ask Questions ===<br />
<br />
<br />
<br />
'''* What are the default component the model will try to fit?'''<br />
<br />
The current implementation models deletion, copy neutral, 3 copies and more than 3 copies.<br />
<br />
<br />
'''* What happen if the model fails to fit the data ?'''<br />
<br />
The model will output this warning :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
Missing data will be set as 0. Then the model will analyse the next SNP (if any).<br />
<br />
<br />
'''* I am getting :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
What does this mean?<br />
'''<br />
The model could not find the component separation before reaching its maximal iteration limit.<br />
This can be due to noisy data, or distribution where no such separation exists.<br />
Try increasing :<br />
MAX_FUN_CALL=10000; # nb of optimization function call<br />
MAX_FUN_ITER=5000; # nb of iterations for each optimization function call<br />
But note, this can significantly increase the runtime.<br />
<br />
<br />
'''* Can I apply some extra normalization before fitting the Gaussian Mixture Model?'''<br />
<br />
Yes, by default a Loess smoothing is applied. (This step can be skipped by setting DO_LOESS_SMOOTH=0 in the shell script or setting DO_LOESS=0; in callCNVs.m).<br />
<br />
Since Gaussian Mixture Model can be sensitive to batch effects, it is strongly recommended that adequate normalization is applied before using the model. <br />
note : The loess smoothing will not correct batch effects, but will improve the signal to noise ratio within individual profile. By default, the Loess windows size is 41 SNPs. For higher density arrays (Affymetrix 6.0 or Illumina 1M) such window could be increased. <br />
<br />
<br />
'''* I am getting this error : <br />
error while loading shared libraries: libmwmclmcrrt.so:<br />
cannot open shared object file: No such file or directory<br />
what does it mean?'''<br />
<br />
Most likely your LD_LIBRARY_PATH is not pointing correctly to the MCR.<br />
The run_callCNVs.sh script should do it for you.<br />
<br />
sh run_callCNVs.sh /path-to-my-MCR/v79 test.dat</div>Armandhttp://www2.unil.ch/cbg/index.php?title=GMM&diff=1211GMM2010-01-08T16:05:34Z<p>Armand: </p>
<hr />
<div><br/><br />
Deletion, insertion and duplication events giving rise to copy number variations (CNVs) have been found genome-wide in the humans and other species.<br />
Such genomic aberrations were identified already more than a decade ago using array-based comparative hybridization. They can also be detected using <br />
data from SNP genotyping arrays, typically by combining the intensities of the two probes for a given SNP and comparing to the same SNP from other arrays (thus deriving a copy number ratio).<br />
Significant shift from the baseline (unit ratio or zero log ratio) reflects copy number changes. Such changes can be identified in many ways, for example, one can use segmentation algorithms to partition the signal then classify such segments into gain, copy neutral and loss status.<br />
Yet, for large datasets, one can take advantage of the signal distribution at each SNP, and cluster each individual from the distribution into a component that would reflect a given copy number change.<br />
<br />
We developed a Gaussian Mixture Model, which detect copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage). <br />
<br />
=== License ===<br />
<br />
The GMM algorithm is licensed under the GNU General Public License, version 2 or later. For details, see http://www.gnu.org/licenses/old-licenses/gpl-2.0.html.<br />
<br />
=== Usage ===<br />
<br />
The GMM can be applied to identify CNVs from any rectangular matrix of copy number ratio. <br />
<br />
Format is like : chr pos sample1 sample2 ...<br />
<br />
Fields should be tab-delimited and it assumes data (within chromosome) are sorted by position.<br />
<br />
An example input file is available within the GMM_CNV.zip (see Download section).<br />
<br />
<br />
For '''[http://www.mathworks.com/ Matlab] users''', download the source code and use the callCNVs.m script.<br />
<br />
'''Users without Matlab''', can use the compiled version and the Matlab Component Runtime (MCR). (Please note, we are only providing a compiled Linux x86_64 version for now).<br />
<br />
Then you can use (and edit according to your need) the shell script called "run_CallCNVs.sh" <br />
<br />
sh run_CallCNVs.sh path_to_the_MCR/v79/<br />
<br />
=== Requirements ===<br />
<br />
If you have the MATLAB software, you can directly use the source code.<br />
<br />
Otherwise, you will need to download the Matlab Component Runtime to use the executables (see Download section).<br />
<br />
=== Download ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Description <br />
! File Name<br />
! Size <br />
! md5sum<br />
|-<br />
| MCR for 64-bit Linux <br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/MCR2007_x86_64.zip|MCR2007_x86_64.zip|/download/MCR2007_x86_64.zip</googa><br />
| 224M<br />
| 451c54a811b3e01402b6a46a1b814c4d <br />
|-<br />
| Linux Executables (+ example input file)<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV.zip|GMM_CNV.zip|/download/GMM_CNV.zip</googa><br />
| 556k <br />
| bd579f39c340a50de2bb80a649643be3<br />
|-<br />
| Source code<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_SOURCE.zip|GMM_CNV_SOURCE.zip|/download/GMM_CNV_SOURCE.zip</googa><br />
| 16k <br />
| 3cb7799bf3e180b33a6742ef382b105e<br />
|-<br />
|-<br />
| Example output files<br />
| <googa>http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_outputs.zip]|GMM_CNV_outputs.zip]|/download/GMM_CNV_outputs.zip]</googa><br />
| 460k <br />
| 6b621a6a8e279697f610db35810777ce<br />
|-<br />
|}<br />
<br />
<br />
=== Frequently Ask Questions ===<br />
<br />
<br />
<br />
'''* What are the default component the model will try to fit?'''<br />
<br />
The current implementation models deletion, copy neutral, 3 copies and more than 3 copies.<br />
<br />
<br />
'''* What happen if the model fails to fit the data ?'''<br />
<br />
The model will output this warning :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
Missing data will be set as 0. Then the model will analyse the next SNP (if any).<br />
<br />
<br />
'''* I am getting :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
What does this mean?<br />
'''<br />
The model could not find the component separation before reaching its maximal iteration limit.<br />
This can be due to noisy data, or distribution where no such separation exists.<br />
Try increasing :<br />
MAX_FUN_CALL=10000; # nb of optimization function call<br />
MAX_FUN_ITER=5000; # nb of iterations for each optimization function call<br />
But note, this can significantly increase the runtime.<br />
<br />
<br />
'''* Can I apply some extra normalization before fitting the Gaussian Mixture Model?'''<br />
<br />
Yes, by default a Loess smoothing is applied. (This step can be skipped by setting DO_LOESS_SMOOTH=0 in the shell script or setting DO_LOESS=0; in callCNVs.m).<br />
<br />
Since Gaussian Mixture Model can be sensitive to batch effects, it is strongly recommended that adequate normalization is applied before using the model. <br />
note : The loess smoothing will not correct batch effects, but will improve the signal to noise ratio within individual profile. By default, the Loess windows size is 41 SNPs. For higher density arrays (Affymetrix 6.0 or Illumina 1M) such window could be increased. <br />
<br />
<br />
'''* I am getting this error : <br />
error while loading shared libraries: libmwmclmcrrt.so:<br />
cannot open shared object file: No such file or directory<br />
what does it mean?'''<br />
<br />
Most likely your LD_LIBRARY_PATH is not pointing correctly to the MCR.<br />
The run_callCNVs.sh script should do it for you.<br />
<br />
sh run_callCNVs.sh /path-to-my-MCR/v79 test.dat</div>Armandhttp://www2.unil.ch/cbg/index.php?title=GMM&diff=1204GMM2009-12-25T10:58:22Z<p>Armand: </p>
<hr />
<div><br/><br />
Deletion, insertion and duplication events giving rise to copy number variations (CNVs) have been found genome-wide in the humans and other species.<br />
Such genomic aberrations were identified already more than a decade ago using array-based comparative hybridization. They can also be detected using <br />
data from SNP genotyping arrays, typically by combining the intensities of the two probes for a given SNP and comparing to the same SNP from other arrays (thus deriving a copy number ratio).<br />
Significant shift from the baseline (unit ratio or zero log ratio) reflects copy number changes. Such changes can be identified in many ways, for example, one can use segmentation algorithms to partition the signal then classify such segments into gain, copy neutral and loss status.<br />
Yet, for large datasets, one can take advantage of the signal distribution at each SNP, and cluster each individual from the distribution into a component that would reflect a given copy number change.<br />
<br />
We developed a Gaussian Mixture Model, which detect copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage). <br />
<br />
=== License ===<br />
<br />
The GMM algorithm is licensed under the GNU General Public License, version 2 or later. For details, see http://www.gnu.org/licenses/old-licenses/gpl-2.0.html.<br />
<br />
=== Usage ===<br />
<br />
The GMM can be applied to identify CNVs from any rectangular matrix of copy number ratio. <br />
<br />
Format is like : chr pos sample1 sample2 ...<br />
<br />
Fields should be tab-delimited and it assumes data (within chromosome) are sorted by position.<br />
<br />
An example input file is available within the GMM_CNV.zip (see Download section).<br />
<br />
<br />
For '''[http://www.mathworks.com/ Matlab] users''', download the source code and use the callCNVs.m script.<br />
<br />
'''Users without Matlab''', can use the compiled version and the Matlab Component Runtime (MCR). (Please note, we are only providing a compiled Linux x86_64 version for now).<br />
<br />
Then you can use (and edit according to your need) the shell script called "run_CallCNVs.sh" <br />
<br />
sh run_CallCNVs.sh path_to_the_MCR/v79/<br />
<br />
=== Requirements ===<br />
<br />
If you have the MATLAB software, you can directly use the source code.<br />
<br />
Otherwise, you will need to download the Matlab Component Runtime to use the executables (see Download section).<br />
<br />
=== Download ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Description <br />
! File Name<br />
! Size <br />
! md5sum<br />
|-<br />
| MCR for 64-bit Linux <br />
| [http://www.unil.ch/cbg/homepage/downloads/MCR2007_x86_64.zip MCR2007_x86_64.zip]<br />
| 224M<br />
| 451c54a811b3e01402b6a46a1b814c4d <br />
|-<br />
| Linux Executables (+ example input file)<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV.zip GMM_CNV.zip]<br />
| 556k <br />
| bd579f39c340a50de2bb80a649643be3<br />
|-<br />
| Source code<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_SOURCE.zip GMM_CNV_SOURCE.zip] <br />
| 16k <br />
| 3cb7799bf3e180b33a6742ef382b105e<br />
|-<br />
|-<br />
| Example output files<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_outputs.zip GMM_CNV_outputs.zip]<br />
| 460k <br />
| 6b621a6a8e279697f610db35810777ce<br />
|-<br />
|}<br />
<br />
<br />
=== Frequently Ask Questions ===<br />
<br />
<br />
<br />
'''* What are the default component the model will try to fit?'''<br />
<br />
The current implementation models deletion, copy neutral, 3 copies and more than 3 copies.<br />
<br />
<br />
'''* What happen if the model fails to fit the data ?'''<br />
<br />
The model will output this warning :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
Missing data will be set as 0. Then the model will analyse the next SNP (if any).<br />
<br />
<br />
'''* I am getting :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
What does this mean?<br />
'''<br />
The model could not find the component separation before reaching its maximal iteration limit.<br />
This can be due to noisy data, or distribution where no such separation exists.<br />
Try increasing :<br />
MAX_FUN_CALL=10000; # nb of optimization function call<br />
MAX_FUN_ITER=5000; # nb of iterations for each optimization function call<br />
But note, this can significantly increase the runtime.<br />
<br />
<br />
'''* Can I apply some extra normalization before fitting the Gaussian Mixture Model?'''<br />
<br />
Yes, by default a Loess smoothing is applied. (This step can be skipped by setting DO_LOESS_SMOOTH=0 in the shell script or setting DO_LOESS=0; in callCNVs.m).<br />
<br />
Since Gaussian Mixture Model can be sensitive to batch effects, it is strongly recommended that adequate normalization is applied before using the model. <br />
note : The loess smoothing will not correct batch effects, but will improve the signal to noise ratio within individual profile. By default, the Loess windows size is 41 SNPs. For higher density arrays (Affymetrix 6.0 or Illumina 1M) such window could be increased. <br />
<br />
<br />
'''* I am getting this error : <br />
error while loading shared libraries: libmwmclmcrrt.so:<br />
cannot open shared object file: No such file or directory<br />
what does it mean?'''<br />
<br />
Most likely your LD_LIBRARY_PATH is not pointing correctly to the MCR.<br />
The run_callCNVs.sh script should do it for you.<br />
<br />
sh run_callCNVs.sh /path-to-my-MCR/v79 test.dat</div>Armandhttp://www2.unil.ch/cbg/index.php?title=GMM&diff=1202GMM2009-12-23T12:02:14Z<p>Armand: </p>
<hr />
<div><br/><br />
Deletion, insertion and duplication events giving rise to copy number variations (CNVs) have been found genome-wide in the humans and other species.<br />
Such genomic aberrations were identified already more than a decade ago using array-based comparative hybridization. They can also be detected using <br />
data from SNP genotyping arrays, typically by combining the intensities of the two probes for a given SNP and comparing to the same SNP from other arrays (thus deriving a copy number ratio).<br />
Significant shift from the baseline (unit ratio or zero log ratio) reflects copy number changes. Such changes can be identified in many ways, for example, one can use segmentation algorithms to partition the signal then try to classify such segments into gain, copy neutral and loss status.<br />
Yet, for large datasets, one can take advantage of the signal distribution at each SNP, and cluster each individual from the distribution into a component that would reflect a given copy number change.<br />
<br />
We developped a Gaussian Mixture Model, which detect copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage). <br />
<br />
=== License ===<br />
<br />
The GMM algorithm is licensed under the GNU General Public License, version 2 or later. For details, see http://www.gnu.org/licenses/old-licenses/gpl-2.0.html.<br />
<br />
=== Usage ===<br />
<br />
The GMM can be applied to identify CNVs from any rectangular matrix of copy number ratio. <br />
<br />
Format is like : chr pos sample1 sample2 ...<br />
<br />
Fields should be tab-delimited and it assumes data (within chromosome) are sorted by position.<br />
<br />
An example input file is available within the GMM_CNV.zip (see Download section).<br />
<br />
<br />
For '''[http://www.mathworks.com/ Matlab] users''', download the source code and use the callCNVs.m script.<br />
<br />
'''Users without Matlab''', can use the compiled version and the Matlab Component Runtime (MCR). (Please note, we are only providing a compiled Linux x86_64 version for now).<br />
<br />
Then you can use (and edit according to your need) the shell script called "run_CallCNVs.sh" <br />
<br />
sh run_CallCNVs.sh path_to_the_MCR/v79/<br />
<br />
=== Requirements ===<br />
<br />
If you have the MATLAB software, you can directly use the source code.<br />
<br />
Otherwise, you will need to download the Matlab Component Runtime to use the executables (see Download section).<br />
<br />
=== Download ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Description <br />
! File Name<br />
! Size <br />
! md5sum<br />
|-<br />
| MCR for 64-bit Linux <br />
| [http://www.unil.ch/cbg/homepage/downloads/MCR2007_x86_64.zip MCR2007_x86_64.zip]<br />
| 224M<br />
| 451c54a811b3e01402b6a46a1b814c4d <br />
|-<br />
| Linux Executables (+ example input file)<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV.zip GMM_CNV.zip]<br />
| 556k <br />
| bd579f39c340a50de2bb80a649643be3<br />
|-<br />
| Source code<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_SOURCE.zip GMM_CNV_SOURCE.zip] <br />
| 16k <br />
| 3cb7799bf3e180b33a6742ef382b105e<br />
|-<br />
|-<br />
| Example output files<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_outputs.zip GMM_CNV_outputs.zip]<br />
| 460k <br />
| 6b621a6a8e279697f610db35810777ce<br />
|-<br />
|}<br />
<br />
<br />
=== Frequently Ask Questions ===<br />
<br />
<br />
<br />
'''* What are the default component the model will try to fit?'''<br />
<br />
The current implementation models deletion, copy neutral, 3 copies and more than 3 copies.<br />
<br />
<br />
'''* What happen if the model fails to fit the data ?'''<br />
<br />
You will model will move to the next SNP to process and you will simply get the warning :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
Missing data will be set as 0.<br />
<br />
<br />
'''* I am getting :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
What does this mean?<br />
'''<br />
The model could not find the component separation before reaching its maximal iteration limit.<br />
This can be due to noisy data, or distribution where no such separation exists.<br />
Try increasing :<br />
MAX_FUN_CALL=10000; # nb of optimization function call<br />
MAX_FUN_ITER=5000; # nb of iterations for each optimization function call<br />
But note, this can significantly increase the runtime.<br />
<br />
<br />
'''* Can I apply some extra normalization before fitting the Gaussian Mixture Model?'''<br />
<br />
Yes, by default a Loess smoothing is applied. (This step can be skipped by setting DO_LOESS_SMOOTH=0 in the shell script or setting DO_LOESS=0; in callCNVs.m).<br />
<br />
It is also recommended that adequate normalization is applied and that such normalized ratios are provided in the matrix input file.<br />
<br />
<br />
'''* I am getting this error : <br />
error while loading shared libraries: libmwmclmcrrt.so: cannot open shared object file: No such file or directory<br />
what does it mean?'''<br />
<br />
Most likely your LD_LIBRARY_PATH is not pointing correctly to the MCR.<br />
The run_[appName].sh script should do it for you.<br />
<br />
sh run_callCNVs.sh /path-to-my-MCR/v79 test.dat # i.e. for compiled distrib with Matlab 2008, build v79</div>Armandhttp://www2.unil.ch/cbg/index.php?title=GMM&diff=1201GMM2009-12-23T12:01:22Z<p>Armand: </p>
<hr />
<div><br/><br />
Deletion, insertion and duplication events giving rise to copy number variations (CNVs) have been found genome-wide in the humans and other species.<br />
Such genomic aberrations were identified already more than a decade ago using array-based comparative hybridization. They can also be detected using <br />
data from SNP genotyping arrays, typically by combining the intensities of the two probes for a given SNP and comparing to the same SNP from other arrays (thus deriving a copy number ratio).<br />
Significant shift from the baseline (unit ratio or zero log ratio) reflects copy number changes. Such changes can be identified in many ways, for example, one can use segmentation algorithms to partition the signal then try to classify such segments into gain, copy neutral and loss status.<br />
Yet, for large datasets, one can take advantage of the signal distribution at each SNP, and cluster each individual from the distribution into a component that would reflect a given copy number change.<br />
<br />
We developped a Gaussian Mixture Model, which detect copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage). <br />
<br />
=== License ===<br />
<br />
The GMM algorithm is licensed under the GNU General Public License, version 2 or later. For details, see http://www.gnu.org/licenses/old-licenses/gpl-2.0.html.<br />
<br />
=== Usage ===<br />
<br />
The GMM can be applied to identify CNVs from any rectangular matrix of copy number ratio. <br />
<br />
Format is like : chr pos sample1 sample2 ...<br />
<br />
Fields should be tab-delimited and it assumes data (within chromosome) are sorted by position.<br />
<br />
An example input file is available within the GMM_CNV.zip (see Download section).<br />
<br />
<br />
For '''[http://www.mathworks.com/ Matlab] users''', download the source code and use the callCNVs.m script.<br />
<br />
'''Users without Matlab''', can use the compiled version and the Matlab Component Runtime (MCR). (Please note, we are only providing a compiled Linux x86_64 version for now).<br />
<br />
Then you can use (and edit according to your need) the shell script called "run_CallCNVs.sh" <br />
<br />
sh run_CallCNVs.sh path_to_the_MCR/v79/<br />
<br />
=== Requirements ===<br />
<br />
If you have the MATLAB software, you can directly use the source code.<br />
<br />
Otherwise, you will need to download the Matlab Component Runtime to use the executables (see Download section).<br />
<br />
=== Download ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Description <br />
! File Name<br />
! Size <br />
! md5sum<br />
|-<br />
| MCR for 64-bit Linux <br />
| [http://www.unil.ch/cbg/homepage/downloads/MCR2007_x86_64.zip MCR2007_x86_64.zip]<br />
| 224M<br />
| 451c54a811b3e01402b6a46a1b814c4d <br />
|-<br />
| Linux Executables (+ example input file)<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV.zip GMM_CNV.zip]<br />
| 556k <br />
| bd579f39c340a50de2bb80a649643be3<br />
|-<br />
| Source code<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_SOURCE.zip GMM_CNV_SOURCE.zip] <br />
| 16k <br />
| 3cb7799bf3e180b33a6742ef382b105e<br />
|-<br />
|-<br />
| Example output files<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_outputs.zip GMM_CNV_outputs.zip]<br />
| 460k <br />
| 6b621a6a8e279697f610db35810777ce<br />
|-<br />
|}<br />
<br />
=== Frequently Ask Questions ===<br />
<br />
<br />
<br />
'''* What are the default component the model will try to fit?'''<br />
<br />
The current implementation models deletion, copy neutral, 3 copies and more than 3 copies.<br />
<br />
<br />
'''* What happen if the model fails to fit the data ?'''<br />
<br />
You will model will move to the next SNP to process and you will simply get the warning :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
Missing data will be set as 0.<br />
<br />
<br />
'''* I am getting :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
What does this mean?<br />
'''<br />
The model could not find the component separation before reaching its maximal iteration limit.<br />
This can be due to noisy data, or distribution where no such separation exists.<br />
Try increasing :<br />
MAX_FUN_CALL=10000; # nb of optimization function call<br />
MAX_FUN_ITER=5000; # nb of iterations for each optimization function call<br />
But note, this can significantly increase the runtime.<br />
<br />
<br />
'''* Can I apply some extra normalization before fitting the Gaussian Mixture Model?'''<br />
<br />
Yes, by default a Loess smoothing is applied. (This step can be skipped by setting DO_LOESS_SMOOTH=0 in the shell script or setting DO_LOESS=0; in callCNVs.m).<br />
<br />
It is also recommended that adequate normalization is applied and that such normalized ratios are provided in the matrix input file.<br />
<br />
<br />
'''* I am getting this error : <br />
error while loading shared libraries: libmwmclmcrrt.so: cannot open shared object file: No such file or directory<br />
what does it mean?'''<br />
<br />
Most likely your LD_LIBRARY_PATH is not pointing correctly to the MCR.<br />
The run_[appName].sh script should do it for you.<br />
<br />
sh run_callCNVs.sh /path-to-my-MCR/v79 test.dat # i.e. for compiled distrib with Matlab 2008, build v79</div>Armandhttp://www2.unil.ch/cbg/index.php?title=GMM&diff=1200GMM2009-12-23T12:00:16Z<p>Armand: </p>
<hr />
<div><br/><br />
Deletion, insertion and duplication events giving rise to copy number variations (CNVs) have been found genome-wide in the humans and other species.<br />
Such genomic aberrations were identified already more than a decade ago using array-based comparative hybridization. They can also be detected using <br />
data from SNP genotyping arrays, typically by combining the intensities of the two probes for a given SNP and comparing to the same SNP from other arrays (thus deriving a copy number ratio).<br />
Significant shift from the baseline (unit ratio or zero log ratio) reflects copy number changes. Such changes can be identified in many ways, for example, one can use segmentation algorithms to partition the signal then try to classify such segments into gain, copy neutral and loss status.<br />
Yet, for large datasets, one can take advantage of the signal distribution at each SNP, and cluster each individual from the distribution into a component that would reflect a given copy number change.<br />
<br />
We developped a Gaussian Mixture Model, which detect copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage). <br />
<br />
=== License ===<br />
<br />
The GMM algorithm is licensed under the GNU General Public License, version 2 or later. For details, see http://www.gnu.org/licenses/old-licenses/gpl-2.0.html.<br />
<br />
=== Usage ===<br />
<br />
The GMM can be applied to identify CNVs from any rectangular matrix of copy number ratio. <br />
<br />
Format is like : chr pos sample1 sample2 ...<br />
<br />
Fields should be tab-delimited and it assumes data (within chromosome) are sorted by position.<br />
<br />
An example input file is available within the GMM_CNV.zip (see Download section).<br />
<br />
<br />
For[[ [http://www.mathworks.com/ Matlab] users]], download the source code and use the callCNVs.m script.<br />
<br />
[[Users without Matlab]], can use the compiled version and the Matlab Component Runtime (MCR). (Please note, we are only providing a compiled Linux x86_64 version for now).<br />
<br />
Then you can use (and edit according to your need) the shell script called "run_CallCNVs.sh" <br />
<br />
sh run_CallCNVs.sh path_to_the_MCR/v79/<br />
<br />
=== Requirements ===<br />
<br />
If you have the MATLAB software, you can directly use the source code.<br />
<br />
Otherwise, you will need to download the Matlab Component Runtime to use the executables (see Download section).<br />
<br />
=== Download ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Description <br />
! File Name<br />
! Size <br />
! md5sum<br />
|-<br />
| MCR for 64-bit Linux <br />
| [http://www.unil.ch/cbg/homepage/downloads/MCR2007_x86_64.zip MCR2007_x86_64.zip]<br />
| 224M<br />
| 451c54a811b3e01402b6a46a1b814c4d <br />
|-<br />
| Linux Executables (+ example input file)<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV.zip GMM_CNV.zip]<br />
| 556k <br />
| bd579f39c340a50de2bb80a649643be3<br />
|-<br />
| Source code<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_SOURCE.zip GMM_CNV_SOURCE.zip] <br />
| 16k <br />
| 3cb7799bf3e180b33a6742ef382b105e<br />
|-<br />
|-<br />
| Example output files<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_outputs.zip GMM_CNV_outputs.zip]<br />
| 460k <br />
| 6b621a6a8e279697f610db35810777ce<br />
|-<br />
|}<br />
<br />
=== Frequently Ask Questions ===<br />
<br />
<br />
<br />
'''* What are the default component the model will try to fit?'''<br />
<br />
The current implementation models deletion, copy neutral, 3 copies and more than 3 copies.<br />
<br />
<br />
'''* What happen if the model fails to fit the data ?'''<br />
<br />
You will model will move to the next SNP to process and you will simply get the warning :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
Missing data will be set as 0.<br />
<br />
<br />
'''* I am getting :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
What does this mean?<br />
'''<br />
The model could not find the component separation before reaching its maximal iteration limit.<br />
This can be due to noisy data, or distribution where no such separation exists.<br />
Try increasing :<br />
MAX_FUN_CALL=10000; # nb of optimization function call<br />
MAX_FUN_ITER=5000; # nb of iterations for each optimization function call<br />
But note, this can significantly increase the runtime.<br />
<br />
<br />
'''* Can I apply some extra normalization before fitting the Gaussian Mixture Model?'''<br />
<br />
Yes, by default a Loess smoothing is applied. (This step can be skipped by setting DO_LOESS_SMOOTH=0 in the shell script or setting DO_LOESS=0; in callCNVs.m).<br />
<br />
It is also recommended that adequate normalization is applied and that such normalized ratios are provided in the matrix input file.<br />
<br />
<br />
'''* I am getting this error : <br />
error while loading shared libraries: libmwmclmcrrt.so: cannot open shared object file: No such file or directory<br />
what does it mean?'''<br />
<br />
Most likely your LD_LIBRARY_PATH is not pointing correctly to the MCR.<br />
The run_[appName].sh script should do it for you.<br />
<br />
sh run_callCNVs.sh /path-to-my-MCR/v79 test.dat # i.e. for compiled distrib with Matlab 2008, build v79</div>Armandhttp://www2.unil.ch/cbg/index.php?title=GMM&diff=1199GMM2009-12-23T11:57:01Z<p>Armand: </p>
<hr />
<div><br/><br />
Deletion, insertion and duplication events giving rise to copy number variations (CNVs) have been found genome-wide in the humans and other species.<br />
Such genomic aberrations were identified already more than a decade ago using array-based comparative hybridization. They can also be detected using <br />
data from SNP genotyping arrays, typically by combining the intensities of the two probes for a given SNP and comparing to the same SNP from other arrays (thus deriving a copy number ratio).<br />
Significant shift from the baseline (unit ratio or zero log ratio) reflects copy number changes. Such changes can be identified in many ways, for example, one can use segmentation algorithms to partition the signal then try to classify such segments into gain, copy neutral and loss status.<br />
Yet, for large datasets, one can take advantage of the signal distribution at each SNP, and cluster each individual from the distribution into a component that would reflect a given copy number change.<br />
<br />
We developped a Gaussian Mixture Model, which detect copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage). <br />
<br />
<br />
=== License ===<br />
<br />
The GMM algorithm is licensed under the GNU General Public License, version 2 or later. For details, see http://www.gnu.org/licenses/old-licenses/gpl-2.0.html.<br />
<br />
<br />
=== Usage ===<br />
<br />
The GMM can be applied to identify CNVs from any rectangular matrix of copy number ratio. <br />
<br />
Format is like : chr pos sample1 sample2 ...<br />
<br />
Fields should be tab-delimited and it assumes data (within chromosome) are sorted by position.<br />
<br />
An example input file is available within the GMM_CNV.zip (see Download section).<br />
<br />
<br />
For [http://www.mathworks.com/ Matlab] users, download the source code and use the callCNVs.m script.<br />
<br />
Users without Matlab, can use the compiled version and the Matlab Component Runtime (MCR). (Please note, we are only providing a compiled Linux x86_64 version for now).<br />
<br />
Then you can use (and edit according to your need) the shell script called "run_CallCNVs.sh" <br />
<br />
sh run_CallCNVs.sh path_to_the_MCR/v79/<br />
<br />
=== Requirements ===<br />
<br />
If you have the MATLAB software, you can directly use the source code.<br />
<br />
Otherwise, you will need to download the Matlab Component Runtime to use the executables (see Download section).<br />
<br />
<br />
=== Download ===<br />
<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Description <br />
! File Name<br />
! Size <br />
! md5sum<br />
|-<br />
| MCR for 64-bit Linux <br />
| [http://www.unil.ch/cbg/homepage/downloads/MCR2007_x86_64.zip MCR2007_x86_64.zip]<br />
| 224M<br />
| 451c54a811b3e01402b6a46a1b814c4d <br />
|-<br />
| Linux Executables (+ example input file)<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV.zip GMM_CNV.zip]<br />
| 556k <br />
| bd579f39c340a50de2bb80a649643be3<br />
|-<br />
| Source code<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_SOURCE.zip GMM_CNV_SOURCE.zip] <br />
| 16k <br />
| 3cb7799bf3e180b33a6742ef382b105e<br />
|-<br />
|-<br />
| Example output files<br />
| [http://www.unil.ch/cbg/homepage/downloads/GMM_CNV_outputs.zip GMM_CNV_outputs.zip]<br />
| 460k <br />
| 6b621a6a8e279697f610db35810777ce<br />
|-<br />
|}<br />
<br />
<br />
=== Frequently Ask Questions ===<br />
<br />
* What are the default component the model will try to fit?<br />
<br />
The current implementation models deletion, copy neutral, 3 copies and more than 3 copies.<br />
<br />
<br />
* What happen if the model fails to fit the data ?<br />
<br />
You will model will move to the next SNP to process and you will simply get the warning :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
Missing data will be set as 0.<br />
<br />
<br />
* I am getting :<br />
Exiting: Maximum number of iterations has been exceeded - increase MaxIter option.<br />
What does this mean?<br />
<br />
The model could not find the component separation before reaching its maximal iteration limit.<br />
This can be due to noisy data, or distribution where no such separation exists.<br />
Try increasing :<br />
MAX_FUN_CALL=10000; # nb of optimization function call<br />
MAX_FUN_ITER=5000; # nb of iterations for each optimization function call<br />
But note, this can significantly increase the runtime.<br />
<br />
<br />
* Can I apply some extra normalization before fitting the Gaussian Mixture Model?<br />
<br />
Yes, by default a Loess smoothing is applied. (This step can be skipped by setting DO_LOESS_SMOOTH=0 in the shell script or setting DO_LOESS=0; in callCNVs.m).<br />
<br />
It is also recommended that adequate normalization is applied and that such normalized ratios are provided in the matrix input file.<br />
<br />
<br />
* I am getting this error : <br />
error while loading shared libraries: libmwmclmcrrt.so: cannot open shared object file: No such file or directory<br />
what does it mean?<br />
<br />
Most likely your LD_LIBRARY_PATH is not pointing correctly to the MCR.<br />
The run_[appName].sh script should do it for you.<br />
<br />
sh run_callCNVs.sh /path-to-my-MCR/v79 test.dat # i.e. for compiled distrib with Matlab 2008, build v79</div>Armandhttp://www2.unil.ch/cbg/index.php?title=File:GMM_CNV.zip&diff=1192File:GMM CNV.zip2009-12-21T15:48:49Z<p>Armand: </p>
<hr />
<div>Compiled Gaussian Mixture Model for CNV calling</div>Armandhttp://www2.unil.ch/cbg/index.php?title=GMM&diff=1191GMM2009-12-18T18:13:18Z<p>Armand: </p>
<hr />
<div><br/><br />
Deletion, insertion and duplication events giving rise to copy number variations (CNVs) have been found genome-wide in the humans and other species.<br />
Such genomic aberrations were identified already more than a decade ago using array-based comparative hybridization. They can also be detected using <br />
data from SNP genotyping arrays, typically by combining the intensities of the two probes for a given SNP and comparing to the same SNP from other arrays (thus deriving a copy number ratio).<br />
Significant shift from the baseline (unit ratio or zero log ratio) reflects copy number changes. Such changes can be identified in many ways, for example, one can use segmentation algorithms to partition the signal then try to classify such segments into gain, copy neutral and loss status.<br />
Yet, for large datasets, one can take advantage of the signal distribution at each SNP, and cluster each individual from the distribution into a component that would reflect a given copy number change.<br />
<br />
We developped a Gaussian Mixture Model, which detect copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage). <br />
<br />
<br />
=== License ===<br />
<br />
The GMM algorithm is licensed under the GNU General Public License, version 2 or later. For details, see http://www.gnu.org/licenses/old-licenses/gpl-2.0.html.<br />
<br />
<br />
=== Usage ===<br />
<br />
The GMM can be applied to identify CNVs from any rectangular matrix of copy number ratio. <br />
<br />
<br />
=== Requirements ===<br />
<br />
If you have the MATLAB software, you can directly use the source code.<br />
<br />
Otherwise, you will need to download the Matlab Component Runtime to use the executables (see Download section).<br />
<br />
<br />
=== Download ===<br />
<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Description <br />
! File Name<br />
! Size <br />
! md5sum<br />
|-<br />
| MCR for 64-bit Linux <br />
| MCR2007_x86_64.zip[http://lausanne.isb-sib.ch/~avalsesi/software/MCR2007_x86_64.zip]<br />
| 224M<br />
| 451c54a811b3e01402b6a46a1b814c4d <br />
|-<br />
| Linux Executables<br />
| GMM_CNV.zip[http://lausanne.isb-sib.ch/~avalsesi/software/GMM_CNV.zip]<br />
| 556k <br />
| bd579f39c340a50de2bb80a649643be3<br />
|-<br />
| Source code<br />
| GMM_CNV_SOURCE.zip[http://lausanne.isb-sib.ch/~avalsesi/software/GMM_CNV_SOURCE.zip] <br />
| 16k <br />
| 3cb7799bf3e180b33a6742ef382b105e<br />
|-<br />
|-<br />
| Example output files<br />
| GMM_CNV_outputs.zip[http://lausanne.isb-sib.ch/~avalsesi/software/GMM_CNV_outputs.zip]<br />
| 460k <br />
| 6b621a6a8e279697f610db35810777ce<br />
|-<br />
|}</div>Armandhttp://www2.unil.ch/cbg/index.php?title=UNIX_recipes&diff=1081UNIX recipes2009-11-10T09:46:18Z<p>Armand: </p>
<hr />
<div>Armand's unix memo<br />
<br />
= Disc checks =<br />
<br />
How to find which partition is which ?<br />
sudo fdisk -l /dev/hda<br />
<br />
How to check a disk<br />
sudo fsck /dev/sda1<br />
<br />
How to find the bad blocks<br />
sudo badblocks /dev/sda1<br />
<br />
How to reformat a disc ignoring bad blocks<br />
sudo mke2fs -c /dev/sda1<br />
<br />
= System monitoring =<br />
<br />
list processes<br />
top<br />
list logged on users, and what they are doing<br />
w<br />
shows processes<br />
ps<br />
list open files, by process <br />
lsof<br />
<br />
<br />
= Running scripts =<br />
<br />
How to execute a given command for many different parameters stored in a file<br />
cat paramList.txt | xargs mycommand<br />
<br />
<br />
= SSH =<br />
<br />
How to generate public key<br />
ssh-keygen -f dsa<br />
<br />
How to display variables for sshagent<br />
ssh-agent<br />
<br />
How to create a passphrase<br />
ssh-add<br />
<br />
How to authorize ssh connection to a remote machine using the generated key<br />
cat .ssh/id_dsa.pub | ssh machine_name "cat - >> .ssh/authorized_keys"<br />
<br />
<br />
= File manipulation =<br />
<br />
How to print a section of a file<br />
awk 'NR >= mystart && NR <= myend' myfile<br />
<br />
How to count #of columns per line<br />
awk '{ print NF }' myfile<br />
<br />
Print a given column (i.e. 2nd one)<br />
awk '{ print $2 }' myfile<br />
cut -f2 myfile<br />
<br />
Pasting 2 files together by their columns<br />
paste file1 file2<br />
<br />
Joining 2 files by a common column (ie 1st column of file1 contains some common identifiers than the 3rd column of file2)<br />
join -1 1 -2 3 file1 file2<br />
<br />
Sorting numerically a file by its 3rd column<br />
sort -n +2 myfile<br />
Sorting numerically a file by its 2nd column then 1st and then 3rd<br />
sort -n -k 2,1,3 myfile<br />
<br />
Splitting a file into smaller files with a fixed number of lines (i.e. 100)<br />
split -l 100 myfile<br />
Remove 10 first line of a file<br />
sed '1,10d' myfile<br />
<br />
Checking file type :<br />
file myfile<br />
<br />
Converting dos "end-like" file to unix<br />
perl -p -e 's/\r$//' < myfile > mynewfile<br />
<br />
Checking ascii content of a file :<br />
od -c myfile | more</div>Armandhttp://www2.unil.ch/cbg/index.php?title=User:Armand&diff=876User:Armand2009-09-17T16:47:51Z<p>Armand: </p>
<hr />
<div>Hi,<br />
<br />
My name is Armand and I am a joint PhD student with Pr. Bergmann and Pr. Jongeneel.<br />
<br />
My main interests are in detecting Copy Number Variation from micro-arrays (SNP arrays and CGH) and how such variation relates with the phenotype.<br />
<br />
Some other projects I am also involved, include :<br />
<br />
* studying the evolution and polymorphisms of some cancer-related genes.<br />
<br />
* how to store structural variants in databases and how to visualize them.<br />
<br />
<br />
<br />
<br />
----<br />
<br />
== Contact ==<br />
Armand Valsesia<br />
Ludwig Institute for Cancer Research<br />
Bâtiment Génopode, UNIL<br />
1015 Lausanne, Switzerland<br />
e-mail: Armand.Valsesia AT licr.org<br />
<br />
----<br />
<br />
== Some usefull links ==<br />
<br />
* Running jobs on [[Vital-IT]]<br />
<br />
* A nice listing of software for Ultra High Throughput Sequencing Data ([[UHTS]])<br />
<br />
* some usefull unix commands ([[UNIX_recipes]])<br />
<br />
* how to package your matlab code ([[Packaging_matlab_to_standalone]])<br />
<br />
* how to deploy a VNC server ([[Vnc]])</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Vnc&diff=822Vnc2009-08-28T12:01:15Z<p>Armand: </p>
<hr />
<div>Deploying a VNC server & client<br />
<br />
VNC provides remote control software which lets you see and interact with desktop applications across any network.<br />
It can be downloaded from [http://www.realvnc.com/ here].<br />
<br />
Installation is easy,<br />
just extract the Linux server tarball.<br />
You will need to ensure you can communicate to the vncserver from the outside world. (configure your iptables)<br />
<br />
<br />
To X applications, a VNC server appears just like the standard X display you sit in front of, but without a physical screen attached. The applications don't know this, they just carry on running whether or not a viewer is connected. You can start a new VNC server on a Unix machine by typing:<br />
<br />
vncserver<br />
<br />
If you haven't run a VNC server before you will be prompted for a password, which you will need to use when connecting to this server. All your servers on the same Unix machine will use the same password, and you can change it at a later date using<br />
<br />
vncpasswd<br />
<br />
With a normal X system, the main X display of a workstation called ’snoopy’ is usually snoopy:0. You can also run as many VNC servers on a Unix machine as you like, and they will appear as snoopy:1, snoopy:2 etc, as if they were just additional displays. Normally vncserver will choose the first available display number and tell you what it is, but you can specify a display number if you always wish to use the same one:<br />
<br />
vncserver :2<br />
<br />
You can cause applications to use a VNC server rather than the normal X display them by setting the DISPLAY environment variable to the VNC server you want, or by starting the application with the -display option.<br />
For example:<br />
<br />
xterm -display snoopy:2 &<br />
<br />
You can kill a Unix VNC server using, for example:<br />
<br />
vncserver -kill :2<br />
<br />
You can specify your desktop resolution with:<br />
<br />
vncserver -geometry 1700x1200<br />
<br />
To start a session manager at each login, you modify your .vnc/xstartup<br />
like :<br />
<br />
me@somwhere: more .vnc/xstartup<br />
#!/bin/sh<br />
<br />
# Uncomment the following two lines for normal desktop:<br />
# unset SESSION_MANAGER<br />
# exec /etc/X11/xinit/xinitrc<br />
<br />
xset fp= catalogue:/etc/X11/fontpath.d,built-ins,/home/me/.fonts<br />
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources<br />
xsetroot -solid black<br />
vncconfig -iconic &<br />
exec startkde</div>Armandhttp://www2.unil.ch/cbg/index.php?title=User:Armand&diff=821User:Armand2009-08-28T11:59:50Z<p>Armand: </p>
<hr />
<div>Hi,<br />
<br />
My name is Armand and I am a joint PhD student with Pr. Bergmann and Pr. Jongeneel.<br />
<br />
My main interests are in detecting Copy Number Variation from micro-arrays (SNP arrays and CGH) and how such variation relates with the phenotype.<br />
<br />
Some other projects I am also involved, include :<br />
<br />
* studying the evolution and polymorphisms of some cancer-related genes.<br />
<br />
* how to store structural variants in databases and how to visualize them.<br />
<br />
<br />
<br />
<br />
----<br />
<br />
== Contact ==<br />
Armand Valsesia<br />
Ludwig Institute for Cancer Research<br />
Bâtiment Génopode, UNIL<br />
1015 Lausanne, Switzerland<br />
Phone: + 41 21 692 40 66<br />
Fax: + 41 21 692 40 65<br />
e-mail: Armand.Valsesia AT licr.org<br />
<br />
----<br />
<br />
== Some usefull links ==<br />
<br />
* Running jobs on [[Vital-IT]]<br />
<br />
* A nice listing of software for Ultra High Throughput Sequencing Data ([[UHTS]])<br />
<br />
* some usefull unix commands ([[UNIX_recipes]])<br />
<br />
* how to package your matlab code ([[Packaging_matlab_to_standalone]])<br />
<br />
* how to deploy a VNC server ([[Vnc]])</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Vnc&diff=820Vnc2009-08-28T11:59:09Z<p>Armand: </p>
<hr />
<div>Deploying a VNC server & client<br />
<br />
VNC provides remote control software which lets you see and interact with desktop applications across any network.<br />
It can be downloaded from [http://www.realvnc.com/ here].<br />
<br />
Installation is easy,<br />
just extract the Linux server tarball.<br />
You will need to ensure you can communicate to the vncserver from the outside world. (configure your iptables)<br />
<br />
<br />
To X applications, a VNC server appears just like the standard X display you sit in front of, but without a physical screen attached. The applications don't know this, they just carry on running whether or not a viewer is connected. You can start a new VNC server on a Unix machine by typing:<br />
<br />
vncserver<br />
<br />
If you haven't run a VNC server before you will be prompted for a password, which you will need to use when connecting to this server. All your servers on the same Unix machine will use the same password, and you can change it at a later date using<br />
<br />
vncpasswd<br />
<br />
With a normal X system, the main X display of a workstation called ’snoopy’ is usually snoopy:0. You can also run as many VNC servers on a Unix machine as you like, and they will appear as snoopy:1, snoopy:2 etc, as if they were just additional displays. Normally vncserver will choose the first available display number and tell you what it is, but you can specify a display number if you always wish to use the same one:<br />
<br />
vncserver :2<br />
<br />
You can cause applications to use a VNC server rather than the normal X display them by setting the DISPLAY environment variable to the VNC server you want, or by starting the application with the -display option.<br />
For example:<br />
<br />
xterm -display snoopy:2 &<br />
<br />
You can kill a Unix VNC server using, for example:<br />
<br />
vncserver -kill :2<br />
<br />
To start a session manager at each login, you modify your .vnc/xstartup<br />
like :<br />
<br />
me@somwhere: more .vnc/xstartup<br />
#!/bin/sh<br />
<br />
# Uncomment the following two lines for normal desktop:<br />
# unset SESSION_MANAGER<br />
# exec /etc/X11/xinit/xinitrc<br />
<br />
xset fp= catalogue:/etc/X11/fontpath.d,built-ins,/home/me/.fonts<br />
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources<br />
xsetroot -solid black<br />
vncconfig -iconic &<br />
exec startkde</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Group_Meeting&diff=805Group Meeting2009-08-19T15:31:45Z<p>Armand: </p>
<hr />
<div>Group Meeting is every Thursday, from 9.30am-11am, in the small meeting room. We also have a [[Journal Club]].<br />
<br />
== 5th February 2009 ==<br />
<br />
Micha<br />
<br />
== 12th February 2009 ==<br />
<br />
We'll have informal "mini-progress-updates" by people involved in GWAS: Karen, Toby, Diana & Zoltan (say ~10' each).<br />
This meeting will be at 13:00 instead of the journal club.<br />
<br />
== 19th February 2009 ==<br />
<br />
Barbara<br />
<br />
== 26th February 2009 ==<br />
<br />
No meeting. Instead, a meeting on:<br />
<br />
== Monday 2nd March 2009 ==<br />
<br />
Diana, @ 2 p.m<br />
<br />
== Friday 6th March 2009 (instead of Thursday 5th)==<br />
<br />
Sascha<br />
<br />
== 12th March 2009 ==<br />
<br />
Bastian<br />
<br />
== 19th March 2009 ==<br />
<br />
Zoltan<br />
<br />
== 26th March 2009 ==<br />
<br />
Karen<br />
<br />
== 2nd April 2009 ==<br />
<br />
Armand<br />
<br />
== 30th April 2009 ==<br />
<br />
Toby<br />
<br />
== 14th May 2009 ==<br />
<br />
TBA<br />
<br />
== 21st May 2009 ==<br />
<br />
TBA <br />
<br />
== 28th May 2009 ==<br />
<br />
Aitana<br />
<br />
== 20th August 2009 ==<br />
<br />
updates from Karen, Bastian, Armand and Gabor</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Journal_Club_(spring_2012)&diff=804Journal Club (spring 2012)2009-08-19T12:12:50Z<p>Armand: </p>
<hr />
<div>Journal Club is every Thursday, from 1-2pm, in the small meeting room. Feel free to bring your lunch. We also have a [[Group Meeting]]. <br />
<br />
Ideally, someone from the group should volunteer to choose a paper for each meeting, and should update this page and email the paper around on the '''Friday the week before the meeting'''. If a volunteer is not forthcoming, Micha will encourage someone to volunteer. Alternatively, people can also give '''tutorials on any scientific topic''' that may be of interest to other members of the group. The [[tutorial marketplace]] is the place to set and see the supply and demand for tutorials.<br />
<br />
== 5th February 2009 ==<br />
<br />
Toby will present:<br />
A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and<br />
Implications for Functional Genomics, by Juliane Sch&auml;fer and Korbinian Strimmer (2005)<br />
''Statistical Applications in Genetics and Molecular Biology''<br />
'''4''':1 Article 32.<br />
[http://dx.doi.org/10.2202/1544-6115.1175 doi:10.2202/1544-6115.1175]<br />
[http://www.bepress.com/sagmb/vol4/iss1/art32 link to paper]<br />
<br />
== 12th February 2009 ==<br />
<br />
G&aacute;bor will present:<br />
<biblio><br />
#wagner pmid=16087882<br />
</biblio><br />
<br />
<br />
[exceptionally at 9:30-10:30]<br />
<br />
== 19th February 2009 ==<br />
<br />
Zolt&aacute;n will present: <br />
The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing, by John D. Storey<br />
(2007) ''J. R. Statist. Soc. B'' '''69''':3 pp.347-368.<br />
[http://dx.doi.org/10.1111/j.1467-9868.2007.005592.x doi:10.1111/j.1467-9868.2007.005592.x]<br />
[http://www3.interscience.wiley.com/journal/118490765/abstract link to paper]<br />
<br />
== 26th February 2009 ==<br />
Bastian will present : A "Silent" Polymorphism in the MDR1 Gene Changes Substrate Specificity,DOI: 10.1126/science.1135308, Science 315, 525 (2007); Chava Kimchi-Sarfaty, et al.<br />
[http://www.sciencemag.org/cgi/content/full/315/5811/525 link to paper]<br />
<br />
This meeting is at 9:30 am instead of the usual time.<br />
<br />
== 5th March 2009 ==<br />
'''Date and time change: Wednesday 3pm-4pm'''<br />
<br />
Karen will present: <br />
Multiple Hypothesis Testing in Microarray Experiments by Sandrine Dudoit, Juliet Popper Shaffer and Jennifer C. Boldrick<br />
Statistical Science, Vol. 18, No. 1 (Feb., 2003), pp. 71-103. <br />
[http://www.jstor.org/stable/3182872 link to paper]<br />
<br />
== 12th March 2009 ==<br />
Aitana will present: <br />
<biblio><br />
#kashtan pmid=17698964<br />
</biblio><br />
<br />
== 19th March 2009 ==<br />
'''Time and room change: 2pm-3pm 1st floor conference room'''<br />
<br />
<br />
Diana is presenting: <br />
<br />
'''Drug—target network'''<br />
Muhammed A Yıldırım, Kwang-Il Goh, Michael E Cusick, Albert-László Barabási & Marc Vidal<br />
<br />
Nature Biotechnology 25, 1119 - 1126 (2007)<br />
Published online: 5 October 2007 | doi:10.1038/nbt1338<br />
http://www.nature.com/nbt/journal/v25/n10/abs/nbt1338.html<br />
<br />
== 26th March 2009 ==<br />
'''Time change: 2pm-3pm'''<br />
<br />
Micha will present the following paper:<br />
<biblio><br />
#millar pmid=16729048<br />
</biblio><br />
http://www.nature.com/msb/journal/v1/n1/synopsis/msb4100018.html<br />
<br />
== 2nd April 2009 ==<br />
'''1pm-2pm'''<br />
<br />
Sascha will present :<br />
<br />
Molecular Systems Biology 4 Article number: 176 <br />
<br />
doi:10.1038/msb.2008.14<br />
<br />
Theoretical and experimental approaches to understand morphogen gradients<br />
<br />
Marta Ibañes1 & Juan Carlos Izpisúa Belmonte<br />
<br />
http://www.nature.com/msb/journal/v4/n1/full/msb200814.html<br />
<br />
== 14th May 2009 ==<br />
<br />
Armand will present<br />
<br />
Accurate whole human genome sequencing using reversible terminator chemistry<br />
Nature 456, 53-59 (6 November 2008) | doi:10.1038/nature07517<br />
<br />
[http://www.nature.com/nature/journal/v456/n7218/full/nature07517.html link to paper]<br />
<br />
== 18th June 2009 ==<br />
No journal club<br />
<br />
== 25th June 2009 ==<br />
<br />
Gabor's turn. Was cancelled because of a workshop.<br />
<br />
== 2nd July 2009 ==<br />
Micha will present<br />
<br />
== 9th July 2009 ==<br />
<br />
Gabor will present:<br />
<br />
Note on the presidential election in Iran, June 2009<br />
<br />
http://www-personal.umich.edu/~wmebane/note18jun2009.pdf<br />
<br />
== 16th July 2009 ==<br />
<br />
Zoltan will present: Introduction to Measure Theory<br />
<br />
== 23th July 2009 ==<br />
<br />
no journal club<br />
<br />
== 30th July 2009 ==<br />
<br />
no journal club<br />
<br />
== 6th August 2009 ==<br />
<br />
no journal club<br />
<br />
== 13th August 2009 ==<br />
Sven: "Smooth Connection Functions" [http://www.iop.org/EJ/abstract/0305-4470/35/17/305/]<br />
... and the most important theorem in Physics (in Sven's view)!<br />
<br />
== 20th August 2009 ==<br />
<br />
<br />
== 27th August 2009 ==<br />
<br />
<br />
== 3rd September 2009 ==<br />
<br />
Aitana will present: The French Flag under attack (or some other topic)<br />
<br />
== 10th September 2009 ==<br />
<br />
Diana will present<br />
<br />
<br />
== 17th September 2009 ==<br />
<br />
Armand will present Perl5 Best Practice and Perl6 Overview</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Welcome_to_the_Computational_Biology_Group!&diff=798Welcome to the Computational Biology Group!2009-08-11T13:24:27Z<p>Armand: </p>
<hr />
<div>Welcome to the [http://serverdgm.unil.ch/bergmann/ CBG] Wiki!<br />
<br />
Creating new pages and editing existing content in the Wiki are restricted to CBG members, but by default the pages are world-readable. [mailto:wwwcbg@unil.ch Drop an email to the admin] if you want an account.<br />
<br />
If you are a [http://serverdgm.unil.ch/bergmann/ CBG] member (and have an account) you can find more information on this wiki by clicking on the [[Help:Contents|Help]] link that is in the menu on the left.<br />
<br />
<br />
<br />
== What's in this wiki: ==<br />
<br />
* Research<br />
** [[Robustness in Drosophila embryo patterning]]<br />
** [[WingX: Systems Biology of the Drosophila Wing]]<br />
** [[Genome Wide Association Studies]]<br />
<br />
* Teaching<br />
** [[UNIL MSc course: "Genes: from sequence to function 2009"]]<br />
** [[UNIL BSc course: "Solving Biological Problems that require Math"]]<br />
** [[UNIL PhD literature seminar: "Systematic interpretation of genetic interactions using protein networks"]]<br />
** [[UNIL MSc course: "Cartographie, séquençage et structure des génomes 2008"]]<br />
** [[Summer school course: "Biologie und Medizin im digitalen Zeitalter: Jenseits der Disziplinen"]]<br />
** [[SIB course: "Statistical analysis applied to genome and proteome analyses"]]<br />
** [[UNIL PhD literature seminar: "Optimality and evolutionary tuning of the expression level of a protein"]]<br />
** [[UNIL MSc course: "Cartographie, séquençage et structure des génomes 2007"]]<br />
** [[CIG-DGM joint seminar: "Genome-wide Association Studies"]]<br />
** [[UNIL PhD literature seminar: "Diffusion and scaling during early embryonic pattern formation"]]<br />
<br />
* (currently imcomplete) list of group [[Publications]]<br />
<br />
* The schedule for our [[Group Meeting]] and our [[Journal Club]]<br />
<br />
* The [[Library]]<br />
<br />
* [[Conference Summaries]]<br />
<br />
* How-to guides <br />
** Running jobs on [[Vital-IT]] <br />
** [[Submitting lots of jobs locally]]<br />
** [[Packaging matlab to standalone]]<br />
<br />
* [[CBGPeople|People at the CBG]]</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Welcome_to_the_Computational_Biology_Group!&diff=797Welcome to the Computational Biology Group!2009-08-11T13:24:05Z<p>Armand: </p>
<hr />
<div>Welcome to the [http://serverdgm.unil.ch/bergmann/ CBG] Wiki!<br />
<br />
Creating new pages and editing existing content in the Wiki are restricted to CBG members, but by default the pages are world-readable. [mailto:wwwcbg@unil.ch Drop an email to the admin] if you want an account.<br />
<br />
If you are a [http://serverdgm.unil.ch/bergmann/ CBG] member (and have an account) you can find more information on this wiki by clicking on the [[Help:Contents|Help]] link that is in the menu on the left.<br />
<br />
<br />
<br />
== What's in this wiki: ==<br />
<br />
* Research<br />
** [[Robustness in Drosophila embryo patterning]]<br />
** [[WingX: Systems Biology of the Drosophila Wing]]<br />
** [[Genome Wide Association Studies]]<br />
<br />
* Teaching<br />
** [[UNIL MSc course: "Genes: from sequence to function 2009"]]<br />
** [[UNIL BSc course: "Solving Biological Problems that require Math"]]<br />
** [[UNIL PhD literature seminar: "Systematic interpretation of genetic interactions using protein networks"]]<br />
** [[UNIL MSc course: "Cartographie, séquençage et structure des génomes 2008"]]<br />
** [[Summer school course: "Biologie und Medizin im digitalen Zeitalter: Jenseits der Disziplinen"]]<br />
** [[SIB course: "Statistical analysis applied to genome and proteome analyses"]]<br />
** [[UNIL PhD literature seminar: "Optimality and evolutionary tuning of the expression level of a protein"]]<br />
** [[UNIL MSc course: "Cartographie, séquençage et structure des génomes 2007"]]<br />
** [[CIG-DGM joint seminar: "Genome-wide Association Studies"]]<br />
** [[UNIL PhD literature seminar: "Diffusion and scaling during early embryonic pattern formation"]]<br />
<br />
* (currently imcomplete) list of group [[Publications]]<br />
<br />
* The schedule for our [[Group Meeting]] and our [[Journal Club]]<br />
<br />
* The [[Library]]<br />
<br />
* [[Conference Summaries]]<br />
<br />
* How-to guides <br />
** Running jobs on [[Vital-IT]] <br />
** [[Submitting lots of jobs locally]]<br />
** [[Packaging matlab to standalone]]<br />
<br />
<br />
* [[CBGPeople|People at the CBG]]</div>Armandhttp://www2.unil.ch/cbg/index.php?title=User:Armand&diff=796User:Armand2009-08-11T13:21:46Z<p>Armand: </p>
<hr />
<div>Hi,<br />
<br />
My name is Armand and I am a joint PhD student with Pr. Bergmann and Pr. Jongeneel.<br />
<br />
My main interests are in detecting Copy Number Variation from micro-arrays (SNP arrays and CGH) and how such variation relates with the phenotype.<br />
<br />
Some other projects I am also involved, include :<br />
<br />
* studying the evolution and polymorphisms of some cancer-related genes.<br />
<br />
* how to store structural variants in databases and how to visualize them.<br />
<br />
<br />
<br />
<br />
----<br />
<br />
== Contact ==<br />
Armand Valsesia<br />
Ludwig Institute for Cancer Research<br />
Bâtiment Génopode, UNIL<br />
1015 Lausanne, Switzerland<br />
Phone: + 41 21 692 40 66<br />
Fax: + 41 21 692 40 65<br />
e-mail: Armand.Valsesia AT licr.org<br />
<br />
----<br />
<br />
== Some usefull links ==<br />
<br />
* Running jobs on [[Vital-IT]]<br />
<br />
* A nice listing of software for Ultra High Throughput Sequencing Data ([[UHTS]])<br />
<br />
* some usefull unix commands ([[UNIX_recipes]])<br />
<br />
* how to package your matlab code ([[Packaging_matlab_to_standalone]])</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Packaging_matlab_to_standalone&diff=795Packaging matlab to standalone2009-08-11T13:20:55Z<p>Armand: </p>
<hr />
<div>== Summary ==<br />
<br />
<br />
This is about compiling matlab code (.m file) and packaging it into a standalone executable.<br />
<br />
<br />
Converting .m files can be done easily with the mcc command. <br />
It will compile the code with gcc (beware of which gcc version you are using) and provide shell script to run the executable.<br />
<br />
To run the code, you will also need the Matlab Compiler Runtime (MCR) which contains run time components. <br />
<br />
The MCR is needed for running any compiled standalone. <br />
Please note, the MCR is Matlab version dependent, so make sure the target machine is running the same MCR than the compilation machine ! <br />
In some cases, a standalone can be kernel dependent ...<br />
<br />
<br />
== Building/finding the MCR ==<br />
<br />
<br />
<br />
With Matlab version 2008, one can build the MCR using self-extracting files :<br />
<br />
<br />
{| class="wikitable sortable" border="1"<br />
|-<br />
! PlatformFile<br />
! self-extracting file<br />
! location<br />
|-<br />
! Windows 32-bit<br />
! MCRInstaller.exe<br />
! matlabroot\toolbox\compiler\deploy\win32<br />
|-<br />
! Windows 64-bit<br />
! MCRInstaller.exe<br />
! matlabroot\toolbox\compiler\deploy\win64<br />
|-<br />
! Linux (glnx86)<br />
! MCRInstaller.bin<br />
! matlabroot/toolbox/compiler/deploy/glnx86<br />
|-<br />
! Linux (glnxa64)<br />
! MCRInstaller.bin<br />
! matlabroot/toolbox/compiler/deploy/glnxa64<br />
|-<br />
! Mac<br />
! MCRInstaller.dmg<br />
! matlabroot/toolbox/compiler/deploy/mac<br />
|-<br />
! Mac intel<br />
! MCRInstaller.dmg<br />
! matlabroot/toolbox/compiler/deploy/maci<br />
|-<br />
! Solaris (sol64)<br />
! MCRInstaller.bin<br />
! matlabroot/toolbox/compiler/deploy/sol64<br />
|}<br />
<br />
Where matlabroot is where your matlab was installed (i.e. /usr/bin/matlab2008b_PDE/ )<br />
<br />
In older Matlab version (< 2008), one included a MCRInstaller.zip file into the distribution. This zip file could be created by running the buildmcr command. This function is now deprecated.<br />
<br />
<br />
== Compiling .mat files with mcc ==<br />
<br />
Let say I have a main function callCNVs.m and that all helper functions are either standard Matlab functions or under a fun/ directory.<br />
<br />
My callCNVs.m function looks like :<br />
<br />
function callCNVs(input_file, varargin)<br />
<br />
if nargin < 1<br />
error('Missing input file');<br />
end<br />
fprintf('Processing file %s\n', input_file);<br />
end<br />
<br />
I can then create my small distribution with <br />
<br />
mcc -m -I /home/armand/MATLAB/COLAUS -I /home/armand/MATLAB/COLAUS/fun -d /home/armand/MATLAB/COLAUS_COMPILED callCNVs.m<br />
<br />
Where -I specify which directories to include for compilation, -d the destination directory and callCNVs.m the main function.<br />
<br />
This will convert callCNVs.m to C, compile it and link functions that are in the include directories.<br />
<br />
Looking into /home/armand/MATLAB/COLAUS_COMPILED, I now have :<br />
<br />
callCNVs: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, \<br />
dynamically linked (uses shared libs), not stripped<br />
callCNVs_main.c: ASCII C program text<br />
callCNVs_mcc_component_data.c: ASCII C program text<br />
callCNVs.prj: XML document text<br />
mccExcludedFiles.log: ASCII English text<br />
readme.txt: Matlab v5 mat-file<br />
run_callCNVs.sh: Bourne shell script text executable<br />
<br />
To run the standalone, I simply use the run_callCNVs.sh script :<br />
<br />
$ sh run_callCNVs.sh /usr/bin/matlab2008b_PDE/ test.dat<br />
------------------------------------------<br />
Setting up environment variables<br />
---<br />
LD_LIBRARY_PATH is .:/usr/bin/matlab2008b_PDE//runtime/glnxa64:/usr/bin/matlab2008b_PDE//bin/glnxa64:/usr/bin/matlab2008b_PDE//sys/os/glnxa64:\<br />
/usr/bin/matlab2008b_PDE//sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/usr/bin/matlab2008b_PDE//sys/java/jre/glnxa64/jre/lib/amd64/server:\<br />
/usr/bin/matlab2008b_PDE//sys/java/jre/glnxa64/jre/lib/amd64/client:/usr/bin/matlab2008b_PDE//sys/java/jre/glnxa64/jre/lib/amd64<br />
<br />
Processing file test.dat<br />
<br />
<br />
== Troubleshooting ==<br />
<br />
=== Error while loading libraries ===<br />
<br />
error while loading shared libraries: libmwmclmcrrt.so: cannot open shared object file: No such file or directory<br />
<br />
Most likely your LD_LIBRARY_PATH is not pointing correctly to the MCR.<br />
The run_[appName].sh script should do it for you.<br />
<br />
$ sh run_callCNVs.sh /path-to-my-MCR/v79 test.dat # i.e. for compiled distrib with Matlab 2008, build v79</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Packaging_matlab_to_standalone&diff=794Packaging matlab to standalone2009-08-11T12:05:40Z<p>Armand: </p>
<hr />
<div>== Summary ==<br />
<br />
<br />
This is about compiling matlab code (.m file) and packaging it into a standalone executable.<br />
<br />
<br />
Converting .m files can be done easily with the mcc command. <br />
It will compile the code with gcc (beware of which gcc version you are using) and provide shell script to run the executable.<br />
<br />
To run the code, you will also need the Matlab Compiler Runtime (MCR) which contains run time components. <br />
<br />
The MCR is needed for running any compiled standalone. <br />
Please note, the MCR is Matlab version dependent, so make sure the target machine is running the same MCR than the compilation machine ! <br />
In some cases, a standalone can be kernel dependent ...<br />
<br />
<br />
== Building/finding the MCR ==<br />
<br />
<br />
<br />
With Matlab version 2008, one can build the MCR using self-extracting files :<br />
<br />
<br />
{| class="wikitable sortable" border="1"<br />
|-<br />
! PlatformFile<br />
! self-extracting file<br />
! location<br />
|-<br />
! Windows 32-bit<br />
! MCRInstaller.exe<br />
! matlabroot\toolbox\compiler\deploy\win32<br />
|-<br />
! Windows 64-bit<br />
! MCRInstaller.exe<br />
! matlabroot\toolbox\compiler\deploy\win64<br />
|-<br />
! Linux (glnx86)<br />
! MCRInstaller.bin<br />
! matlabroot/toolbox/compiler/deploy/glnx86<br />
|-<br />
! Linux (glnxa64)<br />
! MCRInstaller.bin<br />
! matlabroot/toolbox/compiler/deploy/glnxa64<br />
|-<br />
! Mac<br />
! MCRInstaller.dmg<br />
! matlabroot/toolbox/compiler/deploy/mac<br />
|-<br />
! Mac intel<br />
! MCRInstaller.dmg<br />
! matlabroot/toolbox/compiler/deploy/maci<br />
|-<br />
! Solaris (sol64)<br />
! MCRInstaller.bin<br />
! matlabroot/toolbox/compiler/deploy/sol64<br />
|}<br />
<br />
Where matlabroot is where your matlab was installed (i.e. /usr/bin/matlab2008b_PDE/ )<br />
<br />
In older Matlab version (< 2008), one included a MCRInstaller.zip file into the distribution. This zip file could be created by running the buildmcr command. This function is now deprecated.<br />
<br />
<br />
== Compiling .mat files with mcc ==<br />
<br />
Let say I have a main function callCNVs.m and that all helper functions are either standard Matlab functions or under a fun/ directory.<br />
<br />
My callCNVs.m function looks like :<br />
<br />
function callCNVs(input_file, varargin)<br />
<br />
if nargin < 1<br />
error('Missing input file');<br />
end<br />
fprintf('Processing file %s\n', input_file);<br />
end<br />
<br />
I can then create my small distribution with <br />
<br />
mcc -m -I /home/armand/MATLAB/COLAUS -I /home/armand/MATLAB/COLAUS/fun -d /home/armand/MATLAB/COLAUS_COMPILED callCNVs.m<br />
<br />
Where -I specify which directories to include for compilation, -d the destination directory and callCNVs.m the main function.<br />
<br />
This will convert callCNVs.m to C, compile it and link functions that are in the include directories.<br />
<br />
Looking into /home/armand/MATLAB/COLAUS_COMPILED, I now have :<br />
<br />
callCNVs: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, \<br />
dynamically linked (uses shared libs), not stripped<br />
callCNVs_main.c: ASCII C program text<br />
callCNVs_mcc_component_data.c: ASCII C program text<br />
callCNVs.prj: XML document text<br />
mccExcludedFiles.log: ASCII English text<br />
readme.txt: Matlab v5 mat-file<br />
run_callCNVs.sh: Bourne shell script text executable<br />
<br />
To run the standalone, I simply use the run_callCNVs.sh script :<br />
<br />
$ sh run_callCNVs.sh /usr/bin/matlab2008b_PDE/ test.dat<br />
------------------------------------------<br />
Setting up environment variables<br />
---<br />
LD_LIBRARY_PATH is .:/usr/bin/matlab2008b_PDE//runtime/glnxa64:/usr/bin/matlab2008b_PDE//bin/glnxa64:/usr/bin/matlab2008b_PDE//sys/os/glnxa64:\<br />
/usr/bin/matlab2008b_PDE//sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/usr/bin/matlab2008b_PDE//sys/java/jre/glnxa64/jre/lib/amd64/server:\<br />
/usr/bin/matlab2008b_PDE//sys/java/jre/glnxa64/jre/lib/amd64/client:/usr/bin/matlab2008b_PDE//sys/java/jre/glnxa64/jre/lib/amd64<br />
<br />
Processing file test.dat</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Vital-IT&diff=784Vital-IT2009-07-30T09:29:16Z<p>Armand: </p>
<hr />
<div>= How to run jobs on Vital-IT, hints and good practice =<br />
<br />
<br />
<br />
When you have many many jobs to run, running them on the Vital-IT cluster might be better than running them on shoshana or maya.<br />
<br />
Simply because running 300+ jobs on a 16 processor machine, will make your jobs competing with each other. (i.e. each job will not be using 100% of a processor, but will be sharing the resources with the others).<br />
<br />
If your jobs take few minutes to complete, that might not be an issue though.<br />
<br />
Any job that does not require massive amount of memory (more than 7-8Gb) can be easily run on the Vital-it machines. <br />
For huge-memory there are few machines available, although only one (rserv) competing with shoshana or maya.<br />
<br />
== Prerequisites ==<br />
<br />
Before working or crashing vital-it, you will need an account.<br><br />
You can ask for one there [http://www.vital-it.ch/vitalit-intro.htm]<br />
<br />
== Ways to submit jobs ==<br />
<br />
<br />
You can submit jobs through :<br />
<br />
* a web interface [http://www.vital-it.ch/vitalit-tech-wsub.html]<br />
<br />
* or you can use a python script (wsub.py), documentation available at wsub-python[http://www.vital-it.ch/vitalit-tech-wsub-onlinetutorial.html?6]<br />
<br />
* or you can log on to a front-end node (dev.vital-it.ch or prd.vital-it.ch) and submit jobs using the bsub command.[http://www.vital-it.ch/LSF/lsf_using/B_jobops.html#123512]<br />
<br />
== Being nice ==<br />
<br />
'''PLEASE DO NOT RUN ANY COMPUTATION ON THE FRONT_END NODES (dev,prd) !!!''' <br><br />
These front-end nodes are only to submit jobs and do not have the resources to allow you running your jobs interatively.<br><br />
For interactive and/or heavy computation, you can log on rserv.vital-it.ch or noko01.vital-it.ch . <br><br />
The jobs on these machines will share the resources (RAM, CPU, I/O) with all other user's jobs.<br />
<br />
== Installed softwares ==<br />
<br />
There are various bioinformatic softwares installed on Vital-IT. Check out there [http://www.vital-it.ch/vitalit-comp-services.htm]<br />
These include :<br />
* R ( /mnt/common/R-BioC/install/Linux/x86_64/R-2.8.0/bin/R or /mnt/common/R-BioC/install/Linux/ia64/R-2.8.0/bin/R )<br />
* Plink, EigenStrat, Merlin ...<br />
* Raxml, Phylip, phyloBayes, phyml, treefinder ... <br />
* Emboss<br />
* lots of sequence analysis tools ( t-coffee, paralign, hmmer, pftools, clustalw, blast, ssaha, blat, fasta, tagger ...)<br />
<br />
Matlab is not yet installed (mainly due to a licence problem).<br />
One alternative would be to compile the Matlab code on shoshana/maya and to use it on rserv.vital-it.ch.<br />
<br />
= Bsub in a nutshell =<br />
<br />
===Submitting a simple job===<br />
<br />
bsub "sh myscript.sh > mylog"<br />
''Job <903956> is submitted to default queue <normal>.<br />
''<br />
<br />
That will submit it to the cluster and return you its job id.<br />
<br />
Here outputs will be redirected to mylog.<br />
But you can separate STDOUT and STDERR messages in distinct files with :<br />
bsub -e myerrorfile -o myoutputfile "sh myscript.sh"<br />
<br />
=== Submitting job to a queue ===<br />
<br />
You can assign a job to a special queue, simply like :<br />
bsub -q normal "sh myscript.sh" # only for jobs needing less than 24h <br />
bsub -q long "sh myscript.sh" # for long jobs<br />
<br />
By default, each Vital-IT job is submitted to the normal queue, which has a run-time limit of 24hours. After 24h, the job will be killed automatically.<br />
For longer jobs, you can submit to the long queue, without time limit, but with a lower priority.<br />
Such priority score (known as LSF shares) define how soon a submitted job will start running. <br />
Obviously, the more job you submit and the more CPU you have already used, the more your priority score will decrease.<br />
<br />
One can also change the queue of a job<br />
bswitch long 666 # put job with id 666 to the long queue<br />
bswitch -q normal long 0 # put all jobs from normal queue to the long queue<br />
<br />
=== Monitoring jobs === <br />
<br />
You can check it status by doing<br />
bjobs <br />
''JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME<br />
903583 avalses RUN normal devfrt01 cpt176 698099 Feb 26 14:22<br />
903581 avalses RUN normal devfrt01 cpt167 695923 Feb 26 14:22<br />
903580 avalses RUN normal devfrt01 cpt166 695889 Feb 26 14:22<br />
''<br />
<br />
The bjobs lists you the job info (id, name , when it was submitted, on which host it is running, on which queue) and more importantly what is its current status.<br />
The status you will see most of the time are :<br />
* RUN : The job is currently running.<br />
* PEND : The job is pending, that is, it has not yet been started.<br />
* DONE : The job has terminated with status of 0.<br />
* EXIT : The job has terminated with a non-zero status - it may have been aborted due to an error in its execution, or killed by its owner or the LSF administrator.<br />
<br />
To check your jobs, you might also use<br />
bjobs -a # list all running and finished jobs (at least the recently finished)<br />
bjobs -r # list all running jobs<br />
bjobs -d # list all finished jobs (either successfully completed or failed ones)<br />
bjobs -u marcel # list all jobs for this user<br />
bjobs -q normal # list my jobs on this queue<br />
<br />
<br />
'''Special status :''' <br />
<br />
These are some special job status, for which you would probably worry a bit :<br />
<br />
* PSUSP : The job has been suspended, either by its owner or the LSF administrator, while pending.<br />
* USUSP : The job has been suspended, either by its owner or the LSF administrator, while running.<br />
* SSUSP : The job has been suspended by LSF. Either because <br />
** The load conditions on the execution host or hosts have exceeded a threshold according to the loadStop vector defined for the host or queue.<br />
** The run window of the job’s queue is closed. <br />
* UNKWN : mbatchd has lost contact with the sbatchd on the host on which the job runs.<br />
* WAIT : For jobs submitted to a chunk job queue, members of a chunk job that are waiting to run.<br />
* ZOMBI : A job becomes ZOMBI if:<br />
** A non-rerunnable job is killed by bkill while the sbatchd on the execution host is unreachable and the job is shown as UNKWN.<br />
** The host on which a rerunnable job is running is unavailable and the job has been requeued by LSF with a new job ID, as if the job were submitted as a new job.<br />
** After the execution host becomes available, LSF tries to kill the ZOMBI job. Upon successful termination of the ZOMBI job, the job’s status is changed to EXIT.<br />
** With MultiCluster, when a job running on a remote execution cluster becomes a ZOMBI job, the execution cluster treats the job the same way as local ZOMBI jobs. In addition, it notifies the submission cluster that the job is in ZOMBI state and the submission cluster requeues the job.<br />
<br />
=== What to do when a job goes nuts ===<br />
<br />
Bkill is your best friend, when something goes wrong, you can kill your job(s) with :<br />
bkill 007 # kill job id's 007<br />
bkill 0 # kill all my jobs<br />
bkill -q normal 0 # kill all my jobs from the normal queue<br />
bkill -J "toto" # kill job called toto<br />
<br />
== Monitoring Vital-IT ==<br />
<br />
To check the sanity of Vital-IT before or during job submission, you can use the online tools [http://www.vital-it.ch/vitalit-tech-wsub-status.html] :<br />
<br />
* either Qstat [http://www.vital-it.ch/prd/www/cgi-bin/Wserver?qstat=0&html=0], which does nothing more than a bjobs<br />
* or Ganglia [http://www.vital-it.ch/prdpub/ganglia-webfrontend/?c=ProdCluster&r=hour&s=by%2520hostname&hc=4], which will tell you how busy (in term of load, mem usage, SFS load etc...) the nodes are<br />
<br />
= Building nicer bsub =<br />
<br />
=== Linking jobs ===<br />
You can submit many jobs and ensure some start after the completion of some other.<br />
i.e. if you want to run a,b,c and b needs the output from a, and c is to do only when b failed, <br />
Then you can use the -w bsub option<br />
bsub -J a "sh a.sh"<br />
bsub -J b -w '(done "a")' "sh b.sh" # start b when a is successfully done<br />
bsub -J c -w '(exit "b")' "sh c.sh" # start c if b has failed<br />
And here we go, we have a mini-pipeline :-).<br />
<br />
=== Job with special requirements ===<br />
<br />
When a job has special needs, you can ask LSF to start running it only if some conditions are satisfied.<br />
This can be a minimal amount of free memory, a particular host architecture (i.e X86_64 and not ia64) etc..<br />
<br />
You can do this with the bsub -R option.<br />
bsub -R "select[mem>3500] rusage[mem=3500]" .... # will start the job on a machine having at least 3.5Gb of RAM [[and]] reserve 3.5Gb for your job. <br />
bsub -R "select[model==Xeon5160]" .... # the job will start on Xeon machine, which is a x86_64 architecture<br />
<br />
=== Job arrays ===<br />
<br />
<br />
If I want to submit about a thousand job just changing one parameter or one input file, I could do a thousands bsub.<br />
But using a job array is better as it is just much faster.<br />
bsub -J myjobname"[1-1000]"%50 -e log/%I.err -o log/%I.out "sh myscript.sh inputFile${LSB_JOBINDEX}.txt" <br />
This will submit a job array with 1000 jobs (myscript.sh) on an input file name "inputFileNN.txt" where NN is a number from 1 to 1000.<br />
<br />
bsub -J myjobname"[1-1000]"%50 tells LSF that this is a job array starting at 1 and finishing at 1000. %50 specifies how many jobs are allowed to run at any one time. (here it's only 50)<br><br />
The variables %I and %J are used as substitution strings to support file redirection for jobs submitted from a job array. <br><br />
At execution time, %I is expanded to provide the job array index value of the current job, and %J (not used in the above example) is expanded at to provide the job ID of the job array.<br><br />
The ${LSB_JOBINDEX} is an environment variable incremented automatically by LSF.<br />
<br />
By default the max number of job per array is 1000, but the sys admin can increase it up to ~64k.<br />
<br />
<br />
Killing a job array can be done with :<br />
bkill "myjobname" # kill the complete array called myjobname<br />
bkill "myjobname[10]" # only kill the 10th job of the array<br />
bkill "myjobname[1-10,77]" # kill the 10 first jobs and the 77th<br />
<br />
= FAQ =<br />
<br />
==== Can I submit LSF jobs from rserv or noko01?====<br />
<br />
No, use dev or prd instead. <br />
<br />
==== Can I run jobs directly on dev or prd ?====<br />
<br />
Never ! Use rserv or noko01 !<br />
<br />
==== My ls is painfully slow. Why?====<br />
<br />
That is inherent to SFS and the fact that files are stripped on many different discs.<br />
<br />
Apart from avoiding putting thousand of files in a single directory, you can use the /bin/ls or ls --color=none which is much faster than the default ls.<br />
<br />
==== How to I check the space left?====<br />
Please note, that Vital-it will crash if the space left is less than 1Tb !!! Because, there are some webservices relying on this minimal free space.<br />
<br />
df -h .<br />
Filesystem Size Used Avail Use% Mounted on<br />
client_o2ib 16T 13T 2.7T 83% /sfs1<br />
<br />
==== How can I make sure, I am using my bash config on the node running my job ?====<br />
bsub -L /bin/bash ....<br />
<br />
==== Can I run an interactive job on a Vital-it node? ==== <br />
Yes, with bsub -I <br />
<br />
bsub -I echo "hello"<br />
''Job <904773> is submitted to default queue <normal>.<br />
<<Waiting for dispatch ...>><br />
<<Starting on cpt023>><br />
hello''<br />
<br />
== Known limitations ==<br />
<br />
Vital-it uses the SFS file system [http://www.sun.com/software/products/lustre/], files are stripped to many discs for backup reasons. <br><br />
But this means that any file stat operation (i.e. a simple ls), needs to query the various discs where the data stripes are. This can be painfully slow...<br><br />
This means that a job doing lots of I/O operations will be slower compared to a NFS file system. Still running in parallel 200+ jobs will be much faster than one by one or by small batches on maya/shoshana.<br />
<br />
= See also =<br />
<br />
* Complete LSF documentation [http://www.vital-it.ch/LSF/]<br />
* Lustre [http://wiki.lustre.org/index.php?title=Main_Page] [http://www.sun.com/software/products/lustre/] <br />
* HP StorageWorks SFS [http://h20311.www2.hp.com/HPC/cache/276636-0-0-0-121.html]<br />
* SFS/Lustre experience from Roland Laifer [http://www.rz.uni-karlsruhe.de/download/SSCK_Workshop_07_Laifer.pdf]</div>Armandhttp://www2.unil.ch/cbg/index.php?title=UNIX_recipes&diff=783UNIX recipes2009-07-30T09:25:48Z<p>Armand: </p>
<hr />
<div>Armand's unix memo<br />
<br />
= Disc checks =<br />
<br />
How to find which partition is which ?<br />
sudo fdisk -l /dev/hda<br />
<br />
How to check a disk<br />
sudo fsck /dev/sda1<br />
<br />
How to find the bad blocks<br />
sudo badblocks /dev/sda1<br />
<br />
How to reformat a disc ignoring bad blocks<br />
sudo mke2fs -c /dev/sda1<br />
<br />
<br />
= Running scripts =<br />
<br />
How to execute a given command for many different parameters stored in a file<br />
cat paramList.txt | xargs mycommand<br />
<br />
<br />
= SSH =<br />
<br />
How to generate public key<br />
ssh-keygen -f dsa<br />
<br />
How to display variables for sshagent<br />
ssh-agent<br />
<br />
How to create a passphrase<br />
ssh-add<br />
<br />
How to authorize ssh connection to a remote machine using the generated key<br />
cat .ssh/id_dsa.pub | ssh machine_name "cat - >> .ssh/authorized_keys"<br />
<br />
<br />
= File manipulation =<br />
<br />
How to print a section of a file<br />
awk 'NR >= mystart && NR <= myend' myfile<br />
<br />
How to count #of columns per line<br />
awk '{ print NF }' myfile<br />
<br />
Print a given column (i.e. 2nd one)<br />
awk '{ print $2 }' myfile<br />
cut -f2 myfile<br />
<br />
Pasting 2 files together by their columns<br />
paste file1 file2<br />
<br />
Joining 2 files by a common column (ie 1st column of file1 contains some common identifiers than the 3rd column of file2)<br />
join -1 1 -2 3 file1 file2<br />
<br />
Sorting numerically a file by its 3rd column<br />
sort -n +2 myfile<br />
Sorting numerically a file by its 2nd column then 1st and then 3rd<br />
sort -n -k 2,1,3 myfile<br />
<br />
Splitting a file into smaller files with a fixed number of lines (i.e. 100)<br />
split -l 100 myfile<br />
Remove 10 first line of a file<br />
sed '1,10d' myfile<br />
<br />
Checking file type :<br />
file myfile<br />
<br />
Converting dos "end-like" file to unix<br />
perl -p -e 's/\r$//' < myfile > mynewfile<br />
<br />
Checking ascii content of a file :<br />
od -c myfile | more</div>Armandhttp://www2.unil.ch/cbg/index.php?title=User:Armand&diff=756User:Armand2009-06-21T19:10:51Z<p>Armand: </p>
<hr />
<div>Hi,<br />
<br />
My name is Armand and I am a joint PhD student with Pr. Bergmann and Pr. Jongeneel.<br />
<br />
My main interests are in detecting Copy Number Variation from micro-arrays (SNP arrays and CGH) and how such variation relates with the phenotype.<br />
<br />
Some other projects I am also involved, include :<br />
<br />
* studying the evolution and polymorphisms of some cancer-related genes.<br />
<br />
* how to store structural variants in databases and how to visualize them.<br />
<br />
<br />
<br />
<br />
----<br />
<br />
== Contact ==<br />
Armand Valsesia<br />
Ludwig Institute for Cancer Research<br />
Bâtiment Génopode, UNIL<br />
1015 Lausanne, Switzerland<br />
Phone: + 41 21 692 40 66<br />
Fax: + 41 21 692 40 65<br />
e-mail: Armand.Valsesia AT licr.org<br />
<br />
----<br />
<br />
== Some usefull links ==<br />
<br />
* Running jobs on [[Vital-IT]]<br />
<br />
* A nice listing of software for Ultra High Throughput Sequencing Data ([[UHTS]])<br />
<br />
* some usefull unix commands ([[UNIX_recipes]])</div>Armandhttp://www2.unil.ch/cbg/index.php?title=UNIX_recipes&diff=755UNIX recipes2009-06-18T15:38:56Z<p>Armand: </p>
<hr />
<div>Armand's unix memo<br />
<br />
= Disc checks =<br />
<br />
How to find which partition is which ?<br />
sudo fdisk -l /dev/hda<br />
<br />
How to check a disk<br />
sudo fsck /dev/sda1<br />
<br />
How to find the bad blocks<br />
sudo badblocks /dev/sda1<br />
<br />
How to reformat a disc ignoring bad blocks<br />
sudo mke2fs -c /dev/sda1<br />
<br />
<br />
= Running scripts =<br />
<br />
How to execute a given command for many different parameters stored in a file<br />
cat paramList.txt | xargs mycommand<br />
<br />
<br />
= SSH =<br />
<br />
How to generate public key<br />
ssh-keygen -f dsa<br />
<br />
How to display variables for sshagent<br />
ssh-agent<br />
<br />
How to create a passphrase<br />
ssh-add<br />
<br />
How to authorize ssh connection to a remote machine using the generated key<br />
cat .ssh/id_dsa.pub | ssh machine_name "cat - >> .ssh/authorized_keys"<br />
<br />
<br />
= File manipulation =<br />
<br />
How to print a section of a file<br />
awk 'NR >= mystart && NR <= myend' myfile<br />
<br />
How to count #of columns per line<br />
awk '{ print NF }' myfile<br />
<br />
Print a given column (i.e. 2nd one)<br />
awk '{ print $2 }' myfile<br />
cut -f2 myfile<br />
<br />
Pasting 2 files together by their columns<br />
paste file1 file2<br />
<br />
Joining 2 files by a common column (ie 1st column of file1 contains some common identifiers than the 3rd column of file2)<br />
join -1 1 -2 3 file1 file2<br />
<br />
Sorting numerically a file by its 3rd column<br />
sort -n +2 myfile<br />
Sorting numerically a file by its 2nd column then 1st and then 3rd<br />
sort -n -k 2,1,3 myfile<br />
<br />
Splitting a file into smaller files with a fixed number of lines (i.e. 100)<br />
split -l 100 myfile<br />
Remove 10 first line of a file<br />
sed '1,10d' myfile</div>Armandhttp://www2.unil.ch/cbg/index.php?title=UNIX_recipes&diff=754UNIX recipes2009-06-18T15:38:12Z<p>Armand: </p>
<hr />
<div>= Armand's unix memo =<br />
<br />
== Disc checks ==<br />
<br />
How to find which partition is which ?<br />
sudo fdisk -l /dev/hda<br />
<br />
How to check a disk<br />
sudo fsck /dev/sda1<br />
<br />
How to find the bad blocks<br />
sudo badblocks /dev/sda1<br />
<br />
How to reformat a disc ignoring bad blocks<br />
sudo mke2fs -c /dev/sda1<br />
<br />
<br />
== Running scripts ==<br />
<br />
How to execute a given command for many different parameters stored in a file<br />
cat paramList.txt | xargs mycommand<br />
<br />
== SSH ==<br />
<br />
How to generate public key<br />
ssh-keygen -f dsa<br />
<br />
How to display variables for sshagent<br />
ssh-agent<br />
<br />
How to create a passphrase<br />
ssh-add<br />
<br />
How to authorize ssh connection to a remote machine using the generated key<br />
cat .ssh/id_dsa.pub | ssh machine_name "cat - >> .ssh/authorized_keys"<br />
<br />
<br />
== File manipulation ==<br />
<br />
How to print a section of a file<br />
awk 'NR >= mystart && NR <= myend' myfile<br />
<br />
How to count #of columns per line<br />
awk '{ print NF }' myfile<br />
<br />
Print a given column (i.e. 2nd one)<br />
awk '{ print $2 }' myfile<br />
cut -f2 myfile<br />
<br />
Pasting 2 files together by their columns<br />
paste file1 file2<br />
<br />
Joining 2 files by a common column (ie 1st column of file1 contains some common identifiers than the 3rd column of file2)<br />
join -1 1 -2 3 file1 file2<br />
<br />
Sorting numerically a file by its 3rd column<br />
sort -n +2 myfile<br />
Sorting numerically a file by its 2nd column then 1st and then 3rd<br />
sort -n -k 2,1,3 myfile<br />
<br />
Splitting a file into smaller files with a fixed number of lines (i.e. 100)<br />
split -l 100 myfile<br />
Remove 10 first line of a file<br />
sed '1,10d' myfile</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Journal_Club_(spring_2012)&diff=629Journal Club (spring 2012)2009-04-09T09:22:08Z<p>Armand: </p>
<hr />
<div>Journal Club is every Thursday, from 1-2pm, in the small meeting room. Feel free to bring your lunch. We also have a [[Group Meeting]].<br />
<br />
Ideally, someone from the group should volunteer to choose a paper for each meeting, and should update this page and email the paper around on the '''Friday the week before the meeting'''. If a volunteer is not forthcoming, [[user:Toby|Toby]] will encourage someone to volunteer. <br />
<br />
== 5th February 2009 ==<br />
<br />
Toby will present:<br />
A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and<br />
Implications for Functional Genomics, by Juliane Sch&auml;fer and Korbinian Strimmer (2005)<br />
''Statistical Applications in Genetics and Molecular Biology''<br />
'''4''':1 Article 32.<br />
[http://dx.doi.org/10.2202/1544-6115.1175 doi:10.2202/1544-6115.1175]<br />
[http://www.bepress.com/sagmb/vol4/iss1/art32 link to paper]<br />
<br />
== 12th February 2009 ==<br />
<br />
G&aacute;bor will present:<br />
<biblio><br />
#wagner pmid=16087882<br />
</biblio><br />
<br />
<br />
[exceptionally at 9:30-10:30]<br />
<br />
== 19th February 2009 ==<br />
<br />
Zolt&aacute;n will present: <br />
The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing, by John D. Storey<br />
(2007) ''J. R. Statist. Soc. B'' '''69''':3 pp.347-368.<br />
[http://dx.doi.org/10.1111/j.1467-9868.2007.005592.x doi:10.1111/j.1467-9868.2007.005592.x]<br />
[http://www3.interscience.wiley.com/journal/118490765/abstract link to paper]<br />
<br />
== 26th February 2009 ==<br />
Bastian will present : A "Silent" Polymorphism in the MDR1 Gene Changes Substrate Specificity,DOI: 10.1126/science.1135308, Science 315, 525 (2007); Chava Kimchi-Sarfaty, et al.<br />
[http://www.sciencemag.org/cgi/content/full/315/5811/525 link to paper]<br />
<br />
This meeting is at 9:30 am instead of the usual time.<br />
<br />
== 5th March 2009 ==<br />
'''Date and time change: Wednesday 3pm-4pm'''<br />
<br />
Karen will present: <br />
Multiple Hypothesis Testing in Microarray Experiments by Sandrine Dudoit, Juliet Popper Shaffer and Jennifer C. Boldrick<br />
Statistical Science, Vol. 18, No. 1 (Feb., 2003), pp. 71-103. <br />
[http://www.jstor.org/stable/3182872 link to paper]<br />
<br />
== 12th March 2009 ==<br />
Aitana will present: <br />
<biblio><br />
#kashtan pmid=17698964<br />
</biblio><br />
<br />
== 19th March 2009 ==<br />
'''Time and room change: 2pm-3pm 1st floor conference room'''<br />
<br />
<br />
Diana is presenting: <br />
<br />
'''Drug—target network'''<br />
Muhammed A Yıldırım, Kwang-Il Goh, Michael E Cusick, Albert-László Barabási & Marc Vidal<br />
<br />
Nature Biotechnology 25, 1119 - 1126 (2007)<br />
Published online: 5 October 2007 | doi:10.1038/nbt1338<br />
http://www.nature.com/nbt/journal/v25/n10/abs/nbt1338.html<br />
<br />
== 26th March 2009 ==<br />
'''Time change: 2pm-3pm'''<br />
<br />
Micha will present the following paper:<br />
<biblio><br />
#millar pmid=16729048<br />
</biblio><br />
http://www.nature.com/msb/journal/v1/n1/synopsis/msb4100018.html<br />
<br />
== 2nd April 2009 ==<br />
'''1pm-2pm'''<br />
<br />
Sascha will present :<br />
<br />
Molecular Systems Biology 4 Article number: 176 <br />
<br />
doi:10.1038/msb.2008.14<br />
<br />
Theoretical and experimental approaches to understand morphogen gradients<br />
<br />
Marta Ibañes1 & Juan Carlos Izpisúa Belmonte<br />
<br />
http://www.nature.com/msb/journal/v4/n1/full/msb200814.html<br />
<br />
== 14th May 2009 ==<br />
<br />
Armand will present<br />
<br />
Accurate whole human genome sequencing using reversible terminator chemistry<br />
<br />
Nature 456, 53-59 (6 November 2008) | doi:10.1038/nature07517<br />
<br />
[http://www.nature.com/nature/journal/v456/n7218/full/nature07517.html link to paper]</div>Armandhttp://www2.unil.ch/cbg/index.php?title=User:Armand&diff=612User:Armand2009-04-07T16:08:45Z<p>Armand: </p>
<hr />
<div>Hi,<br />
<br />
My name is Armand and I am a joint PhD student with Pr. Bergmann and Pr. Jongeneel.<br />
<br />
My main interests are in detecting Copy Number Variation from micro-arrays (SNP arrays and CGH) and how such variation relates with the phenotype.<br />
<br />
Some other projects I am also involved, include :<br />
<br />
* studying the evolution and polymorphisms of some cancer-related genes.<br />
<br />
* how to store structural variants in databases and how to visualize them.<br />
<br />
<br />
<br />
<br />
----<br />
<br />
== Contact ==<br />
Armand Valsesia<br />
Ludwig Institute for Cancer Research<br />
Bâtiment Génopode, UNIL<br />
1015 Lausanne, Switzerland<br />
Phone: + 41 21 692 40 66<br />
Fax: + 41 21 692 40 65<br />
e-mail: Armand.Valsesia AT licr.org<br />
<br />
----<br />
<br />
== Some usefull links ==<br />
<br />
* Running jobs on [[Vital-IT]]<br />
<br />
* A nice listing of software for Ultra High Throughput Sequencing Data ([[UHTS]])</div>Armandhttp://www2.unil.ch/cbg/index.php?title=User:Armand&diff=611User:Armand2009-04-07T16:08:10Z<p>Armand: </p>
<hr />
<div>Hi,<br />
<br />
My name is Armand and I am a joint PhD student with Pr. Bergmann and Pr. Jongeneel.<br />
<br />
My main interests are in detecting Copy Number Variation from micro-arrays (SNP arrays and CGH) and how such variation relates with the phenotype.<br />
<br />
Some other projects I am also involved, include :<br />
<br />
* studying the evolution and polymorphisms of some cancer-related genes.<br />
<br />
* how to store structural variants in databases and how to visualize them.<br />
<br />
<br />
<br />
<br />
----<br />
<br />
== Contact ==<br />
Armand Valsesia<br />
Ludwig Institute for Cancer Research<br />
Bâtiment Génopode, UNIL<br />
1015 Lausanne, Switzerland<br />
Phone: + 41 21 692 40 66<br />
Fax: + 41 21 692 40 65<br />
e-mail: Armand.Valsesia AT licr.org<br />
<br />
----<br />
<br />
== Some usefull links ==<br />
<br />
* Running jobs on Vital-IT [http://www2.unil.ch/cbg/index.php?title=Vital-IT]<br />
<br />
* A nice listing of software for Ultra High Throughput Sequencing Data ([[UHTS]])</div>Armandhttp://www2.unil.ch/cbg/index.php?title=UHTS&diff=610UHTS2009-04-07T16:06:03Z<p>Armand: </p>
<hr />
<div>__TOC__<br />
<br />
This page was reproduced and slightly edited from this one [http://wiki.nbic.nl/index.php/High_throughput_sequencing], written by people at the Netherlands Bioinformatics Centre [http://www.nbic.nl/]<br />
<br />
==Vendors sequencers==<br />
<br />
{| class="wikitable sortable" border="1"<br />
|-<br />
! Category<br />
! Source<br />
|-<br />
| Hardware vendor<br />
| [http://www.illumina.com/ Illumina]<br />
|-<br />
| Hardware vendor<br />
| [http://www.pacificbiosciences.com Pacific BioSciences]<br />
|-<br />
| Hardware vendor<br />
| [http://www.roche-applied-science.com/ Roche]<br />
|-<br />
| Hardware vendor<br />
| [http://www.helicosbio.com/ Helicos]<br />
|-<br />
| Hardware vendor<br />
| [http://www.appliedbiosystems.com/ Applied Biosystems]<br />
|}<br />
<br />
==Software==<br />
<br />
{| class="wikitable sortable" border="1"<br />
|-<br />
! Category<br />
! Package<br />
! Description<br />
|-<br />
| Viewer<br />
| [http://bioinformatics.bc.edu/marthlab/EagleView EagleView genome viewer]<br />
| EagleView is an information-rich genome assembler viewer with data integration capability. EagleView can display a dozen different types of information including base qualities, machine specific trace signals, and genome feature annotations.<br />
|-<br />
| Alignment<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18070356?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_Discovery_RA&linkpos=4&log$=relatedarticles&logdbfrom=pubmed MUMmerGPU]<br />
| MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.<br />
|-<br />
| Methylation<br />
| [http://www.nature.com/nbt/journal/v26/n7/abs/nbt1414.html Batman]<br />
| Bayesian tool for methylation analysis (Batman)—for analyzing methylated DNA immunoprecipitation (MeDIP) profiles<br />
|-<br />
| Base-calling<br />
| [http://hannonlab.cshl.edu/Alta-Cyclic/main.html Alta-Cyclic]<br />
| Alta-Cyclic is a novel Illumina Genome-Analyzer (Solexa) base caller. Alta Cyclic Features: Longer Reads, More Accurate Reads (compared to Solexa's default base caller), Reduces systematic bias towrsd a certain nucleotide in later cycles. On a GAII platform, Alta Cyclic was able to provide a large amount of useful reads after 78 cycles. <br />
|-<br />
| Enrichment/peak calling<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18599518?dopt=Abstract FindPeaks 3.1]<br />
| Findpeaks was developed to perform analysis of ChIP-Seq experiments. It uses a naive algorithm for identifying regions of high coverage, which represent Chromatin Immunoprecipitation enrichment of sequence fragments, indicating the location of a bound protein of interest.<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18340039?ordinalpos=2&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum ALLPATHS]<br />
| De novo assembly of whole-genome shotgun microreads.<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/17908823?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_Discovery_RA&linkpos=5&log$=relatedarticles&logdbfrom=pubmed SHARCGS]<br />
| [http://sharcgs.molgen.mpg.de/ SHARCGS] is a suitable tool for fully exploiting novel sequencing technologies by assembling sequence contigs de novo with high confidence and by outperforming existing assembly algorithms in terms of speed and accuracy. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18349386?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_Discovery_RA&linkpos=1&log$=relatedarticles&logdbfrom=pubmed Velvet]<br />
| Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.genomic.ch/edena.php EDENA]<br />
| De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Made by Hernandez D et al.<br />
|-<br />
| Assembly<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/ssake SSAKE]<br />
| The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree. SSAKE is designed to help leverage the information from short sequences reads by stringently clustering them into contigs that can be used to characterize novel sequencing targets. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.<br />
|-<br />
| Alignment<br />
| [http://www.fml.mpg.de/raetsch/projects/qpalma qpalma]<br />
| QPalma is an alignment tool targeted to align spliced reads produced by Next Generation sequencing platforms such as Illumina Solexa or 454. QPalma aligns short reads to the genomic sequences in an optimal way according to its underlying algorithm and trained parameters. It creates an alignment using dynamic programming (written in C++), and returns the alignment in a psl like format. The algorithms computes optimal local alignments, so if no alignment has been found it is because no alignment got a sufficiently high alignment score.<br />
|-<br />
| Alignment<br />
| [http://soap.genomics.org.cn SOAP]<br />
| SOAP (Short Oligonucleotide Alignment Program) is a program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Author is Ruiqiang Li at the Beijing Genomics Institute. C++ for Unix.<br />
|-<br />
| Websites<br />
| [http://g2.trac.bx.psu.edu Galaxy]<br />
| <br />
|-<br />
| Blogs<br />
| [http://seqanswers.com/ SeqAnswers]<br />
| <br />
|-<br />
| Integrated solutions<br />
| [http://www.clcbio.com/index.php?id=1240 CLCbio Genomics Workbench]<br />
| de novo and reference assembly of Sanger, 454, Solexa, Helicos, and SOLiD data. Commercial next-gen-seq software that extends the CLCbio Main Workbench software. Includes SNP detection, browser and other features. Runs on Windows, Mac OS X and Linux.<br />
|-<br />
| Integrated solutions<br />
| [http://softgenetics.com/NextGENe.html NextGENe]<br />
| de novo and reference assembly of Illumina and SOLiD data. Uses a novel Condensation Assembly Tool approach where reads are joined via "anchors" into mini-contigs before assembly. Requires Win or MacOS.<br />
|-<br />
| Integrated solutions<br />
| [http://www.dnastar.com/products/SMGA.php SeqMan Genome Analyser]<br />
| Software for Next Generation sequence assembly of Illumina, 454 Life Sciences and Sanger data integrating with Lasergene Sequence Analysis software for additional analysis and visualization capabilities. Can use a hybrid templated/de novo approach. Early release commercial software. Compatible with Windows® XP X64 and Mac OS X 10.4.<br />
|-<br />
| Alignment<br />
| [http://bioinfo.cgrb.oregonstate.edu/docs/solexa/ ELAND]<br />
| Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine.<br />
|-<br />
| Assembly<br />
| [http://euler-assembler.ucsd.edu/portal/ EULER]<br />
| Short read assembly. By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in Genome Research).<br />
|-<br />
| Alignment<br />
| [http://www.ebi.ac.uk/~guy/exonerate/ Exonerate]<br />
| Various forms of alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.<br />
|-<br />
| Alignment & Mapping<br />
| [http://www.gene.com/share/gmap/ GMAP]<br />
| GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.<br />
|-<br />
| Aligment & Assembly<br />
| [http://bioinformatics.bc.edu/marthlab/Mosaik MOSAIK]<br />
| Reference guided aligner/assembler. Written by Michael Strömberg at Boston College.<br />
|-<br />
| Alignment & Mapping<br />
| [http://sourceforge.net/projects/maq/ MAQ]<br />
| Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre.<br />
|-<br />
| Alignment<br />
| [http://mummer.sourceforge.net/ MUMmer]<br />
| MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required.<br />
|-<br />
| Alignment<br />
| [http://www.novocraft.com/index.html Novocraft]<br />
| Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X.<br />
|-<br />
| Assembly<br />
| [http://rulai.cshl.edu/rmap/ RMAP]<br />
| Assembles 20 - 64 bp Solexa reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.<br />
|-<br />
| Alignment<br />
| [http://biogibbs.stanford.edu/~jiangh/SeqMap/ SeqMap]<br />
| Works like ELand, can do 3 or more bp mismatches and also INDELs. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS's.<br />
|-<br />
| Assembly<br />
| [http://compbio.cs.toronto.edu/shrimp/ SHRiMP]<br />
| Assembles to a reference sequence. Developed with Applied Biosystem's colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. Works with data in letterspace (Roche, Illumina), colourspace (AB) and Helicos space.<br />
|-<br />
| Alignment<br />
| [http://www.sanger.ac.uk/Software/analysis/SSAHA/ SSAHA]<br />
| SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha.<br />
|-<br />
| Alignment<br />
| [http://synasite.mgrc.com.my:8080/sxog/NewSXOligoSearch.php SXOligoSearch]<br />
| SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent.<br />
|-<br />
| Assembly (de novo)<br />
| [http://chevreux.org/projects_mira.html MIRA2]<br />
| MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.<br />
|-<br />
| Assembly (de novo)<br />
| [https://sourceforge.net/projects/vcake VCAKE]<br />
| De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.<br />
|-<br />
| SNP/Indel Discovery<br />
| [http://www.sanger.ac.uk/Software/analysis/ssahaSNP/ ssahaSNP]<br />
| ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence. Highly repetitive elements are filtered out by ignoring those kmer words with high occurrence numbers. More tuned for ABI Sanger reads. Developers are Adam Spargo and Zemin Ning from the Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and Mac<br />
|-<br />
| SNP/Indel Discovery<br />
| [http://bioinformatics.bc.edu/marthlab/PbShort PolyBayesShort]<br />
| A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth at Washington University. This version is specifically optimized for the analysis of large numbers (millions) of high-throughput next-generation sequencer reads, aligned to whole chromosomes of model organism or mammalian genomes. Developers at Boston College. Linux-64 and Linux-32.<br />
|-<br />
| SNP/Indel Discovery<br />
| [http://bioinformatics.bc.edu/marthlab/PyroBayes PyroBayes]<br />
| PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. It was designed to assign more accurate base quality estimates to the 454 pyrosequences. Developers at Boston College.<br />
|-<br />
| Integrated solutions<br />
| [http://staden.sourceforge.net/ STADEN]<br />
| Includes GAP4. GAP5 once completed will handle next-gen sequencing data. A partially implemented test version is available [https://sourceforge.net/project/show...kage_id=256957 here]<br />
|-<br />
| Viewer<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/xmatchview XMatchView]<br />
| A visual tool for analyzing cross_match alignments. Developed by Rene Warren and Steven Jones at Canada's Michael Smith Genome Sciences Centre. Python/Win or Linux.<br />
|-<br />
| Integrated solutions<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/sam SAM]<br />
| Sequence Assembly Manager. Whole Genome Assembly (WGA) Management and Visualization Tool. It provides a generic platform for manipulating, analyzing and viewing WGA data, regardless of input type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui and Steven Jones at Canada's Michael Smith Genome Sciences Centre. MySQL backend and Perl-CGI web-based frontend/Linux.<br />
|-<br />
| Enrichment/peak calling<br />
| [http://woldlab.caltech.edu/chipseq/ CHiPSeq]<br />
| From Science Johnson, 2007<br />
|-<br />
| RNAseq<br />
| [http://woldlab.caltech.edu/rnaseq/ ERANGE]<br />
| ERANGE is a Python package for doing RNA-seq and ChIP-seq (hence the "dual-use"), and is a descendant of the ChIPSeq mini peak finder (Johnson, 2007). In particular, the RNAseq analysis uses some of the very same code to access [http://cistematic.caltech.edu/index.html Cistematic]. Version 2.0 is the first released in the wild and is "Bed"-centric. In particular, it is not optimized for speed!<br />
|-<br />
| Methylation<br />
| [http://epigenomics.mcdb.ucla.edu/BS-Seq/download.html BS-Seq]<br />
| The source code and data for the "Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation Patterning" Nature paper by Cokus et al. (Steve Jacobsen's lab at UCLA). POSIX.<br />
|-<br />
| Mapping<br />
| [http://dna.cs.byu.edu/gnumap/ gnumap]<br />
| he Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. Currently, gnumap is designed to be used with the _int.txt data received from the Solexa/Illumina machine. <br />
|-<br />
| Mapping<br />
| [http://www.bioinformaticssolutions.com/products/zoom/index.php ZOOM]<br />
| ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. <br />
|-<br />
| Assembly & Chromosome walking<br />
| [http://www.plantgdb.org/tool/tracembler/ Tracembler]<br />
| Tracembler streamlines the process of recursive database searches, sequence assembly, and gene identification in resulting contigs in attempts to identify homologous loci of genes of interest in species with emerging whole genome shotgun reads. A web server hosting Tracembler is provided at http://www.plantgdb.org/tool/tracembler/, and the software is also freely available from the authors for local installations.<br />
|-<br />
| Enrichment/peak calling<br />
| [http://dir.nhlbi.nih.gov/papers/lmi/epigenomes/sissrs/ sissrs]<br />
| Produce a list of peakmaxima from aligned positions. <br />
|-<br />
| Assembly<br />
| [http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=17534434 SHRAP]<br />
| The source code will be made available individually upon request. "However, note that we do not have a tool that can be used on real 454 sequence data in a production setting."<br />
|-<br />
| Alignment<br />
| [http://www.phrap.org/phredphrapconsed.html Phred Phrap Consed Cross_match]<br />
| The phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base. Phrap is a program for assembling shotgun DNA sequence data. Cross_match is a general purpose utility for comparing any two DNA sequence sets using a 'banded' version of swat. Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap.<br />
|-<br />
| Chromatine Profiling<br />
| [http://bioinformatics-renlab.ucsd.edu/rentrac/wiki/ChromaSig ChromaSig]<br />
| An unsupervised learning method, which finds, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data.<br />
|-<br />
| Integrated solutions<br />
| [http://1001genomes.org/downloads/ Shore]<br />
| Analysis suite for Illumina short read data. <br />
|-<br />
| Mapping<br />
| [http://1001genomes.org/downloads/ GenomeMapper]<br />
| Short read mapping tool. <br />
|-<br />
| Base-calling & Analysis<br />
| [http://bbcf.epfl.ch/Software Rolexa]<br />
| Allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.<br />
|-<br />
| Integrated solutions <br />
| [http://sourceforge.net/projects/solexatools SolexaTools]<br />
| SolexaTools is a project to create a tool set to work with a Solexa genome sequencer. It includes multiple components including a LIMS system, pipeline and other tools to support end-users and researchers setting up a Solexa environment.<br />
|-<br />
| ChIPseq<br />
| [http://mendel.stanford.edu/sidowlab/downloads/quest/ QuEST]<br />
| QuEST is a Kernel Density Estimator-based package for analysis of massively parallel sequencing data from chromatin immunoprecipitations (ChIP-Seq or ChIPseq).<br />
|-<br />
| Mapping<br />
| [http://socs.biology.gatech.edu/ SOCS]<br />
| SOCS is a program designed for efficient mapping of ABI SOLiD sequence data (Short Oligonucleotides in Color Space) to a reference genome with concurrent sequence census and SNP discovery functions. <br />
|-<br />
| Alignment<br />
| [http://bowtie-bio.sourceforge.net/ BOWTIE]<br />
| Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.3 GB for the human genome. It supports alignment policies equivalent to Maq and SOAP but is much faster: about 35x faster than Maq and over 350x faster than SOAP when aligning to the human genome. <br />
|-<br />
| Analysis<br />
| [http://bio.ifom-ieo-campus.it/galaxy CARPET]<br />
| Collection of Automated Routine Programs for Easy Tiling) is a set of Perl, Python and R scripts, integrated on the Galaxy2 web-based platform, for the analysis of ChIP-chip and expression tiling data, both for standard and custom chip designs.<br />
|-<br />
| Assembly<br />
| [http://wgs-assembler.sf.net CABOG]<br />
| Celera Assembler is scientific software for DNA research. CA is a 'whole genome shotgun sequence assembler' -- it reconstructs long sequences of genomic DNA given the fragmentary data produced by whole-genome shotgun sequencing. Celera Assembler was modified for combinations of ABI 3730 and 454 FLX reads. The revised pipeline called CABOG (Celera Assembler with the Best Overlap Graph) is robust to homopolymer run length uncertainty, high read coverage, and heterogeneous read lengths ([http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=18952627&dopt=Abstract pubmed]).<br />
|-<br />
| ChIPseq / ChIP-chip<br />
| [http://www.stat.psu.edu/~yuzhang/pass.tar PASS]<br />
| "..Motivated by the Poisson clumping heuristic, we propose an accurate and efficient method for evaluating statistical significance in genome-wide ChIP-chip tiling arrays. The method works accurately for any large number of multiple comparisons, and the computational cost for evaluating p-values does not increase with the total number of tests..." [http://www.ncbi.nlm.nih.gov/pubmed/18953047?dopt=Abstract pubmed]<br />
|-<br />
| ChIPseq / ChIP-chip<br />
| [http://www.cmbi.ru.nl/~fnielsen/CATCH CATCH]<br />
| CATCH is an tool for exploring patterns in ChIP profiling data. The CATCH algorithm performs a hierachical clustering of the profile patterns with an exhaustive alignment at each step. The algorithm has a user-friendly graphical interface that makes it easy for you to browse your results.<br />
|-<br />
| Misc<br />
| [http://www.ics.uci.edu/~xhx/project/DNAzip DNAzip]<br />
| A series of techniques that in combination reduces a single genome to a size small enough to be sent as an email attachment. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=18996942&dopt=Abstract pubmed]<br />
|-<br />
| Integrated solutions<br />
| [http://www.biostat.jhsph.edu/~hji/cisgenome/ CisGenome]<br />
| An integrated tool for tiling array, ChIP-seq, genome and cis-regulatory element analysis<br />
|-<br />
| Tiling-array analysis<br />
| [http://sourceforge.net/projects/timat2 TiMat2]<br />
| TiMAT2 contains tools for low and high level genomic tiling microarray analysis using the Affymetrix, NimbleGen, and Agilent platforms. It is designed for processing single and multi chip data sets from ChIP-Chip, RNA difference, and aCGH experiments. <br />
|-<br />
| microRNA<br />
| [http://www.bio.psu.edu/people/faculty/Axtell/AxtellLab/Software.html CleaveLand]<br />
| A pipeline for using degradome data to find cleaved small RNA targets.<br />
|-<br />
| Alignment<br />
| [http://www.ebi.ac.uk/~bjp/pecan/ PECAN]<br />
| "..method of probabilistic consistency alignment and make it practical for the alignment of large genomic sequences. In so doing we develop a set of new technical methods, combined in a framework we term 'sequence progressive alignment', because it allows us to iteratively compute an alignment by passing over the input sequences from left to right. The result is that we massively decrease the memory consumption of the program relative to a naive implementation. The general engineering of the challenges faced in scaling such a computationally intensive process offer valuable lessons for planning related large-scale sequence analysis algorithms. We also further show the strong performance of Pecan using an extended analysis of ancient repeat alignments. Pecan is now one of the default alignment programs that has and is being used by a number of whole genome comparative genomic projects." [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=19056777&dopt=Abstract pubmed]<br />
|-<br />
| Assembly<br />
| [http://www.cs.sunysb.edu/~skiena/shorty/ SHORTY]<br />
| "..Our assembler SHORTY is targetted for de novo assembly of microreads with mate pair information and sequencing errors. SHORTY has some novel approach and features in addressing the short read assembly problem.." <br />
|-<br />
| Assembly<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/abyss ABySS]<br />
| "ABySS is a de novo sequence assembler that is designed for very short reads. The single-processor version is useful for assembling genomes up to 40-50 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes." <br />
|-<br />
| Alignment<br />
| [http://pass.cribi.unipd.it PASS]<br />
| "PASS performs fast gapped and ungapped alignments of short DNA sequences onto a reference DNA, typically a genomic sequence. It is designed to handle a huge amount of reads such as those generated by Solexa, SOLiD or 454 technologies. The algorithm is based on a data structure that holds in RAM the index of the genomic positions of "seed" words (typically 11-12 bases) as well as an index of the precomputed scores of short words (typically 7-8 bases) aligned against each other." [http://www.ncbi.nlm.nih.gov/pubmed/19218350?dopt=Abstract pubmed] <br />
|-<br />
| ChIPSeq<br />
| [http://liulab.dfci.harvard.edu/NPS/ NPS]<br />
| "..Our method provides an effective framework for studying nucleosome positioning and epigenetic marks in mammalian genomes..." [http://www.ncbi.nlm.nih.gov/pubmed/19014516?dopt=Abstract pubmed]<br />
|-<br />
| Assembly<br />
| [http://www.seqan.de/projects/consensus.html Consensus]<br />
| SeqCons is an open source consensus computation program for Linux and Windows. The algorithm can be used for de novo and reference-guided sequence assembly. <br />
|-<br />
| ChIPseq<br />
| [http://liulab.dfci.harvard.edu/MACS/ MACS]<br />
| Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions.<br />
|-<br />
| Assembly<br />
| [http://genome.ku.dk/resources/assembly/methods.html Scheibye-Alsing ''et al'']<br />
| A comprehensive overview of the current publicly available sequence assembly programs. [http://www.ncbi.nlm.nih.gov/pubmed/19152793?dopt=Abstract pubmed]<br />
|- <br />
| Transcript seq<br />
| [http://iant.toulouse.inra.fr/FrameDP FrameDP]<br />
| Sensitive peptide detection on noisy matured sequences. A self-training integrative pipeline for predicting CDS in transcripts which can adapt itself to different levels of sequence qualities.<br />
|-<br />
| Enrichment/Peakcalling<br />
| [http://www.gersteinlab.org/proj/PeakSeq/ PeakSeq]<br />
| PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. A methodology for identifying punctate binding sites in ChIP-Seq experiments based on their characteristics. [http://www.nature.com/nbt/journal/v27/n1/full/nbt.1518.html publication]<br />
|-<br />
| Mapping<br />
| [http://socs.biology.gatech.edu/ SOCS]<br />
| SOCS is a program designed for efficient mapping of ABI SOLiD sequence data (Short Oligonucleotides in Color Space) to a reference genome with concurrent sequence census and mismatch identification functions.<br />
|}<br />
<br />
<br />
== Further reading ==<br />
<br />
* Accurate whole human genome sequencing using reversible terminator chemistry, Nature 456, 53-59 doi:10.1038/nature07517 [http://www.nature.com/nature/journal/v456/n7218/full/nature07517.html]</div>Armandhttp://www2.unil.ch/cbg/index.php?title=UHTS&diff=609UHTS2009-04-07T16:03:56Z<p>Armand: </p>
<hr />
<div>__TOC__<br />
<br />
This page was reproduced and slightly edited from this one [http://wiki.nbic.nl/index.php/High_throughput_sequencing], written by people at the Netherlands Bioinformatics Centre [http://www.nbic.nl/]<br />
<br />
==Vendors sequencers==<br />
<br />
{| class="wikitable sortable" border="1"<br />
|-<br />
! Category<br />
! Source<br />
|-<br />
| Hardware vendor<br />
| [http://www.illumina.com/ Illumina]<br />
|-<br />
| Hardware vendor<br />
| [http://www.pacificbiosciences.com Pacific BioSciences]<br />
|-<br />
| Hardware vendor<br />
| [http://www.roche-applied-science.com/ Roche]<br />
|-<br />
| Hardware vendor<br />
| [http://www.helicosbio.com/ Helicos]<br />
|-<br />
| Hardware vendor<br />
| [http://www.appliedbiosystems.com/ Applied Biosystems]<br />
|}<br />
<br />
==Software==<br />
<br />
{| class="wikitable sortable" border="1"<br />
|-<br />
! Category<br />
! Package<br />
! Description<br />
|-<br />
| Viewer<br />
| [http://bioinformatics.bc.edu/marthlab/EagleView EagleView genome viewer]<br />
| EagleView is an information-rich genome assembler viewer with data integration capability. EagleView can display a dozen different types of information including base qualities, machine specific trace signals, and genome feature annotations.<br />
|-<br />
| Alignment<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18070356?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_Discovery_RA&linkpos=4&log$=relatedarticles&logdbfrom=pubmed MUMmerGPU]<br />
| MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.<br />
|-<br />
| Methylation<br />
| [http://www.nature.com/nbt/journal/v26/n7/abs/nbt1414.html Batman]<br />
| Bayesian tool for methylation analysis (Batman)—for analyzing methylated DNA immunoprecipitation (MeDIP) profiles<br />
|-<br />
| Base-calling<br />
| [http://hannonlab.cshl.edu/Alta-Cyclic/main.html Alta-Cyclic]<br />
| Alta-Cyclic is a novel Illumina Genome-Analyzer (Solexa) base caller. Alta Cyclic Features: Longer Reads, More Accurate Reads (compared to Solexa's default base caller), Reduces systematic bias towrsd a certain nucleotide in later cycles. On a GAII platform, Alta Cyclic was able to provide a large amount of useful reads after 78 cycles. <br />
|-<br />
| Enrichment/peak calling<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18599518?dopt=Abstract FindPeaks 3.1]<br />
| Findpeaks was developed to perform analysis of ChIP-Seq experiments. It uses a naive algorithm for identifying regions of high coverage, which represent Chromatin Immunoprecipitation enrichment of sequence fragments, indicating the location of a bound protein of interest.<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18340039?ordinalpos=2&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum ALLPATHS]<br />
| De novo assembly of whole-genome shotgun microreads.<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/17908823?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_Discovery_RA&linkpos=5&log$=relatedarticles&logdbfrom=pubmed SHARCGS]<br />
| [http://sharcgs.molgen.mpg.de/ SHARCGS] is a suitable tool for fully exploiting novel sequencing technologies by assembling sequence contigs de novo with high confidence and by outperforming existing assembly algorithms in terms of speed and accuracy. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18349386?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_Discovery_RA&linkpos=1&log$=relatedarticles&logdbfrom=pubmed Velvet]<br />
| Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.genomic.ch/edena.php EDENA]<br />
| De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Made by Hernandez D et al.<br />
|-<br />
| Assembly<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/ssake SSAKE]<br />
| The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree. SSAKE is designed to help leverage the information from short sequences reads by stringently clustering them into contigs that can be used to characterize novel sequencing targets. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.<br />
|-<br />
| Alignment<br />
| [http://www.fml.mpg.de/raetsch/projects/qpalma qpalma]<br />
| QPalma is an alignment tool targeted to align spliced reads produced by Next Generation sequencing platforms such as Illumina Solexa or 454. QPalma aligns short reads to the genomic sequences in an optimal way according to its underlying algorithm and trained parameters. It creates an alignment using dynamic programming (written in C++), and returns the alignment in a psl like format. The algorithms computes optimal local alignments, so if no alignment has been found it is because no alignment got a sufficiently high alignment score.<br />
|-<br />
| Alignment<br />
| [http://soap.genomics.org.cn SOAP]<br />
| SOAP (Short Oligonucleotide Alignment Program) is a program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Author is Ruiqiang Li at the Beijing Genomics Institute. C++ for Unix.<br />
|-<br />
| Websites<br />
| [http://g2.trac.bx.psu.edu Galaxy]<br />
| <br />
|-<br />
| Blogs<br />
| [http://seqanswers.com/ SeqAnswers]<br />
| <br />
|-<br />
| Integrated solutions<br />
| [http://www.clcbio.com/index.php?id=1240 CLCbio Genomics Workbench]<br />
| de novo and reference assembly of Sanger, 454, Solexa, Helicos, and SOLiD data. Commercial next-gen-seq software that extends the CLCbio Main Workbench software. Includes SNP detection, browser and other features. Runs on Windows, Mac OS X and Linux.<br />
|-<br />
| Integrated solutions<br />
| [http://softgenetics.com/NextGENe.html NextGENe]<br />
| de novo and reference assembly of Illumina and SOLiD data. Uses a novel Condensation Assembly Tool approach where reads are joined via "anchors" into mini-contigs before assembly. Requires Win or MacOS.<br />
|-<br />
| Integrated solutions<br />
| [http://www.dnastar.com/products/SMGA.php SeqMan Genome Analyser]<br />
| Software for Next Generation sequence assembly of Illumina, 454 Life Sciences and Sanger data integrating with Lasergene Sequence Analysis software for additional analysis and visualization capabilities. Can use a hybrid templated/de novo approach. Early release commercial software. Compatible with Windows® XP X64 and Mac OS X 10.4.<br />
|-<br />
| Alignment<br />
| [http://bioinfo.cgrb.oregonstate.edu/docs/solexa/ ELAND]<br />
| Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine.<br />
|-<br />
| Assembly<br />
| [http://euler-assembler.ucsd.edu/portal/ EULER]<br />
| Short read assembly. By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in Genome Research).<br />
|-<br />
| Alignment<br />
| [http://www.ebi.ac.uk/~guy/exonerate/ Exonerate]<br />
| Various forms of alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.<br />
|-<br />
| Alignment & Mapping<br />
| [http://www.gene.com/share/gmap/ GMAP]<br />
| GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.<br />
|-<br />
| Aligment & Assembly<br />
| [http://bioinformatics.bc.edu/marthlab/Mosaik MOSAIK]<br />
| Reference guided aligner/assembler. Written by Michael Strömberg at Boston College.<br />
|-<br />
| Alignment & Mapping<br />
| [http://sourceforge.net/projects/maq/ MAQ]<br />
| Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre.<br />
|-<br />
| Alignment<br />
| [http://mummer.sourceforge.net/ MUMmer]<br />
| MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required.<br />
|-<br />
| Alignment<br />
| [http://www.novocraft.com/index.html Novocraft]<br />
| Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X.<br />
|-<br />
| Assembly<br />
| [http://rulai.cshl.edu/rmap/ RMAP]<br />
| Assembles 20 - 64 bp Solexa reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.<br />
|-<br />
| Alignment<br />
| [http://biogibbs.stanford.edu/~jiangh/SeqMap/ SeqMap]<br />
| Works like ELand, can do 3 or more bp mismatches and also INDELs. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS's.<br />
|-<br />
| Assembly<br />
| [http://compbio.cs.toronto.edu/shrimp/ SHRiMP]<br />
| Assembles to a reference sequence. Developed with Applied Biosystem's colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. Works with data in letterspace (Roche, Illumina), colourspace (AB) and Helicos space.<br />
|-<br />
| Alignment<br />
| [http://www.sanger.ac.uk/Software/analysis/SSAHA/ SSAHA]<br />
| SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha.<br />
|-<br />
| Alignment<br />
| [http://synasite.mgrc.com.my:8080/sxog/NewSXOligoSearch.php SXOligoSearch]<br />
| SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent.<br />
|-<br />
| Assembly (de novo)<br />
| [http://chevreux.org/projects_mira.html MIRA2]<br />
| MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.<br />
|-<br />
| Assembly (de novo)<br />
| [https://sourceforge.net/projects/vcake VCAKE]<br />
| De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.<br />
|-<br />
| SNP/Indel Discovery<br />
| [http://www.sanger.ac.uk/Software/analysis/ssahaSNP/ ssahaSNP]<br />
| ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence. Highly repetitive elements are filtered out by ignoring those kmer words with high occurrence numbers. More tuned for ABI Sanger reads. Developers are Adam Spargo and Zemin Ning from the Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and Mac<br />
|-<br />
| SNP/Indel Discovery<br />
| [http://bioinformatics.bc.edu/marthlab/PbShort PolyBayesShort]<br />
| A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth at Washington University. This version is specifically optimized for the analysis of large numbers (millions) of high-throughput next-generation sequencer reads, aligned to whole chromosomes of model organism or mammalian genomes. Developers at Boston College. Linux-64 and Linux-32.<br />
|-<br />
| SNP/Indel Discovery<br />
| [http://bioinformatics.bc.edu/marthlab/PyroBayes PyroBayes]<br />
| PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. It was designed to assign more accurate base quality estimates to the 454 pyrosequences. Developers at Boston College.<br />
|-<br />
| Integrated solutions<br />
| [http://staden.sourceforge.net/ STADEN]<br />
| Includes GAP4. GAP5 once completed will handle next-gen sequencing data. A partially implemented test version is available [https://sourceforge.net/project/show...kage_id=256957 here]<br />
|-<br />
| Viewer<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/xmatchview XMatchView]<br />
| A visual tool for analyzing cross_match alignments. Developed by Rene Warren and Steven Jones at Canada's Michael Smith Genome Sciences Centre. Python/Win or Linux.<br />
|-<br />
| Integrated solutions<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/sam SAM]<br />
| Sequence Assembly Manager. Whole Genome Assembly (WGA) Management and Visualization Tool. It provides a generic platform for manipulating, analyzing and viewing WGA data, regardless of input type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui and Steven Jones at Canada's Michael Smith Genome Sciences Centre. MySQL backend and Perl-CGI web-based frontend/Linux.<br />
|-<br />
| Enrichment/peak calling<br />
| [http://woldlab.caltech.edu/chipseq/ CHiPSeq]<br />
| From Science Johnson, 2007<br />
|-<br />
| RNAseq<br />
| [http://woldlab.caltech.edu/rnaseq/ ERANGE]<br />
| ERANGE is a Python package for doing RNA-seq and ChIP-seq (hence the "dual-use"), and is a descendant of the ChIPSeq mini peak finder (Johnson, 2007). In particular, the RNAseq analysis uses some of the very same code to access [http://cistematic.caltech.edu/index.html Cistematic]. Version 2.0 is the first released in the wild and is "Bed"-centric. In particular, it is not optimized for speed!<br />
|-<br />
| Methylation<br />
| [http://epigenomics.mcdb.ucla.edu/BS-Seq/download.html BS-Seq]<br />
| The source code and data for the "Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation Patterning" Nature paper by Cokus et al. (Steve Jacobsen's lab at UCLA). POSIX.<br />
|-<br />
| Mapping<br />
| [http://dna.cs.byu.edu/gnumap/ gnumap]<br />
| he Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. Currently, gnumap is designed to be used with the _int.txt data received from the Solexa/Illumina machine. <br />
|-<br />
| Mapping<br />
| [http://www.bioinformaticssolutions.com/products/zoom/index.php ZOOM]<br />
| ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. <br />
|-<br />
| Assembly & Chromosome walking<br />
| [http://www.plantgdb.org/tool/tracembler/ Tracembler]<br />
| Tracembler streamlines the process of recursive database searches, sequence assembly, and gene identification in resulting contigs in attempts to identify homologous loci of genes of interest in species with emerging whole genome shotgun reads. A web server hosting Tracembler is provided at http://www.plantgdb.org/tool/tracembler/, and the software is also freely available from the authors for local installations.<br />
|-<br />
| Enrichment/peak calling<br />
| [http://dir.nhlbi.nih.gov/papers/lmi/epigenomes/sissrs/ sissrs]<br />
| Produce a list of peakmaxima from aligned positions. <br />
|-<br />
| Assembly<br />
| [http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=17534434 SHRAP]<br />
| The source code will be made available individually upon request. "However, note that we do not have a tool that can be used on real 454 sequence data in a production setting."<br />
|-<br />
| Alignment<br />
| [http://www.phrap.org/phredphrapconsed.html Phred Phrap Consed Cross_match]<br />
| The phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base. Phrap is a program for assembling shotgun DNA sequence data. Cross_match is a general purpose utility for comparing any two DNA sequence sets using a 'banded' version of swat. Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap.<br />
|-<br />
| Chromatine Profiling<br />
| [http://bioinformatics-renlab.ucsd.edu/rentrac/wiki/ChromaSig ChromaSig]<br />
| An unsupervised learning method, which finds, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data.<br />
|-<br />
| Integrated solutions<br />
| [http://1001genomes.org/downloads/ Shore]<br />
| Analysis suite for Illumina short read data. <br />
|-<br />
| Mapping<br />
| [http://1001genomes.org/downloads/ GenomeMapper]<br />
| Short read mapping tool. <br />
|-<br />
| Base-calling & Analysis<br />
| [http://bbcf.epfl.ch/Software Rolexa]<br />
| Allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.<br />
|-<br />
| Integrated solutions <br />
| [http://sourceforge.net/projects/solexatools SolexaTools]<br />
| SolexaTools is a project to create a tool set to work with a Solexa genome sequencer. It includes multiple components including a LIMS system, pipeline and other tools to support end-users and researchers setting up a Solexa environment.<br />
|-<br />
| ChIPseq<br />
| [http://mendel.stanford.edu/sidowlab/downloads/quest/ QuEST]<br />
| QuEST is a Kernel Density Estimator-based package for analysis of massively parallel sequencing data from chromatin immunoprecipitations (ChIP-Seq or ChIPseq).<br />
|-<br />
| Mapping<br />
| [http://socs.biology.gatech.edu/ SOCS]<br />
| SOCS is a program designed for efficient mapping of ABI SOLiD sequence data (Short Oligonucleotides in Color Space) to a reference genome with concurrent sequence census and SNP discovery functions. <br />
|-<br />
| Alignment<br />
| [http://bowtie-bio.sourceforge.net/ BOWTIE]<br />
| Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.3 GB for the human genome. It supports alignment policies equivalent to Maq and SOAP but is much faster: about 35x faster than Maq and over 350x faster than SOAP when aligning to the human genome. <br />
|-<br />
| Analysis<br />
| [http://bio.ifom-ieo-campus.it/galaxy CARPET]<br />
| Collection of Automated Routine Programs for Easy Tiling) is a set of Perl, Python and R scripts, integrated on the Galaxy2 web-based platform, for the analysis of ChIP-chip and expression tiling data, both for standard and custom chip designs.<br />
|-<br />
| Assembly<br />
| [http://wgs-assembler.sf.net CABOG]<br />
| Celera Assembler is scientific software for DNA research. CA is a 'whole genome shotgun sequence assembler' -- it reconstructs long sequences of genomic DNA given the fragmentary data produced by whole-genome shotgun sequencing. Celera Assembler was modified for combinations of ABI 3730 and 454 FLX reads. The revised pipeline called CABOG (Celera Assembler with the Best Overlap Graph) is robust to homopolymer run length uncertainty, high read coverage, and heterogeneous read lengths ([http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=18952627&dopt=Abstract pubmed]).<br />
|-<br />
| ChIPseq / ChIP-chip<br />
| [http://www.stat.psu.edu/~yuzhang/pass.tar PASS]<br />
| "..Motivated by the Poisson clumping heuristic, we propose an accurate and efficient method for evaluating statistical significance in genome-wide ChIP-chip tiling arrays. The method works accurately for any large number of multiple comparisons, and the computational cost for evaluating p-values does not increase with the total number of tests..." [http://www.ncbi.nlm.nih.gov/pubmed/18953047?dopt=Abstract pubmed]<br />
|-<br />
| ChIPseq / ChIP-chip<br />
| [http://www.cmbi.ru.nl/~fnielsen/CATCH CATCH]<br />
| CATCH is an tool for exploring patterns in ChIP profiling data. The CATCH algorithm performs a hierachical clustering of the profile patterns with an exhaustive alignment at each step. The algorithm has a user-friendly graphical interface that makes it easy for you to browse your results.<br />
|-<br />
| Misc<br />
| [http://www.ics.uci.edu/~xhx/project/DNAzip DNAzip]<br />
| A series of techniques that in combination reduces a single genome to a size small enough to be sent as an email attachment. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=18996942&dopt=Abstract pubmed]<br />
|-<br />
| Integrated solutions<br />
| [http://www.biostat.jhsph.edu/~hji/cisgenome/ CisGenome]<br />
| An integrated tool for tiling array, ChIP-seq, genome and cis-regulatory element analysis<br />
|-<br />
| Tiling-array analysis<br />
| [http://sourceforge.net/projects/timat2 TiMat2]<br />
| TiMAT2 contains tools for low and high level genomic tiling microarray analysis using the Affymetrix, NimbleGen, and Agilent platforms. It is designed for processing single and multi chip data sets from ChIP-Chip, RNA difference, and aCGH experiments. <br />
|-<br />
| microRNA<br />
| [http://www.bio.psu.edu/people/faculty/Axtell/AxtellLab/Software.html CleaveLand]<br />
| A pipeline for using degradome data to find cleaved small RNA targets.<br />
|-<br />
| Alignment<br />
| [http://www.ebi.ac.uk/~bjp/pecan/ PECAN]<br />
| "..method of probabilistic consistency alignment and make it practical for the alignment of large genomic sequences. In so doing we develop a set of new technical methods, combined in a framework we term 'sequence progressive alignment', because it allows us to iteratively compute an alignment by passing over the input sequences from left to right. The result is that we massively decrease the memory consumption of the program relative to a naive implementation. The general engineering of the challenges faced in scaling such a computationally intensive process offer valuable lessons for planning related large-scale sequence analysis algorithms. We also further show the strong performance of Pecan using an extended analysis of ancient repeat alignments. Pecan is now one of the default alignment programs that has and is being used by a number of whole genome comparative genomic projects." [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=19056777&dopt=Abstract pubmed]<br />
|-<br />
| Assembly<br />
| [http://www.cs.sunysb.edu/~skiena/shorty/ SHORTY]<br />
| "..Our assembler SHORTY is targetted for de novo assembly of microreads with mate pair information and sequencing errors. SHORTY has some novel approach and features in addressing the short read assembly problem.." <br />
|-<br />
| Assembly<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/abyss ABySS]<br />
| "ABySS is a de novo sequence assembler that is designed for very short reads. The single-processor version is useful for assembling genomes up to 40-50 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes." <br />
|-<br />
| Alignment<br />
| [http://pass.cribi.unipd.it PASS]<br />
| "PASS performs fast gapped and ungapped alignments of short DNA sequences onto a reference DNA, typically a genomic sequence. It is designed to handle a huge amount of reads such as those generated by Solexa, SOLiD or 454 technologies. The algorithm is based on a data structure that holds in RAM the index of the genomic positions of "seed" words (typically 11-12 bases) as well as an index of the precomputed scores of short words (typically 7-8 bases) aligned against each other." [http://www.ncbi.nlm.nih.gov/pubmed/19218350?dopt=Abstract pubmed] <br />
|-<br />
| ChIPSeq<br />
| [http://liulab.dfci.harvard.edu/NPS/ NPS]<br />
| "..Our method provides an effective framework for studying nucleosome positioning and epigenetic marks in mammalian genomes..." [http://www.ncbi.nlm.nih.gov/pubmed/19014516?dopt=Abstract pubmed]<br />
|-<br />
| Assembly<br />
| [http://www.seqan.de/projects/consensus.html Consensus]<br />
| SeqCons is an open source consensus computation program for Linux and Windows. The algorithm can be used for de novo and reference-guided sequence assembly. <br />
|-<br />
| ChIPseq<br />
| [http://liulab.dfci.harvard.edu/MACS/ MACS]<br />
| Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions.<br />
|-<br />
| Assembly<br />
| [http://genome.ku.dk/resources/assembly/methods.html Scheibye-Alsing ''et al'']<br />
| A comprehensive overview of the current publicly available sequence assembly programs. [http://www.ncbi.nlm.nih.gov/pubmed/19152793?dopt=Abstract pubmed]<br />
|- <br />
| Transcript seq<br />
| [http://iant.toulouse.inra.fr/FrameDP FrameDP]<br />
| Sensitive peptide detection on noisy matured sequences. A self-training integrative pipeline for predicting CDS in transcripts which can adapt itself to different levels of sequence qualities.<br />
|-<br />
| Enrichment/Peakcalling<br />
| [http://www.gersteinlab.org/proj/PeakSeq/ PeakSeq]<br />
| PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. A methodology for identifying punctate binding sites in ChIP-Seq experiments based on their characteristics. [http://www.nature.com/nbt/journal/v27/n1/full/nbt.1518.html publication]<br />
|-<br />
| Mapping<br />
| [http://socs.biology.gatech.edu/ SOCS]<br />
| SOCS is a program designed for efficient mapping of ABI SOLiD sequence data (Short Oligonucleotides in Color Space) to a reference genome with concurrent sequence census and mismatch identification functions.<br />
|}</div>Armandhttp://www2.unil.ch/cbg/index.php?title=UHTS&diff=608UHTS2009-04-07T16:02:50Z<p>Armand: </p>
<hr />
<div>__TOC__<br />
<br />
This page was reproduced from this one [http://wiki.nbic.nl/index.php/High_throughput_sequencing], written by people at the Netherlands Bioinformatics Centre [http://www.nbic.nl/]<br />
<br />
==Vendors sequencers==<br />
<br />
{| class="wikitable sortable" border="1"<br />
|-<br />
! Category<br />
! Source<br />
! Description<br />
! Performance experience<br />
|-<br />
| Hardware vendor<br />
| [http://www.illumina.com/ Illumina]<br />
| <br />
| None<br />
|-<br />
| Hardware vendor<br />
| [http://www.pacificbiosciences.com Pacific BioSciences]<br />
|<br />
| None<br />
|-<br />
| Hardware vendor<br />
| [http://www.roche-applied-science.com/ Roche]<br />
|<br />
| None<br />
|-<br />
| Hardware vendor<br />
| [http://www.helicosbio.com/ Helicos]<br />
| <br />
| None<br />
|-<br />
| Hardware vendor<br />
| [http://www.appliedbiosystems.com/ Applied Biosystems]<br />
| <br />
| None<br />
|}<br />
<br />
==Software==<br />
<br />
{| class="wikitable sortable" border="1"<br />
|-<br />
! Category<br />
! Package<br />
! Description<br />
|-<br />
| Viewer<br />
| [http://bioinformatics.bc.edu/marthlab/EagleView EagleView genome viewer]<br />
| EagleView is an information-rich genome assembler viewer with data integration capability. EagleView can display a dozen different types of information including base qualities, machine specific trace signals, and genome feature annotations.<br />
|-<br />
| Alignment<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18070356?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_Discovery_RA&linkpos=4&log$=relatedarticles&logdbfrom=pubmed MUMmerGPU]<br />
| MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.<br />
|-<br />
| Methylation<br />
| [http://www.nature.com/nbt/journal/v26/n7/abs/nbt1414.html Batman]<br />
| Bayesian tool for methylation analysis (Batman)—for analyzing methylated DNA immunoprecipitation (MeDIP) profiles<br />
|-<br />
| Base-calling<br />
| [http://hannonlab.cshl.edu/Alta-Cyclic/main.html Alta-Cyclic]<br />
| Alta-Cyclic is a novel Illumina Genome-Analyzer (Solexa) base caller. Alta Cyclic Features: Longer Reads, More Accurate Reads (compared to Solexa's default base caller), Reduces systematic bias towrsd a certain nucleotide in later cycles. On a GAII platform, Alta Cyclic was able to provide a large amount of useful reads after 78 cycles. <br />
|-<br />
| Enrichment/peak calling<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18599518?dopt=Abstract FindPeaks 3.1]<br />
| Findpeaks was developed to perform analysis of ChIP-Seq experiments. It uses a naive algorithm for identifying regions of high coverage, which represent Chromatin Immunoprecipitation enrichment of sequence fragments, indicating the location of a bound protein of interest.<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18340039?ordinalpos=2&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum ALLPATHS]<br />
| De novo assembly of whole-genome shotgun microreads.<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/17908823?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_Discovery_RA&linkpos=5&log$=relatedarticles&logdbfrom=pubmed SHARCGS]<br />
| [http://sharcgs.molgen.mpg.de/ SHARCGS] is a suitable tool for fully exploiting novel sequencing technologies by assembling sequence contigs de novo with high confidence and by outperforming existing assembly algorithms in terms of speed and accuracy. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.ncbi.nlm.nih.gov/pubmed/18349386?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_Discovery_RA&linkpos=1&log$=relatedarticles&logdbfrom=pubmed Velvet]<br />
| Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).<br />
|-<br />
| Assembly (de novo)<br />
| [http://www.genomic.ch/edena.php EDENA]<br />
| De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Made by Hernandez D et al.<br />
|-<br />
| Assembly<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/ssake SSAKE]<br />
| The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree. SSAKE is designed to help leverage the information from short sequences reads by stringently clustering them into contigs that can be used to characterize novel sequencing targets. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.<br />
|-<br />
| Alignment<br />
| [http://www.fml.mpg.de/raetsch/projects/qpalma qpalma]<br />
| QPalma is an alignment tool targeted to align spliced reads produced by Next Generation sequencing platforms such as Illumina Solexa or 454. QPalma aligns short reads to the genomic sequences in an optimal way according to its underlying algorithm and trained parameters. It creates an alignment using dynamic programming (written in C++), and returns the alignment in a psl like format. The algorithms computes optimal local alignments, so if no alignment has been found it is because no alignment got a sufficiently high alignment score.<br />
|-<br />
| Alignment<br />
| [http://soap.genomics.org.cn SOAP]<br />
| SOAP (Short Oligonucleotide Alignment Program) is a program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Author is Ruiqiang Li at the Beijing Genomics Institute. C++ for Unix.<br />
|-<br />
| Websites<br />
| [http://g2.trac.bx.psu.edu Galaxy]<br />
| <br />
|-<br />
| Blogs<br />
| [http://seqanswers.com/ SeqAnswers]<br />
| <br />
|-<br />
| Integrated solutions<br />
| [http://www.clcbio.com/index.php?id=1240 CLCbio Genomics Workbench]<br />
| de novo and reference assembly of Sanger, 454, Solexa, Helicos, and SOLiD data. Commercial next-gen-seq software that extends the CLCbio Main Workbench software. Includes SNP detection, browser and other features. Runs on Windows, Mac OS X and Linux.<br />
|-<br />
| Integrated solutions<br />
| [http://softgenetics.com/NextGENe.html NextGENe]<br />
| de novo and reference assembly of Illumina and SOLiD data. Uses a novel Condensation Assembly Tool approach where reads are joined via "anchors" into mini-contigs before assembly. Requires Win or MacOS.<br />
|-<br />
| Integrated solutions<br />
| [http://www.dnastar.com/products/SMGA.php SeqMan Genome Analyser]<br />
| Software for Next Generation sequence assembly of Illumina, 454 Life Sciences and Sanger data integrating with Lasergene Sequence Analysis software for additional analysis and visualization capabilities. Can use a hybrid templated/de novo approach. Early release commercial software. Compatible with Windows® XP X64 and Mac OS X 10.4.<br />
|-<br />
| Alignment<br />
| [http://bioinfo.cgrb.oregonstate.edu/docs/solexa/ ELAND]<br />
| Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine.<br />
|-<br />
| Assembly<br />
| [http://euler-assembler.ucsd.edu/portal/ EULER]<br />
| Short read assembly. By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in Genome Research).<br />
|-<br />
| Alignment<br />
| [http://www.ebi.ac.uk/~guy/exonerate/ Exonerate]<br />
| Various forms of alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.<br />
|-<br />
| Alignment & Mapping<br />
| [http://www.gene.com/share/gmap/ GMAP]<br />
| GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.<br />
|-<br />
| Aligment & Assembly<br />
| [http://bioinformatics.bc.edu/marthlab/Mosaik MOSAIK]<br />
| Reference guided aligner/assembler. Written by Michael Strömberg at Boston College.<br />
|-<br />
| Alignment & Mapping<br />
| [http://sourceforge.net/projects/maq/ MAQ]<br />
| Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre.<br />
|-<br />
| Alignment<br />
| [http://mummer.sourceforge.net/ MUMmer]<br />
| MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required.<br />
|-<br />
| Alignment<br />
| [http://www.novocraft.com/index.html Novocraft]<br />
| Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X.<br />
|-<br />
| Assembly<br />
| [http://rulai.cshl.edu/rmap/ RMAP]<br />
| Assembles 20 - 64 bp Solexa reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.<br />
|-<br />
| Alignment<br />
| [http://biogibbs.stanford.edu/~jiangh/SeqMap/ SeqMap]<br />
| Works like ELand, can do 3 or more bp mismatches and also INDELs. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS's.<br />
|-<br />
| Assembly<br />
| [http://compbio.cs.toronto.edu/shrimp/ SHRiMP]<br />
| Assembles to a reference sequence. Developed with Applied Biosystem's colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. Works with data in letterspace (Roche, Illumina), colourspace (AB) and Helicos space.<br />
|-<br />
| Alignment<br />
| [http://www.sanger.ac.uk/Software/analysis/SSAHA/ SSAHA]<br />
| SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha.<br />
|-<br />
| Alignment<br />
| [http://synasite.mgrc.com.my:8080/sxog/NewSXOligoSearch.php SXOligoSearch]<br />
| SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent.<br />
|-<br />
| Assembly (de novo)<br />
| [http://chevreux.org/projects_mira.html MIRA2]<br />
| MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.<br />
|-<br />
| Assembly (de novo)<br />
| [https://sourceforge.net/projects/vcake VCAKE]<br />
| De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.<br />
|-<br />
| SNP/Indel Discovery<br />
| [http://www.sanger.ac.uk/Software/analysis/ssahaSNP/ ssahaSNP]<br />
| ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence. Highly repetitive elements are filtered out by ignoring those kmer words with high occurrence numbers. More tuned for ABI Sanger reads. Developers are Adam Spargo and Zemin Ning from the Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and Mac<br />
|-<br />
| SNP/Indel Discovery<br />
| [http://bioinformatics.bc.edu/marthlab/PbShort PolyBayesShort]<br />
| A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth at Washington University. This version is specifically optimized for the analysis of large numbers (millions) of high-throughput next-generation sequencer reads, aligned to whole chromosomes of model organism or mammalian genomes. Developers at Boston College. Linux-64 and Linux-32.<br />
|-<br />
| SNP/Indel Discovery<br />
| [http://bioinformatics.bc.edu/marthlab/PyroBayes PyroBayes]<br />
| PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. It was designed to assign more accurate base quality estimates to the 454 pyrosequences. Developers at Boston College.<br />
|-<br />
| Integrated solutions<br />
| [http://staden.sourceforge.net/ STADEN]<br />
| Includes GAP4. GAP5 once completed will handle next-gen sequencing data. A partially implemented test version is available [https://sourceforge.net/project/show...kage_id=256957 here]<br />
|-<br />
| Viewer<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/xmatchview XMatchView]<br />
| A visual tool for analyzing cross_match alignments. Developed by Rene Warren and Steven Jones at Canada's Michael Smith Genome Sciences Centre. Python/Win or Linux.<br />
|-<br />
| Integrated solutions<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/sam SAM]<br />
| Sequence Assembly Manager. Whole Genome Assembly (WGA) Management and Visualization Tool. It provides a generic platform for manipulating, analyzing and viewing WGA data, regardless of input type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui and Steven Jones at Canada's Michael Smith Genome Sciences Centre. MySQL backend and Perl-CGI web-based frontend/Linux.<br />
|-<br />
| Enrichment/peak calling<br />
| [http://woldlab.caltech.edu/chipseq/ CHiPSeq]<br />
| From Science Johnson, 2007<br />
|-<br />
| RNAseq<br />
| [http://woldlab.caltech.edu/rnaseq/ ERANGE]<br />
| ERANGE is a Python package for doing RNA-seq and ChIP-seq (hence the "dual-use"), and is a descendant of the ChIPSeq mini peak finder (Johnson, 2007). In particular, the RNAseq analysis uses some of the very same code to access [http://cistematic.caltech.edu/index.html Cistematic]. Version 2.0 is the first released in the wild and is "Bed"-centric. In particular, it is not optimized for speed!<br />
|-<br />
| Methylation<br />
| [http://epigenomics.mcdb.ucla.edu/BS-Seq/download.html BS-Seq]<br />
| The source code and data for the "Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation Patterning" Nature paper by Cokus et al. (Steve Jacobsen's lab at UCLA). POSIX.<br />
|-<br />
| Mapping<br />
| [http://dna.cs.byu.edu/gnumap/ gnumap]<br />
| he Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. Currently, gnumap is designed to be used with the _int.txt data received from the Solexa/Illumina machine. <br />
|-<br />
| Mapping<br />
| [http://www.bioinformaticssolutions.com/products/zoom/index.php ZOOM]<br />
| ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. <br />
|-<br />
| Assembly & Chromosome walking<br />
| [http://www.plantgdb.org/tool/tracembler/ Tracembler]<br />
| Tracembler streamlines the process of recursive database searches, sequence assembly, and gene identification in resulting contigs in attempts to identify homologous loci of genes of interest in species with emerging whole genome shotgun reads. A web server hosting Tracembler is provided at http://www.plantgdb.org/tool/tracembler/, and the software is also freely available from the authors for local installations.<br />
|-<br />
| Enrichment/peak calling<br />
| [http://dir.nhlbi.nih.gov/papers/lmi/epigenomes/sissrs/ sissrs]<br />
| Produce a list of peakmaxima from aligned positions. <br />
|-<br />
| Assembly<br />
| [http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=17534434 SHRAP]<br />
| The source code will be made available individually upon request. "However, note that we do not have a tool that can be used on real 454 sequence data in a production setting."<br />
|-<br />
| Alignment<br />
| [http://www.phrap.org/phredphrapconsed.html Phred Phrap Consed Cross_match]<br />
| The phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base. Phrap is a program for assembling shotgun DNA sequence data. Cross_match is a general purpose utility for comparing any two DNA sequence sets using a 'banded' version of swat. Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap.<br />
|-<br />
| Chromatine Profiling<br />
| [http://bioinformatics-renlab.ucsd.edu/rentrac/wiki/ChromaSig ChromaSig]<br />
| An unsupervised learning method, which finds, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data.<br />
|-<br />
| Integrated solutions<br />
| [http://1001genomes.org/downloads/ Shore]<br />
| Analysis suite for Illumina short read data. <br />
|-<br />
| Mapping<br />
| [http://1001genomes.org/downloads/ GenomeMapper]<br />
| Short read mapping tool. <br />
|-<br />
| Base-calling & Analysis<br />
| [http://bbcf.epfl.ch/Software Rolexa]<br />
| Allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.<br />
|-<br />
| Integrated solutions <br />
| [http://sourceforge.net/projects/solexatools SolexaTools]<br />
| SolexaTools is a project to create a tool set to work with a Solexa genome sequencer. It includes multiple components including a LIMS system, pipeline and other tools to support end-users and researchers setting up a Solexa environment.<br />
|-<br />
| ChIPseq<br />
| [http://mendel.stanford.edu/sidowlab/downloads/quest/ QuEST]<br />
| QuEST is a Kernel Density Estimator-based package for analysis of massively parallel sequencing data from chromatin immunoprecipitations (ChIP-Seq or ChIPseq).<br />
|-<br />
| Mapping<br />
| [http://socs.biology.gatech.edu/ SOCS]<br />
| SOCS is a program designed for efficient mapping of ABI SOLiD sequence data (Short Oligonucleotides in Color Space) to a reference genome with concurrent sequence census and SNP discovery functions. <br />
|-<br />
| Alignment<br />
| [http://bowtie-bio.sourceforge.net/ BOWTIE]<br />
| Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.3 GB for the human genome. It supports alignment policies equivalent to Maq and SOAP but is much faster: about 35x faster than Maq and over 350x faster than SOAP when aligning to the human genome. <br />
|-<br />
| Analysis<br />
| [http://bio.ifom-ieo-campus.it/galaxy CARPET]<br />
| Collection of Automated Routine Programs for Easy Tiling) is a set of Perl, Python and R scripts, integrated on the Galaxy2 web-based platform, for the analysis of ChIP-chip and expression tiling data, both for standard and custom chip designs.<br />
|-<br />
| Assembly<br />
| [http://wgs-assembler.sf.net CABOG]<br />
| Celera Assembler is scientific software for DNA research. CA is a 'whole genome shotgun sequence assembler' -- it reconstructs long sequences of genomic DNA given the fragmentary data produced by whole-genome shotgun sequencing. Celera Assembler was modified for combinations of ABI 3730 and 454 FLX reads. The revised pipeline called CABOG (Celera Assembler with the Best Overlap Graph) is robust to homopolymer run length uncertainty, high read coverage, and heterogeneous read lengths ([http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=18952627&dopt=Abstract pubmed]).<br />
|-<br />
| ChIPseq / ChIP-chip<br />
| [http://www.stat.psu.edu/~yuzhang/pass.tar PASS]<br />
| "..Motivated by the Poisson clumping heuristic, we propose an accurate and efficient method for evaluating statistical significance in genome-wide ChIP-chip tiling arrays. The method works accurately for any large number of multiple comparisons, and the computational cost for evaluating p-values does not increase with the total number of tests..." [http://www.ncbi.nlm.nih.gov/pubmed/18953047?dopt=Abstract pubmed]<br />
|-<br />
| ChIPseq / ChIP-chip<br />
| [http://www.cmbi.ru.nl/~fnielsen/CATCH CATCH]<br />
| CATCH is an tool for exploring patterns in ChIP profiling data. The CATCH algorithm performs a hierachical clustering of the profile patterns with an exhaustive alignment at each step. The algorithm has a user-friendly graphical interface that makes it easy for you to browse your results.<br />
|-<br />
| Misc<br />
| [http://www.ics.uci.edu/~xhx/project/DNAzip DNAzip]<br />
| A series of techniques that in combination reduces a single genome to a size small enough to be sent as an email attachment. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=18996942&dopt=Abstract pubmed]<br />
|-<br />
| Integrated solutions<br />
| [http://www.biostat.jhsph.edu/~hji/cisgenome/ CisGenome]<br />
| An integrated tool for tiling array, ChIP-seq, genome and cis-regulatory element analysis<br />
|-<br />
| Tiling-array analysis<br />
| [http://sourceforge.net/projects/timat2 TiMat2]<br />
| TiMAT2 contains tools for low and high level genomic tiling microarray analysis using the Affymetrix, NimbleGen, and Agilent platforms. It is designed for processing single and multi chip data sets from ChIP-Chip, RNA difference, and aCGH experiments. <br />
|-<br />
| microRNA<br />
| [http://www.bio.psu.edu/people/faculty/Axtell/AxtellLab/Software.html CleaveLand]<br />
| A pipeline for using degradome data to find cleaved small RNA targets.<br />
|-<br />
| Alignment<br />
| [http://www.ebi.ac.uk/~bjp/pecan/ PECAN]<br />
| "..method of probabilistic consistency alignment and make it practical for the alignment of large genomic sequences. In so doing we develop a set of new technical methods, combined in a framework we term 'sequence progressive alignment', because it allows us to iteratively compute an alignment by passing over the input sequences from left to right. The result is that we massively decrease the memory consumption of the program relative to a naive implementation. The general engineering of the challenges faced in scaling such a computationally intensive process offer valuable lessons for planning related large-scale sequence analysis algorithms. We also further show the strong performance of Pecan using an extended analysis of ancient repeat alignments. Pecan is now one of the default alignment programs that has and is being used by a number of whole genome comparative genomic projects." [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=19056777&dopt=Abstract pubmed]<br />
|-<br />
| Assembly<br />
| [http://www.cs.sunysb.edu/~skiena/shorty/ SHORTY]<br />
| "..Our assembler SHORTY is targetted for de novo assembly of microreads with mate pair information and sequencing errors. SHORTY has some novel approach and features in addressing the short read assembly problem.." <br />
|-<br />
| Assembly<br />
| [http://www.bcgsc.ca/platform/bioinfo/software/abyss ABySS]<br />
| "ABySS is a de novo sequence assembler that is designed for very short reads. The single-processor version is useful for assembling genomes up to 40-50 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes." <br />
|-<br />
| Alignment<br />
| [http://pass.cribi.unipd.it PASS]<br />
| "PASS performs fast gapped and ungapped alignments of short DNA sequences onto a reference DNA, typically a genomic sequence. It is designed to handle a huge amount of reads such as those generated by Solexa, SOLiD or 454 technologies. The algorithm is based on a data structure that holds in RAM the index of the genomic positions of "seed" words (typically 11-12 bases) as well as an index of the precomputed scores of short words (typically 7-8 bases) aligned against each other." [http://www.ncbi.nlm.nih.gov/pubmed/19218350?dopt=Abstract pubmed] <br />
|-<br />
| ChIPSeq<br />
| [http://liulab.dfci.harvard.edu/NPS/ NPS]<br />
| "..Our method provides an effective framework for studying nucleosome positioning and epigenetic marks in mammalian genomes..." [http://www.ncbi.nlm.nih.gov/pubmed/19014516?dopt=Abstract pubmed]<br />
|-<br />
| Assembly<br />
| [http://www.seqan.de/projects/consensus.html Consensus]<br />
| SeqCons is an open source consensus computation program for Linux and Windows. The algorithm can be used for de novo and reference-guided sequence assembly. <br />
|-<br />
| ChIPseq<br />
| [http://liulab.dfci.harvard.edu/MACS/ MACS]<br />
| Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions.<br />
|-<br />
| Assembly<br />
| [http://genome.ku.dk/resources/assembly/methods.html Scheibye-Alsing ''et al'']<br />
| A comprehensive overview of the current publicly available sequence assembly programs. [http://www.ncbi.nlm.nih.gov/pubmed/19152793?dopt=Abstract pubmed]<br />
|- <br />
| Transcript seq<br />
| [http://iant.toulouse.inra.fr/FrameDP FrameDP]<br />
| Sensitive peptide detection on noisy matured sequences. A self-training integrative pipeline for predicting CDS in transcripts which can adapt itself to different levels of sequence qualities.<br />
|-<br />
| Enrichment/Peakcalling<br />
| [http://www.gersteinlab.org/proj/PeakSeq/ PeakSeq]<br />
| PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. A methodology for identifying punctate binding sites in ChIP-Seq experiments based on their characteristics. [http://www.nature.com/nbt/journal/v27/n1/full/nbt.1518.html publication]<br />
|-<br />
| Mapping<br />
| [http://socs.biology.gatech.edu/ SOCS]<br />
| SOCS is a program designed for efficient mapping of ABI SOLiD sequence data (Short Oligonucleotides in Color Space) to a reference genome with concurrent sequence census and mismatch identification functions.<br />
|}</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Journal_Club_(spring_2012)&diff=607Journal Club (spring 2012)2009-04-07T10:05:56Z<p>Armand: </p>
<hr />
<div>Journal Club is every Thursday, from 1-2pm, in the small meeting room. Feel free to bring your lunch. We also have a [[Group Meeting]].<br />
<br />
Ideally, someone from the group should volunteer to choose a paper for each meeting, and should update this page and email the paper around on the '''Friday the week before the meeting'''. If a volunteer is not forthcoming, [[user:Toby|Toby]] will encourage someone to volunteer. <br />
<br />
== 5th February 2009 ==<br />
<br />
Toby will present:<br />
A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and<br />
Implications for Functional Genomics, by Juliane Sch&auml;fer and Korbinian Strimmer (2005)<br />
''Statistical Applications in Genetics and Molecular Biology''<br />
'''4''':1 Article 32.<br />
[http://dx.doi.org/10.2202/1544-6115.1175 doi:10.2202/1544-6115.1175]<br />
[http://www.bepress.com/sagmb/vol4/iss1/art32 link to paper]<br />
<br />
== 12th February 2009 ==<br />
<br />
G&aacute;bor will present:<br />
<biblio><br />
#wagner pmid=16087882<br />
</biblio><br />
<br />
<br />
[exceptionally at 9:30-10:30]<br />
<br />
== 19th February 2009 ==<br />
<br />
Zolt&aacute;n will present: <br />
The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing, by John D. Storey<br />
(2007) ''J. R. Statist. Soc. B'' '''69''':3 pp.347-368.<br />
[http://dx.doi.org/10.1111/j.1467-9868.2007.005592.x doi:10.1111/j.1467-9868.2007.005592.x]<br />
[http://www3.interscience.wiley.com/journal/118490765/abstract link to paper]<br />
<br />
== 26th February 2009 ==<br />
Bastian will present : A "Silent" Polymorphism in the MDR1 Gene Changes Substrate Specificity,DOI: 10.1126/science.1135308, Science 315, 525 (2007); Chava Kimchi-Sarfaty, et al.<br />
[http://www.sciencemag.org/cgi/content/full/315/5811/525 link to paper]<br />
<br />
This meeting is at 9:30 am instead of the usual time.<br />
<br />
== 5th March 2009 ==<br />
'''Date and time change: Wednesday 3pm-4pm'''<br />
<br />
Karen will present: <br />
Multiple Hypothesis Testing in Microarray Experiments by Sandrine Dudoit, Juliet Popper Shaffer and Jennifer C. Boldrick<br />
Statistical Science, Vol. 18, No. 1 (Feb., 2003), pp. 71-103. <br />
[http://www.jstor.org/stable/3182872 link to paper]<br />
<br />
== 12th March 2009 ==<br />
Aitana will present: <br />
<biblio><br />
#kashtan pmid=17698964<br />
</biblio><br />
<br />
== 19th March 2009 ==<br />
'''Time and room change: 2pm-3pm 1st floor conference room'''<br />
<br />
<br />
Diana is presenting: <br />
<br />
'''Drug—target network'''<br />
Muhammed A Yıldırım, Kwang-Il Goh, Michael E Cusick, Albert-László Barabási & Marc Vidal<br />
<br />
Nature Biotechnology 25, 1119 - 1126 (2007)<br />
Published online: 5 October 2007 | doi:10.1038/nbt1338<br />
http://www.nature.com/nbt/journal/v25/n10/abs/nbt1338.html<br />
<br />
== 26th March 2009 ==<br />
'''Time change: 2pm-3pm'''<br />
<br />
Micha will present the following paper:<br />
<biblio><br />
#millar pmid=16729048<br />
</biblio><br />
http://www.nature.com/msb/journal/v1/n1/synopsis/msb4100018.html<br />
<br />
== 2nd April 2009 ==<br />
'''1pm-2pm'''<br />
<br />
Sascha will present :<br />
<br />
Molecular Systems Biology 4 Article number: 176 <br />
<br />
doi:10.1038/msb.2008.14<br />
<br />
Theoretical and experimental approaches to understand morphogen gradients<br />
<br />
Marta Ibañes1 & Juan Carlos Izpisúa Belmonte<br />
<br />
http://www.nature.com/msb/journal/v4/n1/full/msb200814.html<br />
<br />
== 9th April 2009 ==<br />
<br />
Armand will present<br />
<br />
Accurate whole human genome sequencing using reversible terminator chemistry<br />
<br />
Nature 456, 53-59 (6 November 2008) | doi:10.1038/nature07517<br />
<br />
[http://www.nature.com/nature/journal/v456/n7218/full/nature07517.html link to paper]</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Group_Meeting&diff=584Group Meeting2009-03-31T16:11:05Z<p>Armand: </p>
<hr />
<div>Group Meeting is every Thursday, from 9.30am-11am, in the small meeting room. We also have a [[Journal Club]].<br />
<br />
== 5th February 2009 ==<br />
<br />
Micha<br />
<br />
== 12th February 2009 ==<br />
<br />
We'll have informal "mini-progress-updates" by people involved in GWAS: Karen, Toby, Diana & Zoltan (say ~10' each).<br />
This meeting will be at 13:00 instead of the journal club.<br />
<br />
== 19th February 2009 ==<br />
<br />
Barbara<br />
<br />
== 26th February 2009 ==<br />
<br />
No meeting. Instead, a meeting on:<br />
<br />
== Monday 2nd March 2009 ==<br />
<br />
Diana, @ 2 p.m<br />
<br />
== Friday 6th March 2009 (instead of Thursday 5th)==<br />
<br />
Sascha<br />
<br />
== 12th March 2009 ==<br />
<br />
Bastian<br />
<br />
== 19th March 2009 ==<br />
<br />
Zoltan<br />
<br />
== 26th March 2009 ==<br />
<br />
Karen<br />
<br />
== 2nd April 2009 ==<br />
<br />
Armand<br />
<br />
== 30th April 2009 ==<br />
<br />
Toby</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Welcome_to_the_Computational_Biology_Group!&diff=502Welcome to the Computational Biology Group!2009-03-18T11:14:43Z<p>Armand: </p>
<hr />
<div>Welcome to the reincarnation of the [http://serverdgm.unil.ch/bergmann/ CBG] Wiki!<br />
<br />
Creating new pages and editing existing content in the Wiki are restricted to CBG members, but by default the pages are world-readable. [mailto:wwwcbg@unil.ch Drop an email to the admin] if you want an account.<br />
<br />
If you are a [http://serverdgm.unil.ch/bergmann/ CBG] member (and have an account) you can find more information on this wiki by clicking on the [[Help:Contents|Help]] link that is in the menu on the left.<br />
<br />
<br />
<br />
== What's in this wiki: ==<br />
<br />
* Research<br />
** [[Robustness in Drosophila embryo patterning]]<br />
** [[WingX: Systems Biology of the Drosophila Wing]]<br />
** [[Genome Wide Association Studies]]<br />
<br />
* Teaching<br />
** [[Course: "Solving Biological Problems that require Math"]]<br />
<br />
* (currently imcomplete) list of group [[Publications]]<br />
<br />
* The schedule for our [[Group Meeting]] and our [[Journal Club]]<br />
<br />
* The [[Library]]<br />
<br />
* Some potentially relevant upcoming [[talks]].<br />
<br />
* How-to guides <br />
** Running jobs on [[Vital-IT]] <br />
** [[Submitting_lots_of_jobs_locally]]<br />
<br />
* [[CBGPeople|People at the CBG]]</div>Armandhttp://www2.unil.ch/cbg/index.php?title=Journal_Club_(spring_2012)&diff=499Journal Club (spring 2012)2009-03-17T15:11:10Z<p>Armand: </p>
<hr />
<div>Journal Club is every Thursday, from 1-2pm, in the small meeting room. Feel free to bring your lunch. We also have a [[Group Meeting]].<br />
<br />
Ideally, someone from the group should volunteer to choose a paper for each meeting, and should update this page and email the paper around on the '''Friday the week before the meeting'''. If a volunteer is not forthcoming, [[user:Toby|Toby]] will encourage someone to volunteer. <br />
<br />
== 5th February 2009 ==<br />
<br />
Toby will present:<br />
A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and<br />
Implications for Functional Genomics, by Juliane Sch&auml;fer and Korbinian Strimmer (2005)<br />
''Statistical Applications in Genetics and Molecular Biology''<br />
'''4''':1 Article 32.<br />
[http://dx.doi.org/10.2202/1544-6115.1175 doi:10.2202/1544-6115.1175]<br />
[http://www.bepress.com/sagmb/vol4/iss1/art32 link to paper]<br />
<br />
== 12th February 2009 ==<br />
<br />
G&aacute;bor will present:<br />
<biblio><br />
#wagner pmid=16087882<br />
</biblio><br />
<br />
<br />
[exceptionally at 9:30-10:30]<br />
<br />
== 19th February 2009 ==<br />
<br />
Zolt&aacute;n will present: <br />
The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing, by John D. Storey<br />
(2007) ''J. R. Statist. Soc. B'' '''69''':3 pp.347-368.<br />
[http://dx.doi.org/10.1111/j.1467-9868.2007.005592.x doi:10.1111/j.1467-9868.2007.005592.x]<br />
[http://www3.interscience.wiley.com/journal/118490765/abstract link to paper]<br />
<br />
== 26th February 2009 ==<br />
Bastian will present : A "Silent" Polymorphism in the MDR1 Gene Changes Substrate Specificity,DOI: 10.1126/science.1135308, Science 315, 525 (2007); Chava Kimchi-Sarfaty, et al.<br />
[http://www.sciencemag.org/cgi/content/full/315/5811/525 link to paper]<br />
<br />
This meeting is at 9:30 am instead of the usual time.<br />
<br />
== 5th March 2009 ==<br />
'''Date and time change: Wednesday 3pm-4pm'''<br />
<br />
Karen will present: <br />
Multiple Hypothesis Testing in Microarray Experiments by Sandrine Dudoit, Juliet Popper Shaffer and Jennifer C. Boldrick<br />
Statistical Science, Vol. 18, No. 1 (Feb., 2003), pp. 71-103. <br />
[http://www.jstor.org/stable/3182872 link to paper]<br />
<br />
== 12th March 2009 ==<br />
Aitana will present: <br />
<biblio><br />
#kashtan pmid=17698964<br />
</biblio><br />
<br />
== 19th March 2009 ==<br />
'''Time and room change: 2pm-3pm 1st floor conference room'''<br />
<br />
<br />
Diana is presenting: <br />
<br />
'''Drug—target network'''<br />
Muhammed A Yıldırım, Kwang-Il Goh, Michael E Cusick, Albert-László Barabási & Marc Vidal<br />
<br />
Nature Biotechnology 25, 1119 - 1126 (2007)<br />
Published online: 5 October 2007 | doi:10.1038/nbt1338<br />
http://www.nature.com/nbt/journal/v25/n10/abs/nbt1338.html<br />
<br />
== 26th March 2009 ==<br />
'''Time change: 2pm-3pm'''<br />
<br />
Micha will present something (to be announced).<br />
<br />
== 2nd April 2009 ==<br />
'''Time change: 2pm-3pm'''<br />
<br />
Sascha will present something (to be announced).<br />
<br />
== 9th April 2009 ==<br />
<br />
Armand (TBA).</div>Armand