GMM
Deletion, insertion and duplication events giving rise to copy number variations (CNVs) have been found genome-wide in the humans and other species.
Such genomic aberrations were identified already more than a decade ago using array-based comparative hybridization. They can also be detected using
data from SNP genotyping arrays, typically by combining the intensities of the two probes for a given SNP and comparing to the same SNP from other arrays (thus deriving a copy number ratio).
Significant shift from the baseline (unit ratio or zero log ratio) reflects copy number changes. Such changes can be identified in many ways, for example, one can use segmentation algorithms to partition the signal then try to classify such segments into gain, copy neutral and loss status.
Yet, for large datasets, one can take advantage of the signal distribution at each SNP, and cluster each individual from the distribution into a component that would reflect a given copy number change.
We developped a Gaussian Mixture Model, which detect copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage).
Contents
License
The GMM algorithm is licensed under the GNU General Public License, version 2 or later. For details, see http://www.gnu.org/licenses/old-licenses/gpl-2.0.html.
Usage
The GMM can be applied to identify CNVs from any rectangular matrix of copy number ratio.
Requirements
If you have the MATLAB software, you can directly use the source code.
Otherwise, you will need to download the Matlab Component Runtime to use the executables (see Download section).
Download
Description | File Name | Size | md5sum |
---|---|---|---|
MCR for 64-bit Linux | MCR2007_x86_64.zip[1] | 224M | 451c54a811b3e01402b6a46a1b814c4d |
Linux Executables | GMM_CNV.zip[2] | 556k | bd579f39c340a50de2bb80a649643be3 |
Source code | GMM_CNV_SOURCE.zip[3] | 16k | 3cb7799bf3e180b33a6742ef382b105e |
Example output files | GMM_CNV_outputs.zip[4] | 460k | 6b621a6a8e279697f610db35810777ce |