March 29, 2015
[LMM] literature overview: performance
March 27, 2015
[LMM] literature overview: approximate methods
March 15, 2015
[FaST-LMM] Proximal contamination
March 13, 2015
[FaST-LMM] REML estimate
March 11, 2015
[FaST-LMM] comparison with PyLMM (continued)
March 10, 2015
[FaST-LMM] comparison with PyLMM (practice)
March 9, 2015
[FaST-LMM] comparison with PyLMM (theory)
March 3, 2015
[FaST-LMM] fastlmm/inference/lmm_cov.py, part 2
February 27, 2015
[FaST-LMM] high-level overview, part 2
February 25, 2015
[FaST-LMM] high-level overview of the codebase, part 1
February 18, 2015
February 16, 2015
[FaST-LMM] fastlmm/inference/lmm_cov.py, part 1
During the last month, the github repository of FaST-LMM has seen some updates, including much awaited addition of comments in
I’ve downloaded source code of PyLMM which looks extremely simplistic after so much time spent on FaST-LMM, and below are my notes about the differences between the two.
The next step will be running the relevant pieces of code and comparing their results and runtime.
Kinship matrix calculation
PyLMM (lmm.py, calculateKinship)
Given $ W $, “an n x m matrix encoding SNP minor alleles”, it performs matrix multiplication to get $ \frac1m W W^T $. Nothing fancy.
The distinguishing feature of the method is that it makes a distinction between low-rank case and full-rank case.
In full-rank case, when the number of SNPs is equal or greater than the number of individuals, it computes $ K $ as $ G G^T $, as usual.
However, if the number of individuals exceeds the number of SNPs (i.e. the matrix $ K $ is not full-rank), the method computes SVD of $ G $ instead of taking eigendecomposition of $ K $. All the remaining computations are then performed with the $ U $ and $ S $, where $ K = USU^T $.
Handling XKX matrix
This difference is going to be seen only when the number of covariates is large, and that’s, I guess, extremely rare.
PyLMM (lmm.py, getMLsoln)
Straightforwardly computes inverse of , and then does two matrix multiplications to obtain
FaST-LMM (full-rank case)
Instead of inverting $ X K X $, computes its eigendecomposition. Part of the rationale is that in REML log-likelihood calculation, we need the determinant of this matrix, and where PyLMM computes it as
Default value is
I couldn’t find such an option. Seemingly FaST-LMM aims to be used for larger datasets, where this approach is unaffordable.
Fitting a model with two variance components
With fixed $ n $ (
Although it’s possible to use two kernels, there’s no way to select which SNPs will go into the foreground kernel.
This much more computation-intensive procedure does the following by searching on a 2D grid (the first dimension is the number of SNPs, and the second one is the mixing coefficient):
Cross-validation also adds to the run time.
What is missing in PyLMM (compared to FaST-LMM)
As mentioned above, feature selection and estimating the mixing parameters are not implemented.
Proximal contamination handling
FaST-LMM applies smart updates to many involved matrices so as to virtually eliminate the tested SNP (and a few nearby) from the kinship matrix.
Efficient evaluation of various expressions
The thesis devotes a section (3.3.1) to the issue of efficient evaluation.