March 29, 2015
[LMM] literature overview: performance March 27, 2015 [LMM] literature overview: approximate methods March 15, 2015 [FaST-LMM] Proximal contamination March 13, 2015 [FaST-LMM] REML estimate March 11, 2015 [FaST-LMM] comparison with PyLMM (continued) March 10, 2015 [FaST-LMM] comparison with PyLMM (practice) March 9, 2015 [FaST-LMM] comparison with PyLMM (theory) March 3, 2015 [FaST-LMM] fastlmm/inference/lmm_cov.py, part 2 February 27, 2015 [FaST-LMM] high-level overview, part 2 February 25, 2015 [FaST-LMM] high-level overview of the codebase, part 1 February 18, 2015 [FaST-LMM] fastlmm/inference/lmm.py February 16, 2015 [FaST-LMM] fastlmm/inference/lmm_cov.py, part 1 |
## [NEWS]During the last month, the github repository of FaST-LMM has seen some updates, including much awaited addition of comments in I’ve downloaded source code of PyLMM which looks extremely simplistic after so much time spent on FaST-LMM, and below are my notes about the differences between the two. The next step will be running the relevant pieces of code and comparing their results and runtime. ## Kinship matrix calculation## PyLMM (lmm.py, calculateKinship)Given $ W $, “an n x m matrix encoding SNP minor alleles”, it performs matrix multiplication to get $ \frac1m W W^T $. Nothing fancy. ## FaST-LMMThe distinguishing feature of the method is that it makes a distinction between low-rank case and full-rank case. In full-rank case, when the number of SNPs is equal or greater than the number of individuals, it computes $ K $ as $ G G^T $, as usual. However, if the number of individuals exceeds the number of SNPs (i.e. the matrix $ K $ is not full-rank), the method computes SVD of $ G $ instead of taking eigendecomposition of $ K $. All the remaining computations are then performed with the $ U $ and $ S $, where $ K = USU^T $. ## Handling XKX matrixThis difference is going to be seen only when the number of covariates is large, and that’s, I guess, extremely rare. ## PyLMM (lmm.py, getMLsoln)Straightforwardly computes inverse of , and then does two matrix multiplications to obtain ## FaST-LMM (full-rank case)Instead of inverting $ X K X $, computes its eigendecomposition. Part of the rationale is that in REML log-likelihood calculation, we need the determinant of this matrix, and where PyLMM computes it as ## Refitting## PyLMMIf Default value is ## FaST-LMMI couldn’t find such an option. Seemingly FaST-LMM aims to be used for larger datasets, where this approach is unaffordable. ## Fitting a model with two variance components## PyLMM: misc.pyWith fixed $ n $ ( Although it’s possible to use two kernels, there’s no way to select which SNPs will go into the foreground kernel. ## FaST-LMM: feature_selection/feature_selection_two_kernel.pyThis much more computation-intensive procedure does the following by searching on a 2D grid (the first dimension is the number of SNPs, and the second one is the mixing coefficient):
Cross-validation also adds to the run time. ## What is missing in PyLMM (compared to FaST-LMM)## Two-kernel caseAs mentioned above, feature selection and estimating the mixing parameters are not implemented. ## Proximal contamination handlingFaST-LMM applies smart updates to many involved matrices so as to virtually eliminate the tested SNP (and a few nearby) from the kinship matrix. ## Efficient evaluation of various expressionsThe thesis devotes a section (3.3.1) to the issue of efficient evaluation. |