March 29, 2015
[LMM] literature overview: performance March 27, 2015 [LMM] literature overview: approximate methods March 15, 2015 [FaST-LMM] Proximal contamination March 13, 2015 [FaST-LMM] REML estimate March 11, 2015 [FaST-LMM] comparison with PyLMM (continued) March 10, 2015 [FaST-LMM] comparison with PyLMM (practice) March 9, 2015 [FaST-LMM] comparison with PyLMM (theory) March 3, 2015 [FaST-LMM] fastlmm/inference/lmm_cov.py, part 2 February 27, 2015 [FaST-LMM] high-level overview, part 2 February 25, 2015 [FaST-LMM] high-level overview of the codebase, part 1 February 18, 2015 [FaST-LMM] fastlmm/inference/lmm.py February 16, 2015 [FaST-LMM] fastlmm/inference/lmm_cov.py, part 1 |
[NEWS]During the last month, the github repository of FaST-LMM has seen some updates, including much awaited addition of comments in I’ve downloaded source code of PyLMM which looks extremely simplistic after so much time spent on FaST-LMM, and below are my notes about the differences between the two. The next step will be running the relevant pieces of code and comparing their results and runtime. Kinship matrix calculationPyLMM (lmm.py, calculateKinship)Given $ W $, “an n x m matrix encoding SNP minor alleles”, it performs matrix multiplication to get $ \frac1m W W^T $. Nothing fancy. FaST-LMMThe distinguishing feature of the method is that it makes a distinction between low-rank case and full-rank case. In full-rank case, when the number of SNPs is equal or greater than the number of individuals, it computes $ K $ as $ G G^T $, as usual. However, if the number of individuals exceeds the number of SNPs (i.e. the matrix $ K $ is not full-rank), the method computes SVD of $ G $ instead of taking eigendecomposition of $ K $. All the remaining computations are then performed with the $ U $ and $ S $, where $ K = USU^T $. Handling XKX matrixThis difference is going to be seen only when the number of covariates is large, and that’s, I guess, extremely rare. PyLMM (lmm.py, getMLsoln)Straightforwardly computes inverse of , and then does two matrix multiplications to obtain FaST-LMM (full-rank case)Instead of inverting $ X K X $, computes its eigendecomposition. Part of the rationale is that in REML log-likelihood calculation, we need the determinant of this matrix, and where PyLMM computes it as RefittingPyLMMIf Default value is FaST-LMMI couldn’t find such an option. Seemingly FaST-LMM aims to be used for larger datasets, where this approach is unaffordable. Fitting a model with two variance componentsPyLMM: misc.pyWith fixed $ n $ ( Although it’s possible to use two kernels, there’s no way to select which SNPs will go into the foreground kernel. FaST-LMM: feature_selection/feature_selection_two_kernel.pyThis much more computation-intensive procedure does the following by searching on a 2D grid (the first dimension is the number of SNPs, and the second one is the mixing coefficient):
Cross-validation also adds to the run time. What is missing in PyLMM (compared to FaST-LMM)Two-kernel caseAs mentioned above, feature selection and estimating the mixing parameters are not implemented. Proximal contamination handlingFaST-LMM applies smart updates to many involved matrices so as to virtually eliminate the tested SNP (and a few nearby) from the kinship matrix. Efficient evaluation of various expressionsThe thesis devotes a section (3.3.1) to the issue of efficient evaluation. |