March 29, 2015
[LMM] literature overview: performance March 27, 2015 [LMM] literature overview: approximate methods March 15, 2015 [FaST-LMM] Proximal contamination March 13, 2015 [FaST-LMM] REML estimate March 11, 2015 [FaST-LMM] comparison with PyLMM (continued) March 10, 2015 [FaST-LMM] comparison with PyLMM (practice) March 9, 2015 [FaST-LMM] comparison with PyLMM (theory) March 3, 2015 [FaST-LMM] fastlmm/inference/lmm_cov.py, part 2 February 27, 2015 [FaST-LMM] high-level overview, part 2 February 25, 2015 [FaST-LMM] high-level overview of the codebase, part 1 February 18, 2015 [FaST-LMM] fastlmm/inference/lmm.py February 16, 2015 [FaST-LMM] fastlmm/inference/lmm_cov.py, part 1 |
This post starts the series devoted to studying FaST-LMM source code located here. In these series, ‘thesis’ refers to the publication ‘Linear mixed models for genome-wide association studies’ (C. Lippert) First of all, some definitions are to be provided:
The working horse is Nevertheless, let’s start from Linreg classLinreg(X).regress(Y) returns residuals, i.e. $ Y - X\beta = Y - XX^\dagger Y $. Here $ X^\dagger $ is a pseudo-inverse of $ X $ computed by numpy.linalg.pinv. With our definition of the matrix $S$, its effect can be written simply as $SY$. LMM classsetSU_fromK
Here we see that K_ is first set to $S(K+I)$, and then to $S(S(K+I))^T = S(K+I)S\ $ (note that $K$ and $S$ are both symmetric). Thesis 3.1.1. (Maximum likelihood estimation) contains a reference to the lemma C.15, because the expression for the likelihood function includes $ S(\gamma K + I) S $. Lemma C.15 allows to quickly compute eigenpairs of a matrix of this form with any $\gamma$ by considering the matrix $S(K + I)S$:
Note that although documentation of numpy.linalg.eigh suggests that returned eigenvalues are not necessarily ordered, quick check shows that at least in the real-valued case they are sorted in ascending order. That’s why last $ N - D $ eigenpairs are taken (first D are zeros). Thus we conclude that
Update from 04 May 2015:
The rationale and approach are described in the thesis 2.2.3
Here idempotency of $S$ was used and the fact that $S = U_SU_S^T$ where columns of $U_S$ are eigenvectors of $S(K+I)S$. The latter fact is less trivial, its proof is found in the thesis. setSU_fromGSome branches in the function body are trivial, so let’s skip them and look at the most interesting piece of code:
The $SG$ matrix is decomposed as $U\Sigma^{1/2}V^T$. This corresponds to decomposing $SKS = SGG^TS$ as $U\Sigma U^T$. |