March 29, 2015
[LMM] literature overview: performance
March 27, 2015
[LMM] literature overview: approximate methods
March 15, 2015
[FaST-LMM] Proximal contamination
March 13, 2015
[FaST-LMM] REML estimate
March 11, 2015
[FaST-LMM] comparison with PyLMM (continued)
March 10, 2015
[FaST-LMM] comparison with PyLMM (practice)
March 9, 2015
[FaST-LMM] comparison with PyLMM (theory)
March 3, 2015
[FaST-LMM] fastlmm/inference/lmm_cov.py, part 2
February 27, 2015
[FaST-LMM] high-level overview, part 2
February 25, 2015
[FaST-LMM] high-level overview of the codebase, part 1
February 18, 2015
February 16, 2015
[FaST-LMM] fastlmm/inference/lmm_cov.py, part 1
Following the meet-in-the-middle approach, after thorough delving into the algorithmic core of the FaST-LMM, I decided to finally run the code on test data and see what it does from the user point of view :)
Python notebook infrastructure, and the provided document FaST-LMM.ipynb help immensely with understanding how to run the analysis. The notebook will serve as a guide through the available functionality.
Required packages and modules
(All but the last one, I installed via Pacman)
The first approach is called ‘traditional’. In order to exclude the tested SNP from the set on which null model is built, it simply skips the whole chromosome when building the model. It’s less sophisticated than the following methods, but has less power (in the statistical sense).
Now let’s go into depth and discover what’s going under the hood in the straightforwardly named
The code is located in
Optional ‘covar’ argument
If it’s provided, it’s used as the X matrix (without the last column of ones). If not, X is just a vector of ones.
LMM(all + select)
The next algorithm is more complicated. The principal idea is to improve statistical power by removing from GSM those SNPs that are uncorrelated with phenotypes.
A little bit more formal and math-inclined summary (taken from one of the papers) is as follows:
Mixed-model basically means that the matrix $K$ is now a mix of $G_0G_0^T$ and $G_1G_1^T$ where $G_1$ corresponds to the selected SNPs.
How these parameters are searched, is currently beyond my comprehension. As this is a high-level overview, I’ll search for details later on.
Implemented in module
The boolean parameter
In the next part: LMM(select) + PCs; epistasis; testing sets of SNPs