March 29, 2015
[LMM] literature overview: performance March 27, 2015 [LMM] literature overview: approximate methods March 15, 2015 [FaST-LMM] Proximal contamination March 13, 2015 [FaST-LMM] REML estimate March 11, 2015 [FaST-LMM] comparison with PyLMM (continued) March 10, 2015 [FaST-LMM] comparison with PyLMM (practice) March 9, 2015 [FaST-LMM] comparison with PyLMM (theory) March 3, 2015 [FaST-LMM] fastlmm/inference/lmm_cov.py, part 2 February 27, 2015 [FaST-LMM] high-level overview, part 2 February 25, 2015 [FaST-LMM] high-level overview of the codebase, part 1 February 18, 2015 [FaST-LMM] fastlmm/inference/lmm.py February 16, 2015 [FaST-LMM] fastlmm/inference/lmm_cov.py, part 1 |
Following the meet-in-the-middle approach, after thorough delving into the algorithmic core of the FaST-LMM, I decided to finally run the code on test data and see what it does from the user point of view :) Python notebook infrastructure, and the provided document FaST-LMM.ipynb help immensely with understanding how to run the analysis. The notebook will serve as a guide through the available functionality. Required packages and modules
(All but the last one, I installed via Pacman) LMM(all)The first approach is called ‘traditional’. In order to exclude the tested SNP from the set on which null model is built, it simply skips the whole chromosome when building the model. It’s less sophisticated than the following methods, but has less power (in the statistical sense). Now let’s go into depth and discover what’s going under the hood in the straightforwardly named single_snp_leave_out_one_chromThe code is located in Input formats
Optional ‘covar’ argumentIf it’s provided, it’s used as the X matrix (without the last column of ones). If not, X is just a vector of ones. Final notes
LMM(all + select)The next algorithm is more complicated. The principal idea is to improve statistical power by removing from GSM those SNPs that are uncorrelated with phenotypes. A little bit more formal and math-inclined summary (taken from one of the papers) is as follows:
Mixed-model basically means that the matrix $K$ is now a mix of $G_0G_0^T$ and $G_1G_1^T$ where $G_1$ corresponds to the selected SNPs.
How these parameters are searched, is currently beyond my comprehension. As this is a high-level overview, I’ll search for details later on.
Feature selectionImplemented in module
The boolean parameter Another parameter, In the next part: LMM(select) + PCs; epistasis; testing sets of SNPs |