[FaST-LMM] REML estimate

March 29, 2015
[LMM] literature overview: performance

March 27, 2015
[LMM] literature overview: approximate methods

March 15, 2015
[FaST-LMM] Proximal contamination

March 13, 2015
[FaST-LMM] REML estimate

March 11, 2015
[FaST-LMM] comparison with PyLMM (continued)

March 10, 2015
[FaST-LMM] comparison with PyLMM (practice)

March 9, 2015
[FaST-LMM] comparison with PyLMM (theory)

March 3, 2015
[FaST-LMM] fastlmm/inference/lmm_cov.py, part 2

February 27, 2015
[FaST-LMM] high-level overview, part 2

February 25, 2015
[FaST-LMM] high-level overview of the codebase, part 1

February 18, 2015
[FaST-LMM] fastlmm/inference/lmm.py

February 16, 2015
[FaST-LMM] fastlmm/inference/lmm_cov.py, part 1

It turns out that I had wrong understanding of how REML estimate can be computed. It was confusing to see that both lmm.py and lmm_cov.py give the same results although using completely different approaches. Now that I’ve studied the literature I can summarize what I learned.

Reference paper

It took me a while to find the original derivations, but here they are:

S. R. Searle, 1978

The simplest form of the equation (conceptually)

The advantage of REML estimate of the variance component is that it’s unbiased.

For that, we get rid of $\beta$, fixed-effect coefficients, by taking a matrix $S$ such that $SX = 0$.

The idea behind it is this: if $y$ has gaussian distribution with mean $X\beta + Gu$ and var.-covar. matrix $V=\sigma_g^2GG^T + \sigma^2I$, the transformed phenotype vector $Sy$ will still have gaussian distribution (because the transformation is linear) but with mean $0$ and variance $SVS^T$, containing no trace of $\beta$. Then ML estimate of $Sy$ is maximized to obtain the optimal variance ratio $\gamma=\frac{\sigma_g^2}{\sigma^2}$.

Now, $ SX = 0 $ means that rows of $ S $ are in the orthogonal complement of $ \mathrm{colspace}(X),\ $ and if $X$ has $D$ linearly independent columns, rank of $S$ can’t be more than $ N - D$. It makes sense to take $S$ with this maximal rank. And because of this, the number of individuals is effectively reduced from $N$ to $N-D$.

Thus the REML estimate of $\gamma$ is just the ML estimate with $Sy$ instead of $y$ and $N-D$ instead of $N$ because now we effectively have $N-D$ individuals:

$\gamma = \mathrm{argmax} \left(-\frac12(N-D)\log(2\pi) - \frac12 \log\left|SVS^T\right| - \frac12 (Sy)^T(SVS^T)^{-1}(Sy)\right)$

In our case, $V$ is $\sigma^2(\gamma K + I)$, therefore we can pull $\sigma^2$ out of the determinant and rewrite the expression in the parentheses as

$-\frac12(N-D)\log(2\pi) - \frac12 \left((N-D)\log(\sigma^2) + \log\left|S(\gamma K+ I)S^T\right|\right) - \frac{1}{2\sigma^2} (Sy)^T(S(\gamma K+ I)S^T)^{-1}(Sy)$

Instead of the unknown $\sigma^2$ we put its ML estimate, which is

$\frac{1}{N-D}(Sy)^T(S(\gamma K+I)S^T)^{-1}(Sy)$

and get

$-\frac12(N-D)\log\left(2\pi\sigma^2\right) - \frac12\log\left|S(\gamma K+ I)S^T\right| - \frac{N-D}{2}$

This expression is used in fastlmm.inference.lmm_cov module for REML derivation:

nLL = 0.5 * (logdetK + N * (np.log(2.0 * np.pi * sigma2) + 1))

Removing $S$ from the equation

Remarkably, that happens to be possible.

Details can be found in Searle on pages 89–90, where the following facts are obtained, with the aid of formulas derived on pages 29-30:

$ \log\left|SVS^T\right| = \log|V| + \log\left|X^TV^{-1}X\right| - \log\left|X^TX\right| + C,\quad $ where $C$ is a constant that depends on the choice of $S$
$S$ can be taken to be $I - XX^{\dagger}$, and in this case $C = 0$.
$ (Sy)^T(SVS^T)^{-1}(Sy) = y^TPy \ $ where $P = V^{-1} - V^{-1}X(X^TV^{-1}X)X^TV^{-1}$
$y^TPy$ can also be calculated as $ (y-X\hat{\beta})^TV^{-1}(y-X\hat{\beta}) \ $ where $\hat{\beta}$ is estimated as usual, i.e. $\hat{\beta} = \left(X^TV^{-1}X\right)^{-1}X^TV^{-1}y$

Therefore we obtain the following expression for the REML log-likelihood:

$-\frac12 (N-D)\log(2\pi) - \frac{C}{2} + \frac12\log\left|X^TX\right| - \frac12 \log|V| -\frac12\log\left|X^TV^{-1}X\right| - \frac12 (y-X\hat{\beta})^TV^{-1}(y-X\hat{\beta})$

Again, after pulling out $\sigma^2$ and ignoring the constant, it gets transformed into

$-\frac12(N-D)\log(2\pi) +\frac12\log\left|X^TX\right| - \left(\frac{N}{2}\log(\sigma^2) + \frac12\log|\gamma K+ I|\right) -\left(-\frac{D}{2}\log(\sigma^2)+\frac12\log\left|X^T(\gamma K+ I)^{-1}X\right|\right) - \frac{N-D}{2}$

which simplifies to

$-\frac12(N-D)\log(2\pi\sigma^2) - \frac12\log|\gamma K+ I| - \frac12\log\left|X^T(\gamma K+ I)^{-1}X\right| +\frac12\log\left|X^TX\right| - \frac{N-D}{2}$

Compare with this line from fastlmm.inference.lmm:

nLL =  0.5 * ( logdetK + logdetXKX - logdetXX + (N-D) * ( SP.log(2.0*SP.pi*sigma2) + 1 ) )