"""
Linear mixed effects models are regression models for dependent data.
They can be used to estimate regression relationships involving both
means and variances.
These models are also known as multilevel linear models, and
hierarchical linear models.
The MixedLM class fits linear mixed effects models to data, and
provides support for some common post-estimation tasks. This is a
group-based implementation that is most efficient for models in which
the data can be partitioned into independent groups. Some models with
crossed effects can be handled by specifying a model with a single
group.
The data are partitioned into disjoint groups. The probability model
for group i is:
Y = X*beta + Z*gamma + epsilon
where
* n_i is the number of observations in group i
* Y is a n_i dimensional response vector (called endog in MixedLM)
* X is a n_i x k_fe dimensional design matrix for the fixed effects
(called exog in MixedLM)
* beta is a k_fe-dimensional vector of fixed effects parameters
(called fe_params in MixedLM)
* Z is a design matrix for the random effects with n_i rows (called
exog_re in MixedLM). The number of columns in Z can vary by group
as discussed below.
* gamma is a random vector with mean 0. The covariance matrix for the
first `k_re` elements of `gamma` (called cov_re in MixedLM) is
common to all groups. The remaining elements of `gamma` are
variance components as discussed in more detail below. Each group
receives its own independent realization of gamma.
* epsilon is a n_i dimensional vector of iid normal
errors with mean 0 and variance sigma^2; the epsilon
values are independent both within and between groups
Y, X and Z must be entirely observed. beta, Psi, and sigma^2 are
estimated using ML or REML estimation, and gamma and epsilon are
random so define the probability model.
The marginal mean structure is E[Y | X, Z] = X*beta. If only the mean
structure is of interest, GEE is an alternative to using linear mixed
models.
Two types of random effects are supported. Standard random effects
are correlated with each other in arbitrary ways. Every group has the
same number (`k_re`) of standard random effects, with the same joint
distribution (but with independent realizations across the groups).
Variance components are uncorrelated with each other, and with the
standard random effects. Each variance component has mean zero, and
all realizations of a given variance component have the same variance
parameter. The number of realized variance components per variance
parameter can differ across the groups.
The primary reference for the implementation details is:
MJ Lindstrom, DM Bates (1988). "Newton Raphson and EM algorithms for
linear mixed effects models for repeated measures data". Journal of
the American Statistical Association. Volume 83, Issue 404, pages
1014-1022.
See also this more recent document:
http://econ.ucsb.edu/~doug/245a/Papers/Mixed%20Effects%20Implement.pdf
All the likelihood, gradient, and Hessian calculations closely follow
Lindstrom and Bates 1988, adapted to support variance components.
The following two documents are written more from the perspective of
users:
http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf
http://lme4.r-forge.r-project.org/slides/2009-07-07-Rennes/3Longitudinal-4.pdf
Notation:
* `cov_re` is the random effects covariance matrix (referred to above
as Psi) and `scale` is the (scalar) error variance. For a single
group, the marginal covariance matrix of endog given exog is scale*I
+ Z * cov_re * Z', where Z is the design matrix for the random
effects in one group.
* `vcomp` is a vector of variance parameters. The length of `vcomp`
is determined by the number of keys in either the `exog_vc` argument
to ``MixedLM``, or the `vc_formula` argument when using formulas to
fit a model.
Notes:
1. Three different parameterizations are used in different places.
The regression slopes (usually called `fe_params`) are identical in
all three parameterizations, but the variance parameters differ. The
parameterizations are:
* The "user parameterization" in which cov(endog) = scale*I + Z *
cov_re * Z', as described above. This is the main parameterization
visible to the user.
* The "profile parameterization" in which cov(endog) = I +
Z * cov_re1 * Z'. This is the parameterization of the profile
likelihood that is maximized to produce parameter estimates.
(see Lindstrom and Bates for details). The "user" cov_re is
equal to the "profile" cov_re1 times the scale.
* The "square root parameterization" in which we work with the Cholesky
factor of cov_re1 instead of cov_re directly. This is hidden from the
user.
All three parameterizations can be packed into a vector by
(optionally) concatenating `fe_params` together with the lower
triangle or Cholesky square root of the dependence structure, followed
by the variance parameters for the variance components. The are
stored as square roots if (and only if) the random effects covariance
matrix is stored as its Choleky factor. Note that when unpacking, it
is important to either square or reflect the dependence structure
depending on which parameterization is being used.
Two score methods are implemented. One takes the score with respect
to the elements of the random effects covariance matrix (used for
inference once the MLE is reached), and the other takes the score with
respect to the parameters of the Choleky square root of the random
effects covariance matrix (used for optimization).
The numerical optimization uses GLS to avoid explicitly optimizing
over the fixed effects parameters. The likelihood that is optimized
is profiled over both the scale parameter (a scalar) and the fixed
effects parameters (if any). As a result of this profiling, it is
difficult and unnecessary to calculate the Hessian of the profiled log
likelihood function, so that calculation is not implemented here.
Therefore, optimization methods requiring the Hessian matrix such as
the Newton-Raphson algorithm cannot be used for model fitting.
"""
import numpy as np
import statsmodels.base.model as base
from statsmodels.tools.decorators import cache_readonly
from statsmodels.tools import data as data_tools
from scipy.stats.distributions import norm
from scipy import sparse
import pandas as pd
import patsy
from collections import OrderedDict
import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
from statsmodels.base._penalties import Penalty
def _dot(x, y):
"""
Returns the dot product of the arrays, works for sparse and dense.
"""
if isinstance(x, np.ndarray) and isinstance(y, np.ndarray):
return np.dot(x, y)
elif sparse.issparse(x):
return x.dot(y)
elif sparse.issparse(y):
return y.T.dot(x.T).T
# From numpy, adapted to work with sparse and dense arrays.
def _multi_dot_three(A, B, C):
"""
Find best ordering for three arrays and do the multiplication.
Doing in manually instead of using dynamic programing is
approximately 15 times faster.
"""
# cost1 = cost((AB)C)
cost1 = (A.shape[0] * A.shape[1] * B.shape[1] + # (AB)
A.shape[0] * B.shape[1] * C.shape[1]) # (--)C
# cost2 = cost((AB)C)
cost2 = (B.shape[0] * B.shape[1] * C.shape[1] + # (BC)
A.shape[0] * A.shape[1] * C.shape[1]) # A(--)
if cost1 < cost2:
return _dot(_dot(A, B), C)
else:
return _dot(A, _dot(B, C))
def _dotsum(x, y):
"""
Returns sum(x * y), where '*' is the pointwise product, computed
efficiently for dense and sparse matrices.
"""
if sparse.issparse(x):
return x.multiply(y).sum()
else:
# This way usually avoids allocating a temporary.
return np.dot(x.ravel(), y.ravel())
class VCSpec(object):
"""
Define the variance component structure of a multilevel model.
An instance of the class contains three attributes:
- names : names[k] is the name of variance component k.
- mats : mats[k][i] is the design matrix for group index
i in variance component k.
- colnames : colnames[k][i] is the list of column names for
mats[k][i].
The groups in colnames and mats must be in sorted order.
"""
def __init__(self, names, colnames, mats):
self.names = names
self.colnames = colnames
self.mats = mats
def _get_exog_re_names(self, exog_re):
"""
Passes through if given a list of names. Otherwise, gets pandas names
or creates some generic variable names as needed.
"""
if self.k_re == 0:
return []
if isinstance(exog_re, pd.DataFrame):
return exog_re.columns.tolist()
elif isinstance(exog_re, pd.Series) and exog_re.name is not None:
return [exog_re.name]
elif isinstance(exog_re, list):
return exog_re
# Default names
defnames = ["x_re{0:1d}".format(k + 1) for k in range(exog_re.shape[1])]
return defnames
class MixedLMParams(object):
"""
This class represents a parameter state for a mixed linear model.
Parameters
----------
k_fe : int
The number of covariates with fixed effects.
k_re : int
The number of covariates with random coefficients (excluding
variance components).
k_vc : int
The number of variance components parameters.
Notes
-----
This object represents the parameter state for the model in which
the scale parameter has been profiled out.
"""
def __init__(self, k_fe, k_re, k_vc):
self.k_fe = k_fe
self.k_re = k_re
self.k_re2 = k_re * (k_re + 1) // 2
self.k_vc = k_vc
self.k_tot = self.k_fe + self.k_re2 + self.k_vc
self._ix = np.tril_indices(self.k_re)
def from_packed(params, k_fe, k_re, use_sqrt, has_fe):
"""
Create a MixedLMParams object from packed parameter vector.
Parameters
----------
params : array_like
The mode parameters packed into a single vector.
k_fe : int
The number of covariates with fixed effects
k_re : int
The number of covariates with random effects (excluding
variance components).
use_sqrt : bool
If True, the random effects covariance matrix is provided
as its Cholesky factor, otherwise the lower triangle of
the covariance matrix is stored.
has_fe : bool
If True, `params` contains fixed effects parameters.
Otherwise, the fixed effects parameters are set to zero.
Returns
-------
A MixedLMParams object.
"""
k_re2 = int(k_re * (k_re + 1) / 2)
# The number of covariance parameters.
if has_fe:
k_vc = len(params) - k_fe - k_re2
else:
k_vc = len(params) - k_re2
pa = MixedLMParams(k_fe, k_re, k_vc)
cov_re = np.zeros((k_re, k_re))
ix = pa._ix
if has_fe:
pa.fe_params = params[0:k_fe]
cov_re[ix] = params[k_fe:k_fe+k_re2]
else:
pa.fe_params = np.zeros(k_fe)
cov_re[ix] = params[0:k_re2]
if use_sqrt:
cov_re = np.dot(cov_re, cov_re.T)
else:
cov_re = (cov_re + cov_re.T) - np.diag(np.diag(cov_re))
pa.cov_re = cov_re
if k_vc > 0:
if use_sqrt:
pa.vcomp = params[-k_vc:]**2
else:
pa.vcomp = params[-k_vc:]
else:
pa.vcomp = np.array([])
return pa
from_packed = staticmethod(from_packed)
def from_components(fe_params=None, cov_re=None, cov_re_sqrt=None,
vcomp=None):
"""
Create a MixedLMParams object from each parameter component.
Parameters
Loading ...