Ordinary words they are the least squares coefficient

Ordinary least squares (OLS) parameter estimation is achieved by minimizing the
sum overall squared differences between the observed sample values and the predicted
values (the residuals). When Yˆ
i = ?ˆ
0 + ?ˆ
1Xi
is the prediction for Y based on
the i-th value of X. Then ei = Yi ? Yˆ
i
is the i-th residual. A residual is simply the
difference between the i-th observed response value and the i-th response value that
is predicted by our linear model. We define the residual sum of squares (RSS) as
RSS = e
2
1 + e
2
2 + · · · + e
2
n
2.3. Estimation methods 11
or equivalently as
RSS = (Y1 ? ?ˆ
0 ? ?ˆ
1×1)
2 + (y2 ? ?ˆ
0 ? ?ˆ
1×2)
2 + · · · + (yn ? ?ˆ
0 ? ?ˆ
1xn)
2
The least squares approach chooses ?ˆ
0 and ?ˆ
1 to minimize the RSS. From the formulas
above we can easily get

1 =
?
n
i=1
(xi ? x)(yi ? y)
?
n
i=1
(xi ? x)
2

0 = yˆ ? ?ˆ
1x
where yˆ ? 1
n ?
n
i=1
yi and xˆ ? 1
n ?
n
i=1
xi are the sample means. In other words they
are the least squares coefficient estimates for simple linear regression (James et al.,
2013). In R we can use the lm() function to fit a linear model with an OLS estimation
method by default.
2.3.2 Maximum likelihood estimation and the log likelihood
The maximum likelihood (ML) approach aims to estimate population parameters in
a way that attaining the sample observation values from such populations with the
assigned probability distribution is maximized. In other words, the ML parameter
estimates are the values that make obtaining the observed data most likely. Likelihood
of observed data, given a specific set of distribution parameters (for example
the mean µ and variance ?
2 of a normal distribution) is the product of the individual
densities of each observation for those parameters (and distribution).
The likelihood function for a model with independent and normally distributed
observations is specified as:
LFull(?, ?
2
; y) ? (2??2
)
?n/2?n
i=1
exp
?
(yi ? x’i?)
2
2?
2

It is important to mention that when the residuals of a linear model are independent
and normally distributed then the parameters estimates are the same as for the OLS
estimation method.
ML estimation has some drawbacks that make using the log likelihood more
useful. For instance one negative side of ML is that it is known to be unstable near
horizontal asymptotic. Log-likelihood functions are steeper and have narrower curvatures
and thus have more prominent maxims. ML is also known to yield biased
parameter estimates, which is actually related to small sample sizes. Another major
drawback of this method is that it typically requires strong assumptions about the
underlying distributions of the parameters.
The log likelihood is specified as follows:
LFull(?, ?
2
; y) ? ?
n
2
log(?
2
) ?
1
2?
2
n
?
i=1
(yi ? x’i?)
2
2.3.3 REML estimation method
An alternative to ML is restricted maximum likelihood (REML). REML maximizes
the likelihood of the residuals rather than data and therefore takes into account the
number of estimated fixed parameters. Whilst REML yields unbiased random effects
parameter estimates, REML models differing in fixed effects cannot be compared via
12 Chapter 2. Method
log-likelihood ratio tests or information criteria (such as AIC). For relatively simple
regression models, the gls() function in the nlme package is sufficient.
To obtain an unbiased estimate for ?
2
, the likelihood function based on a set
of n ? p independent contrasts of y is used. The resulting log-restricted-likelihood
function is given by:
lREML(?
2
; y) ? ?
n ? p
2
log(?
2
) ?
1
2?
2
n
?
i=1
ri
where
ri ? yi ? xi
‘(
n
?
i=1
xixi
‘)
?1
n
?
i=1
xiyi
Maximization of the previous formula with respect to the following REML estimator:

2
REML ?
1
n ? p
n
?
i=1
r
2
i
(Ga?ecki and Burzykowski, 2013)
REML estimation method is the default method in both the gls() function that
ables to specify the covariance structure as well and also the lme() function that fits
linear mixed models in R
2.3.4 Estimation in LMMs
As we already mentioned a fixed effect is an unknown constant that we try to estimate
from the data and REML method is used by default to do that in R. But a
random effect is a random variable, so we do not estimate a random effect, since
it would not make much sense. Instead we try to estimate the parameters that describe
the distribution of this random effect. A random effects approach to modeling
effects is more ambitious in the sense that it attempts to say something about the
wider population beyond the particular sample (West, Welch, and Galecki, 2006).
2.4 Variance functions
The names of the variance functions in this project are the same names they are given
in R. All the functions can be found in the nlme package.
2.4.1 Variance function varIdent()
The varIdent() variance function depends on the variance covariate, i.e. vi and on
parameters ? = (?1, . . . , ?S), but not on µi
. Thus they are mean-independent variance
functions.
We assign the weights inside the gls() function to an object of the varIdent class,
created with the help of the varIdent() constructor function. The varIdent() function
allows for different variances for different strata (S), which means that we allow for
different variances of measurements at different time points. In mathematical terms,
2.5. Correlation function 13
this means:
Var(Glucoseit) ? ?
2
t =
?
???????
???????
?
2
for t = 1 (0 minutes),
?
2
?
2
2
for t = 2 (5 minutes),
?
2
?
2
3
for t = 3 (10 minutes),
. . .
?
2
?
2
25 for t = 25 (120 minutes),
where ?2 ? ?2/?1, ?3 ? ?3/?1 . . . ?25 ? ?25/?1. Since the variance model uses S + 1
parameters to represent S variances, it is unidentifiable, but imposing restrictions
on the variance parameters ?, the model achieves identifiability. Therefore ?1 = 1,
so that ?l
, l = 2, …, S represent the ratio between the standard deviations of the
l-th stratum and the first stratum. By definition, ?l > 0, l = 2, …, S (Ga?ecki and
Burzykowski, 2013; Pinheiro and Bates, 2000). Model glsmod2 uses the varIdent()
correlation function.
2.4.2 Variance function varPower() class
The varPower() function specifies that the variances are a power function of the
mean value:
?it = ??it = ??(µit, ?) = ?(µit)
?
where µit is the predicted (mean) value of Glucoseit. The function ?(·) is an example
of the varPower(·) variance function from the < ?, µ >-group, œ ? ? and no strata
(no arguments are used in the varPower() function). This model is fitted with REMLbased
GLS (Ga?ecki and Burzykowski, 2013). The model that uses the varPower()
function has been stored in the glsmod3 object.
2.4.3 Variance function varExp() class
The varExp() function specifies that the variances are an exponential function of the
variance covariate:
?(vit, ?) = exp(?vit)
where vit is the variance covariate (Ga?ecki and Burzykowski, 2013). The model that
includes the varExp() function, has been stored in the glsmod4 object.
2.5 Correlation function
2.5.1 Autoregressive model
Autoregressive models are a part of the large family of correlation structures that
contribute to the linear stationary autoregressive-moving average models also known
as ARMA models. Autoregressive models assume that the correlation decreases as
time points are more farther apart, meaning that the highest correlation is between
adjacent time points. The number of past observations included in the model (p) is
called the order of the autoregressive model, which is denoted by AR(p). There are
p correlation parameters in an AR(p) model, given by ? = (?1, …, ?p) (Pinheiro and
Bates, 2000).
Autoregressive model of order one (AR(1) as it is specified in R), is the simplest
and also one of the most useful autoregressive models. Its correlation function decreases
in absolute value exponentially with lag. The correlation parameter, ?, can
14 Chapter 2. Method
take values between 1 and -1. This structure is only applicable for evenly spaced
time intervals for the repeated measure. The correlation structure can be written
as: h(k, ?) ? ?
k
, k = 0, 1, …; |?| < 1

Author: