Browse by

3. Mean squared error

Isabel Molina, J.N.K. Rao and Gauri Sankar Datta

Note that the BLUP ${\tilde{θ}}_{i} (A)$ of the small area mean $θ_{i}$ is a linear function of $y .$ Hence, its MSE can be easily calculated and it is given by the sum of two terms:

$MSE {{\tilde{θ}}_{i} (A)} = g_{1 i} (A) + g_{2 i} (A),$

where $g_{1 i} (A)$ is due to the estimation of the random area effect $v_{i}$ and $g_{2 i} (A)$ is due to the estimation of the regression parameter $β,$ with

$\begin{array}{l} g_{1 i} (A) & = & D_{i} {1 - B_{i} (A)}, \\ g_{2 i} (A) & = & B_{i}^{2} (A) {x^{'}}_{i} {X^{'} Σ^{- 1} (A) X}^{- 1} x_{i} . \end{array}$

However, the EBLUP ${\hat{θ}}_{i}$ given in (2.7) is not linear in $y$ due to the estimation of the random effects variance $A .$ Using a moments estimator of $A,$ Prasad and Rao (1990) obtained a second order correct approximation for the MSE of the EBLUP. Later, Datta and Lahiri (2000) and Das, Jiang and Rao (2004) obtained second order correct MSE approximations under ML and REML estimation of $A .$ When using the REML estimator of $A,$ their approximation to the MSE, for large $m,$ is given by

$MSE ({\hat{θ}}_{RE, i}) = g_{1 i} (A) + g_{2 i} (A) + g_{3 i} (A) + o (m^{- 1}), (3.1)$

where

$g_{3 i} (A) = B_{i}^{2} (A) \frac{V_{RE} (A)}{A + D_{i}} and V_{RE} (A) = \frac{2}{\sum_{i = 1}^{m} {(A + D_{i})}^{- 2}} .$

Note that as $m \to \infty, g_{1 i} (A) = O (1), g_{2 i} (A) = O (m^{- 1})$ and $g_{3 i} (A) = O (m^{- 1}),$ so $g_{1 i} (A)$ is the leading term in the MSE for large $m .$ However, for small $A, g_{1 i} (A)$ is approximately zero and then $g_{3 i} (A)$ might be the leading term for small $m .$ For example, taking only one covariate $(p = 1)$ with constant values $x_{i} = 1$ and constant sampling variances $D_{i} = D, i = 1, \dots, m$ and letting $A = 0,$ we obtain $g_{1 i} (0) = 0, g_{2 i} (0) = D / m$ and $g_{3 i} (0) = 2 D / m;$ that is, $g_{3 i} (0)$ is twice as large as $g_{2 i} (0) .$

Datta and Lahiri (2000) obtained an estimator of the MSE of the EBLUP ${\hat{θ}}_{RE, i}$ given by

$mse ({\hat{θ}}_{RE, i}) = g_{1 i} ({\hat{A}}_{RE}) + g_{2 i} ({\hat{A}}_{RE}) + 2 g_{3 i} ({\hat{A}}_{RE}) . (3.2)$

The MSE estimator (3.2) is second-order unbiased in the sense that

$E {mse ({\hat{θ}}_{RE, i})} =MSE ({\hat{θ}}_{RE, i}) + o (m^{- 1}) .$

In the case that $A = 0,$ the BLUP ${\tilde{θ}}_{RE, i}$ of $θ_{i}$ becomes the regression-synthetic estimator ${\hat{θ}}_{SYN, i} = {x^{'}}_{i} \tilde{β} (0) .$ But surprisingly, the approximation to the MSE of the EBLUP given in (3.1) can be very different from the MSE of the synthetic estimator. Note that the latter is

$MSE ({\hat{θ}}_{SYN, i}) = g_{2 i} (0) < g_{2 i} (0) + g_{3 i} (0),$

because $g_{3 i} (0)$ is strictly positive even for $A = 0.$ In fact, in the simple example with only one covariate $(p = 1)$ with constant values $x_{i} = 1$ and constant sampling variances $D_{i} = D, i = 1, \dots, m,$ we have $MSE ({\hat{θ}}_{SYN, i}) = g_{2 i} (0) = D / m$ whereas the approximation to the MSE of the EBLUP given in (3.1) with $A = 0$ gives $MSE ({\hat{θ}}_{RE, i}) \approx g_{2 i} (0) + g_{3 i} (0) = 3 D / m,$ three times larger. It turns out that (3.1) is not a good approximation of the MSE of the EBLUP when $A = 0$ and, instead, we should use $MSE ({\hat{θ}}_{RE, i}) = g_{2 i} (0) .$ Moreover, since for $A = 0$ this quantity does not depend on any unknown parameter, we can take it also as MSE estimator, i.e., we can take $mse ({\hat{θ}}_{RE, i}) = g_{2 i} (0) .$

In practice, the true value of $A$ is not known but we have the consistent estimator ${\hat{A}}_{RE} .$ When ${\hat{A}}_{RE} = 0,$ the EBLUP becomes the regression-synthetic estimator for all areas, that is

${\hat{θ}}_{RE, i} = {\hat{θ}}_{SYN, i} = {x^{'}}_{i} \tilde{β} (0), i = 1, \dots, m .$

In this case, $g_{1 i} ({\hat{A}}_{RE}) = 0$ for all areas and the MSE estimator given in (3.2) reduces to

$mse ({\hat{θ}}_{RE, i}) = g_{2 i} (0) + 2 g_{3 i} (0) > g_{2 i} (0) =MSE ({\hat{θ}}_{SYN, i}), i = 1, \dots, m .$

Thus, the MSE estimator given in (3.2) can be seriously overestimating the MSE for ${\hat{A}}_{RE} = 0.$ To reduce the overestimation, we consider a modified MSE estimator of ${\hat{θ}}_{RE, i}$ given by

${mse}_{0} ({\hat{θ}}_{RE, i}) = {\begin{array}{l} g_{2 i} & if {\hat{A}}_{RE} = 0, \\ g_{1 i} ({\hat{A}}_{RE}) + g_{2 i} ({\hat{A}}_{RE}) + 2 g_{3 i} ({\hat{A}}_{RE}) & if {\hat{A}}_{RE} > 0, \end{array} (3.3)$

where $g_{2 i} = g_{2 i} (0) = {x^{'}}_{i} {(X^{'} D^{- 1} X)}^{- 1} x_{i}, i = 1, \dots, m .$

In fact, for $A$ close to zero, it may happen that $g_{2 i}$ is closer to the true MSE than the full MSE estimator $mse ({\hat{θ}}_{RE, i}),$ but the question of when is $A$ close enough to zero arises. This question motivates the use of a preliminary testing procedure of $A = 0$ to define alternative MSE estimators of the EBLUP in Section 4.

Previous | Next

Date modified:: 2015-11-27

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

3. Mean squared error