1. Introduction
Jiming Jiang, Thuan Nguyen and J. Sunil Rao
Previous | Next
Observed
best prediction (OBP; Jiang, Nguyen and Rao 2011) is a new method for small
area estimation (SAE; e.g., Rao 2003). It is motivated by the fact that the
best linear unbiased prediction (BLUP) is a hybrid of best prediction (BP) and
maximum likelihood (ML) estimation, while the main interest in SAE is typically
a prediction problem. The OBP derives parameter estimation based on a purely
predictive consideration, leading to the so-called best predictive estimator
(BPE) of the model parameters. The development of the OBP in Jiang et al. (2011) mainly focuses on
the Fay-Herriot model (Fay and Herriot 1979). Another important class of SAE
models is the nested-error regression (NER) model, introduced by Battese,
Harter and Fuller (1988). The NER model may be expressed as
where the are the
area-specific random effects and are errors which
are assumed to be independent and normally distributed with mean zero, and where and are unknown.
Under the NER model, the small area mean, assuming infinite population, is for the small area,
where is the
population mean of the (assumed known; e.g.,
Rao 2003). It is seen that is a (linear)
mixed effect. Let Then, the best
predictor (BP) of is obtained by
minimizing the model-based mean squared prediction error (MSPE),
where denotes
expectation under the assumed NER model, and denotes a
predictor of By normal theory
(e.g., Jiang 2007, page 237), the BP is given by
where and are the true
parameters, and The traditional
best linear unbiased prediction (BLUP) method is based on (1.3) with replaced by its
ML estimator, assuming that is known; and
the empirical BLUP (EBLUP) is derived from the BLUP with replaced by a
consistent estimator.
The
OBP procedure (Jiang et al.
2011) derives estimators of and namely the BPE,
by minimizing the observed, design-based MSPE, which is completely different
from the traditional methods such as maximum likelihood (ML) and restricted
maximum likelihood (REML; e.g., Jiang 2007). Throughout this paper, we assume
that the samples are drawn via simple random sampling, without replacement,
from each small area, which is what the design-based approach is based upon.
Write Note that, in
practice, the small area populations are finite. Following Jiang et al. (2011), we consider a
super-population NER model. Suppose that the subpopulations of responses and auxiliary
data are realizations
from corresponding super-populations that are assumed to satisfy the NER model.
It follows that
where and satisfy the same
assumptions as in (1.1). Under the finite-population setting, the true small
area mean is (as opposed to under the
infinite-population setting) for Furthermore,
write Then, the
finite-population version of the BP (1.3) has the expression (e.g., Rao 2003, Section
7.2.5)
where denotes
(conditional) expectation under the assumed super-population NER model, and and are the true
parameters. Note that the BP is model-dependent.
In
practice, any assumed model is subject to misspecification. Jiang et al. (2011) considers
misspecification of the mean function, while assuming that the
variance-covariance structure of the data is correctly specified. However, the
latter, too, may be misspecified in practice. In this paper, we extend the
potential model misspecification to both the mean function and the
variance-covariance structure. One possible misspecification of the
variance-covariance structure is heteroscedasticity, defined in terms of for area where the are unknown and
possibly different. However, in spite of the potential model misspecification,
there are reasons that one cannot "abandon� the assumed model, and the
model-based BP. First, the assumed model and BP are relatively simple to use,
and therefore, attractive to practitioners; in particular, they utilizes simple
relationship (linear) between the response and auxiliary variables. For
example, in contrast to (1.4), which may subject to misspecification of the
mean function, one may assume where the are completely
unspecified, unknown constants. The latter model is almost always correct, but
is useless, because it does not utilize any relationship between and at all. In fact,
in practice, if auxiliary data are available, it is often "politically
incorrect� not to use them. Secondly, even though there is a concern about the
model misspecification, it often lacks (statistical) evidence on why something
else is more reasonable, or whether a complication is necessary. For example,
sometimes there is a concern about the normality assumption, but there is no
indication on why an alternative distribution, say, is more
reasonable. As another example, suppose that one fits a quadratic model and
finds that the coefficient of the quadratic term is insignificant. Then, one is
not sure whether the complication of quadratic modeling is necessary as opposed
to linear modeling. Thus, as far as this paper is concerned, we are not
attempting to change the assumed model, or the BP, (1.5), based on the assumed
model. In particular, we assume a single parameter, in (1.5) for the
ratio rather than
considering a heteroscedastic NER model such as in Jiang and Nguyen (2012), and
Nandram and Sun (2012). Our goal is to find a better way to estimate the
parameters, under the
assumed model that are involved in (1.5), so that the resulting BP, (1.5), is
more robust against model misspecifications. We do so by considering an
objective MSPE that is not model-dependent, defined as follows. Let denote the
vector of small area means, and the vector of
BPs. Note that depends on that is, The design-based
MSPE is
Note that the in (1.6) is
different from the in (1.2), (1.3),
or (1.5) in that is completely
model-free; namely, the expectation in (1.6) is with respect to the simple
random sampling from the areas, which has nothing to do with the assumed model.
Jiang et al. (2011) showed
that the MSPE in (1.6) has an alternative expression, which is a key idea of
the OBP. Namely, we have where does not depend
on and
In (1.7), is considered as
a parameter vector, rather than the true parameter vector, with Furthermore, is a
design-unbiased estimator of that has the
following expression:
The BPE of is the minimizer
of with respect to For the reader's
convenience, the derivations of (1.7) and (1.8) are provided in the Appendix. Also
note that the BP is based on the (model-based) area-specific MSPE (so it is
optimal for every small area, if the assumed model is correct), while the BPE
is based on the (design-based) overall MSPE. This is because we do not want the
estimator of to be
area-dependent. One reason is that area-dependent estimators are often unstable
due to the small sample size from the area, while an estimator obtained by
utilizing all of the areas, such as the BPE defined in this paper, tends to be
much more stable.
The
consideration of the design-based MSPE, as we do in this paper, is due to the
fact that the design-based MSPE is completely model-free. Note that, in Jiang et al. (2011), where the authors
considered the Fay-Herriot model, it is not possible to evaluate the
design-based MSPE, because the actual samples from the areas are not available
(only summaries of the data are available at the area level). Thus, instead,
the authors considered model-based MSPE under the most general, or least
restrictive, model, which simply assumes that the mean function is where is completely
unknown, for the small area. In
general, there is a "rule of thumb� on what kind of MSPE one should consider.
Essentially, the rule is that one should make the MSPE as model-free as
possible, so that it would be objective and (relatively) robust to
model-misspecifications.
In
Section 2, we first consider a simulated example in which we compare the
design-based predictive performance of the OBP with that of the EBLUP. Such
comparisons were made in Jiang et al.
(2011) under the Fay-Herriot model, but has never been done under the NER
model. Furthermore, the simulation setting involves misspecification of both
the mean function and the variance function, which, again, has not been
considered. The simulation results show that the OBP can outperform the EBLUP
not just in the overall design-based MSPE but also in the (design-based)
area-specific MSPE for every one of a large number of small areas. This is
clearly something that has never been discovered. For example, in Jiang et al. (2011), the OBP is shown
to outperform the EBLUP in the overall MSPE but not necessarily for every small
area.
An important problem of
practical interest is estimation of the area-specific MSPEs, here the
design-based MSPEs. In Section 3, we propose a bootstrap estimator for the
area-specific MSPE, which has the advantage of simplicity and always being
positive. Another simulation study is carried out to evaluate the performance
of the proposed MSPE estimator. An application to the Television School and
Family Smoking Prevention and Cessation Project (TVSFP) is discussed in Section
4.
Previous | Next