2. One-step calibration weighting
Phillip S. Kott and Dan Liao
Previous | Next
2.1 Calibration weighting and unit nonresponse
In the absence of
nonresponse (or frame errors), calibration weighting is a
sampling-weight-adjustment method that creates a set of weights
asymptotically close to the
original design weights,
that satisfy a set of calibration
equations (one for each component of
where
denotes the sample,
the sample-selection probability of unit
the population of size
a vector with
components each having a known population
total, and
means
Kott (2009)
describes a conservative set of mild conditions under which
is a nearly unbiased estimator
for the population total
(i.e., the relative bias of
is asymptotically zero). Most
importantly, each
is assumed to be bounded from
below by a positive value as
and the (expected) sample size,
grow arbitrarily large (we add
the parenthetical “expected” in case the sample size is random).
In addition, the
first four central population moments of each component of
is assumed to be bounded from
above, while
converges to a positive definite
matrix.
Using
calibration-weighting will tend to reduce mean squared error relative to the
expansion estimator,
when
is correlated with some
components of
One should keep in mind, however,
that most surveys have many
A simple way to
compute calibration weights is linearly with the following formula:
Fuller et al.
(1994) and later Lundström and Särndal (1999) argued that this linear
calibration can also be used to handle unit nonresponse. The sample
is replaced by the respondent sample
while
depending on whether the respondent sample is calibrated
to the population
or calibrated to the original sample
Either way, the estimate is nearly unbiased
under the quasi-sample-design that treats response as a second phase of random
sampling so long as each unit’s probability of response has the form:
and
is a consistent estimator for the unknown
parameter vector
in
equation (2.1).
The problem with
the response function in equation (2.1) is that the implicit estimator for
can be negative. A nonlinear form
of calibration weighting avoiding this possibility was suggested by Kott and
Liao (2012) based on the generalized exponential form of Folsom and Singh
(2000). It uses Newton’s method (iterative Taylor-series approximations) to
find a
such that the calibration
equation (from here on, we refer to the vector of component calibration
equations as the calibration equation):
holds, where
or
the lower bound of
is nonnegative (so that calibration weights
are likewise nonnegative), and the upper bound of
can be either finite or infinite.
Although there are
other reasonable forms the weight-adjustment function
can take, we will restrict our attention to
functions in the form in equation (2.3). This is a generalization of both
raking where
and the implicit estimation of a
logistic response model, where
In Deming and Stephan’s original
(1940) iterative-proportional-fitting algorithm for raking, the components of
were restricted to indicator
functions. We use “raking” more broadly here to mean calibration weighting with
a weight-adjustment function of the form
When
equation (2.3) becomes the
generalized-raking adjustment introduced in Deville and Särndal (1992) and
discussed further in Deville, Särndal and Sautory (1993). Generalized raking
not only lets the components of
be continuous but also allows the
range of the
to be constrained between a
positive
and a (possibly) finite
Deville and Särndal
(1992) required
Since the authors were not
treating samples with nonresponse (or incorrect frames),
needed to converge to 0 and
to 1 as the (expected) sample
size grew arbitrarily large. When adjusting design weights for nonresponse,
however, setting
is a more sensible strategy, so
that the implicit estimated probability of response does not exceed 1.
Although the
original definition of calibration weighting in Deville and Särndal (1992)
involved minimizing the differences between the
and
in
as measured by some loss
function, later formulations (e.g., Estevao and Särndal 2000) removed the loss
function from the definition. Forcing
and
to be close makes little sense
when calibration weighting is used to adjust for unit nonresponse since if a
sampled
has a relatively small
probability of response, then the difference between
and
should be relatively
large.
Rather than
assuming a response model with a particular functional form, an alternative
justification for using calibration weighting as a mean of removing
unit-nonresponse bias assumes a prediction model in which the survey variable
is itself a random variable such
that
for some unknown
whether or not
is sampled or whether it responds
when sampled. Kott (2006) and others have observed the calibration-weighted
estimator for
will be nearly unbiased under the
prediction model when calibration is done to the population (when
in equation (2.2)) and under the
combination of the prediction model and the original sample-selection mechanism
when calibration is done to the original sample (when
The property that
a calibration-weighted estimator is nearly unbiased in some sense when either an assumed response model or an assumed prediction model holds has been called
“double protection against nonresponse bias” by Kim and Park (2006). It is
known as “double robustness” in the biostatics literature (Bang and Robins
2005) and attributed to Robins, Rotnitzky and Zhao
(1994), which dealt with item rather than unit nonresponse.
The distribution
of
under the prediction model is
often assumed to be the same for sampled and nonsampled population members. That
is to say, the sampling mechanism is assumed to be ignorable. In addition, the distribution of
is often assumed to be the same whether
or not a population member responds when sampled, that is, that the response
mechanism is also assumed to be ignorable (Little and Rubin 2002). Here, we make weaker analogous assumptions under
the prediction model, namely, that
does not depend on whether
is sampled or when sampled
responds. Let us say that the sampling and response mechanisms are assumed to
be “first-moment ignorable”.
2.2 Instrumental variables
Deville (2000)
observed that instrumental-variable calibration can be used to adjust for
potential nonresponse bias by assuming a response model that depended on
but fitting calibration equations with
where the
satisfying equation (2.5) with
or
a consistent estimator of unknown
parameter vector
in equation (2.4). Some mild conditions are
needed for this. Sufficient are the following:
is a consistent and bounded estimator for
is everywhere twice differentiable, and
is always invertible and bounded as the sample
grows arbitrarily large.
Let
when
otherwise. It is not hard to show that
for some
between
and
as Kott and Liao (2012)
demonstrated when
Deville also noted
that it is possible for components of the
to be survey variables with
values known only for respondents. Chang and Kott (2008) extended the notion of
calibration weighting to allow the dimension of the
vector to be greater than that of
the
vector. We will not treat either possibility in the
following sections.
Kim and Shao
(2013) in treating nonignorable nonresponse call the components of
not wholly functions of the
components of
“instrumental variables”. To
limit future confusion, we will henceforth use to term “model variables” to
refer to the components of
Previous | Next