3. Proposed
method
Jae Kwang Kim and Shu Yang
Previous | Next
We first consider a particular fractional hot deck
imputation method, called full
fractional imputation, where the imputed values are taken from the set of
respondents denoted as
. That is, the
imputed value of missing
denoted by is equal to the value of among the set in We propose a fractional hot deck
imputation approach that makes use of the parametric model assumption . If all of the elements in are selected as the imputed
values for missing we can treat as a realization from and fractional weight assigned to
donor for the missing item is, by choosing in (2.6),
with
, and being the MLE obtained from (2.4).
The second line follows from the MAR assumption. Furthermore, we can write
where the second equality follows from the MAR
assumption, and the last (approximate) equality follows by approximating the
integral by the population empirical distribution, and is the number of respondents in
the population. Using the survey weights, we can approximate
and the fractional weights in (3.1) are
computed from
with
. In (3.3), the point mass assigned to donor for missing unit is expressed by the ratio of the
density Thus, for each missing unit observations are used as donors
for the hot deck imputation using as the fractional weights. Such
fractional imputation can be called full fractional imputation (FFI) because
there is no randomness due to the imputation mechanism. The FFI estimator of defined by , is then computed by solving
where is defined in (3.3). Note that
the imputed estimating equation (3.4) is a good approximation to the expected
estimating equation in (2.2).
In survey sampling, an imputed data set with a
large imputation size may not be desirable. Thus, instead of taking all the
observations in
as donors for each missing item,
a subset of
can be selected to reduce the
size of the donor set of missing
Thus, the selection of the donors
is viewed as a sampling problem and we use an efficient sampling design and
weighting techniques to obtain efficient imputation estimators. For the donor
selection mechanism, efficient sampling designs, such as a stratified sampling
design or systematic Proportional-to-Size (PPS) sampling, can be used to select
donors of size
A systematic PPS sampling for
fractional hot deck imputation can be described as follows:
- Within
each
with
sort the donors in the full
respondent set
in ascending order as
and use
to denote the fractional weight
associated with
That is,
for
- Partition
by
where
- Generate
and let
For
if
for some
include
in the sample
After we select
from the complete set of
respondents, the selected donors in
are assigned with the initial
fractional weights
. The fractional weights are further adjusted to satisfy
for some
, and
for all
with
where
is the fractional weights for FFI
method, as defined in (3.3). Regarding the choice of the control function
in (3.5), we can use
, which keeps the empirical distributions of
for
and
as close as possible in the sense
that the first and second moment of
are the same. Other choices can
also be considered. See Fuller and Kim (2005).
The problem of adjusting the initial weights to
satisfy certain constraints is often called calibration and the resulting
fractional weights can be called calibrated fractional weights. Using the idea
of regression weighting, the final calibration fractional weights that satisfy
(3.5) and
can be computed by
where
and . Here,
denotes
Some of the fractional weights
computed by (3.6) can take negative values. If that happens, algorithms
alternative to regression weighting should be used. For example, consider
entropy weighting, where the fractional weights of the form
are approximately equal to the regression
fractional weights in (3.6) and are always positive. Once the calibration
fractional weights are obtained, the FHDI estimator of
is then computed by solving
For variance estimation, a replication method
can be used. See Appendix A.1 for a brief discussion of the replication
variance estimator for the proposed method.
Furthermore, the proposed method can handle
non-ignorable non-response under the correct specification of the response
model. See Appendix A.3 for the extension to non-ignorable non-response case.
Previous | Next