Browse by

3. Proposed method

Jae Kwang Kim and Shu Yang

We first consider a particular fractional hot deck imputation method, called full fractional imputation, where the imputed values are taken from the set of respondents denoted as $A_{R} = {i \in A; δ_{i} = 1}$ . That is, the $j -th$ imputed value of missing $y_{i},$ denoted by $y_{i}^{* (j)},$ is equal to the $j -th$ value of $y$ among the set in $A_{R} .$ We propose a fractional hot deck imputation approach that makes use of the parametric model assumption $f (y | x; θ)$ . If all of the elements in $A_{R}$ are selected as the imputed values for missing $y_{i},$ we can treat ${y_{j}; j \in A_{R}}$ as a realization from $f (y_{j} | δ_{j} = 1)$ and fractional weight assigned to donor $y_{j}$ for the missing item $y_{i}$ is, by choosing $h (y_{j} | x_{i}) = f (y_{j} | δ_{j} = 1)$ in (2.6),

$\begin{array}{l} w_{i j}^{*} & \propto & f (y_{j} | x_{i}, δ_{i} =0; \hat{θ}) / f (y_{j} | δ_{j} =1) & (3.1) \\ \propto & f (y_{j} | x_{i;} \hat{θ}) / f (y_{j} | δ_{j} =1), \end{array}$

with $\sum_{j; δ_{j} = 1} w_{i j}^{*} = 1$ , and $\hat{θ}$ being the MLE obtained from (2.4). The second line follows from the MAR assumption. Furthermore, we can write

$\begin{array}{l} f (y_{j} | δ_{j} =1) & = & \int f (y_{j} | x, δ_{j} =1) f (x | δ_{j} =1) d x & (3.2) \\ = & \int f (y_{j} | x) f (x | δ_{j} =1) d x \\ ≅ & \frac{1}{N_{R}} \sum_{k =1}^{N} δ_{k} f (y_{j} | x_{k}), \end{array}$

where the second equality follows from the MAR assumption, and the last (approximate) equality follows by approximating the integral by the population empirical distribution, and $N_{R}$ is the number of respondents in the population. Using the survey weights, we can approximate

$f (y_{j} | δ_{j} = 1) ≅ \frac{\sum_{k \in A_{R}} w_{k} f (y_{j} | x_{k})}{\sum_{k \in A_{R}} w_{k}}$

and the fractional weights in (3.1) are computed from

$w_{i j}^{*} \propto \frac{f (y_{j} | x_{i}; \hat{θ})}{\sum_{k \in A_{R}} w_{k} f (y_{j} | x_{k}; \hat{θ})} (3.3)$

with $\sum_{j \in A_{R}} w_{i j}^{*} = 1$ . In (3.3), the point mass $w_{i j}^{*}$ assigned to donor $y_{j}$ for missing unit $i$ is expressed by the ratio of the density $f (y | x) .$ Thus, for each missing unit $i, n_{R} = | A_{R} |$ observations are used as donors for the hot deck imputation using $w_{i j}^{*}$ as the fractional weights. Such fractional imputation can be called full fractional imputation (FFI) because there is no randomness due to the imputation mechanism. The FFI estimator of $η,$ defined by $\sum_{i = 1}^{N} U (η; x_{i}, y_{i}) = 0$ , is then computed by solving

$\sum_{i \in A} w_{i} {δ_{i} U (η; x_{i}, y_{i}) + (1 - δ_{i}) \sum_{j \in A_{R}} w_{i j}^{*} U (η; x_{i}, y_{j})} = 0, (3.4)$

where $w_{i j}^{*}$ is defined in (3.3). Note that the imputed estimating equation (3.4) is a good approximation to the expected estimating equation in (2.2).

In survey sampling, an imputed data set with a large imputation size may not be desirable. Thus, instead of taking all the observations in $A_{R}$ as donors for each missing item, a subset of $A_{R}$ can be selected to reduce the size of the donor set of missing $y_{i} .$ Thus, the selection of the donors is viewed as a sampling problem and we use an efficient sampling design and weighting techniques to obtain efficient imputation estimators. For the donor selection mechanism, efficient sampling designs, such as a stratified sampling design or systematic Proportional-to-Size (PPS) sampling, can be used to select donors of size $m .$ A systematic PPS sampling for fractional hot deck imputation can be described as follows:

Within each $i$ with $δ_{i} = 0,$ sort the donors in the full respondent set ${y_{j}; δ_{j} = 1}$ in ascending order as $y_{(1)} \leq \dots \leq y_{(r)}$ and use $w_{i (j)}^{*}$ to denote the fractional weight associated with $y_{(j)} .$ That is, $w_{i (j)}^{*} = w_{i k}^{*}$ for $y_{(j)} = y_{k} .$
Partition $[0,1]$ by ${I_{j} \equiv [\sum_{k = 0}^{j} w_{i (j)}^{*}, \sum_{k = 0}^{j + 1} w_{i (j)}^{*}), j = 1, \dots, r - 1},$ where $w_{i (0)}^{*} = 0.$
Generate $u \sim uniform (0, 1 / m)$ and let $u_{k} = u + k / m,$ $k = 0, \dots, m - 1.$ For $k = 0, \dots, m - 1,$ if $u_{k} \in I_{j}$ for some $0 \leq j \leq r - 1,$ include $j$ in the sample $D_{i} .$

After we select $D_{i}$ from the complete set of respondents, the selected donors in $D_{i}$ are assigned with the initial fractional weights $w_{i j 0}^{*} = 1 / m$ . The fractional weights are further adjusted to satisfy

$\sum_{i \in A} w_{i} {(1 - δ_{i}) \sum_{j \in D_{i}} w_{i j, c}^{*} q (x_{i}, y_{j})} = \sum_{i \in A} w_{i} {(1 - δ_{i}) \sum_{j \in A_{R}} w_{i j}^{*} q (x_{i}, y_{j})}, (3.5)$

for some $q (x_{i}, y_{j})$ , and $\sum_{j \in D_{i}} w_{i j, c}^{*} = 1$ for all $i$ with $δ_{i} = 0,$ where $w_{i j}^{*}$ is the fractional weights for FFI method, as defined in (3.3). Regarding the choice of the control function $q (x, y)$ in (3.5), we can use $q (x, y) = {(y, y^{2})}^{'}$ , which keeps the empirical distributions of $y$ for $D_{i}$ and $A_{R}$ as close as possible in the sense that the first and second moment of $y$ are the same. Other choices can also be considered. See Fuller and Kim (2005).

The problem of adjusting the initial weights to satisfy certain constraints is often called calibration and the resulting fractional weights can be called calibrated fractional weights. Using the idea of regression weighting, the final calibration fractional weights that satisfy (3.5) and $\sum_{j} w_{i j, c}^{*} = 1$ can be computed by

$w_{i j, c}^{*} = w_{i j 0}^{*} + w_{i j 0}^{*} Δ (q_{i j}^{*} - {\bar{q}}_{i \cdot}^{*}), (3.6)$

where $q_{i j}^{*} = q (x_{i}, y_{j}), {\bar{q}}_{i \cdot}^{*} = \sum_{j \in A_{R}} w_{i j 0}^{*} q_{i j}^{*},$

$Δ = {C_{q} - \sum_{i \in A} w_{i} (1 - δ_{i}) \sum_{j \in A_{R}} w_{i j 0}^{*} q_{i j}^{*}}^{T} {\sum_{i \in A} w_{i} (1 - δ_{i}) \sum_{j \in A_{R}} w_{i j 0}^{*} {(q_{i j}^{*} - {\bar{q}}_{i \cdot}^{*})}^{\otimes 2}}^{- 1}$

and $C_{q} = \sum_{i \in A} w_{i} {(1 - δ_{i}) \sum_{j \in A_{R}} w_{i j}^{*} q (x_{i}, y_{j})}$ . Here, $B^{\otimes 2}$ denotes $B B^{T} .$ Some of the fractional weights computed by (3.6) can take negative values. If that happens, algorithms alternative to regression weighting should be used. For example, consider entropy weighting, where the fractional weights of the form

$w_{i j, c}^{*} = \frac{w_{i j}^{*} \exp (Δ q_{i j}^{*})}{\sum_{k \in A_{R}} w_{i k}^{*} \exp (Δ q_{i k}^{*})} (3.7)$

are approximately equal to the regression fractional weights in (3.6) and are always positive. Once the calibration fractional weights are obtained, the FHDI estimator of $η$ is then computed by solving

$\sum_{i \in A} w_{i} {δ_{i} U (η; x_{i}, y_{i}) + (1 - δ_{i}) \sum_{j \in D_{i}} w_{i j, c}^{*} U (η; x_{i}, y_{j})} = 0. (3.8)$

For variance estimation, a replication method can be used. See Appendix A.1 for a brief discussion of the replication variance estimator for the proposed method.

Furthermore, the proposed method can handle non-ignorable non-response under the correct specification of the response model. See Appendix A.3 for the extension to non-ignorable non-response case.

Previous | Next

Date modified:: 2017-09-20

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

3. Proposed method