1. Introduction

Alina Matei and M. Giovanna Ranalli

Previous | Next

Nonresponse is an increasingly common problem in surveys. It is a problem because it causes missing data and, more importantly, because such gaps are a potential source of bias for survey estimates. In the presence of unit nonresponse, it is often assumed that each unit in the population has an associated probability to respond to the survey. Such a response probability is unknown and several methods are proposed to estimate it either explicitly, using response propensity modeling like logistic regression models (see e.g., Kim and Kim 2007), or implicitly, using response homogeneity groups or more generally calibration (see Särndal and Lundström 2005, for an overview). Once estimates are computed, a commonly used method to deal with unit nonresponse is reweighting: sampling weights of the respondents are adjusted by the inverse of the estimated response probability providing new weights. Estimation of response probabilities typically requires the availability of auxiliary information, either in the form of the value of some auxiliary variables for all units in the originally selected sample or of their population mean or total.

In this paper, we are particularly interested in the case where the missing data mechanism is non-ignorable, because nonresponse depends on characteristics of interest that are either observed only on the respondents or are completely unobserved, which leads to data that are Not Missing At Random (NMAR). This is typical of, but not limited to, surveys with sensitive questions (concerning drug abuse, sexual attitudes, politics, income, etc). Various approaches are proposed in the survey sampling literature to deal with non-ignorable nonresponse. These approaches can be roughly divided into likelihood based methods and reweighting methods. Note that all of these methods make use of observed auxiliary information. Survey problems with non-ignorable nonrespondents are discussed e.g., in Greenlees, Reece and Zieschang (1982), Little and Rubin (1987), Beaumont (2000), Qin, Leung and Shao (2002), Zhang (2002). Copas and Farewell (1998) introduce into the British National Survey of Sexual Attitudes and Lifestyles a variable called ‘enthusiasm-to-respond’ to the survey, which is expected to be related to probabilities of unit and item response. A method is proposed that estimates these probabilities using this variable to achieve unbiased estimates of population parameters. An approach based on the use of latent variables for modeling nonignorable nonresponse is given in Biemer and Link (2007), extending the ideas in Drew and Fuller (1980) and using a discrete latent variable based on call history data available for all sample units. The latent variable is computed using some indicators of level of effort based on call attempts.

We propose here a method of reweighting to reduce nonresponse bias in the case of non-ignorable nonresponse. The method does not require the availability of auxiliary information, on the sample or population level, but different assumptions are made. First, it is assumed that item nonresponse is present in the survey and that it affects m MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXddrpe0=1qpeea0=yrVue9 Fve9Fve8meaabaqaciaacaGaaeqabaWaaeaaeaaakeaacaWGTbaaaa@3957@  variables of particular interest. Thus a response indicator can be defined for each variable , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXddrpe0=1qpeea0=yrVue9 Fve9Fve8meaabaqaciaacaGaaeqabaWaaeaaeaaakeaacqWItecBca GGSaaaaa@3A46@  for = 1, , m , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXddrpe0=1qpeea0=yrVue9 Fve9Fve8meaabaqaciaacaGaaeqabaWaaeaaeaaakeaacqWItecBcq GH9aqpcaaIXaGaaGilaiablAciljaaiYcacaWGTbGaaiilaaaa@3F87@  taking value 1 if item MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXddrpe0=1qpeea0=yrVue9 Fve9Fve8meaabaqaciaacaGaaeqabaWaaeaaeaaakeaacqWItecBaa a@3996@  is observed on unit k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXddrpe0=1qpeea0=yrVue9 Fve9Fve8meaabaqaciaacaGaaeqabaWaaeaaeaaakeaacaWGRbaaaa@3955@  and 0 otherwise. Next, the response indicators are assumed to be manifestations of an underlying continuous scale which determines a latent variable that is related to the response propensity of the units and to the variable of interest. It is possible to compute such a latent variable for all units in the sample, not only for the respondents, and thus to use it as an auxiliary variable in a response probability estimation procedure. The outcome of this estimation procedure can finally be used in a reweighting fashion.

The use of continuous latent variables to model item nonresponse is considered in Moustaki and Knott (2000). In this paper, we take a different perspective and use latent variable models to address non-ignorable unit nonresponse. We propose to use a latent variable called here ‘will to respond to the survey’, which is expected to be related to the probability of unit response, similar to the case of the ‘enthusiasm-to-respond’ variable as defined by Copas and Farewell (1998). Following Moustaki and Knott (2000), ‘weighting through latent variable modeling is expected to perform well under non-ignorable nonresponse where conditioning on observed covariates only is not enough.’ Moreover, in the absence of any covariate, we expect that an estimator based on the proposed weighting system using latent variables will perform better in terms of bias reduction than the naive estimator computed on the set of respondents. Moustaki and Knott (2000) propose a reweighting system for item non-response using covariates and one or more latent variables. Our major contribution over the existing literature is to construct a weighting system to deal with unit and item non-response based only on latent variables and that can also be used in the absence of any other covariate. On the other hand, our approach is different to that of Copas and Farewell (1998), because they survey their ‘enthusiasm-to-respond’ variable on the respondents to quantify the interest in answering the survey and a set of covariates, while we infer it from the data.

The paper is organized as follows. Section 2 introduces the survey framework and notation. Section 3 illustrates estimation of response probabilities. Section 4 describes the latent trait model used to this end. The proposed estimator and its variance estimation are shown in Section 5. In Section 6, the empirical properties of the proposed estimator are evaluated via simulation studies. In Section 7 we summarize our conclusions.

Previous | Next

Date modified: