A few remarks on a small example by Jean-Claude Deville regarding non-ignorable non-response Section 1. Deville’s example
During a conference at the University of Neuchâtel, Jean-Claude Deville (2005) presented a simple example to illustrate the value of generalized calibration for dealing with non-ignorable non-response (regarding generalized calibration, see Deville 2000, 2002 and 2004; Kott 2006; Chang and Kott 2008; Kott and Chang 2010; and Lesage and Haziza 2015). The example is reproduced below in its entirety.
Adjustments to offset the effects of non-response require very accurate knowledge of the factors that cause it. In particular, if what is to be measured directly influences the response probability, we must take risks with the data. Here is a small fictional example: A group of students is interviewed about their use of drugs. The survey results are as follows:
Yes | No | Non-response | Combined | |
---|---|---|---|---|
Boys | 40 | 80 | 180 | 300 |
Girls | 20 | 160 | 120 | 300 |
Combined | 60 | 240 | 300 | 600 |
Naively, we would think that the percentage of drug users is estimated at 60/(240 + 60)= 25%. This estimate is made under the assumption that non-respondents have the same behaviour as respondents. However, we notice that the response rate for girls is greater than the response rate for boys. To correct that, we calculate the rate of drug users among girls, or 1/9, and among boys, or 3/9, and we conclude that the rate of drug users in observed student population is 2/9 = 22.2%. Now, if we think that drug use is causing the non-response, the model has two parameters and the response probabilities of users and non-users, respectively. We find that these probabilities equal 0.2 and 0.8, respectively. The estimated number of users is therefore 200 among boys and 100 among girls, and the estimated overall percentage is 50!
At first glance, the example is simple, and it perfectly explains the usual typology of the three non-response mechanisms. Each of the three estimates proposed in the example corresponds to one of the three categories below:
- Missing completely at random (MCAR): The response probability does not depend on the variable of interest (drug use) or on the auxiliary variable (gender).
- Missing at random (MAR): The response probability does not depend on the variable of interest after conditioning on the auxiliary variable (gender). In this case, the response probability would therefore depend on gender only.
- Not missing at random (NMAR): The response probability depends on the variable of interest itself (drug use) even if consideration is given to the auxiliary variable
The example shows the value of generalized calibration, which can deal directly with NMAR. Jean-Claude Deville addresses the problem by considering the probabilities and as parameters to be estimated. This example can be dealt with in several ways, depending on one’s point of view on inference.
In the following, we will show that there are at least three methods to address the problem, namely the method of moments, the maximum likelihood method and calibration. The maximum likelihood method was not dealt with by Jean-Claude Deville. We develop calculations completely for the first two estimation methods by considering the two models. We also calculate the calibration and generalized calibration results.
We show that the three results obtained are identical. The estimated likelihood function could be used to choose between the two models. Unfortunately, the function has the same value for both models, which does not make it possible to choose the model. However, we propose a way to make a choice.
In Section 2, we present the notation used. Section 3 is devoted to estimation using the method of moments, and Section 4 is devoted to estimation using the maximum likelihood method. In Section 5, we apply the calibration and generalized calibration methods. We close with a discussion on the value of each method in Section 6.
- Date modified: