2. Motivation
Andrés Gutiérrez, Leonardo Trujillo and Pedro Luis do
Nascimento Silva
Previous | Next
2.1 Sampling designs
and estimators
Consider a finite
population as a set of
units, where
, forming the universe of study.
is known as the population size.
Each element belonging to the population can be identified with an index
Let
be the index set given by
The selection of a sample
is done according to a sampling
design defined as the multivariate probability distribution over a support
in a way that
for every
and
Under a sampling
design
, an inclusion probability is assigned to every element in the
population in order to denote the probability that the element belongs to the
sample. For the
-th element in the population
this probability is denoted as
and it is known as the first
order inclusion probability given by
where
is a random variable denoting the
membership of the element
to the sample, and the subindex
refers to the sum over all the
possible samples containing the
-th element. Analogously,
is known as the second order
inclusion probability and it denotes the probability that the elements
and
belong to the sample and it is
given by
The aim of the
sample survey is to study a characteristic of interest
associated with every unit in the
population and to estimate a function of interest
called a parameter.
This inferential
approach is known as design-based inference. Under this approach, the estimates
of the parameters and their properties depend directly on the discrete
probability measure related to the chosen sampling design and do not take into
account the properties of the finite population. Also, the values
are taken as the observation for
the individual
for the characteristic of
interest
. Also,
is considered as a fixed quantity
rather than a random variable.
Then, the
Horvitz-Thompson (HT) estimator can be defined as:
where
is the reciprocal of the
first-order inclusion probability and it is known as the expansion factor or
basic design weight. The HT estimator is unbiased for the total population
, (assuming all the first order inclusion probabilities are greater than
zero) and its variance is given by
where
. If all the second-order inclusion probabilities are greater than zero,
an unbiased estimator of (2.1) is given by
Gambino and Silva
(2009) suggest that in a household survey, the main interest is to focus on
characteristics for particular household members that could be related to health
variables, educational variables, income/expenses, employment status, etc. In
general, the sampling designs used for this kind of survey are complex and use
techniques such as stratification, clustering or unequal probabilities of
selection. Some of the results from repeated surveys consider the estimation of
level at a particular point of time, estimation of changes between two survey
rounds and the estimation of the average level parameters over repeated rounds
of a survey. Different rotation schemes and the frequency of the survey can
affect considerably the precision of the estimators.
2.2 Pseudo-likelihood
Some authors such
as Fuller (2009), Chambers and Skinner (2003, p. 179), and Pessoa and Silva
(1998, chapter 5) consider the problem where the maximum likelihood estimation
is appropriate for simple random samples, as is the case in Stasny (1987), but
not for samples resulting from a complex survey design. Under this scheme, it
is assumed that the density population function is
where the parameter of interest
is
. If there is access to the information for the whole population,
through a census, the maximum likelihood estimator of
can be obtained by maximizing
with respect to
. We will denote
as the value maximizing the last
expression. The likelihood equations for the population are given by
The
are known as scores and they are defined as
The
pseudo-likelihood approach considers that
is the parameter of interest
according to the information collected in a complex sample. If
is considered as the parameter of
interest, it is possible to estimate it using a weighted linear estimator
where
is a sampling design weight such
as the inverse of the inclusion probability of the individual
Then, it is possible to obtain an
estimator for
solving the resulting equation
system.
Definition 2.1 A
maximum pseudo-likelihood estimator
for
corresponds to the solution of the
pseudo-likelihood equations given by
Using the Taylor
linearization method, the asymptotic variance of a maximum pseudo-likelihood
estimator based on the sampling design is given by
where
is the variance of the estimator
for the population total of the scores based on the sampling design and
An estimator for
is given by
where
is a consistent estimator for the
variance of the estimator of the population total of the scores and
Then, following Binder (1983), the asymptotic distribution of
is normal since
These definitions offer a solid background for the correct inference
when using large samples as is the case in labour force surveys.
2.3 Nonresponse
Särndal and
Lundström (2005) state that nonresponse has been a topic of increasing interest
in national statistical offices during the last decades. Also, in the sampling
survey literature, the attention to this topic has increased considerably.
Nonresponse is a common non desirable issue in the development of a survey that
can affect considerably the quality of the estimates.
Lohr (1999)
discusses several types of nonresponse mechanisms:
- The
nonresponse mechanism is ignorable when the probability of an individual
responding to the survey does not depend on the characteristic of interest.
Note that the word "ignorable" makes reference to a model explaining
the mechanism.
- On the
other hand, the nonresponse mechanism is nonignorable when the probability of
an individual responding to the survey depends on the characteristic of
interest. For example, in a labour survey, the possibility of response may
depend on the labour force classification of the individuals in a household.
Lumley (2009,
chapter 9) analyses individual nonresponse with partial data for a respondent
considering a design-based approach adjusting the sampling weights. Fuller (2009,
chapter 5) considers some imputation techniques for the nonresponse treatment
through probabilistic models and sampling weights. Särndal (2011) considers a
model-based approach through balanced sets in order to achieve higher
representativeness of the estimates. In the same way, Särndal and Lundström
(2010) propose a set of indicators in order to judge the effectiveness of
auxiliary information in order to control the bias generated by nonresponse. Särndal
and Lundström (2005) give a large number of references about nonresponse. These
references examine two main complementary aspects in a survey: prevention of
the problem of nonresponse (before it happens) and estimation techniques in
order to take into account nonresponse in the inference process. This second
aspect is known as adjustment for nonresponse.
Previous | Next