4. Estimation of the parameters of interest
Andrés Gutiérrez, Leonardo Trujillo and Pedro Luis do
Nascimento Silva
Previous | Next
Let be the total number of
respondents for the population of interest having a classification at time and at time Let be the total number of
individuals in the population not responding at time but responding at time with classification Let denote the total number of
individuals in the population not responding at time but responding at time with classification and finally let be the total number of
individuals at the population not responding at any of the two periods of
observation. It follows that the total size of the population, , must satisfy:
Defining the
following characteristics of interest, it is possible to define the parameters
of interest:
Then, the product of these quantities, defined as , corresponds to a new characteristic of interest taking the value one
if the individual has responded at both times and is classified in the cell , or zero otherwise. Also,
Define the
following dichotomic characteristics:
It follows that
Let denote the weight for the -th individual corresponding to a
specific sampling strategy (sampling design and estimator) in both waves. Then
the following expressions represent the estimators of the parameters of
interest:
for , , and respectively. Note that an
unbiased estimation for the population size is given by
where
Taking into
account the functional form of all the parameters of interest, and noticing
that the likelihood function of the model is proportional to (3.1), we arrive
at the following result.
Result 4.1 The
log-likelihood for the observed data at the population can be rewritten as
where
where is a vector containing the characteristics , is a vector containing the characteristics is a vector containing the characteristics , and is a vector containing the characteristics (for every and ).
Now, in order to
obtain estimators of the parameters, it is necessary to maximize this last
function. Using standard techniques of maximum likelihood, the corresponding
likelihood equations are given by
where the vector , commonly known as scores, is
defined by
Also, as it is not
usual to survey the whole population, a probability sample is selected and the
expression is considered as a population
parameter. In this way, considering as the corresponding sampling
weights, an unbiased estimator for this sum of scores is defined as The next expression is known as
the pseudo-likelihood equation and it is an effective way to find estimators
for the model parameters taking into account the sampling weights:
It is assumed that
for the model in this paper, the initial probability of an individual
responding at time is the same for all the possible
classifications in the survey. Also, the transition probabilities between
respondents and nonrespondents do not depend on the classification of the
individual in the survey, and . Considering these assumptions, the following results will let the
estimation of the Markov model probabilities take into account the sampling
weights.
Result 4.2 Under
the assumptions of the model, the resulting maximum pseudo-likelihood
estimators for and are given by
respectively.
Result 4.3 Under
the assumptions of the model, the resulting maximum pseudo-likelihood
estimators for and are obtained through iteration until
convergence of the next expressions
respectively. The superindex denotes the value of the estimation for the
parameters of interest at the iteration.
The results before
provide an exhaustive frame for the implementation of the two-stage Markovian
model in order to take into account the sampling weights in longitudinal
surveys. Another question of interest is how to choose the initial values and . In general, any set of values is valid if they follow the initial
restrictions. These are
However, following
the guidelines at Chen and Fienberg (1974) and considering the hypothetical
case where all of the individuals responded in both periods, then (for every ) and (for every ) and their sampling estimations
are also null. Given this, and considering the expressions of the resulting
estimators, a sensible choice is given by
Lastly, this
iterative approach is commonly implemented for estimation problems by maximum
likelihood in contingency tables. However, some approaches for the fit of
log-linear models in contingency tables for complex survey designs can be found
at Clogg and Eliason (1987), Rao and Thomas (1988), Skinner and Vallet (2010),
among others. The next result provides an approach to gross flow estimation
considering the sampling weights at both periods of interest.
Result 4.4 Under the
assumptions of the model, a sampling estimator of is
Previous | Next