3. Modified regression
estimation for changing survey frames
John Preston
Previous | Next
The MR estimators can be extended to the situation of
changing survey frames by adding "births� into the population at the previous
time period, and adding "deaths� into the population at the current time period
to create a "pseudo-population� (Diagram 3.1). These "pseudo-populations� will
satisfy the requirement that the units in the population remain unchanged
between the previous and current time periods. A full description of the
extension to the MR estimator for changing survey frames is outlined below.
Consider a dynamic population which changes over time
due to the addition of "births� and the deletion of "deaths�. At time
the union of
and
can be divided into three components. The
first component consists of units in the population in stratum
at time
but not at time
referred to as the "death� population
in stratum
comprised of
units. The second component consists of units
in the population in stratum
at time
and time
referred to as the "common� population
in stratum
comprised of
units. The third component consists of units
in the population in stratum
at time
but not at time
referred to as the "birth� population
in stratum
comprised of
units. Those units in the population which change
stratum between time
and
are included in the "death� population
under their stratum at time
and are also included in the "birth�
population
under their stratum at time
Diagram 3.1 Standard and pseudo populations and samples
Description for Diagram 3.1
At time
define the "pseudo-population�
in stratum
as the union of
and
comprised of
units. It is important to note that the "pseudo-population�
at time
is different to the "pseudo-population�
at time
as the "pseudo-population�
at time
is based on the union of
and
while the "pseudo-population�
at time
is based on the union of
and
Hence the "pseudo-populations� for the current
and previous time periods need to be calculated at each time period. Define the
"pseudo-values� for the variable of interest
for unit
at time
and time
as:
and define
the "pseudo-values� for the auxiliary variables
for unit
at time
and time
as:
At time
denote
and
as the "pseudo-samples� in stratum
where
consists of all units selected in the original
sample
in stratum
at time
plus a random sample of units
from the "birth� population
in stratum
at time
selected with inclusion probabilities
and
consists of all units selected in the original
sample
in stratum
at time
plus a random sample of units
from the "death� population
in stratum
at time
selected with inclusion probabilities
Let
and
denote the sample sizes in the
"pseudo-samples�
and
respectively. Once again it is important to
note that the "pseudo-sample�
at time
is different to the "pseudo-sample�
at time
as the "pseudo-sample�
at time
includes a random sample of units from the
"birth� population at time
while the "pseudo-sample�
at time
includes a random sample of units from the
"death� population at time
Hence the "pseudo-samples� for the current and
previous time periods need to be calculated at each time period.
The choice of an appropriate sample selection technique,
for the selection of the additional random samples of units from the "birth�
and "death� populations, will depend on the sample selection technique used to
select the original samples. Many repeated business surveys select their
samples using a permanent random number (PRN) selection technique, to enable
some control of the rotation of units into and out of sample from one time period
to the next. Consider the simplest case where the original samples
and
in stratum
described by
and
where
and
are the selection interval start and end
points in stratum
at time
and
is the permanent random number for unit
In this case the "pseudo-samples�
and
in stratum
are described by
and
This selection technique will give a similar
amount of overlap between the samples from the "death� population at time
and
and between the samples from the "birth�
population at time
and
as between the samples from the "common�
population at time
and
Clearly the amount of overlap between the
samples from the "death� and "birth� populations will affect the behaviour of
the estimates and optimising the amount of overlap could be investigated.
Define the "pseudo-design weights�
for all units in the "pseudo-sample�
and
for all units in the "pseudo-sample�
Since the "pseudo-design weights� for the
original sampled units are equal to the original design weights and the
"pseudo-values� for the variable of interest are equal to zero for the additional
sampled units from the "birth� and "death� populations, then the HT estimator
based on the "pseudo-sample�, "pseudo-values�
and "pseudo-design weights� is equivalent to the HT estimator
based on the original sample, original values
and original design weights. Hence the inclusion of these additional sampled
units into the "pseudo-sample� from the "birth� and "death� populations will
not introduce any extra variability into the point-in-time estimates.
The proposed MR estimator for the special case of
changing survey frames can be written as:
where
and
is the "pseudo
weight� for unit
at time
given by:
and the MR1,
MR2 and MRR values for the "pseudo-composite auxiliary variables� are given by:
where
is a correction factor applied to
the MR1, MR2 and MRR values to account for the relative change in the
population size in stratum
between time
and time
The other adjustments to the MR2
and MRR values were made to ensure that the HT estimator for the
"pseudo-composite auxiliary variables�
at time
is unbiased for the corresponding
key survey variables
at time
A simple proof of the
unbiasedness of the HT estimator for the "pseudo-composite auxiliary variables�
is shown in the Appendix.
The HT estimator
is equivalent to
since the "pseudo-values� for the variable of
interest are equal to zero for the additional sampled units from the "birth�
and "death� populations. Similarly the GR estimator
is equivalent to
since the "pseudo-values� for the variable of
interest and the auxiliary variables are equal to zero for the additional
sampled units from the "birth� and "death� populations. However, the MR
estimator
is not equivalent to
since the "pseudo-values� for the composite
auxiliary variables are not equal to zero for the additional sampled units from
the "birth� and "death� populations.
The proposed procedure of adding "births� into the
population at the previous time period and adding "deaths� into the population
at the current time period is performed independently at each time period, so
there is no accumulation of "births� and "deaths� in the "pseudo-population�
over time.
Previous | Next