3. Composite generalized regression estimation for design (c)
Takis Merkouris
Previous | Next
A
computationally very convenient, but generally suboptimal, variant of
in (2.6) is
obtained by replacing the matrix
with the
diagonal "weighting matrix�
having
as
diagonal entry,
where
are the design
weights of
and
are positive
constants. This gives the multivariate composite generalized regression (CGR)
estimator of
where
is the
associated matrix regression coefficient. For an extensive discussion of the
generalized regression estimator in a single sample, see Särndal et al. (1992, Chapter 6). The
CGR estimator may be compactly written as
i.e., as a sum
of weighted sample regression residuals. The coefficient
is optimal in
the sense of generalized least squares, i.e., it minimizes the quadratic form
in these
residuals. Similarly to the COR estimator, the CGR estimator too can be
obtained in calibration form as
where the vector
minimizes the
generalized least-squares distance
and satisfies
the constraints
and
This extends to
the present context the well-known equivalence of generalized regression
estimation and calibration estimation (Deville and Särndal 1992) for a single-sample
setting. Now using the subvector of calibrated weights
for sample
only, we obtain
the composite estimators in (3.1) in the simple linear forms
and
Using Lemma 1
and the diagonal structure of
it works out
that
can be written
as
where
is the
generalized regression (GR) counterpart of
The matrix
regression coefficient
is written
explicitly as
where
If
and
were
uncorrelated, or if information on
was not used in
the estimation of
then it would be
and
But the GR
estimator
is generally
more efficient than the HT estimator
and since
(in the partial
ordering of non-negative definite matrices), it is clear that more weight is
given to
in (3.2),
through
than would have
been given to the component estimator
in the simple
composite estimator involving only information on
This suggests
that the CGR estimator in (3.2), incorporating information from sample
is a more
efficient estimator. Suggestive of the efficiency of
is also its
alternative expression, obtained using (2.11),
where
is the composite
regression estimator of
using
information on
from
and
In
general, the computationally simpler CGR estimator
involving the
coefficient
is less
efficient than the optimal composite regression estimator
which involves
the estimated optimal coefficient
and has the same
asymptotic variance as the BLUE in (2.3); the efficiency loss may be larger in
nested matrix sampling, for which the matrix
is not block-diagonal.
On the other hand,
may be unstable
in small samples, when there is a small number of degrees of freedom available
for the estimation of
which is
particularly so in nested matrix sampling; for a discussion of the relative
stability of the optimal versus the generalized regression estimator in the
single-sample case see Rao (1994) or Montanari (1998). For certain sampling
strategies, described in the following theorem,
and the CGR
estimator is the COR estimator, and asymptotically is BLUE; the proof is given
in the Appendix.
Theorem 1 Consider the following sampling strategies.
Non-nested design
- For all three samples
and
assume stratified simple random sampling
without replacement (STRSRS) with sampling fraction
in stratum
of sample
and
denoting stratum size, and specify the
constants
in
as
for all units of stratum
Furthermore, assume that within each sample
the units are sorted by stratum, and consider the augmented design matrix
in (2.7), where
is the block diagonal matrix
and
is the diagonal matrix
with diagonal element
being a vector of ones for all units of
stratum
in sample
and consider the corresponding augmented
vector of calibration totals
where
is the vector of strata sizes for sample
- For all three samples
and
assume stratified Poisson sampling and specify
the constants
in the entries of
as
for the units of stratum
where
is the inclusion probability of unit
in stratum
of the
survey.
Nested design
- Assume that an initial stratified simple random
sample
is split by stratum into three simple random
subsamples
and
Specify the sampling fractions
the constants
in
the design matrix
and the vector of calibration totals
as in part
- Assume that an initial stratified Poisson
sample
is randomly split by stratum into three
subsamples
and
with unequal inclusion probabilities for the
units of each subsample. Specify the constants
in
as
for the units of stratum
where
is the marginal inclusion probability of unit
in stratum
of the
subsample.
Under each of strategies
and
the calibration procedure with matrix
in the least-squares distance measure gives
the CGR estimator in (3.1) with
implying that the CGR estimator is the COR
estimator. For
and
this holds approximately when the strata
sampling fractions are approximately zero.
Corollary 1 The result of Theorem 1 holds also for the unstratified
versions of all four designs. For simple random sampling without replacement
(SRS), in particular, the matrix
reduces to the diagonal matrix
having as its
diagonal element the
dimensional unit vector
and the vector of calibration totals is then
Corollary 2 In non-nested sampling, when the sampling design for each of
the three samples is one of the designs in
and
or one of their unstratified versions, but not
the same for all samples, the result of Theorem 1 holds provided that the
matrix
in
and the vector
are reduced so as to correspond only to the
samples for which SRS or STRSRS is used.
The
extended calibration scheme in Theorem
includes
calibration to the stratum sizes (or to the population size in the SRS
version), through the inclusion of an intercept for each stratum in the design
matrix
No additional
information is used beyond what is assumed in the sampling design in
and
and the form of
the resulting CGR estimator remains the same as in (3.1) because the HT estimates
of the population and strata sizes are exact. The effect of this extended
calibration (with the specified values of
is only to
convert the CGR coefficient
to the optimal
coefficient
and, thus, the
CGR estimator to the COR estimator. The practical significance of this
conversion lies in carrying out optimal composite regression estimation through
the much simpler calibration procedure of generalized regression estimation.
Subsampling
as in part
with a priori fixed sample sizes, is a
natural procedure in matrix sampling involving splitting a questionnaire. In
contrast, in the subsampling scheme of part
is the expected
sample size of
the actual size
being random. Unequal subsampling probabilities may be determined adaptively
for increased efficiency; see Gonzalez and Eltinge (2008).
The
results of Theorem 1 could extend to other sampling designs, e.g., stratified
two-stage simple random sampling in non-nested matrix sampling. However, the
required adjustments in the matrices
would not be
easier than using directly the matrices
in the
calibration to obtain the optimal composite regression estimator.
For
sampling designs other than those assumed in Theorem 1, the value of
in the entries
of
should be set to
where
denoting design
effect, to take into account the differential in effective sample sizes among
the three samples. If the same design is used for all samples, then
The
justification for this adjustment is based on the argument given in Merkouris
(2010) for a similar problem of composite regression estimation.
Previous | Next