Dealing with small sample sizes, rotation group bias and discontinuities in a rotating panel design
3. Estimating monthly labour force figuresDealing with small sample sizes, rotation group bias and discontinuities in a rotating panel design
3. Estimating monthly labour force figures
In this section a multivariate structural time
series model is developed for the LFS data that are observed under the rotating
panel design. The model deals with small sample sizes by borrowing strength
over time to improve the precision of the GREG
estimates, and accounts for the RGB as well as the autocorrelation between the subsequent panels of the rotating
panel and models the discontinuities due to the redesign of the LFS in 2010.
Let
denote the GREG estimate for the
unknown population parameter, say
based on the
panel observed at time
Since responding households are
interviewed at quarterly intervals, it follows that the
panel at time
that was sampled for the first
time at time
Due to the applied rotation
pattern, each month data are collected in five different panels and a vector
is observed. A five dimensional
time series with GREG estimates for the monthly employed and unemployed labour
force is obtained as a result. Pfeffermann (1991) proposed a multivariate
structural time series model for this kind of time series to model the
population parameter of interest, and to account for the RGB and the
autocorrelation in the sampling errors. This approach is extended with an
intervention component to model the discontinuities of the survey redesign.
This results in the following time series model for the five series of GREG
estimates:
with
a five dimensional vector with
each element equal to one,
a vector with time dependent
components that account for the RGB,
a diagonal matrix with dummy
variables that change from zero to one at the moment that the survey changes
from the old to the new design,
a five dimensional vector with
regression coefficients, and
the corresponding survey errors
for each panel estimate.
The population parameter
in (3.1) can be decomposed in a
trend component, a seasonal component, and an irregular component, i.e.,
Here
denotes a stochastic trend
component, using the so-called smooth trend model,
A likelihood
ratio test indicates that in this application the more general local linear
trend model, which has a disturbance term for the slope parameter
as well as a disturbance term for
the level parameter
does not improve the fit to the
data. Inclusion of a disturbance term for the level increases the
log-likelihood of (3.1) with 0.05 units. This results in a likelihood ratio
test statistic of 0.1. Under the null hypothesis that the level disturbance
term is equal to zero, this test statistic is a chi-squared distributed random
variable with 1 degree of freedom. As a result, this null hypothesis is
accepted with a
value of 0.75.
Furthermore,
denotes a trigonometric
stochastic seasonal component,
where
Finally,
denotes the irregular component,
which contains the unexplained variation of the population parameter and is
modelled as a white noise process:
It is not
immediately obvious that the white noise component
in (3.2) and the sampling errors
in (3.1) are both identifiable.
The sampling errors can be separated from the white noise component because
each sample is observed five times and because the variance of the sampling
errors, as well as the autocorrelation in the sampling errors induced by the
sample overlap of the panel, are calculated directly from the survey data.
Details are explained below.
The trend (3.3)
describes the gradual change in the population parameter, while the seasonal
component (3.4) captures the systematic monthly deviations from the trend
within a year. See e.g., Durbin and Koopman (2001) for details. Through
component (3.2) values for
are related to the population
values from preceding periods. This component shows how sample information
observed in preceding periods is used to improve the precision of the estimates
for
in a particular time period.
The systematic
differences between the subsequent panels, i.e., the RGB, are modelled in (3.1)
with
The absolute bias in the monthly
labour force figures cannot be estimated from the sample data only. Therefore
additional restrictions for the elements of
are required to identify the
model. Here it is assumed that an unbiased estimate for
is obtained with the first panel,
i.e.,
This implies that the first
component of
equals zero. The other elements
of
measure the time dependent
differences with respect to the first panel. Contrary to Pfeffermann (1991),
were time independent RGB is assumed,
are modelled as random walks for
As a result it follows that
The
discontinuities induced by the redesign in 2010 are modelled with the third
term in (3.1). The diagonal matrix
contains five intervention
variables:
where
denotes the moment that panel
changes from the old to the new
survey design. Under the assumption that (3.2) correctly models the evolution
of the population variable, the regression coefficients in
can be interpreted as the
systematic effects of the redesign on the level of the series observed in the
five panels. The intervention approach with state-space models was originally
proposed by Harvey and Durbin (1986) to estimate the effect of seat belt
legislation on British road casualties. With step intervention (3.8) it is
assumed that the redesign only has a systematic effect on the level of the
series. Alternative interventions, e.g., for the slope or the seasonal
components are also possible, see Durbin and Koopman (2001), Chapter 3. A redesign might not only
affect the point estimates, but also the variance of the GREG estimates. This
issue is discussed under the time series model for the survey errors.
Finally a time
series model for the survey errors
in (3.1) is developed. The direct
estimates for the design variances of the survey errors are available from the
micro data and are incorporated in the time series model using the survey error
model
where
proposed by Binder and Dick
(1990). Here
denotes the estimated variance of
the GREG estimator. Choosing the survey errors proportional to the standard
error of the GREG estimators allows for non-homogeneous variance in the survey
errors, that arise e.g., due to the gradually decreasing sample size over the
last decade.
The sample of the
first panel has no sample overlap with panels observed in the past.
Consequently, the survey errors of the first panel,
are not correlated with survey
errors in the past. It is, therefore, assumed that
is white noise with
and
As a result, the variance of the
survey error equals
which is approximately equal to
the direct estimate of the variance of the GREG estimate for the first panel if
the maximum likelihood (ML) estimate for
is close to one.
The survey errors
of the second, third, fourth and fifth panel are correlated with survey errors
of preceding periods. The autocorrelations between the survey errors of the
subsequent panels are estimated from the survey data, using the approach
proposed by Pfeffermann, Feder and Signorelli (1998). In this application it
appears that the autocorrelation structure for the second, third, fourth and
fifth panel can be modelled conveniently with an AR(1) model, van den Brakel
and Krieg (2009). Therefore it is assumed that
with
the first order autocorrelation
coefficient,
and
for
Since
is an AR(1) process,
As a result
is approximately equal to
provided that the ML estimates
for
are close to
The survey
redesign in 2010 might affect the variance of the GREG estimates. Systematic
differences in these variances are automatically taken into account, since they
are used as a-priori information in the time series model for the survey error.
An alternative possibility would be to allow for different values for
before and after the survey
redesign, which can be interpreted as an intervention on the variance
hyperparameter of the survey error.
Auxiliary time
series can be incorporated in the model to improve the estimates for the
discontinuities. Reliable auxiliary series contain valuable information for
correctly separating real developments from discontinuities in the intervention
model. The auxiliary information will also increase the precision of the model
estimates for the monthly unemployment figures. For the unemployed labour
force, the number of people formally registered at the employment office is a
potential auxiliary variable to be included in the model.
There are
different ways to incorporate auxiliary information in the model. One
straightforward possibility is to extend the time series model (3.2) for the
population parameter of the LFS with a regression component for the auxiliary
series, i.e.,
where
denotes the auxiliary series and
the regression coefficient. The
major drawback of this approach is that the auxiliary series will partially
explain the trend and seasonal effect in
leaving only a residual trend and
seasonal effect for
and
This hampers the estimation of a
trend for the target variable.
An alternative
approach, that allows the direct estimation of a filtered trend for
is to extend model (3.1) with the
auxiliary series and model the correlation between the trends of the series of
the LFS and the auxiliary series. This gives rise to the following model:
The series of
the LFS and the auxiliary series from the register both have their own
population parameter that can be modelled with two separate time series models,
i.e.,
where
(
stands for register), defined
similarly to (3.2). Since the auxiliary series is based on a registration, this
series does not have a RGB, a discontinuity at the moment that the LFS is
redesigned or a survey error component.
The model allows
for correlation between the disturbances of the slope of the trend component of
the LFS and the auxiliary series. This results in the following definition for
the smooth trend model for the LFS and the auxiliary series:
with
the correlation coefficient
between these series. The correlation between both series is determined by the
model. If the model detects a strong correlation, then the trends of both
series will develop into the same direction more or less simultaneously. Model
(3.9) does not allow for correlation between the disturbances of the seasonal
component of the LFS series and the auxiliary series. Both series have their
own seasonal component
defined by (3.5). In a similar
way both series have their own white noise
for the unexplained variation,
which are assumed to be uncorrelated and are defined by (3.6).
Models (3.1) and (3.9)
explicitly account for discontinuities in the different panels through the
intervention component. Estimates for the target variables, obtained with these
models, are therefore not affected by the systematic effect of the change-over.
As a result, the models correct for the discontinuities induced by the
redesign. Model estimates for the target variables can be interpreted as the
results observed under the old method, also after the change-over to the new
survey design. The discontinuity of the first panel must be added to the model
estimates for the target variables to produce figures that can be interpreted
as being obtained under the new design.
The general way to
proceed is to express the model in the so-called state-space representation and
apply the Kalman filter to obtain optimal estimates for the state variables,
see e.g., Durbin and Koopman (2001). It is assumed that the disturbances are
normally distributed. Under this assumption, the Kalman filter gives optimal
estimates for the state vector and the signals. Estimates for state variables
for period
based on the information
available up to and including period
are referred to as the filtered
estimates. The filtered estimates of past state vectors can be updated if new
data become available. This procedure is referred to as smoothing and results
in smoothed estimates that are based on the completely observed time series. In
this application, interest is mainly focussed on the filtered estimates, since
they are based on the complete set of information that would be available in
the regular production process to produce a model-based estimate for month
The analysis is
conducted with software developed in OxMetrics in combination with the
subroutines of SsfPack 3.0, see Doornik (2009) and Koopman, Shephard and
Doornik (2008). All state variables are non-stationary with the exception of
the survey errors. The non-stationary variables are initialised with a diffuse
prior, i.e., the expectation of the initial states is equal to zero and the
initial covariance matrix of the states is diagonal with large diagonal
elements. The survey errors are stationary and therefore initialised with a
proper prior. The initial values for the survey errors are equal to zero and
the covariance matrix is available from the aforementioned model for the survey
errors. In Ssfpack 3.0 an exact diffuse log-likelihood function is obtained
with the procedure proposed by Koopman (1997).
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.