Dealing with small sample sizes, rotation group bias and discontinuities in a rotating panel design 3. Estimating monthly labour force figuresDealing with small sample sizes, rotation group bias and discontinuities in a rotating panel design 3. Estimating monthly labour force figures

In this section a multivariate structural time series model is developed for the LFS data that are observed under the rotating panel design. The model deals with small sample sizes by borrowing strength over time to improve the precision of the GREG estimates, and accounts for the RGB as well as the autocorrelation between the subsequent panels of the rotating panel and models the discontinuities due to the redesign of the LFS in 2010.

Let ${\hat{Y}}_{t}^{j}$ denote the GREG estimate for the unknown population parameter, say $θ_{t},$ based on the $j^{th}$ panel observed at time $t, j = 1, \dots, 5.$ Since responding households are interviewed at quarterly intervals, it follows that the $j^{th}$ panel at time $t$ that was sampled for the first time at time $t - 3 j + 3.$ Due to the applied rotation pattern, each month data are collected in five different panels and a vector ${\hat{Y}}_{t} = {({\hat{Y}}_{t}^{1}, {\hat{Y}}_{t}^{2}, {\hat{Y}}_{t}^{3}, {\hat{Y}}_{t}^{4}, {\hat{Y}}_{t}^{5})}^{T}$ is observed. A five dimensional time series with GREG estimates for the monthly employed and unemployed labour force is obtained as a result. Pfeffermann (1991) proposed a multivariate structural time series model for this kind of time series to model the population parameter of interest, and to account for the RGB and the autocorrelation in the sampling errors. This approach is extended with an intervention component to model the discontinuities of the survey redesign. This results in the following time series model for the five series of GREG estimates:

${\hat{Y}}_{t} = 1_{5} θ_{t} + λ_{t} + Δ_{t} β + e_{t}, (3.1)$

with $1_{5}$ a five dimensional vector with each element equal to one, $λ_{t} = {(λ_{t}^{1}, λ_{t}^{2}, λ_{t}^{3}, λ_{t}^{4}, λ_{t}^{5})}^{T}$ a vector with time dependent components that account for the RGB, $Δ_{t} = Diag (δ_{t}^{1}, δ_{t}^{2}, δ_{t}^{3}, δ_{t}^{4}, δ_{t}^{5})$ a diagonal matrix with dummy variables that change from zero to one at the moment that the survey changes from the old to the new design, $β = {(β^{1}, β^{2}, β^{3}, β^{4}, β^{5})}^{T}$ a five dimensional vector with regression coefficients, and $e_{t} = {(e_{t}^{1}, e_{t}^{2}, e_{t}^{3}, e_{t}^{4}, e_{t}^{5})}^{T}$ the corresponding survey errors for each panel estimate.

The population parameter $θ_{t}$ in (3.1) can be decomposed in a trend component, a seasonal component, and an irregular component, i.e.,

$θ_{t} = L_{t} + S_{t} + ε_{t} . (3.2)$

Here $L_{t}$ denotes a stochastic trend component, using the so-called smooth trend model,

$\begin{array}{l} L_{t} & = & L_{t - 1} + R_{t - 1}, & \begin{array}{l} \end{array} & \begin{array}{l} \end{array} \\ R_{t} & = & R_{t - 1} + η_{t}, & (3.3) \\ E (η_{t}) & = & 0, Cov (η_{t}, η_{t^{'}}) = {\begin{array}{l} σ_{η}^{2} & if t = t^{'} \\ 0 & if t \neq t^{'} . \end{array} \end{array}$

A likelihood ratio test indicates that in this application the more general local linear trend model, which has a disturbance term for the slope parameter $R_{t}$ as well as a disturbance term for the level parameter $L_{t},$ does not improve the fit to the data. Inclusion of a disturbance term for the level increases the log-likelihood of (3.1) with 0.05 units. This results in a likelihood ratio test statistic of 0.1. Under the null hypothesis that the level disturbance term is equal to zero, this test statistic is a chi-squared distributed random variable with 1 degree of freedom. As a result, this null hypothesis is accepted with a $p -$ value of 0.75.

Furthermore, $S_{t}$ denotes a trigonometric stochastic seasonal component,

$S_{t} = \sum_{l = 1}^{6} S_{l, t}, (3.4)$

where

$\begin{array}{l} S_{l, t} & = & S_{l, t - 1} \cos (h_{l}) + S_{l, t - 1}^{*} \sin (h_{l}) + ω_{l, t} \\ S_{l, t}^{*} & = & S_{l, t - 1}^{*} \cos (h_{l}) - S_{l, t - 1} \sin (h_{l}) + ω_{l, t}^{*}, h_{l} = \frac{π l}{6}, l = 1, \dots, 6, \\ E (ω_{l, t}) & = & E (ω_{l, t}^{*}) = 0, & (3.5) \\ Cov (ω_{l, t}, ω_{l^{'}, t^{'}}) & = & Cov (ω_{l, t}^{*}, ω_{l^{'}, t^{'}}^{*}) = {\begin{array}{l} σ_{ω}^{2} & if l = l^{'} & and t = t^{'} \\ 0 & if l \neq l^{'} & or t \neq t^{'} \end{array}, \\ Cov (ω_{l, t}, ω_{l, t}^{*}) & = & 0, \forall l, \forall t . \end{array}$

Finally, $ε_{t}$ denotes the irregular component, which contains the unexplained variation of the population parameter and is modelled as a white noise process:

$E (ε_{t}) = 0, Cov (ε_{t}, ε_{t^{'}}) = {\begin{array}{l} σ_{ε}^{2} & if t = t^{'} \\ 0 & if t \neq t^{'} . \end{array} (3.6)$

It is not immediately obvious that the white noise component $ε_{t}$ in (3.2) and the sampling errors $e_{t}$ in (3.1) are both identifiable. The sampling errors can be separated from the white noise component because each sample is observed five times and because the variance of the sampling errors, as well as the autocorrelation in the sampling errors induced by the sample overlap of the panel, are calculated directly from the survey data. Details are explained below.

The trend (3.3) describes the gradual change in the population parameter, while the seasonal component (3.4) captures the systematic monthly deviations from the trend within a year. See e.g., Durbin and Koopman (2001) for details. Through component (3.2) values for $θ_{t}$ are related to the population values from preceding periods. This component shows how sample information observed in preceding periods is used to improve the precision of the estimates for $θ_{t}$ in a particular time period.

The systematic differences between the subsequent panels, i.e., the RGB, are modelled in (3.1) with $λ_{t} .$ The absolute bias in the monthly labour force figures cannot be estimated from the sample data only. Therefore additional restrictions for the elements of $λ_{t}$ are required to identify the model. Here it is assumed that an unbiased estimate for $θ_{t}$ is obtained with the first panel, i.e., ${\hat{Y}}_{t}^{1} .$ This implies that the first component of $λ_{t}$ equals zero. The other elements of $λ_{t}$ measure the time dependent differences with respect to the first panel. Contrary to Pfeffermann (1991), were time independent RGB is assumed, $λ_{t}^{j}$ are modelled as random walks for $j = 2, 3, 4, and 5.$ As a result it follows that

$λ_{t}^{1} = 0, λ_{t}^{j} = λ_{t - 1}^{j} + η_{λ, j, t}, j = 2, 3, 4, 5, (3.7)$

$E (η_{λ, j, t}) = 0, Cov (η_{λ, j, t}, η_{λ, j^{'}, t^{'}}) = {\begin{matrix} σ_{λ}^{2} & if & t = t^{'} & and & j = j^{'} \\ 0 & if & t \neq t' & or & j \neq j' . \end{matrix}$

The discontinuities induced by the redesign in 2010 are modelled with the third term in (3.1). The diagonal matrix $Δ_{t}$ contains five intervention variables:

$δ_{t}^{j} = {\begin{matrix} \begin{matrix} 0 & if & t < T_{R}^{j} \end{matrix} \\ \begin{matrix} 1 & if & t \geq T_{R}^{j} \end{matrix} \end{matrix}, for j = 1, 2, \dots, 5, (3.8)$

where $T_{R}^{j}$ denotes the moment that panel $j$ changes from the old to the new survey design. Under the assumption that (3.2) correctly models the evolution of the population variable, the regression coefficients in $β$ can be interpreted as the systematic effects of the redesign on the level of the series observed in the five panels. The intervention approach with state-space models was originally proposed by Harvey and Durbin (1986) to estimate the effect of seat belt legislation on British road casualties. With step intervention (3.8) it is assumed that the redesign only has a systematic effect on the level of the series. Alternative interventions, e.g., for the slope or the seasonal components are also possible, see Durbin and Koopman (2001), Chapter 3. A redesign might not only affect the point estimates, but also the variance of the GREG estimates. This issue is discussed under the time series model for the survey errors.

Finally a time series model for the survey errors $e_{t}$ in (3.1) is developed. The direct estimates for the design variances of the survey errors are available from the micro data and are incorporated in the time series model using the survey error model $e_{t}^{j} = k_{t}^{j} {\tilde{e}}_{t}^{j}$ where $k_{t}^{j} = \sqrt{Vâr ({\hat{Y}}_{t}^{j}),}$ proposed by Binder and Dick (1990). Here $Vâr ({\hat{Y}}_{t}^{j})$ denotes the estimated variance of the GREG estimator. Choosing the survey errors proportional to the standard error of the GREG estimators allows for non-homogeneous variance in the survey errors, that arise e.g., due to the gradually decreasing sample size over the last decade.

The sample of the first panel has no sample overlap with panels observed in the past. Consequently, the survey errors of the first panel, $e_{t}^{1},$ are not correlated with survey errors in the past. It is, therefore, assumed that ${\tilde{e}}_{t}^{1}$ is white noise with $E ({\tilde{e}}_{t}^{1}) = 0$ and $Var ({\tilde{e}}_{t}^{1}) = σ_{e 1}^{2} .$ As a result, the variance of the survey error equals $Var (e_{t}^{1}) = {(k_{t}^{1})}^{2} σ_{e 1}^{2},$ which is approximately equal to the direct estimate of the variance of the GREG estimate for the first panel if the maximum likelihood (ML) estimate for $σ_{e 1}^{2}$ is close to one.

The survey errors of the second, third, fourth and fifth panel are correlated with survey errors of preceding periods. The autocorrelations between the survey errors of the subsequent panels are estimated from the survey data, using the approach proposed by Pfeffermann, Feder and Signorelli (1998). In this application it appears that the autocorrelation structure for the second, third, fourth and fifth panel can be modelled conveniently with an AR(1) model, van den Brakel and Krieg (2009). Therefore it is assumed that ${\tilde{e}}_{t}^{j} = ρ {\tilde{e}}_{t - 3}^{j - 1} + ν_{t}^{j},$ with $ρ$ the first order autocorrelation coefficient, $E (ν_{t}^{j}) = 0,$ and $Var (ν_{t}^{j}) = σ_{e j}^{2}$ for $j = 2, 3, 4, 5.$ Since ${\tilde{e}}_{t}^{j}$ is an AR(1) process, $Var (e_{t}^{j}) = σ_{e j}^{2} {(k_{t}^{j})}^{2} / (1 - ρ^{2}) .$ As a result $Var (e_{t}^{j})$ is approximately equal to $Vâr ({\hat{Y}}_{t}^{j})$ provided that the ML estimates for $σ_{e j}^{2}$ are close to $(1 - ρ^{2}) .$

The survey redesign in 2010 might affect the variance of the GREG estimates. Systematic differences in these variances are automatically taken into account, since they are used as a-priori information in the time series model for the survey error. An alternative possibility would be to allow for different values for $σ_{e j}^{2}$ before and after the survey redesign, which can be interpreted as an intervention on the variance hyperparameter of the survey error.

Auxiliary time series can be incorporated in the model to improve the estimates for the discontinuities. Reliable auxiliary series contain valuable information for correctly separating real developments from discontinuities in the intervention model. The auxiliary information will also increase the precision of the model estimates for the monthly unemployment figures. For the unemployed labour force, the number of people formally registered at the employment office is a potential auxiliary variable to be included in the model.

There are different ways to incorporate auxiliary information in the model. One straightforward possibility is to extend the time series model (3.2) for the population parameter of the LFS with a regression component for the auxiliary series, i.e., $θ_{t} = L_{t} + S_{t} + b X_{t} + ε_{t},$ where $X_{t}$ denotes the auxiliary series and $b$ the regression coefficient. The major drawback of this approach is that the auxiliary series will partially explain the trend and seasonal effect in $θ_{t},$ leaving only a residual trend and seasonal effect for $L_{t}$ and $S_{t} .$ This hampers the estimation of a trend for the target variable.

An alternative approach, that allows the direct estimation of a filtered trend for $θ_{t},$ is to extend model (3.1) with the auxiliary series and model the correlation between the trends of the series of the LFS and the auxiliary series. This gives rise to the following model:

$(\begin{matrix} Y_{t} \\ X_{t} \end{matrix}) = (\begin{matrix} 1_{5} θ_{t}^{LFS} \\ θ_{t}^{R} \end{matrix}) + (\begin{matrix} λ_{t} \\ 0 \end{matrix}) + (\begin{matrix} Δ_{t} β \\ 0 \end{matrix}) + (\begin{matrix} e_{t} \\ 0 \end{matrix}) . (3.9)$

The series of the LFS and the auxiliary series from the register both have their own population parameter that can be modelled with two separate time series models, i.e., $θ_{t}^{z} = L_{t}^{z} + S_{t}^{z} + ε_{t}^{z},$ where $z = LFS or z = R$ ( $R$ stands for register), defined similarly to (3.2). Since the auxiliary series is based on a registration, this series does not have a RGB, a discontinuity at the moment that the LFS is redesigned or a survey error component.

The model allows for correlation between the disturbances of the slope of the trend component of the LFS and the auxiliary series. This results in the following definition for the smooth trend model for the LFS and the auxiliary series:

$\begin{array}{l} L_{t}^{z} & = & L_{t - 1}^{z} + R_{t - 1}^{z}, \\ R_{t}^{z} & = & R_{t - 1}^{z} + η_{t}^{z}, \\ E (η_{t}^{z}) & = & 0, \\ Cov (η_{t}^{z}, η_{t^{'}}^{z}) & = & {\begin{array}{l} σ_{η z}^{2} & if & t = t^{'} \\ 0 & if & t \neq t^{'} \end{array}, z = LFS, R, \\ Cov (η_{t}^{LFS}, η_{t^{'}}^{R}) & = & {\begin{array}{l} ϑ σ_{η LFS} σ_{η R} & if & t = t^{'} \\ 0 & if & t \neq t^{'}, \end{array} \end{array}$

with $ϑ$ the correlation coefficient between these series. The correlation between both series is determined by the model. If the model detects a strong correlation, then the trends of both series will develop into the same direction more or less simultaneously. Model (3.9) does not allow for correlation between the disturbances of the seasonal component of the LFS series and the auxiliary series. Both series have their own seasonal component $S_{t}^{z}$ defined by (3.5). In a similar way both series have their own white noise $ε_{t}^{z}$ for the unexplained variation, which are assumed to be uncorrelated and are defined by (3.6).

Models (3.1) and (3.9) explicitly account for discontinuities in the different panels through the intervention component. Estimates for the target variables, obtained with these models, are therefore not affected by the systematic effect of the change-over. As a result, the models correct for the discontinuities induced by the redesign. Model estimates for the target variables can be interpreted as the results observed under the old method, also after the change-over to the new survey design. The discontinuity of the first panel must be added to the model estimates for the target variables to produce figures that can be interpreted as being obtained under the new design.

The general way to proceed is to express the model in the so-called state-space representation and apply the Kalman filter to obtain optimal estimates for the state variables, see e.g., Durbin and Koopman (2001). It is assumed that the disturbances are normally distributed. Under this assumption, the Kalman filter gives optimal estimates for the state vector and the signals. Estimates for state variables for period $t$ based on the information available up to and including period $t$ are referred to as the filtered estimates. The filtered estimates of past state vectors can be updated if new data become available. This procedure is referred to as smoothing and results in smoothed estimates that are based on the completely observed time series. In this application, interest is mainly focussed on the filtered estimates, since they are based on the complete set of information that would be available in the regular production process to produce a model-based estimate for month $t .$

The analysis is conducted with software developed in OxMetrics in combination with the subroutines of SsfPack 3.0, see Doornik (2009) and Koopman, Shephard and Doornik (2008). All state variables are non-stationary with the exception of the survey errors. The non-stationary variables are initialised with a diffuse prior, i.e., the expectation of the initial states is equal to zero and the initial covariance matrix of the states is diagonal with large diagonal elements. The survey errors are stationary and therefore initialised with a proper prior. The initial values for the survey errors are equal to zero and the covariance matrix is available from the aforementioned model for the survey errors. In Ssfpack 3.0 an exact diffuse log-likelihood function is obtained with the procedure proposed by Koopman (1997).

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Catalogue no. 12-001-X

Frequency: semi-annual

Ottawa

Date modified:: 2017-09-20

Language selection

Search and menus

Search

Dealing with small sample sizes, rotation group bias and discontinuities in a rotating panel design 3. Estimating monthly labour force figuresDealing with small sample sizes, rotation group bias and discontinuities in a rotating panel design 3. Estimating monthly labour force figures