Section 6: Data processing

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Next section | Previous section

Data capture

Since 1994, responses to survey questions are captured directly by the interviewer at the time of the interview using a computerized questionnaire on a laptop or desktop computer. The computerized questionnaire reduces processing time and costs associated with data entry, transcription errors and data transmission. The response data are encrypted to ensure confidentiality and sent electronically to the appropriate Statistics Canada Regional Office. From there they are transmitted over a secure line to the head office in Ottawa for further processing.

Editing and imputation

Some editing is done directly at the time of interview. Where the information entered is out of range (too large or too small) of expected values or inconsistent with previous entries, the interviewer is prompted, through message screens on the computer, to modify the information. However, interviewers have the options of bypassing the edits and of skipping questions if the respondent does not know the answer or refuses to answer. Therefore, the response data are subjected to further edit and imputation processes once they arrive at head office.

The editing and imputation phases of processing involve the identification of logically inconsistent or missing information items and the modification of such data. Since the true value of each entry on the questionnaire is not known, the identification of errors can be done only through recognition of obvious inconsistencies (for example, a 15 year-old respondent who is recorded as having last worked in 1940). If a value is suspicious but reasonable, the value will find its way into the monthly statistics. For that reason, emphasis must be placed on quality controls and interviewer training to ensure errors are both minimal in number and non-systematic in nature.

During the editing phase of processing, it may be observed that all questionnaire items for individuals (persons) in the household are missing. This is referred to as complete (or total) non-response. Item non-response occurs when only some questionnaire data items are missing. Imputation and non-response weight adjustment are the methods used to resolve complete non-response. Imputation alone is the method used to resolve item non-response. The imputation methods employed for the LFS include carry-forward, deterministic and donor (hot-deck) imputation. The non-response adjustment method is discussed below in the sub-section entitled Weighting.

Where errors or omissions are detected, the erroneous or missing items are replaced by the imputation of logically consistent values. This is referred to as deterministic (or substitution) imputation. Such changes are made automatically by the edit and imputation system or through intervention of experts. These changes are based on pre-specified criteria and may involve the internal logic of the questionnaire, reference to earlier month's information (if available) or the use of similar records to impute one or more values.

Some missing items are resolved by carrying forward last month’s data, if available and appropriate. Other missing items may require the use of donor (hot-deck) imputation, which involves the copying of data from another person (i.e., a ‘donor’) with similar characteristics. In all cases, editing and imputation changes are recorded and this information is used to assess various aspects of survey performance. These records of errors are also used to advise interviewers of mistakes made in the past in order to avoid repetition of these mistakes in the future.

Industry and occupation coding

In this process, industry and occupation codes are assigned using the respondent's job description on the questionnaire. The first step is an attempt to code each record using a computerized procedure. If this is unsuccessful, the coding is performed manually. In both cases, codes assigned are based on the classifications described in the North American Industry Classification System (NAICS, 2007) and the National Occupational Classification for Statistics (NOC-S, 2006) manuals.

Creation of derived variables

A number of data items (variables) on the microdata file are derived by combining items on the questionnaire according to classification rules. For example, labour force status is derived from specific combinations of responses to a number of survey questions regarding work activity, status in employment, job search, availability, etc.

Weighting

The sample data are weighted to enable tabulations of estimates at national, provincial, and sub-provincial levels of aggregation.

The sample design determines a certain number of weighting factors to be used in the calculation of the individual weights. The main component is the inverse of the probability of selection, known as the basic weight. For example, in an area where 2% of the households are sampled, each household would be assigned a basic weight of 1/.02=50. The basic weight is then adjusted for any sub-sampling due to growth that may have occurred in the area. This weight is then adjusted for non-response and coverage error.

In the LFS, some survey non-response is compensated for by imputation: carry forward, substitution or donor imputation methods (as discussed above in the sub-section entitled Editing and imputation). Any remaining non-response is accounted for by adjusting the weights for the responding households in the same area. This non-response adjustment assumes that the characteristics of the responding households are not significantly different from the non-responding households.

To the extent that this assumption is true, non-response will not be a source of bias in the LFS estimates. The weights derived after the non-response adjustments are called the subweights. The final adjustment to the weight is made to correct for coverage errors. The subweights are adjusted so that the survey estimates of population conform to control totals. These final weights are used in the LFS tabulations.

Seasonal adjustment

Most estimates associated with the labour market are subject to seasonal variation; that is, annually-recurring fluctuations attributable to climate and regular institutional events such as vacations and holiday seasons. Seasonal adjustment is used to remove these seasonal variations from more than 3,000 series in the LFS in order to facilitate analysis of short-term change for major indicators such as employment and unemployment by age and sex, employment by industry and employment by class of worker (public and private employees or self-employed). Many of these indicators are adjusted at national and provincial levels. Main labour force status estimates are also seasonally adjusted for Census metropolitan areas (CMAs), and published as three-month moving averages to reduce irregular movements caused by relatively small sample sizes.

Procedures used in seasonal adjustment

The method being used for seasonal adjustment is X-12-ARIMA, as implemented in SAS (version 9.2) Proc X12.

Seasonally adjusted estimates of overall employment and unemployment for Canada are derived by summing adjusted estimates for major age/sex groups (men aged 15 to 24, 25 to 54 and 55+; women aged 15 to 24, 25 to 54 and 55+). The resulting overall estimate is used as a benchmark for other seasonally adjusted series. For example, employment estimates by industry and class of worker are adjusted independently and then increased or decreased proportionately so that their total sums to the overall benchmark. This procedure is known as raking. Starting in January 2010, Statistics Canada's in-house SAS Proc TSRaking program has been used for this purpose.

Overall employment and unemployment estimates for the provinces are also derived by summing adjusted estimates for major age/sex groups (men 15 to 24, 25+; women aged 15 to 24, 25+). However, prior to the summation the estimate for each age/sex group is raked to the corresponding national estimate. Similarly, estimates of employment by industry are raked to the provincial employment total.

Seasonally adjusted estimates of labour force for any particular group are derived by adding the seasonally adjusted estimates of employment and unemployment for that group. Similarly, seasonally adjusted rates (for example, unemployment rate) are calculated by dividing the seasonally adjusted numerator by the seasonally adjusted denominator. In the case of the participation rate and employment rate, only the numerator is seasonally adjusted.

Adjustment for reference week effect

The definition of the LFS reference week (usually the week with the 15^th day of the month) implies that the actual dates of the week vary from year to year. This variability may impact the month-to-month change in major labour market estimates. For example, more students may have finished exams and entered the labour market before the end of reference week in years when the 15th day of June falls near the beginning of the week than is the case in years when the 15th falls near the end of reference week. The reference week effects are removed from the series so that the underlying trend is easier to interpret. These adjustments compensate for early or late reference weeks.

These effects are estimated by the seasonal adjustment method X-12-ARIMA using a regression model with ARIMA residuals.

Adjustment for holiday effects on actual hours worked

In addition, actual hours of work are particularly affected by variability in the dates of the reference week combined with the presence of fixed (Thanksgiving, Remembrance Day) or moving (Easter) holidays during the reference week in some years but not in others. Similarly, fluctuations can also occur in July, depending on the timing of the reference week relative to the usual vacation period that tends to peak in the latter half of the month. This variability could introduce significant fluctuations in estimates of actual hours worked and is therefore removed from the series prior to seasonal adjustment.

Starting in January 2010, a method used by the System of National Accounts labour statistics was adopted. Permanent prior adjustments are now generated by adding back the hours lost due to the holiday as reported by respondents of the Labour Force Survey. The historical series have been revised using this new method. The holidays that may fall in the reference week and are adjusted (adding back the hours lost) include Family day (for certain provinces), March break (for certain provinces), Easter Friday or Easter Monday, the July construction holiday in Quebec, Thanksgiving, and Remembrance Day.

As hours lost due to holidays are not reported for the self-employed, a model is used to estimate and remove systematic fluctuations due to holiday occurrence in the reference weeks. This model is based on special time series regression in a manner similar to the calendar adjustment performed for reference week location.

Starting in January 2015, to better reflect the actual hours from the self-employed workers, the seasonally adjusted total actual hours worked series is derived as the sum of the three seasonally adjusted classes of workers (public employees, private employees and self-employed). The provincial series is slightly modified to match this improved seasonally adjusted actual hours total. All actual hours series have been revised back to the start of the series based on this new methodology.

Since holiday effects on actual hours worked vary a great deal from industry to industry, depending on the characteristics of each regarding the observance of holidays and summer vacation practices, prior adjustments are calculated and performed separately for each major industry group.

Regular annual revisions for seasonal adjustment

Each year, the Labour Force Survey revises its estimates for the previous three years, using the latest seasonal factors.

Seasonal adjustment requires data from past, current and future values. As new data become available, various time series components can be better estimated which lead to revised and more accurate seasonally adjusted estimates.

Seasonal adjustment models and options for each series are also reviewed each year. When appropriate, updated options will be used to produce the revised seasonally adjusted estimates (and the on-going seasonal adjusted estimates on a monthly basis for the year to come).

Other revisions and redesigns

Every five years, population estimates are rebased or reweighted to the most recent census population counts. As of January 2015, LFS estimates have been adjusted to reflect population counts from the 2011 Census, adjusted for net undercoverage, with revisions going back to 2001. Generally, the introduction of the latest classification systems for industry, occupation and geography, along with other changes, occur at this time. For more information, see The 2015 Revisions of the Labour Force Survey (LFS).

The LFS undergoes a sample redesign every ten years to reflect changes in population characteristics and new definitions of geographical boundaries. The most recent redesign defines new strata based on the Census information of 2011. For more information, see The 2015 Revisions of the Labour Force Survey (LFS).

Redesign of the questionnaire, data collection, processing and dissemination systems occur approximately every 20 years. The next such redesign is scheduled for 2017-2018. The main goals of this upcoming redesign will be to: 1) transition to a corporate data collection platform capable of supporting personal, telephone and respondent self-complete modes of interviewing; 2) modernise computer systems and processes used to edit, code and process data; and 3) align survey outputs with Statistics Canada’s New Dissemination Model.