Section 6: Data processing

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Data capture

Since 1994, responses to survey questions are captured directly by the interviewer at the time of the interview using a computerized questionnaire on a lap-top or desktop computer. The computerized questionnaire reduces processing time and costs associated with data entry, transcription errors, and data transmission. The response data are encrypted to ensure confidentiality and sent via modem to the appropriate Statistics Canada Regional Office. From there they are transmitted over a secure line to Ottawa for further processing. Prior to the introduction of computer assisted interviewing (CAI), information was recorded by the interviewer on a paper questionnaire, which was then sent for data capture in the Regional Office before transmission to Ottawa.

Editing

Some editing is done directly at the time of interview. Where the information entered is out of range (too large or small) of expected values, or inconsistent with previous entries, the interviewer is prompted, through message screens on the computer, to modify the information. However, interviewers have the option of bypassing the edits, and of skipping questions if the respondent does not know the answer or refuses to answer. Therefore, the response data are subjected to further edit and imputation processes once they arrive in head office.

The editing and imputation phases of processing involve the identification of logically inconsistent or missing information items, and the modification of such conditions. Since the true value of each entry on the questionnaire is not known, the identification of errors can be done only through recognition of obvious inconsistencies (for example, a 15 year-old respondent who is recorded as having last worked in 1940). If a value is suspicious but reasonable, the erroneous value will find its way into the monthly statistics. For that reason emphasis must be placed on quality controls and interviewer training to ensure that errors are both minimal in number and non-systematic in nature.

Where errors or omissions are detected, the erroneous or missing items are replaced by the imputation of logically consistent values. Such changes are made automatically by the edit and imputation system or through intervention of experts. These changes are based on pre-specified criteria, and may involve the internal logic of the questionnaire, reference to earlier month's information (if available), or the use of similar records to impute one or more values. In all cases, editing changes are recorded and this information is used to assess various aspects of survey performance. These records of errors are also used to advise interviewers of mistakes made in the past in order to avoid repetition of these mistakes in the future.

Industry and occupation coding

In this process, industry and occupation codes are assigned using the respondent's job description on the questionnaire. The first step is an attempt to code each record using a computerized procedure. If this is unsuccessful, the coding is performed manually. In both cases, codes assigned are based on the classifications described in the North American Industry Classification System (NAICS 2007) and the National Occupational Classification for Statistics (NOC-S, 2006) manuals.

Creation of derived variables

A number of data items (variables) on the microdata file are derived by combining items on the questionnaire according to classification rules. For example, labour force status is derived from specific combinations of responses to a number of survey questions regarding work activity, status in employment, job search, availability, etc.

Weighting

The sample data are weighted to enable tabulations of estimates at national, provincial, and sub-provincial levels of aggregation.

The sample design determines a certain number of weighting factors to be used in the calculation of the individual weights. The main component is the inverse of the probability of selection, known as the basic weight. For example, in an area where 2 percent of the households are sampled, each household would be assigned a basic weight of 1/.02=50. The basic weight is then adjusted for any sub-sampling due to growth that may have occurred in the area. This weight is then adjusted for non-response and coverage error.

In the LFS, some survey non-response is compensated for by carrying forward last month's data if they are available and appropriate. Any remaining non-response is accounted for by adjusting the weights for the responding households in the same area. This non-response adjustment assumes that the characteristics of the responding households are not significantly different than the non-responding households. To the extent that this assumption is true, non-response will not be a source of bias in the LFS estimates. The weights derived after the non-response adjustments are applied are called the subweights. The final adjustment to the weight is made to correct for coverage errors. The subweights are compared to independently derived estimates of population and adjusted so that the survey estimates of population conform to these control totals. These final weights are used in the LFS tabulations.

Seasonal adjustment: LFS procedures

Most estimates associated with the labour market are subject to seasonal variation, that is, annually-recurring fluctuations attributable to climate and regular institutional events such as vacations, and holiday seasons. Seasonal adjustment is used to remove these seasonal variations from more than 3,000 series from the LFS, in order to facilitate analysis of short-term change for major indicators such as employment and unemployment by age and sex, employment by industry, and employment by class of worker (public and private employee or self-employed). Many of these indicators are adjusted at national and provincial levels. Main labour force status estimates are also seasonally adjusted for Census Metropolitan Areas (CMAs), and published as three-month moving averages to reduce irregular movements caused by relatively small sample sizes.

Procedures

Beginning in January 2010, X-12-ARIMA, as implemented in SAS (version 9.2) Proc X12, has been used for seasonal adjustment, replacing X-11-ARIMA used since 1980. In January 2010, all seasonally adjusted estimates were revised historically using the X-12-ARIMA program.

Seasonally adjusted estimates of overall employment and unemployment for Canada are derived by summing adjusted estimates for major age/sex groups (men aged 15 to 24, 25 to 54 and 55+; women aged 15 to 24, 25 to 54 and 55+). The resulting overall estimate is used as a benchmark for other seasonally adjusted series. For example, employment estimates by industry and class of worker are adjusted independently and then increased or decreased proportionately so that their total sums to the overall benchmark. This procedure is known as raking. Starting in January 2010, Statistics Canada's in house SAS Proc TSRaking program will be used for this purpose.

Overall employment and unemployment estimates for the provinces are also derived by summing adjusted estimates for major age/sex groups (men 15 to 24, 25+; women aged 15 to 24, 25+). However, prior to the summation, the estimate for each age/sex group is raked to the corresponding national estimate. Similarly, estimates of employment by industry are raked to the provincial employment total.

Seasonally adjusted estimates of labour force for any particular group are derived by adding the seasonally adjusted estimates of employment and unemployment for that group. Similarly, seasonally adjusted rates (for example, unemployment rate) are calculated by dividing the seasonally adjusted numerator by the seasonally adjusted denominator. In the case of the participation rate and employment rate, only the numerator is seasonally adjusted.

Adjustment for reference week effect

The definition of the LFS reference week (usually the week with the 15th day of the month) implies that the actual dates of the week vary from year to year. This variability may impact on the month-to-month change in major labour market estimates. For example, more students may have finished exams and entered the labour market before the end of reference week in years when the 15th day of June falls near the beginning of the week, than is the case in years when the 15th falls near the end of reference week. The reference week effects are removed from the series so that the underlying trend is easier to interpret. These adjustments compensate for early or late reference weeks.

Starting in 2010, these effects will be estimated by the seasonal adjustment program X-12-ARIMA using a regression model with ARIMA residuals.

Adjustment for holiday effects on actual hours worked

In addition, actual hours of work are particularly affected by variability in the dates of the reference week combined with the presence of fixed (Thanksgiving, Remembrance Day) or moving holidays (Easter) during the reference week in some years but not in others. Similarly, fluctuations can also occur in July, depending on the timing of the reference week relative to the usual vacation period which tends to peak in the latter half of July. This variability introduces significant fluctuation in estimates of actual hours worked and are therefore removed from the series prior to seasonal adjustment.

Starting in January 2010, a method used by the System of National Accounts labour statistics was adopted. Permanent prior adjustments are now generated by adding back the hours lost due to the holiday as reported by respondents of the Labour Force Survey. The historical series have been revised using this new method. The holidays that may fall in reference week and are adjusted with priors (adding back the hours lost) include Family day, March break, Easter Friday or Easter Monday, the July construction holiday in Quebec, Thanksgiving and Remembrance day.

Since holiday effects on actual hours worked vary a great deal from industry to industry depending on the characteristics of each regarding the observance of holidays and summer vacation practices, prior adjustments are calculated and performed separately for each major industry group.

Next | Previous

Date modified: