7. Data quality

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

7.1 Overview of data quality evaluation

The objective of the 2012 APS is to produce quality estimates in the areas of education, employment, health and other core indicators for First Nations people living off reserve, Métis and Inuit aged 6 years and over.

Sections 7.2 and 7.3, below, explain the two types of errors that occur in surveys - sampling and non-sampling errors. Each type of error is evaluated in the context of the 2012 APS.  Sampling error is the difference between the data obtained from the survey sample and the data that would have resulted from a complete census of the entire population taken under similar conditions. Thus, sampling error can be described as differences arising from sample-to-sample variability.  Non-sampling errors are all other errors that are unrelated to sampling. Non-sampling errors can occur at any stage of the survey process, and include non-response for the survey as well as errors introduced during data collection or computer processing.  Respondents may have made errors in their responses, trying to recall facts from the past, for example, or when a proxy stands in for a respondent. A response may have been incorrectly captured due to interviewer fatigue or a computer malfunctioning. An error may have been made in programming when the data were being processed or totalled. These are all examples of non-sampling errors.

This chapter describes the various measures adopted to prevent errors from occurring wherever possible and to adjust for any errors found throughout the different stages of the APS. Areas of caution for interpreting APS data are noted.

7.2 Sampling errors and bootstrap method

The estimates that can be derived from the 2012 Aboriginal Peoples Survey are based on a sample of individuals. Somewhat different estimates might be obtained if a complete census had been taken using the same questionnaires, interviewers, supervisors, processing methods, etc. as those actually used. The difference between an estimate obtained from the sample and the one resulting from a complete count taken under similar conditions is called the “sampling error” of the estimate.

In order to provide estimates of sampling error for statistics produced in the APS, a particular type of bootstrap method (the bootstrap being itself a specific resampling method), was developed. Several bootstrap methods exist in the literature but none of them was appropriate for the APS sampling design. The particularities of the APS design that made the estimation of sampling errors difficult were the following:

  • Three-phase sampling design in which households (or dwellings) were selected in the first two phases and individuals in the third phase (section 3.2.3);
  • The sampling fraction of the first phase sample (NHS sample) was non-negligible (about 1/3 in the N1 regions) and the APS sampling fraction was generally relatively high in most strata;
  • The APS strata (combinations of domains of estimation, N1 or N2 regions, initial respondent vs. NRFU respondent, identity vs. ancestry-only) were not nested within the NHS strata (collection units or groups of collection units);
  • The method used had to be flexible enough to produce standard statistics such as proportions, totals, means and ratios but also more sophisticated statistics, including percentiles, logistic regression coefficients, etc.

Some of these particularities were encountered during the 2006 APS. However, in 2006, the survey frame was constructed from the Census long form. The sampling fraction was then about one in five households and the response rate was high because of the mandatory nature of the Census. In 2011, the NHS had an even larger sampling fraction of about one in three households. Since the NHS was a voluntary survey, non-response was higher than it had been for the 2006 long form questionnaire and a subsample of non respondents was selected for non response follow-up, which made the sampling design more complex. A more detailed description of the NHS sampling design is found in Chapter 3 of the National Household Survey User Guide

Because  non-response to the NHS was relatively important, for the purpose of calculating variance only (variance is a particular measure of sampling error), NRFU respondents were considered as a third phase sample, where the probability of inclusion of a household was equal to its own response probability and did not depend on the response probabilities of other households.

Several bootstrap methods exist in the literature for single-phase sampling and for multi-stage sampling. The most common one is called the “with-replacement bootstrap” and consists of selecting M with-replacement subsamples from the main sample and producing estimates for each subsample. The bootstrap variance estimate is then derived as a function of the squared differences between estimates coming from each of the M bootstrap subsamples and the estimate coming from the survey sample.

Variance calculation is greatly simplified though the use of bootstrap weights. For each subsample (bootstrap replication), the initial sampling weight first has to be adjusted for bootstrap subsampling, which produces what is called “initial bootstrap weights”. Since each bootstrap sample is drawn by selecting the units with replacement, a unit can appear several times in a particular bootstrap sample. It can be shown that the bootstrap weights are a function of the initial sampling weight of the observation multiplied by what is called “the multiplicity” of the unit in the bootstrap sample, which is the number of times the unit is selected in the bootstrap sample. The multiplicity of a unit in the bootstrap sample is a random variable following what is called a “multinomial distribution”. Hence, the bootstrap weights can be seen as the product of the initial sampling weights by a random adjustment factor (in this case, a function of the multiplicity of the unit). Once initial bootstrap weights have been derived, all weight adjustments applied on the initial sampling weights are applied to the initial bootstrap weights to obtain the final bootstrap weights, which will capture the variance associated with not only the particular sampling design but also the variance associated to all weight adjustments applied to the full sample to derive the final weights.

For the 2006 APS, a general bootstrap method for two-phase samplingNote1 was developed.

As mentioned earlier, bootstrap weights can be seen as the product of the initial sampling weight by a random adjustment factor. This is the idea behind the general bootstrap methodology used in 2006. In the case of that two-phase sample, the variance was decomposed into two components, each one associated to a phase of sampling. The general two-phase bootstrap methodology produced a random adjustment factor for each phase of sampling. In the case of the 2006 APS, the initial bootstrap weight of a unit was the product of the initial sampling weight by these two random adjustment factors.

In 2011, however, to estimate variance only, the NHS was seen to have two additional phases, one corresponding to the NRFU subsample and one associated with non-response to the NRFU subsample. For the 2012 APS, the three phases of the NHS were combined into a single phase and the general two-phase bootstrap methodology (one NHS phase and one APS phase) was then applied. In the general bootstrap method for two-phase designs, random adjustment factors are functions of the simple and double inclusion probabilities associated with each of the phases. To combine the three phases into one, the simple and double inclusion probabilities were combined for the three NHS phases. The simple and double inclusion probabilities of the combined three phases are given by the product of the simple and double inclusion probabilities of each of the three phases. The details of the methodology used are found in Haddou (2013)Note2.

Once the three NHS phases were combined into a single phase, the general bootstrap method for two-phase sampling was applied, which involved calculating two sets of random adjustment factors, that is one set for each phase.

The presence of these two sets of random adjustment factors had a major advantage. The first set could be used for estimates based on the first phase only, that is, estimates based on the NHS sample. These estimates were used when the weights were adjusted based on the NHS totals at the time of post-stratification (section 6.5). This produced variable NHS totals for each bootstrap sample and reflected the fact that NHS totals were based on a sample and not on known, fixed totals.

For the APS, 1000 sets of bootstrap weights were generated using the method described above. The method used is slightly biased upward in the sense that it slightly overestimates the variance. However, the amount of overestimation was found to be negligible for the APS. The method can also lead to negative bootstrap weights. To overcome this problem, a transformation was done on the bootstrap weights that reduced their variability. Therefore, the variance calculated on these transformed bootstrap weights has to be multiplied by a factor that is a function of a certain parameter, called phi. The value of the parameter corresponds to the smallest integer that makes all bootstrap weights positive. For the APS, this factor is 4. The variances calculated on the transformed bootstrap weights have to be multiplied by 42 = 16. In addition, the CVs obtained (square root of the variance divided by the estimate itself) have to be multiplied by 4. However, most software which produce sampling error estimates from bootstrap weights have an option to specify this adjustment factor such that the correct variance estimate is obtained without the need of an extra step to multiply by the constant.

It is extremely important to use the appropriate multiplicative factor for any estimate of sampling error such as variance, standard error or CV. Omission of this factor would lead to erroneous results and conclusions. This factor is often specified as the “Fay adjustment factor” in software which produces sampling error estimates from bootstrap weights.

Note that if C is the variance multiplicative factor, some software uses the parameter k instead where k=11/ C MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGacaGaaiaabeqaamaabeabaaGcbaGaam 4Aaiabg2da9iaaigdacqGHsislcaaIXaGaai4lamaakaaabaGaam4q aaWcbeaaaaa@3DFF@ . In our case, since C=16, then k=0.75. For examples of procedures using the Fay adjustment factor, see the Aboriginal Peoples Survey, 2012: User’s Guide to the Analytical File.

The sampling error measure used for the APS is the coefficient of variation (CV) of the estimate, which is the standard error of the estimate divided by the estimate itself. When the CV of an estimate is less than or equal to 16.6%, the estimate can be used without restriction. In this survey, when the CV of an estimate is greater than 16.6% but smaller or equal to 33.3%, the estimate will be accompanied by the letter “E” to indicate that the data should be used with caution. When the CV of an estimate is greater than 33.3%, the cell estimate will be replaced by the letter “F” to indicate that the data is suppressed for reasons of reliability. An “X” is used to indicate that an estimate is suppressed to meet confidentiality requirements of the Statistics Act. These rules are summarised in Table 5.

Table 5
Sampling variability guidelines
Table summary
This table displays the results of sampling variability guidelines. The information is grouped by type of estimate (appearing as row headers), c.v. (in %) , guidelines and symbol (appearing as column headers).
Type of Estimate c.v. (in %) Guidelines Symbol
Acceptable c.v. ≤ 16.6 Estimates can be considered for general unrestricted release. Requires no special notation.  
Marginal 16.6 < c.v. ≤33.3 Estimates can be considered for general unrestricted release but should be accompanied by a warning cautioning subsequent users of the high sampling variability associated with the estimates. Such estimates should be identified by the letter E (or in some other similar fashion). E – use with caution
Unacceptable c.v. > 33.3 Statistics Canada recommends not to release estimates of unacceptable quality. However, if the user chooses to do so then estimates should be flagged with the letter F (or in some other fashion) and the following warning should accompany the estimates: “The user is advised that . . . (specify the data) . . . do not meet Statistics Canada’s quality standards for this statistical program. Conclusions based on these data will be unreliable and most likely invalid. These data and any consequent findings should not be published. If the user chooses to publish these data or findings, then this disclaimer must be published with the data.” F – too unreliable to be published

7.3 Non-sampling errors

Besides sampling, a number of factors at almost every stage of a survey can cause errors in survey results. Non-sampling errors arise primarily from the following sources: non-response, coverage, measurement and processing.  For each of these areas, the following sections discuss the various measures used to minimize and correct error.  For example, measurement errors may be due to respondents misunderstanding the questions and answering them inaccurately; also responses may be entered incorrectly during data capture and errors may be introduced in the processing and tabulation of data. Moving from a paper questionnaire in 2006 to Computer Assisted Interviewing (CAI) in 2012 greatly reduced the level of non-sampling error because CAI allows for the direct capture of responses, automated flows between questions, built in edits which eliminate inconsistencies and outliers, etc. (for more information on CAI, please refer to section 2.1).

Over a large number of observations, randomly occurring errors will have little effect on the estimates from the survey. However, errors occurring systematically will contribute to biases in the survey estimates. Thus, much time and effort was devoted to reduce non-sampling errors in the survey as described in the following sections. 

7.3.1 Non-response errors

Non-response errors result from a failure to collect complete information on all units in the selected sample. Non-response produces errors in the survey estimates in two ways. First, non-respondents often have different characteristics from respondents, which can result in biased survey estimates if non-response is not corrected properly. The larger the non-response rate, the larger the risk of potential bias will be. Second, having a larger number of non-respondents reduces the effective size of the sample. As a result, the precision of the estimates decreases (the sampling error on the estimates will increase). This second aspect can be overcome by selecting a larger sample size initially. However, this will not reduce the potential bias in the estimates.

There are many types of non-response. One form of non-response is item non-response (or partial non-response), where the respondent does not respond to one or more questions, but has completed a significant portion of the overall questionnaire.  Item non-response can be due to difficulty understanding a particular question.

Generally, the extent of item non-response was relatively small in the APS. Extensive qualitative reviews and testing of questionnaire was done prior to the survey, hence reducing the extent of item non-response. A response to key pre-defined questions was required before a case was classified as “respondent” as described in section 5.3.1. There were some cases, however, where a large proportion of responses to key questions were missing. These cases were eliminated from the database of respondents (did not satisfy definition of respondent) and were treated during weighting as a special case of total non-response (section 6.4). Finally, there is total non-response when the person selected to participate in the survey could not be contacted or did not participate once contacted. Weights of respondents were inflated in order to compensate for those who did not respond as described in section 6.3.

To mitigate the number of non-response cases, many initiatives were undertaken. In the months leading up to the survey, a comprehensive communications strategy was implemented to encourage participation as described in section 4. In addition, in-depth interviewer training was conducted. Interviewer training in conjunction with detailed interviewer manuals was done under by experienced Statistics Canada training staff, who oversaw activities in the field. Efforts to reach non-respondents through call-backs and follow-ups were also made by senior interviewers to encourage respondents to participate in the survey. Field follow-up, using CAPI interviewers, was also conducted in many specific regions.

A detailed table of final response rates obtained for the 2012 APS is provided in section 3.3 of this guide (Table 4).

7.3.2 Coverage errors

As mentioned in section 3.1, the target population of the 2012 APS was the Aboriginal identity population of Canada, aged 6 years and over as of February 1, 2012, living in private dwellings, excluding persons living on Indian reserves or settlements and in certain First Nations communities in Yukon and the NWT. The population sampled or covered by the survey corresponded to NHS respondents reporting Aboriginal ancestry or identity (see section 3.1.1) with the same restrictions as those for the target population in terms of age and geography. For data on First Nations people living on reserve, researchers are directed to use the 2011 NHS. Alternately, information on that population will be made available through the First Nations Regional Early Childhood, Education and Employment Survey conducted by the First Nations Information Governance Centre (see section 2.2).

Coverage errors occur when there are differences between the target population and the sampled population (population covered by the frame). Over-coverage is generally not an issue since out of scope units in the sample are typically identified during data collection and can be estimated for the entire survey frame. However, under-coverage can exist. Because the APS sample was selected from those who had participated in the NHS, individuals who did not participate in the NHS could not be sampled for the APS (the NHS had an unweighted response rate of 68.6%). As such, non-response bias in the NHS could translate to coverage bias in the APS (although technically, this could also be considered as a non-response bias for APS). Statistics Canada conducted several studies, before and after NHS data collection, to assess the risk and extent of the potential non-response bias in the NHS. A number of measures were taken to mitigate its effects. Namely, particular non-response follow-up procedures were used to reduce the potential bias for populations at risk such as the Aboriginal population. Particular weighting strategies were also used to reduce this bias. For a full discussion of data quality for the NHS, please refer to the National Household Survey User Guide.

7.3.3 Measurement errors

Measurement errors occur when a provided response differs from the real value. Such errors may be attributable to the respondent, the interviewer, the questionnaire, the collection method or the respondent's record-keeping system. Extensive efforts were made to develop questions for the 2012 APS which would be understood, relevant and culturally sensitive.

Following the release of data from the 2006 APS, an extensive content review was conducted of existing APS questions. The review brought together expertise from a diverse group of researchers and subject matter experts from within and outside of Statistics Canada. An analysis was conducted on which questions worked the best and which were most effective in producing valid indicators. This process also extended into an extensive search for relevant questions from other standardized survey questions at Statistics Canada. Questions selected for potential inclusion on the 2012 questionnaire then underwent several rounds of qualitative testing using one-on-one interviews with respondents in eight different communities across various regions of Canada, including Iqaluit in the North. Testing was done among First Nations people, Métis and Inuit. Qualitative testing of the survey questionnaire was carried out by Statistics Canada's Questionnaire Design Resource Centre (QDRC). To minimize measurement error, adjustments were made to question wording and flows based on those results. 

Many other measures were also taken to specifically reduce measurement error, including the use of skilled interviewers, extensive training of interviewers with respect to the survey procedures and content, and observation and monitoring of interviewers to detect problems of questionnaire design or misunderstanding of instructions.

7.3.4 Processing errors

Processing errors may occur at various stages of the survey process including data capture, coding and editing. Quality control procedures were applied to every stage of APS data processing to minimize this type of error. As compared to the 2006 APS, processing errors in 2012 were greatly reduced from the fact that the 2006 paper and pencil data collection method was replaced by Computer Assisted Interviewing.

APS questionnaires were first reviewed in the field by the interviewer supervisor. At the data processing stage, a detailed set of procedures and edit rules was used to identify and correct any inconsistencies between the responses provided. A set of thorough, systematized procedures was developed to assess the quality of every variable and to make corrections to any errors found. A snapshot of the output files was taken at each step and verification was done by comparing files at the current and previous step. The programming of all edit rules was exhaustively tested before being applied to the data.  Some examples of the data processing verifications were: the review of all question flows, including very complex sequences, to ensure skip values were accurately assigned and distinguished from different types of missing values; quality control double-coding of “other-specify” responses; experienced supervision of coding to standardized classifications; and the review of all derived variables against their component variables to ensure accurate programming of derivation logic, including very complex derivations. See the data processing chapter (section 5) of this guide for more details.


Notes

  1. Langlet, É., Beaumont, J.-F., and Lavallée, P. (2008). Bootstrap Methods for Two-Phase Sampling Applicable to Postcensal Surveys. Paper presented at Statistics Canada’s Advisory Committee on Statistical Methods, May 2008, Ottawa.
  2. Haddou, M. (2013). Bootstrap Variance Estimation Specifications - Aboriginal Peoples Survey. Internal document, January 2013.
Date modified: