Data quality, concepts and methodology: Methodology and data quality

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

[an error occurred while processing this directive]11-526-x[an error occurred while processing this directive] [an error occurred while processing this directive]

Introduction

This section provides an overview of the underlying methodology of the survey and of key aspects of the data quality. It will also provide an understanding of the strengths and limitations of the data. The information may be of particular relevance when making comparisons with data from other surveys or sources of information and when drawing conclusions from time series.

Reference period

Respondents of the Households and the Environment Survey (HES) were asked to refer to behaviours and activities that were undertaken by the household for the following reference periods: the twelve months prior to the date of the interview.

Target population

The target population consisted of households in Canada excluding households located in Yukon, Northwest Territories and Nunavut, households located on Indian reserves or Crown lands, and households consisting entirely of full-time members of the Canadian Armed Forces. Institutions and households of certain remote regions were also excluded.

Variables measured

  1. Water quality concerns of households
  2. Consumption and conservation of water
  3. Conservation of energy
  4. Home heating and cooling
  5. The indoor environment
  6. Use of household lawn and garden equipment
  7. Use of gasoline-powered recreation equipment
  8. Pesticide and fertilizer use on lawns and gardens
  9. Composting and hazardous waste disposal practices
  10. Impacts of air and water quality on households
  11. Purchasing decisions

Instrument design

The questionnaire was designed by Statistics Canada in consultation with stakeholders involved in the Canadian Environment Sustainability Indicators project and in consideration of the data needs of both the project and the larger research and policy communities. Testing of the questionnaire was done by Statistics Canada's Questionnaire Design Research Centre (QDRC). One-on-one focus sessions were conducted in both English and French by the QDRC in Ottawa and Montreal in January and February 2009.

The questionnaire was designed to follow standard practices and wording, where applicable, in a computer-assisted telephone interviewing environment. This included the automatic control of question wording and flows that depended upon answers to earlier questions and the use of online edits to check for logical inconsistencies and gross capture errors.

The computer application for data collection was subjected to extensive testing before its use in the survey.

Sampling

The Households and the Environment Survey (HES) was administered from October 2009 to November 2009 to a sub-sample of the dwellings that were part of the Canadian Community Health Survey – Annual Component, 2009 (CCHS 2009) between January 1st and June 30th, 2009. The details of the CCHS sample design are available upon request. The resulting sample size for HES 2009 consisted of 20,000 dwellings.

Data collection

Data collection took place from October 2009 to December 2009. Participation in the survey was voluntary and data were collected directly from a representative of the selected household by telephone interview. Depending on this person's availability and operational constraints, the HES interview was completed immediately or arrangements were made to call back in order to complete the interview. An automated call scheduler managed follow-up calls in order to try to make contact with the respondent at different times of day throughout the collection period.

Interviews for the HES were conducted by Statistics Canada's regional offices using a computer-assisted telephone interviewing (CATI) application. The initial sample size consisted of 20,000 dwellings. A total of 14,754 responding units yielded a final response rate of 73.8% to the HES.

Error detection

The HES questionnaire incorporated many features to maximize the quality of the data collected. There were multiple edits in the computer-assisted interview application to compare the entered data against unusual values and logical inconsistencies between sections of the questionnaire. When an edit failed, the interviewer was prompted to correct the information, with the help of the respondent. As well, the interviewer had the ability to enter a response of "Don't know" or "Refused" if the respondent did not answer a question.

Estimation

Estimates representing in-scope households were produced by assigning weights to each sampled household. The weight of a sampled household indicated the number of households in the population that the unit represented. The initial weight was provided by the CCHS and incorporated the probability of selecting the unit in their sample, as well as other adjustments such as the treatment of non-response to the CCHS.

In order to produce the HES weights, a first adjustment was made to the initial weight to reflect the fact that only a subsample of the CCHS was used. A second adjustment was made to account for the HES nonresponse. Finally, a third and final adjustment was made to produce the final weight. This final adjustment consisted of a post-stratification to the Census projections. The quality of the estimates was assessed using estimates of their CV. Given the complexity of the HES design, CVs cannot be calculated using a simple formula therefore bootstrap replicate weights were used to obtain the CVs of the estimates.

Quality evaluation

All published data were compared to data from previous cycles of the survey to ensure consistency. Subject-matter experts confronted the data using other sources as well as by identifying and researching any values that were not consistent with others in the same domain.

Disclosure control

Statistics Canada is prohibited by law from releasing any data that would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Coverage error

The coverage error of the CCHS, of which the HES is a subsample, is estimated at less than 2%.

Response rates and sampling error

The response rate for this survey was 73.8%. Provincial response rates ranged from 68.8% to 75.5%.

Sampling error is defined as the error that results from estimating a population characteristic by measuring a portion of the population rather than the entire population. For probability sample surveys, methods exist to calculate sampling error.

The coefficient of variation (CV) provides such a measure. It is the ratio of the standard error of the survey estimate to the average value of the estimate itself, across all possible samples. The coefficient of variation is usually computed as the estimate of the standard error of the survey estimate to the estimate itself. This relative measure of sampling error is usually expressed as a percentage (10% instead of 0.1). It is very useful in comparing the precision of sample estimates, where their sizes or scale differ from one another.

The extent of this sampling error is quantified by the CV with the following guidelines:

  1. 16.5% and below: acceptable estimate;
  2. 16.6% to 33.3%: marginal estimate requiring cautionary note to users; and
  3. 33.3% and above: unacceptable estimate.

Estimates that do not meet an acceptable level of quality are either flagged for caution or suppressed. CV tables are prepared by Statistics Canada and made available to help users understand the quality of individual estimates.

For example, CVs for the proportion of households that gave a correct description of radon in 2009 for Canada and the provinces are as follows:

Canada
3.17%
Newfoundland and Labrador
17.74%
Prince Edward Island
17.01%
Nova Scotia
10.05%
New Brunswick
8.78%
Quebec
6.35%
Ontario
5.54%
Manitoba
13.89%
Saskatchewan
9.33%
Alberta
9.96%
British Columbia
8.83%

Data comparability over time

The HES sample was selected from the 2007 (January to June) respondents to the Canadian Community Health Survey (CCHS). All the details of the CCHS sample design can be obtained upon request. In Quebec and in Ontario, the HES sample was selected from the CCHS respondents in order to allow for reliable estimates; i.e., with a coefficient of variation (CV) of 16.5% or better for proportions as small as 10% in census metropolitan areas (CMAs) and in the non-CMA portion of each province. In the other provinces, all the CCHS responding dwellings were selected in order to allow for the most reliable estimates possible. The initial HES 2007 sample size consisted of 29,980 dwellings.

Data collection

Topic: Radon awareness

Discussion
Potential impact on comparability
In 2007, respondents were asked if they were "aware of radon gas and its impacts on human health", while in 2009 respondents were initially asked if they had "heard of radon" and then asked a series of follow-up questions if they indicated they had. This change separated awareness of radon (i.e. they had heard of it) from knowledge of the properties of radon. Thus, care must be taken when making direct comparisons for this response.
Moderate impact – Comparisons should be made with caution.
Date modified: