Data quality, concepts and methodology: Methodology and data quality

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Introduction

This section provides an overview of the underlying methodology of the survey and of key aspects of the data quality. It will also provide an understanding of the strengths and limitations of the data. The information may be of particular relevance when making comparisons with data from other surveys or sources of information and when drawing conclusions from time series.

Reference period

Respondents of the Households and the Environment Survey (HES) were asked to refer to behaviours and activities that were undertaken by the household for the following reference periods (examples of the questions or modules using the reference period):

Reference period: At the time of the interview

  1. Dwelling's water source
  2. Type of heating equipment
  3. Energy conservation
  4. Septic system
  5. Water conservation (water meters, low-flow showerheads, reduced volume toilets, rain barrel or cistern)
  6. Recycling programs
  7. Radon awareness
  8. Ethanol blended gasoline and bio-diesel availability

Reference period: During the previous summer

  1. Lawn and garden watering

Reference period: During the "last twelve months"

  1. Drinking water choice
  2. Water treatment
  3. Water testing
  4. Water conservation (indoor water use)
  5. Fertilizer and pesticide use
  6. Recycling behaviour
  7. Hazardous waste disposal
  8. Composting
  9. Cleaning and chemical products
  10. Recreation vehicles and gas-powered equipment
  11. Motor vehicles
  12. Public transit
  13. Ethanol blended gasoline and bio-diesel use
  14. Air quality
  15. Purchasing decisions
  16. Total household income

Reference period: Warmer months and colder months

  1. Mode of transport to work

Reference period: Winter season and summer season

  1. Thermostat use
  2. Indoor temperature

Reference period: "In the last five years"

  1. Major appliance purchases

Target population

The target population consisted of households in Canada excluding households located in Yukon, Northwest Territories and Nunavut, households located on Indian reserves or Crown lands, and households consisting entirely of full-time members of the Canadian Armed Forces. Institutions and households of certain remote regions were also excluded.

Variables measured

Broadly, the 2007 HES measured variables that explored the following themes:

  1. Water quality concerns of households
  2. Consumption and conservation of water
  3. Consumption and conservation of energy
  4. Home heating and cooling
  5. Use of household lawn and garden equipment
  6. Use of gasoline-powered recreation equipment
  7. Pesticide and fertilizer use on lawns and gardens
  8. Recycling, composting and waste disposal practices
  9. Impacts of air and water quality on households
  10. Transportation decisions
  11. Purchasing decisions

Instrument design

The questionnaire was designed by Statistics Canada in consultation with stakeholders involved in the Canadian Environment Sustainability Indicators project and in consideration of the data needs of both the project and the larger research and policy communities. Testing of the questionnaire was done by Statistics Canada's Questionnaire Design Research Centre (QDRC). Focus group sessions were conducted along with a number of one-on-one interviews. These were conducted in both English and French by Statistics Canada's Questionnaire Design Resource Centre in five cities across Canada in January and February 2007.

The questionnaire was designed to follow standard practices and wording, when applicable, in a computer-assisted interviewing environment. This included the automatic control of question wording and flows that depended upon answers to earlier questions and the use of online edits to check for logical inconsistencies and gross capture errors.

The computer application for data collection was subjected to extensive testing before its use in the survey.

Sampling

The HES sample was selected from the 2007 (January to June) respondents to the Canadian Community Health Survey (CCHS). All the details of the CCHS sample design can be obtained upon request. In Quebec and in Ontario, the HES sample was selected from the CCHS respondents in order to allow for reliable estimates; i.e., with a coefficient of variation (CV) of 16.5% or better for proportions as small as 10% in census metropolitan areas (CMAs) and in the non-CMA portion of each province. In the other provinces, all the CCHS responding dwellings were selected in order to allow for the most reliable estimates possible. The initial HES 2007 sample size consisted of 29,980 dwellings.

Data collection

Data collection took place from October 2007 to February 2008. Participation in the survey was voluntary and data were collected directly from a representative of the selected household by telephone interview. Depending on this person's availability and operational constraints, the HES interview was completed immediately or arrangements were made to call back in order to complete the interview. An automated call scheduler managed follow-up calls in order to try to make contact with the respondent at different times of day throughout the collection period.

Interviews for the HES were conducted from Statistics Canada's regional offices using a computer-assisted telephone interviewing (CATI) application. The initial sample size consisted of 29,980 dwellings. A total of 21,690 responding units yielded a final response rate of 72.3% to the HES.

Error detection

The HES questionnaire incorporated many features to maximize the quality of the data collected. There were multiple edits in the computer-assisted interview questionnaire to compare the entered data against unusual values and logical inconsistencies between sections of the questionnaire. When an edit failed, the interviewer was prompted to correct the information, with the help of the respondent. As well, the interviewer had the ability to enter a response of "Don't Know" or "Refused" if the respondent did not answer the question.

Once the data were received at Statistics Canada's head office, an extensive series of processing steps was undertaken to examine each record received. A top-down flow edit was used to clean up any question paths that may have been mistakenly followed during the interview.

Estimation

Estimates representing in-scope households were produced by assigning weights to each sampled household. The weight of a sampled household indicated the number of households in the population that the unit represented. The initial weight was provided by the CCHS and incorporated the probability of selecting the unit in their sample, as well as other adjustments such as the treatment of non-response to the CCHS.

In order to produce the HES weights, a first adjustment was made to the initial weight to reflect the fact that only a subsample of the CCHS was used. A second adjustment was made to account for the HES nonresponse. Finally, a third and final adjustment was made to produce the final weight. This final adjustment consisted of a post-stratification to the Census projections. The quality of the estimates was assessed using estimates of their CV. Given the complexity of the HES design, CVs cannot be calculated using a simple formula therefore bootstrap replicate weights were used to obtain the CVs of the estimates.

Quality evaluation

All published data were compared to identical or similar HES data from previous surveys to ensure consistency. Explanations were found for any significant changes. Subject-matter experts confronted the data using other sources as well as by identifying and researching any values that were not consistent with others in the same domain.

Disclosure control

Statistics Canada is prohibited by law from releasing any data that would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Coverage

The coverage error of the CCHS, of which the HES is a subsample, is estimated at less than 2%.

Response rates and sampling error

The response rate for this survey was 72.3%. Provincial response rates ranged from 68% to 75%.

The results estimated from HES are based on a sample of households in Canada. The results obtained from asking the same questions to all Canadian households would differ to some known extent. The extent of this sampling error is quantified by the CV with the following guidelines:

  1. 16.5% and below: acceptable estimate;
  2. 16.6% to 33.3%: marginal estimate requiring cautionary note to users; and
  3. 33.3% and above: unacceptable estimate.

Estimates that do not meet an acceptable level of quality are either flagged for caution or suppressed. CV tables are prepared by Statistics Canada and made available to help users understand the quality of individual estimates. For example, CVs for the estimated proportion of households that had a compact fluorescent light bulb in 2007 for Canada and the provinces are as follows:

Canada
0.90%
Newfoundland and Labrador
4.44%
Prince Edward Island
4.64%
Nova Scotia
2.47%
New Brunswick
3.06%
Quebec
2.33%
Ontario
1.54%
Manitoba
3.61%
Saskatchewan
3.07%
Alberta
2.88%
British Columbia
1.98%

Data comparability over time

For the 2007 version of the survey, improvements were made to some questions. Some were reworded or reordered to reflect what was learned during the 2006 collection cycle. While these quality improvements were necessary, they have impacted the comparability of some of the 2007 data with those of 2006. Thus, care should be exercised when making direct year-to-year comparisons for certain topics.

Data obtained from the 2007 survey are directly comparable with data from the 2006 survey for the following variables:

  1. Main source of water
  2. Access to and use of recycling programs
  3. Household composting
  4. Presence of a thermostat and a programmable thermostat
  5. Presence of energy-saving light bulbs
  6. Presence of low-flow shower heads
  7. Presence of a low-flow toilet or a toilet tank with the water volume modified
  8. Presence of water purifiers or filters
  9. Presence of a yard

The following topics describe some of the more significant changes and offer some guidance when making such comparisons for these topics. Further information on making comparisons for topics not listed here can be obtained upon request.

Topic: Thermostats and dwelling temperature

Discussion
Potential impact on comparability
In 2006, the questions referred to the "heating" and "cooling" seasons, while in 2007 this was changed to "winter" and "summer" seasons. It is expected that most respondents interpreted these concepts similarly, so the results should be comparable.
Little or no impact – Data can be readily compared.

Topic: Drinking water

Discussion
Potential impact on comparability
In 2006, the question asked "What type of water does your household primarily drink at home?", while the corresponding question in 2007 asked "…what type of water did your household primarily use for drinking?". The wording of the question in 2006 addressed only water consumed in the home while the wording of the question in 2007 potentially includes water consumed by household members outside the home, such as at work or in restaurants.
Moderate impact – Comparisons should be made with caution.

Topic: Drinking water treatment

Discussion
Potential impact on comparability
In 2006, respondents were asked if they "usually boiled their water before drinking it", while the corresponding question in 2007 asked whether respondents had boiled their water in order to make it safe for drinking in the last twelve months. This wording change could include one-time events that were not a usual mode of water treatment for the respondents. Thus, care must be taken when making direct comparisons for this response.
Moderate impact – Comparisons should be made with caution.

Topic: Pesticide use

Discussion
Potential impact on comparability
In 2006, respondents were asked whether "any weed killers, pesticides or fungicides were applied to their lawn or garden", without any distinction being made about whether they were using "chemical" or "natural or organic" pesticides. Such a distinction was made in 2007. It is felt that respondents in 2006 generally did not consider natural or organic pesticides when answering the question so comparisons between the two years should be made using the use of chemical pesticide figures from 2007.
Significant impact – Comparisons should only be made using the chemical pesticide data.

Topic: Fertilizer and pesticide use

Discussion
Potential impact on comparability
In both 2006 and 2007, the universe for the questions related to fertilizer and pesticide use was based on the presence of a lawn or garden. Apartments were considered in-scope for the universe in 2006, but were out-of-scope in 2007. Users of the 2006 Public Use Microdata File (catalogue no. 16M0001X) should exclude apartments when making comparisons between 2006 and 2007.
Moderate – Comparisons should be made with caution.