Data quality, concepts and methodology: Data quality

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

[an error occurred while processing this directive]16-401-x[an error occurred while processing this directive] [an error occurred while processing this directive]

All data, from whatever source, are subject to error. The Industrial Water Survey is no exception. There are two general categories of error in surveys. The first is sampling error which arises from the fact that a sample or subset of the target population is used to represent the population. The size of sampling error is quantifiable. The second category is referred to as non-sampling error and is not as easily quantified. Non-sampling error refers to all the other kinds of error that arise in surveys. An incomplete or inaccurate list of the general population, a respondent misinterpretation of questions, a provision of erroneous information, a failure to respond and information processing errors are examples of nonsampling errors.

Typically the sampling error is measured by the expected variability of the estimate from the true value, expressed as a percentage of the estimate. This measure is referred to as the coefficient of variation (CV) or the standard deviation. Coefficients of variation of the final estimates were computed for the Industrial Water Survey and are indicated on the statistical tables. The quality of the estimates was classified as follows:

A. Excellent
CV is 0.01% to 4.99%
B. Very good
CV is 5.00% to 9.99%
C. Good
CV is 10.00% to 14.99%
D. Acceptable
CV is 15.00% to 24.99%
E. Use caution
CV is 25.00% to 49.99%
F. Unreliable
CV is > 49.99% (data is suppressed)

As mentioned in the previous section on "data collection and processing", every attempt was made to eliminate the non-sampling error through collection and data validation techniques.

Response rates

The 2007 response rate for the manufacturing component of the Industrial Water Survey was 72%. For the mining component of the survey, it was 79%. The response rate was 92% for the thermal-electric component. The total water intake variable and the total water discharge variable were considered mandatory. Without these two variables, a record was considered to be a "total non-response" to the survey. At the end of the collection cycle, the sample was re-weighted to account for the "total non-response" units.

Error detection

Many factors affect the accuracy of data produced in a survey. For example, respondents may have made errors in interpreting questions, answers may have been incorrectly entered on the questionnaires, and errors may have been introduced during the data capture or tabulation process. Every effort was made to reduce the occurrence of such errors in the survey.

Returned data were first checked using an automated edit-check program (BLAISE) immediately after capture. This first procedure verifies that all mandatory cells have been filled in, that certain values lie within acceptable ranges, that questionnaire flow patterns have been respected, and that totals equal the sum of their components. Collection officers evaluate the edit failures and concentrate follow-up efforts accordingly.

Further data checking is performed by subject matter officers who compare historical data (if available) with returned data to determine if differences between survey cycles are reasonable. If not, collection officers are asked to confirm with respondents their responses. Subject matter officers also research companies (annual reports, web sites, etc.) in an effort to verify information submitted by respondents.

Imputation

Statistical imputation was used for partial-response records. Five methods of imputations were used for the Industrial Water Survey: Deterministic imputation (only one possible value for the field to impute), historical imputation, imputation by ratio, donor imputation (using a "nearest neighbour" approach to find a valid record that is most similar to the record requiring imputation) and manual imputation. Ratios were calculated and donors were selected for imputation purposes based on the same or closest industry group within specified geographic areas.

Estimation

The response values for sampled units were multiplied by a sampling weight in order to estimate for the entire population. The sampling weight was calculated using a number of factors, including the probability of the unit being selected in the sample. Finally, the weights were adjusted to account for the uncovered portion and for respondents who could not be contacted or who refused to complete the survey.

Quality evaluation

When the Industrial Water Survey was reinstituted for reference year 2005, it had been almost ten years since the survey had last been conducted. In addition to the extended lapse of time between survey years, the use of different industrial classification systems and the different sampling strategies between the survey years made historical comparisons difficult. Reported data for 2005 was evaluated for consistency within the reporting unit and within a reporting unit's industry. However, with the survey being conducted again for reference year 2007, a comparison of the 2 years was possible. An important result of this historical comparison was the discovery of inconsistencies between the 2005 and 2007 results of the survey. Some of these inconsistencies were the result of misunderstandings by the respondent as to what they were being asked to report. Design changes to the questionnaires and the use of a "Reporting Guide" implemented with the 2007 survey should minimize this type of problem for the 2007 and future versions of the survey. Revisions to the 2005 data have been made and the revised results are available at the following link: www.statcan.gc.ca/pub/16-401-x/16-401-x2008001-eng.htm.

Next | Previous

Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: