Data quality, concepts and methodology: Data quality evaluation

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

The methodology of this survey has been designed to control errors and to reduce their potential effects on estimates. However, the survey results remain subject to errors, of which sampling error is only one component of the total survey error. Sampling error results when observations are made only on a sample and not on the entire population. All other errors arising from the various phases of a survey are referred to as non-sampling errors. For example, these types of errors can occur when a respondent provides incorrect information or does not answer certain questions; when a unit in the target population is omitted or covered more than once; when GST data for records being modeled for a particular month are not representative of the actual record for various reasons; when a unit that is out of scope for the survey is included by mistake or when errors occur in data processing, such as coding or capture errors.

Prior to publication, combined survey results are analyzed for comparability; in general, this includes a detailed review of individual responses (especially for large businesses), general economic conditions and historical trends.

A common measure of data quality for surveys is the coefficient of variation (CV). The coefficient of variation, defined as the standard error divided by the sample estimate, is a measure of precision in relative terms. Since the coefficient of variation is calculated from responses of individual units, it also measures some non-sampling errors.

The formula used to calculate coefficients of variation (CV) as percentages is:

Figure 1: CV

Confidence intervals can be constructed around the estimates using the estimate and the CV. Thus, for our sample, it is possible to state with a given level of confidence that the expected value will fall within the confidence interval constructed around the estimate. For example, if an estimate of $12,000,000 has a CV of 2%, the standard error will be $240,000 (the estimate multiplied by the CV). It can be stated with 68% confidence that the expected values will fall within the interval whose length equals the standard deviation about the estimate, i.e. between $11,760,000 and $12,240,000. Alternatively, it can be stated with 95% confidence that the expected value will fall within the interval whose length equals two standard deviations about the estimate, i.e. between $11,520,000 and $12,480,000.

Finally, due to the small contribution of the non-survey portion to the total estimates, bias in the non-survey portion has a negligible impact on the CVs. Therefore, the CV from the survey portion is used for the total estimate that is the summation of estimates from the surveyed and non-surveyed portions.

Next technical note | Previous technical note

Date modified: