Statistics Canada Quality Guidelines

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

Survey steps >

Data quality evaluation

Scope and purpose

Data quality evaluation is a process used to determine whether final products meet the original objectives of the statistical activity, in particular in terms of that data's accuracy, timeliness and reliability. It allows users to better interpret survey results and the Agency to improve the quality of its surveys.

There are two broad methods of evaluating data quality:

Certification or validation is the process whereby data are analysed before official release with a view to avoiding gross errors and eliminating poor quality data. This process frequently coincides with an interpretative analysis of the data and usually involves time constraints and deadlines, and therefore only methods that yield rapid results can be used.

Sources of error studies generally provide quantitative information on the specific sources of errors in the data. While timeliness is important, the results of these studies often are only available after the official release of the data.

Principles

Users must be able to determine to what extent data errors affect their use; however, users are rarely able to independently evaluate the accuracy of data produced by a statistical agency. It is therefore up to each agency to evaluate data quality and quickly provide users with the results in a usable form.

Data quality evaluations are also useful to the Agency. To the extent that errors can be associated with certain stages of the survey process, evaluations can be used to improve the quality of the next iteration of the survey, as well as other similar surveys. Evaluations include, for instance, reviewing survey plans, the significance of nonresponse, as well as dubious imputation practices.

The timeliness of data quality evaluations is just as important as the timeliness of the data. Ideally, evaluation results are valid and timely enough to improve released data — for example: an evaluation of coverage can be used to compensate for differences between the frame and the target population. When this is not possible, evaluation results should at least be timely enough to help users to analyse the data and staff to design the next iteration of the survey.

Guidelines

Determine the extent of data quality evaluation required for a program or a product. The factors to be considered are: data uses and users; risk of errors and impact of errors on data use; quality variation over time; cost of the evaluation in relation to the total cost of the program; improving quality; increasing efficiency and productivity; usefulness of measures to users and ease of interpretation; and whether the survey will or will not be repeated.
Data quality evaluations at Statistics Canada must be designed to meet the mandatory and minimum requirements of the Policy on Informing Users of Data Quality and Methodology (Statistics Canada, 2000d). In the case of census and survey data, minimum requirements include measuring or evaluating coverage errors, response or imputation rate and (if dealing with a sample survey) measurement of sampling errors for key characteristics.
Provide a quality evaluation based on expert opinion or subjective analysis whenever data quality evaluations will not yield quantitative measurements because of the nature of the product, the user, time constraints, cost or technical feasibility.
Make planning of data quality evaluations part of the overall survey design, as the information needed for such evaluations must often be collected during the survey process. Data quality reports should be included in the dissemination schedule for the survey.
In the case of repeated surveys and statistical activities, it may not be necessary, or even possible, to consistently produce detailed quality evaluations. However, periodically review activity to ensure it meets its objectives – not just when problems arise.
Involve users of evaluation results, whether they are associated with a statistical agency or not, in establishing the data quality evaluation program objectives. When circumstances permit, also involve them in the evaluation process.
Among certification methods, consider:

checking coherence in relation to external data sources – for example: other surveys, other iterations of the same survey, administrative data
checking internal coherence – for example: by calculating ratios that are known to be within certain limits (male-female ratios, average values of properties, etc)
analysing largest units individually as regards their contributions to overall estimates (generally applied to business surveys)
calculating data quality indicators – for example: nonresponse rates, imputation rates and coefficients of variation
holding feedback sessions with staff involved in data collecting and processing
"reasonableness" checks by well-informed experts, including pre-release external review in the form of “work in progress.”

Evaluate the following sources of error:
- Coverage errors which consist of omissions, erroneous inclusions, and duplications in the frame used to conduct the survey. Since they affect all survey estimates, they constitute one of the most important types of error. Coverage errors may translate into a negative or positive bias in the data, and the impact may vary depending on the survey universe subgroup. One should also be concerned about classification errors, notably industrial and geographical, among others. For example, badly defined limits or erroneous coding may lead to an omission of part of the territory.
- Nonresponse errors occur when there is no response to one or all of the survey questions. Nonresponse leads to an increase in variance as a result of a reduction in the actual size of the sample and the recourse to imputation, and produces a bias if the nonrespondents have characteristics of interest that are different from those of the respondents. Furthermore, there is a risk of significantly underestimating the sampling error, if imputed data are treated as though they were observed data.
- Measurement errors occur when the response provided differs from the real value; such errors may be attributable to the respondent, the interviewer, the questionnaire, the collection method or the respondent's record-keeping system. Such errors may be random or they may result in a systematic bias if they are not random.
- Processing errors occur at subsequent stages of the process, when checking, coding, entering, imputing, and tabulating data. Like measurement errors, processing errors may lead to variance and bias. It is also necessary to look into the potential impact of snags in the survey process: uneven staff training, unusually high staff turnover, procedural changes in mid-operation, etc.
- Sampling errors occur when survey results were obtained from a sample rather than the population as a whole. In practice, these errors may also include estimation errors that may be attributable to the use of estimators which, deliberately or otherwise, create a bias (e.g., some small area estimators).
A good discussion of the subject can be found in Lessler and Kalsbeek (1992) and Lyberg et al (1997).

Top of Page

References

Biemer, P., Groves, R.M., Mathiowetz, N.A., Lyberg, L. and Sudman, S. (eds.) (1991). Measurement Errors in Surveys. Wiley, New York.

Fuller, W. (1987). Measurement Error Models. Wiley, New York.

Gosselin, J.-F., Chinnappa, B.N., Ghangurde, P.D. and Tourigny, J. (1978). Coverage. Chapter 2 in A Compendium of Methods of Error Evaluation in Censuses and Surveys, Statistics Canada Catalogue no 13-564E, 7-9.

Lessler, J.T. and Kalsbeek, W.D. (1992). Nonsampling Errors in Surveys. Wiley, New York.

Lyberg, L., Biemer, P., Collins, M., de Leeuw, E., Dippo, C., Schwarz, N. and Trewin, D. (ed.) (1997). Survey Measurement and Process Quality. Wiley, New York.

Statistics Canada (1996b). Proceedings of Symposium 1996: Nonsampling Errors.

Statistics Canada (2000d). Policy on Informing Users of Data Quality and Methodology. Policy Manual, 2.3.

Statistics Canada (2001e). Proceedings of Symposium 2001: Achieving Quality in a Statistical Agency: A Methodological Perspective.

Statistics Canada (2002c). Statistics Canada’s Quality Assurance Framework - 2002. Statistics Canada Catalogue no. 12-586-XIE.

Statistics Canada (2003b). Guidelines on the Use of Data Accuracy Criteria in the Dissemination of Statistical Information. Issued by the Methods and Standards Committee, July 2003.

Home \| Search \| Contact Us \| Français
Date Modified: 2014-04-10	Important Notices