Data quality evaluation

Scope and purpose
Principles
Guidelines
Quality indicators
References

Scope and purpose

Sound data quality practices are built into all survey steps, as described in the other chapters of this document.  A data quality evaluation is a process to determine how well final products meet the original objectives of the statistical activity, in particular in terms of the reliability from an accuracy, timeliness and coherence point of view. It allows users to better interpret survey results and the Agency to improve the quality of its surveys.

There are two broad methods of evaluating data quality:

  • Certification or validation: data are analysed before official release with a view to avoiding gross errors and eliminating poor quality data. Comparisons with external or auxiliary sources are often the tool of choice at this stage;

  • Studies of the sources of error: these generally provide quantitative information on the specific sources of errors in the data.

 The data quality evaluation draws from the quality indicators produced in all survey steps. Methods for evaluating quality at each of those steps are given through out this document. While a single one-dimensional index of quality is often thought impossible to compile, the various indicators can be summarized and compared in terms of their relative importance and consequences.

Principles

Data quality evaluations should be conducted to determine to what extent the statistical information is relevant and representative. However, users are rarely able to independently evaluate the quality of data produced by a statistical agency. It is therefore up to each agency to evaluate data quality and quickly provide users with the results in a usable form.

Data quality evaluations should be conducted to determine the extent to which errors can be associated with certain stages of the survey process. Such evaluations can be used to improve the quality of the next iteration of the survey, as well as other similar surveys.

Data quality evaluations at Statistics Canada must be designed to meet the mandatory and minimum requirements of the Policy on Informing Users of Data Quality and Methodology (Statistics Canada, 2000). The minimum requirements include measuring or evaluating coverage errors, response or imputation rate and (if dealing with a sample survey) measuring sampling errors of key characteristics.

Guidelines

Design

  • Determine the extent of data quality evaluation required for a program or a product. The factors to be considered are: data uses and users, risk of errors and impact of errors on data use, quality variation over time, cost of the evaluation in relation to the total cost of the program, improving quality, increasing efficiency and productivity, usefulness of measures to users and ease of interpretation, and whether the survey has been, will be or will not be repeated.

  • Make planning of data quality evaluations part of the overall survey design, as the information needed for such evaluations must often be collected during the survey process. Data quality reports should be included in the dissemination schedule for the survey.

  • Data quality evaluation results should be valid and timely enough to improve released data. When this is not possible, evaluation results should at least be timely enough to help users to analyse the data and survey takers to improve the design of the next iteration of the survey or similar surveys.

Execution

  • Provide a quality evaluation based on expert opinion or subjective analysis whenever data quality evaluations will not yield quantitative measurements because of the nature of the product, the user, time constraints, cost or technical feasibility.

  • In the case of repeated surveys and statistical activities, it may not be necessary, or even possible, to consistently produce detailed quality evaluations. However, periodically review activity to ensure it meets its objectives, not just when problems arise.

  • Involve users of evaluation results, whether they are associated with a statistical agency or not, in establishing the data quality evaluation program objectives. When circumstances permit, also involve them in the evaluation process.

  • In order to perform all of these evaluations, the survey manager or the survey management team must have identified targets or standards which they want to attain.

Certification or validation

  • Certification or validation of the statistical information should be conducted whenever appropriate or possible.

  • Certification or validation should challenge rather than rationalize the data. It is recommended to involve analysts who did not take part in the production of the data.

  • Check coherence in relation to external data sources such as other surveys, other iterations of the same survey or administrative data.

  • Check internal coherence, for example, by calculating ratios that are known to be within certain limits (male-female ratios, average values of properties, etc).

  • Analyse the largest units individually with regard to their contribution to overall estimates (generally applied to business surveys).

  • Review and interpret data quality indicators suggested in the other sections of this document and compare them to production targets.

  • Hold feedback sessions with staff involved in data collection and processing

  • Conduct "reasonableness" checks by well-informed experts, including prerelease external review in the form of "work in progress".

Studies of sources of errors

  • Studies of sources of errors should be conducted frequently on annual or longer statistical programs, and conducted occasionally on more frequent programs.

  • Consider evaluating, among other sources, errors due to coverage, sampling, nonresponse, measurement and processing based on the results of studies conducted in other survey steps.

Quality indicators

  • Quality indicators have been suggested in each of the preceding chapters to measure characteristics specifically related to the topic of the chapter. However it is also suggested that there be some measures related to the project as a whole, not associated with any one particular step. These indicators often cannot be measured until the product has been released and, in some cases, not until much later following the release. Indicators that are not available until after the release of the product cannot be included in the documentation of the product, but they can serve as an indication of the potential quality of a subsequent iteration of the program or of a similar program. These indicators could include the following:

Timeliness

  • How long did the project take from start to finish? How long is this from the reference period?

  • How long after collection did the estimates for the main characteristics become available?

Relevance

  • Do the results respond to the goals of the project and the analytical needs of the community?

  • Were there any operational steps or constraints which meant that certain populations may not have been included or certain questions could not be asked?

  • Contrast planned outcomes and realized outcomes; justify discrepancies. 

Interpretability

  • Review the completeness of the documentation

  • Track the number of requests for information, specifically related to clarifications of the information.  This is especially important for repeated surveys. Identify whether this reflects a fundamental flaw in the conceptual framework or in the available documentation

Accuracy

  • Was the project able to produce estimates of the desired quality for all of the domains and variables that were planned? This could be expressed as a percentage, e.g. 86% of all planned estimates met the CV targets

  • For repeated surveys, compare key estimates and their quality (CV) to previous results. Be able to explain changes. Express changes in the CV in terms of percentage higher/lower than previous iterations. Similar statistics can be generated for imputation rates, error rates etc.

  • For non-repeated surveys, potentially use related administrative data or other survey estimates for comparisons to the actual estimates. Since the populations may be somewhat different, explanations of why there are differences may be necessary. 

Coherence

  • Review reasons for differences in results from previous iterations and try to quantify them (e.g., "The survey now covers the Territories. If they had not been included as in previous iterations, the national estimate would have been 31.4% rather than 31.5%").

  • Review the survey results and those of external sources; address discrepancies

Accessibility

  • Provide a description of the types and formats of products from the survey.

  • Report the number of times a survey product was viewed or accessed on a publicly accessible internet site.

  • Indicate if the survey data is available in a Public Use Microdata file, if there are any free data products and if the data is available in the Research Data Centres.

References

Biemer, P., R.M. Groves, N.A. Mathiowetz, L. Lyberg and S. Sudman (eds.) 1991. Measurement Errors in Surveys. New York. Wiley. 760 p.

Biemer, P., L. Lyberg. 2003. Introduction to Survey Quality. New York. Wiley. 424 p.

Fuller, W. 1987. Measurement Error Models. New York. Wiley. 440 p.

Lessler, J.T. and W.D. Kalsbeek. 1992. Nonsampling Errors in Surveys. New York. Wiley. 432 p.

Lyberg, L., P. Biemer, M. Collins, E. de Leeuw, C. Dippo, N. Schwarz and D. Trewin (eds.) 1997. Survey Measurement and Process Quality. New York. Wiley. 808 p.

Statistics Canada. 2000. "Policy on Informing Users of Data Quality and Methodology." Statistics Canada Policy Manual. Section 2.3. Last updated March 4, 2009.

Statistics Canada. 2002. Statistics Canada's Quality Assurance Framework - 2002. Statistics Canada Catalogue no. 12-586-XIE. Ottawa, Ontario. 28 p. /bsolc/olc-cel/olc-cel?lang=eng&catno=12-586-X.

Statistics Canada. 2003. Survey Methods and Practices. Statistics Canada Catalogue no. 12-587-XPE. Ottawa, Ontario. 396 p. /bsolc/olc-cel/olc-cel?lang=eng&catno=12-587-X.

Date modified: