Policy
- Statistics Canada will make available to users indicators of the quality of data it disseminates and descriptions of the underlying concepts and methodology.
- Statistical products will be accompanied by or make explicit reference to documentation on quality and methodology.
- Documentation on quality and methodology will conform to such standards and guidelines issued under this Policy.
- Exemption from the requirements of this policy may occur in special circumstances
- Sponsors of cost recovery surveys and statistical consultation work, for which no data will be disseminated by Statistics Canada, are to be made aware of and encouraged to conform to the applicable elements of the standards and guidelines issued under this Policy.
Standards and Guidelines on Documentation
These standards and guidelines describe the kind of documentation on data quality and methodology required to meet the Policy on Informing Users of Data Quality and Methodology.
The Elements of Quality
Among statistical agencies there is no commonly accepted definition of data quality for official statistics. Statistics Canada has defined data quality in terms of "fitness for use". Whether data and statistical information are fit for use depends on the intended uses and on intrinsic characteristics of the data or information. The essence of this Policy is that users must be provided with the information necessary to judge its fitness for their intended use.
Six dimensions of quality have been identified within the concept of "fitness for use".
- The relevance of statistical information reflects the degree to which it meets the real needs of users. It is concerned with whether the available information sheds light on the issues of most importance to users. The assessment of relevance needs to take into account the varying needs of users.
- The accuracy of statistical information is the degree to which the information correctly describes the phenomena it was designed to measure. It is usually characterized in terms of error in statistical estimates and is traditionally decomposed into bias (systematic error) and variance (random error) components. It may also be described in terms of the major sources of error that potentially cause inaccuracy (e.g., coverage, sampling, nonresponse, response).
- The timeliness of statistical information refers to the delay between the reference point (or the end of the reference period) to which the information pertains, and the date on which the information becomes available. It is typically involved in a tradeoff against accuracy. The timeliness of information will influence its relevance.
- The accessibility of statistical information refers to the ease with which it can be obtained by users. This includes the ease with which the existence of information can be ascertained, as well as the suitability of the form or medium through which the information can be accessed. The cost of the information may also be an aspect of accessibility for some users.
- The interpretabilty of statistical information reflects the availability of the supplementary information and metadata necessary to interpret and utilize it appropriately. This information normally covers the underlying concepts, variables and classifications used, the methodology of collection, and indicators of the accuracy of the statistical information. This Policy aims to ensure the interpretability of our information.
- The coherence of statistical information reflects the degree to which it can be successfully brought together with other statistical information within a broad analytic framework and over time. The use of standard concepts, classifications and target populations promotes coherence, as does the use of common methodology across surveys. Coherence does not necessarily imply full numerical consistency.
Documentation on data quality and methodology is an integral component of statistical data and analytical results based on these data. Such documentation provides the means of assessing fitness for use and contributes directly to their interpretability.
Definitions
For purposes of these standards and guidelines the following definitions are used.
- Data accuracy measure: a numeric value, or symbol corresponding to numeric values, which quantifies or summarizes the likely magnitude and important sources of differences between the published data and the quantities that the statistical activity was designed to estimate.
- Data accuracy rating: a categorization or quantification of the accuracy of data based on expert judgement or analysis. It summarizes the accuracy of the data, or indicates the level of confidence with which the data may be used. Data accuracy ratings are appropriate when, by the nature of the data product, or for reasons of timeliness, cost or technical feasibility, data accuracy measures could not be given.
Data accuracy ratings need to be based on sound evidence and good judgement. They may assess the effect of a single source of error, or the overall accuracy. They may be based on macro comparisons with data from other sources, or on conclusions drawn from a review of "data accuracy measures". They may be simply statements or numeric rankings based on an expert's assessment of data sources or of a methodology.
- Indicators of data accuracy: data accuracy measures or data accuracy ratings. These may also be termed "accuracy indicators".
- Documentation on methodology: the description of the underlying concepts and methodology used in the implementation of a statistical program, including detailed definitions of the variables, terminology, indices, models, and estimators used. It also includes descriptions of changes affecting comparability of data over time, or other features of methodology affecting data quality.
- Statistical statement or analytical result: any statement or result that explicitly or implicitly indicates the underlying meaning or statistical significance of an estimate or finding. These include highlights, interpretations, statistical test results, and statements of trend, change or significance.
General Principles
The following general principles govern the implementation of the standards and guidelines.
- Users must be provided with the information necessary to understand both the strengths and limitations of the data being disseminated.
- The documentation provided to users on data quality should engender an awareness of quality as an issue in the proper use of the data.
- The documentation on methodology must permit users to assess whether the data adequately approximate what they wish to measure, and whether the estimates were produced with tolerances acceptable for their intended purpose.
- The documentation provided should be clear, well organized and accessible. Accuracy indicators should not be technically difficult for the intended clientele to understand or use.
- The descriptions of methodology and the indicators of data accuracy should be carefully integrated whenever this will benefit the user's understanding.
- Specific standards for the level of detail to be provided in documentation on data quality or methodology are given in Mandatory Documentation Requirements section below. These are mandatory but minimum requirements. The need to go beyond these standards will depend on the benefit to users or more specifically on:
- the type of data collection, data sources, and analysis;
- the nature and purpose of the product;
- the range and impact of uses of the data;
- the medium of dissemination; and
- the total budget of the statistical program.
- The detail and frequency of the updating of the documentation on data quality for the purposes of the Policy, should consider:
- the intended uses of the data;
- the potential for error and its significance to the use of the data;
- variation in accuracy and coherence over time;
- cost of the evaluation of data quality relative to the overall cost of the statistical program;
- potential for subsequent improvement of quality and efficiency;
- applicability and utility of the indicators of accuracy to users.
Mandatory Documentation Requirements
A specific set of summary information on data quality and methodology must be presented or made available to users for each statistical product. The information should reflect the individual product. However, much of this summary documentation will be common to many products from the same statistical program.
Topics to be Included in Documentation
The summary documentation required by the Policy is to be organized according to the structure below. The bullets under each heading indicate the information that should be included (wherever applicable) in summary documentation. They are not intended to exclude any other information necessary for the proper interpretation or use of a particular information product, nor to disallow variation in the placement of material in the interests of clarity to the user. The exact content under each heading will depend on the individual program, on the types of data or results included in the product, and on whether there are important accuracy issues to describe. The numbering system used below is not part of the standard; it is provided here only for clarity.
1. Note(s) to Users (if applicable)
(Explanatory note: This item is to be included only if applicable. This topic may consist of highlights of information provided in one or more of the sections listed below, or particular explanations or warnings of which users should be aware.)
2. Name of survey or program - Concepts, Methodology and Data Quality
- a standardized message introducing the information on data quality and methodology and emphasising the importance of taking it into account.
2.1 Data Sources and Methodology
(i) General methodology
- the conceptual universe and the target population;
- a statement on the data source(s) and the sampling and collection methodology;
- a statement on the processing and estimation methodology.
(ii) Reference period
- a statement on the time frame or reference period of the data.
(iii) Revisions (if applicable)
- if applicable, a statement advising that the data are subject to revision and an indication of what the size of the revision might be - for example, a measure based on past revisions;
(iv) Adjustments (if applicable)
- if applicable, a description of benchmarking, calendarization or seasonal adjustments made to the data and their impact.
2.2 Concepts and Variables Measured
- key concepts, variables (or characteristics) and classifications used;
- key indicators, indices, or other key data or analytical results being disseminated.
2.3 Data Accuracy
- a statement of the key data accuracy issues, as well as an acknowledgement that the data are subject to error, and that the level of error may vary across geography and by characteristic (as applicable, such statements may emphasize the presence of coverage error, sampling error, error due to non-response, response error, and processing error, and may be incorporated in text with measures of accuracy);
- for census, survey or administrative data a data accuracy measure of coverage, or at least a coverage rating;
- for sample survey data (or data from the sample component of a census) estimates of sampling error for key characteristics and a brief summary of the sample design.
- for census, survey or administrative data a response rate and a statement on how non-response is handled, or an imputation rate or other measure of the extent of imputation and its contribution to the estimates.
2.4 Comparability of Data and Related Sources
- if applicable, a statement advising that the data are not or may not be comparable over time and why (including any significant change in data accuracy from one reference period to another).
2.5 Other Quality Indicators and Assessments (if applicable)
- for analytical results a summary of the analytic approach or methods, as well as a brief description and discussion of the possible effects of accuracy issues, assumptions and caveats on the results and their statistical significance;
- a description of other important potential sources of error, or of any events (for example, a strike) which have likely influenced the accuracy, timeliness and interpretation or use of the data.
3. Appendices (as necessary)
The numbering system above is not part of the standard; it is used here only for clarity.
Guidelines for Additional Documentation
For major surveys and statistical programs there is good reason to provide users with more detailed or more specialized data quality and methodology information than that required by the Standards of the previous section. The supplementary documentation might cover topics specified in more detail, or might address topics not covered by the summary documentation.
The supplementary documentation might include "technical" documentation to afford analysts a greater understanding of accuracy issues and a fuller appreciation of the methodology. Such documentation may take many forms, from a comprehensive report to separate reports or chapters on specific aspects of methodology or data quality evaluation.
Potential topics or documentation to include in this supplementary documentation are:
- topics covered in the standards
- historical quality trend or record - for any category or indicator of accuracy the long term record or trend;
- the questionnaire(s) used;
- the sampling frame - creation, updating, and quality assurance
- the detailed sample design and estimation procedures;
- other processing - description of methods and indicators of the extent of coding errors, data capture errors, impact of edits, etc.;
- a description of the imputation approach and examples of key imputation rules;
- quality control procedures used;
- the form in which the final data are stored and the tabulation or retrieval system, including confidentiality protection requirements and procedures;
- any special procedures or other steps that might be relevant to the particular content of the product;
- total variance (or total standard error) or its components by source - the overall variability of the statistics, including the effect of sampling error, response error, and processing error;
- non-response bias - an assessment of the effect of non-response on the results;
- response bias - evidence of response bias problems stemming from respondent misunderstanding, questionnaire problems, or other sources;
- seasonal adjustment - description of the methodology and measures of the impact and significance of the adjustment together with an explanation of how these measures should be interpreted (for example, the mean absolute percent change of the last year's revisions of the seasonal factor, or the MCD - months for cyclical dominance - statistics).
- data quality validation and evaluation - results and descriptions of the methodologies of the studies, processes and methods used to assess, measure or evaluate the accuracy of the data.
Statistics Canada's Quality Guidelines is a useful source of information to help identify what may be the important quality issues to be considered for inclusion in the supplementary documentation, as well as the potentially significant sources of error that might be examined in greater depth.
Electronic products for which additional documentation on data quality and methodology exists, will normally have links to the additional documentation either embedded into the product or accompanying the product. Other products will contain an explicit reference to such additional documentation whenever it exists.
Special Requirements Based on the Type of Data
As a supplement to the general standards, the following items should be included, as applicable, in documentation for the corresponding types of products.
1. Index Numbers.
For index numbers of prices or quantities, the conceptual basis presents an additional dimension in describing the data quality and methodology. Particular attention might be given to any substitutions made in developing the estimates, with special reference to product changes and changes in product quality. In addition, particular attention should be paid to specific conceptual and methodological aspects of the indices. Their proper description, in many cases, may be more important for users than a strict assessment of the quality of input data. The following elements should be developed:
- definitions - precise definitions of the underlying economic concepts that the index numbers are intended to measure. Reference should be made to any application or class of application (e.g., deflation of macro-economic aggregates) for which the index numbers are not suitable.
- the methodology adopted - documentation should cover topics such as the index formula, weighting system, computation of the index at various aggregation levels, basing, re-basing, linking of indices, treatment of changes in the varieties or qualities of goods available on the market. The adopted methodology should be compared with the underlying index concepts and possible distortions discussed.
2. National Accounts and data resulting from other data integration activities
In the case of National Accounts and data resulting from other data integration activities, both the impact of quality problems in the source data, and the impact of the methods of analysis, integration, benchmarking and adjustments used, have to be taken into account. Given the multiplicity of data sources and the complexity of methods, it may be necessary to use data accuracy ratings. In particular, it may be necessary and desirable to consolidate the ratings for all major and assessable components or sources of error into a single set of data accuracy ratings.
Documentation for data and analytical results based on data integration activities (including the System of National Accounts) should, in particular, cover the following topics:
- the conceptual framework for the analysis and integration;
- the major definitions and concepts used and how they are defined operationally;
- (the data sources used, and the extent to which they measure the target concepts, as well as gaps and deficiencies in these data sources. Non-comparability of data elements available from different sources should be noted. Reference should be made to the quality of the primary data underlying the analysis;
- the methods used in integrating and analysing the data from feeder sources including, where relevant, the adjustments made to data from different sources, the methods used for price deflation, the methods used for seasonal adjustment and benchmarking, and a description of the revision process; and
- any discrepancy arising in the integration or analysis of data from different sources, and the procedures by which these discrepancies were handled (e.g., the statistical discrepancy arising in the estimation of income and expenditure accounts).
3. Statistics derived from administrative data or from data not collected by Statistics Canada
The topics listed in mandatory documentation should be covered to the extent applicable. However, since these statistics may be based on data not originally collected for statistical purposes, the following topics take on particular importance and should be covered:
- the data sources;
- the purposes for which the data were originally collected;
- the merits and shortcomings of the data for the statistical purpose for which they are being used (e.g., in terms of conceptual and coverage biases);
- how the data are processed after being received and what, if anything, is done to correct problems in the original data set; and
- the reliability of the estimates, including caveats where necessary.
4. Documentation for geographic and cartographic data products
Documentation should include descriptions of the data sources and transformations, along with descriptions or references to the methodology and indicators of data accuracy corresponding to these sources. Documentation should also include descriptions or indicators of the positional accuracy, logical consistency and completeness of the product data.
5. Products that include primarily or only analytical results
Documentation should be provided on both the source data and the method of analysis. The requirements for documentation on the source data are similar to those for other products and can be met by including, linking to, or referring to the corresponding information for the data source(s). The documentation of the methods of analysis may be incorporated into the product either as part of the presentation of the analytical results in the body of the report, or in separate "text boxes". Such "text boxes" might also include summary information on the data source (in addition to links or references to the source documentation). Documentation on the analysis should also note the use of the Policy on Review of Information Products as a quality assurance methodology.
For products that consist of a series of analytic reports in the same broad subject area, it may be possible to present or embed the mandatory information common to all or most of the individual reports at the beginning of the product. Information specific to the individual reports would then be included in those individual reports.
Specifically, for products that present analytical results, the documentation of the analysis should cover the following:
- data source(s) used;
- (key features of the methodology and accuracy of the source data pertinent to the analysis;
- analytical objectives, concepts and variables;
- analytic methods used, and their assumptions and caveats;
- statistical significance of the results, and any relevant conflicting or corroborating results; and
- appropriate use of the results.
|