Quality assurance

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

Results

All (24)

All (24) (0 to 10 of 24 results)

1. Creation of a composite quality indicator for administrative data-based estimates using clustering Archived
Articles and reports: 11-522-X202100100015
Description: National statistical agencies such as Statistics Canada have a responsibility to convey the quality of statistical information to users. The methods traditionally used to do this are based on measures of sampling error. As a result, they are not adapted to the estimates produced using administrative data, for which the main sources of error are not due to sampling. A more suitable approach to reporting the quality of estimates presented in a multidimensional table is described in this paper. Quality indicators were derived for various post-acquisition processing steps, such as linkage, geocoding and imputation, by estimation domain. A clustering algorithm was then used to combine domains with similar quality levels for a given estimate. Ratings to inform users of the relative quality of estimates across domains were assigned to the groups created. This indicator, called the composite quality indicator (CQI), was developed and experimented with in the Canadian Housing Statistics Program (CHSP), which aims to produce official statistics on the residential housing sector in Canada using multiple administrative data sources.
Keywords: Unsupervised machine learning, quality assurance, administrative data, data integration, clustering.
Release date: 2021-10-22
2. Distributions of Household Economic Accounts, estimates of asset, liability and net worth distributions, 2010 to 2020, technical methodology and quality report
Articles and reports: 13-604-M2021001
Description:
This documentation outlines the methodology used to develop the Distributions of household economic accounts published in September 2021 for the reference years 2010 to 2020. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.
Release date: 2021-09-07
3. Distributions of Household Economic Accounts, estimates of asset, liability and net worth distributions, 2010 to 2019, technical methodology and quality report
Articles and reports: 13-604-M2020002
Description:
This documentation outlines the methodology used to develop the Distributions of household economic accounts published in June 2020 for the reference years 2010 to 2019. It describes the framework and the steps implemented to produce distributional information aligned with the National balance sheet accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.

Release date: 2020-06-26
4. The challenges of linking and using administrative data from different sources Archived
Articles and reports: 11-522-X201700014724
Description:
At the Institut national de santé publique du Québec, the Quebec Integrated Chronic Disease Surveillance System (QICDSS) has been used daily for approximately four years. The benefits of this system are numerous for measuring the extent of diseases more accurately, evaluating the use of health services properly and identifying certain groups at risk. However, in the past months, various problems have arisen that have required a great deal of careful thought. The problems have affected various areas of activity, such as data linkage, data quality, coordinating multiple users and meeting legal obligations. The purpose of this presentation is to describe the main challenges associated with using QICDSS data and to present some possible solutions. In particular, this presentation discusses the processing of five data sources that not only come from five different sources, but also are not mainly used for chronic disease surveillance. The varying quality of the data, both across files and within a given file, will also be discussed. Certain situations associated with the simultaneous use of the system by multiple users will also be examined. Examples will be given of analyses of large data sets that have caused problems. As well, a few challenges involving disclosure and the fulfillment of legal agreements will be briefly discussed.
Release date: 2016-03-24
5. Comparison of Physical Activity Adult Questionnaire results with accelerometer data Archived
Articles and reports: 82-003-X201500714205
Description:
Discrepancies between self-reported and objectively measured physical activity are well-known. For the purpose of validation, this study compares a new self-reported physical activity questionnaire with an existing one and with accelerometer data.
Release date: 2015-07-15
6. Validation of cognitive functioning categories in the Canadian Community Health Survey -Healthy Aging Archived
Articles and reports: 82-003-X201000411391
Geography: Canada
Description:
This analysis uses data from the Cognition Module of the 2009 Canadian Community Health Survey - Healthy Aging to validate a categorization of levels of cognitive functioning in the household population aged 45 or older.
Release date: 2010-12-15
7. Validation of self-rated mental health Archived
Articles and reports: 82-003-X201000311288
Geography: Canada
Description:
This article assesses the association between self-rated mental health and selected World Mental Health-Composite International Diagnostic Interview-measured disorders, self-reported diagnoses of mental disorders, and psychological distress in the Canadian population.
Release date: 2010-07-21
8. Accounting for uncertainty in the evaluation of data collection costs and data quality under partitioned designs for the U.S. Consumer Expenditure Surveys Archived
Articles and reports: 11-522-X200800010991
Description:
In the evaluation of prospective survey designs, statistical agencies generally must consider a large number of design factors that may have a substantial impact on both survey costs and data quality. Assessments of trade-offs between cost and quality are often complicated by limitations on the amount of information available regarding fixed and marginal costs related to: instrument redesign and field testing; the number of primary sample units and sample elements included in the sample; assignment of instrument sections and collection modes to specific sample elements; and (for longitudinal surveys) the number and periodicity of interviews. Similarly, designers often have limited information on the impact of these design factors on data quality.
This paper extends standard design-optimization approaches to account for uncertainty in the abovementioned components of cost and quality. Special attention is directed toward the level of precision required for cost and quality information to provide useful input into the design process; sensitivity of cost-quality trade-offs to changes in assumptions regarding functional forms; and implications for preliminary work focused on collection of cost and quality information. In addition, the paper considers distinctions between cost and quality components encountered in field testing and production work, respectively; incorporation of production-level cost and quality information into adaptive design work; as well as costs and operational risks arising from the collection of detailed cost and quality data during production work. The proposed methods are motivated by, and applied to, work with partitioned redesign of the interview and diary components of the U.S. Consumer Expenditure Survey.
Release date: 2009-12-03
9. Under-reporting of energy intake in the Canadian Community Health Survey Archived
Articles and reports: 82-003-X200800410703
Geography: Canada
Description:
Data from 16,190 respondents to the 2004 Canadian Community Health Survey - Nutrition were used to estimate under-reporting of food intake for the population aged 12 or older in the 10 provinces.
Release date: 2008-10-15
10. Common metadata constructs for statistical data Archived
Articles and reports: 11-522-X20050019436
Description:
Regardless of the specifics of any given metadata scheme, there are common metadata constructs used to describe statistical data. This paper will give an overview of the different approaches taken to achieve the common goal of providing consistent information.
Release date: 2007-03-02

Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (16)

Analysis (16) (0 to 10 of 16 results)

1. Creation of a composite quality indicator for administrative data-based estimates using clustering Archived
Articles and reports: 11-522-X202100100015
Description: National statistical agencies such as Statistics Canada have a responsibility to convey the quality of statistical information to users. The methods traditionally used to do this are based on measures of sampling error. As a result, they are not adapted to the estimates produced using administrative data, for which the main sources of error are not due to sampling. A more suitable approach to reporting the quality of estimates presented in a multidimensional table is described in this paper. Quality indicators were derived for various post-acquisition processing steps, such as linkage, geocoding and imputation, by estimation domain. A clustering algorithm was then used to combine domains with similar quality levels for a given estimate. Ratings to inform users of the relative quality of estimates across domains were assigned to the groups created. This indicator, called the composite quality indicator (CQI), was developed and experimented with in the Canadian Housing Statistics Program (CHSP), which aims to produce official statistics on the residential housing sector in Canada using multiple administrative data sources.
Keywords: Unsupervised machine learning, quality assurance, administrative data, data integration, clustering.
Release date: 2021-10-22
2. Distributions of Household Economic Accounts, estimates of asset, liability and net worth distributions, 2010 to 2020, technical methodology and quality report
Articles and reports: 13-604-M2021001
Description:
This documentation outlines the methodology used to develop the Distributions of household economic accounts published in September 2021 for the reference years 2010 to 2020. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.
Release date: 2021-09-07
3. Distributions of Household Economic Accounts, estimates of asset, liability and net worth distributions, 2010 to 2019, technical methodology and quality report
Articles and reports: 13-604-M2020002
Description:
This documentation outlines the methodology used to develop the Distributions of household economic accounts published in June 2020 for the reference years 2010 to 2019. It describes the framework and the steps implemented to produce distributional information aligned with the National balance sheet accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.

Release date: 2020-06-26
4. The challenges of linking and using administrative data from different sources Archived
Articles and reports: 11-522-X201700014724
Description:
At the Institut national de santé publique du Québec, the Quebec Integrated Chronic Disease Surveillance System (QICDSS) has been used daily for approximately four years. The benefits of this system are numerous for measuring the extent of diseases more accurately, evaluating the use of health services properly and identifying certain groups at risk. However, in the past months, various problems have arisen that have required a great deal of careful thought. The problems have affected various areas of activity, such as data linkage, data quality, coordinating multiple users and meeting legal obligations. The purpose of this presentation is to describe the main challenges associated with using QICDSS data and to present some possible solutions. In particular, this presentation discusses the processing of five data sources that not only come from five different sources, but also are not mainly used for chronic disease surveillance. The varying quality of the data, both across files and within a given file, will also be discussed. Certain situations associated with the simultaneous use of the system by multiple users will also be examined. Examples will be given of analyses of large data sets that have caused problems. As well, a few challenges involving disclosure and the fulfillment of legal agreements will be briefly discussed.
Release date: 2016-03-24
5. Comparison of Physical Activity Adult Questionnaire results with accelerometer data Archived
Articles and reports: 82-003-X201500714205
Description:
Discrepancies between self-reported and objectively measured physical activity are well-known. For the purpose of validation, this study compares a new self-reported physical activity questionnaire with an existing one and with accelerometer data.
Release date: 2015-07-15
6. Validation of cognitive functioning categories in the Canadian Community Health Survey -Healthy Aging Archived
Articles and reports: 82-003-X201000411391
Geography: Canada
Description:
This analysis uses data from the Cognition Module of the 2009 Canadian Community Health Survey - Healthy Aging to validate a categorization of levels of cognitive functioning in the household population aged 45 or older.
Release date: 2010-12-15
7. Validation of self-rated mental health Archived
Articles and reports: 82-003-X201000311288
Geography: Canada
Description:
This article assesses the association between self-rated mental health and selected World Mental Health-Composite International Diagnostic Interview-measured disorders, self-reported diagnoses of mental disorders, and psychological distress in the Canadian population.
Release date: 2010-07-21
8. Accounting for uncertainty in the evaluation of data collection costs and data quality under partitioned designs for the U.S. Consumer Expenditure Surveys Archived
Articles and reports: 11-522-X200800010991
Description:
In the evaluation of prospective survey designs, statistical agencies generally must consider a large number of design factors that may have a substantial impact on both survey costs and data quality. Assessments of trade-offs between cost and quality are often complicated by limitations on the amount of information available regarding fixed and marginal costs related to: instrument redesign and field testing; the number of primary sample units and sample elements included in the sample; assignment of instrument sections and collection modes to specific sample elements; and (for longitudinal surveys) the number and periodicity of interviews. Similarly, designers often have limited information on the impact of these design factors on data quality.
This paper extends standard design-optimization approaches to account for uncertainty in the abovementioned components of cost and quality. Special attention is directed toward the level of precision required for cost and quality information to provide useful input into the design process; sensitivity of cost-quality trade-offs to changes in assumptions regarding functional forms; and implications for preliminary work focused on collection of cost and quality information. In addition, the paper considers distinctions between cost and quality components encountered in field testing and production work, respectively; incorporation of production-level cost and quality information into adaptive design work; as well as costs and operational risks arising from the collection of detailed cost and quality data during production work. The proposed methods are motivated by, and applied to, work with partitioned redesign of the interview and diary components of the U.S. Consumer Expenditure Survey.
Release date: 2009-12-03
9. Under-reporting of energy intake in the Canadian Community Health Survey Archived
Articles and reports: 82-003-X200800410703
Geography: Canada
Description:
Data from 16,190 respondents to the 2004 Canadian Community Health Survey - Nutrition were used to estimate under-reporting of food intake for the population aged 12 or older in the 10 provinces.
Release date: 2008-10-15
10. Common metadata constructs for statistical data Archived
Articles and reports: 11-522-X20050019436
Description:
Regardless of the specifics of any given metadata scheme, there are common metadata constructs used to describe statistical data. This paper will give an overview of the different approaches taken to achieve the common goal of providing consistent information.
Release date: 2007-03-02

Reference (8)

Reference (8) ((8 results))

1. Survey of Household Spending 2002: Data Quality Indicators Archived
Surveys and statistical programs – Documentation: 62F0026M2004001
Description:
This report describes the quality indicators produced for the 2002 Survey of Household Spending. These quality indicators, such as coefficients of variation, nonresponse rates, slippage rates and imputation rates, help users interpret the survey data.
Release date: 2004-09-15
2. 2000 Survey of Household Spending - Data Quality Indicators Archived
Surveys and statistical programs – Documentation: 62F0026M2002001
Description:
This report describes the quality indicators produced for the 2000 Survey of Household Spending. It covers the usual quality indicators that help users interpret the data, such as coefficients of variation, non-response rates, slippage rates and imputation rates.
Release date: 2002-06-28
3. 1999 Survey of Household Spending Data Quality Indicators Archived
Surveys and statistical programs – Documentation: 62F0026M2001002
Description:
This report describes the quality indicators produced for the 1999 Survey of Household Spending. It covers the usual quality indicators that help users interpret data, such as coefficients of variation, nonresponse rates, imputation rates and the impact of imputed data on the estimates. Added to these are various less often used indicators such as slippage rates and measures of the representativity of the sample for particular characteristics that are useful for evaluating the survey methodology.
Release date: 2001-10-15
4. A comparison of two record linkage procedures Archived
Surveys and statistical programs – Documentation: 11-522-X19990015664
Description:
Much work on probabilistic methods of linkage can be found in the statistical literature. However, although many groups undoubtedly still use deterministic procedures, not much literature is available on these strategies. Furthermore there appears to exist no documentation on the comparison of results for the two strategies. Such a comparison is pertinent in the situation where we have only non-unique identifiers like names, sex, race etc. as common identifiers on which the databases are to be linked. In this work we compare a stepwise deterministic linkage strategy with the probabilistic strategy, as implemented in AUTOMATCH, for such a situation. The comparison was carried out on a linkage between medical records from the Regional Perinatal Intensive Care Centers database and education records from the Florida Department of Education. Social security numbers, available in both databases, were used to decide the true status of the record pair after matching. Match rates and error rates for the two strategies are compared and a discussion of their similarities and differences, strengths and weaknesses is presented.
Release date: 2000-03-02
5. Multilevel models for repeated binary outcomes: Attitudes and vote over the electoral cycle Archived
Surveys and statistical programs – Documentation: 11-522-X19980015016
Description:
Models for fitting longitudinal binary responses are explored using a panel study of voting intentions. A standard repeated measures multilevel logistic model is shown inadequate due to the presence of a substantial proportion of respondents who maintain a constant response over time. A multivariate binary response model is shown a better fit to the data.
Release date: 1999-10-22
6. Multilevel modeling of complex data structures with multiple unit membership and missing unit identifications Archived
Surveys and statistical programs – Documentation: 11-522-X19980015018
Description:
This paper presents a method for handling longitudinal data in which individuals belong to more than one unit at a higher level, and also where there is missing information on the identification of the units to which they belong. In education, for example, a student might be classified as belonging sequentially to a particular combination of primary and secondary school, but for some students, the identity of either the primary or secondary school may be unknown. Likewise, in a longitudinal study, students may change school or class from one period to the next, so 'belonging' to more than one higher level unit. The procedures used to model these stuctures are extensions of a random effects cross-classified multilevel model.
Release date: 1999-10-22
7. A latent class model for the transition from school to working life in presence of missing data Archived
Surveys and statistical programs – Documentation: 11-522-X19980015024
Description:
A longitudinal study on a cohort of pupils in the secondary school has been conducted in an Italian region since 1986 in order to study the transition from school to working life. The information have been collected at every sweep by a mail questionnaire and, at the final sweep, by a face-to-face interview, where retrospective questions referring back to the whole observation period have been asked. The gross flows between different discrete states - still in the school system, in the labour force without a job, in the labour force with a job - may then be estimated both from prospective and retrospective data, and the recall effect may be evaluated. Moreover, the conditions observed by the two different techniques may be regarded as two indicators of the 'true' unobservable condition, thus leading to the specification and estimation of a latent class model. In this framework, a Markov chain hypothesis may be introduced and evaluated in order to estimate the transition probabilities between the states, once they are corrected or the classification errors. Since the information collected by mail show a given amount of missing data in terms of unit nonresponse, the 'missing' category is also introduced in the model specification.
Release date: 1999-10-22
8. Data Quality of Income Data Using Computer-assisted Interviewing: SLID Experience Archived
Surveys and statistical programs – Documentation: 75F0002M1994015
Description:
This paper describes how the computer-assisted interviewing (CAI) income application was programmed for a Survey of Labour and Income Dynamics (SLID) test conducted in 1993.
Release date: 1995-12-30

Report a problem or mistake on this page

Date modified:: 2024-04-19