Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

155 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

155 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

155 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

155 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (266)

All (266) (25 of 266 results)

  • Articles and reports: 12-001-X201700254897
    Description:

    This note by Chris Skinner presents a discussion of the paper “Sample survey theory and methods: Past, present, and future directions” where J.N.K. Rao and Wayne A. Fuller share their views regarding the developments in sample survey theory and methods covering the past 100 years.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700254871
    Description:

    In this paper the question is addressed how alternative data sources, such as administrative and social media data, can be used in the production of official statistics. Since most surveys at national statistical institutes are conducted repeatedly over time, a multivariate structural time series modelling approach is proposed to model the series observed by a repeated surveys with related series obtained from such alternative data sources. Generally, this improves the precision of the direct survey estimates by using sample information observed in preceding periods and information from related auxiliary series. This model also makes it possible to utilize the higher frequency of the social media to produce more precise estimates for the sample survey in real time at the moment that statistics for the social media become available but the sample data are not yet available. The concept of cointegration is applied to address the question to which extent the alternative series represent the same phenomena as the series observed with the repeated survey. The methodology is applied to the Dutch Consumer Confidence Survey and a sentiment index derived from social media.

    Release date: 2017-12-21

  • Articles and reports: 13-605-X201700114840
    Description:

    Statistics Canada is presently preparing the statistical system to be able to gauge the impact of the transition from illegal to legal non-medical cannabis use and to shed light on the social and economic activities related to the use of cannabis thereafter. While the system of social statistics captures some information on the use of cannabis, updates will be required to more accurately measure health effects and the impact on the judicial system. Current statistical infrastructure used to more comprehensively measure the use and impacts of substances such as tobacco and alcohol could be adapted to do the same for cannabis. However, available economic statistics are largely silent on the role illegal drugs play in the economy. Both social and economic statistics will need to be updated to reflect the legalization of cannabis and the challenge is especially great for economic statistics This paper provides a summary of the work that is now under way toward these ends.

    Release date: 2017-09-28

  • Articles and reports: 82-003-X201601214687
    Description:

    This study describes record linkage of the Canadian Community Health Survey and the Canadian Mortality Database. The article explains the record linkage process and presents results about associations between health behaviours and mortality among a representative sample of Canadians.

    Release date: 2016-12-21

  • Articles and reports: 12-001-X201600214663
    Description:

    We present theoretical evidence that efforts during data collection to balance the survey response with respect to selected auxiliary variables will improve the chances for low nonresponse bias in the estimates that are ultimately produced by calibrated weighting. One of our results shows that the variance of the bias – measured here as the deviation of the calibration estimator from the (unrealized) full-sample unbiased estimator – decreases linearly as a function of the response imbalance that we assume measured and controlled continuously over the data collection period. An attractive prospect is thus a lower risk of bias if one can manage the data collection to get low imbalance. The theoretical results are validated in a simulation study with real data from an Estonian household survey.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114545
    Description:

    The estimation of quantiles is an important topic not only in the regression framework, but also in sampling theory. A natural alternative or addition to quantiles are expectiles. Expectiles as a generalization of the mean have become popular during the last years as they not only give a more detailed picture of the data than the ordinary mean, but also can serve as a basis to calculate quantiles by using their close relationship. We show, how to estimate expectiles under sampling with unequal probabilities and how expectiles can be used to estimate the distribution function. The resulting fitted distribution function estimator can be inverted leading to quantile estimates. We run a simulation study to investigate and compare the efficiency of the expectile based estimator.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114538
    Description:

    The aim of automatic editing is to use a computer to detect and amend erroneous values in a data set, without human intervention. Most automatic editing methods that are currently used in official statistics are based on the seminal work of Fellegi and Holt (1976). Applications of this methodology in practice have shown systematic differences between data that are edited manually and automatically, because human editors may perform complex edit operations. In this paper, a generalization of the Fellegi-Holt paradigm is proposed that can incorporate a large class of edit operations in a natural way. In addition, an algorithm is outlined that solves the resulting generalized error localization problem. It is hoped that this generalization may be used to increase the suitability of automatic editing in practice, and hence to improve the efficiency of data editing processes. Some first results on synthetic data are promising in this respect.

    Release date: 2016-06-22

  • Technical products: 11-522-X201700014745
    Description:

    In the design of surveys a number of parameters like contact propensities, participation propensities and costs per sample unit play a decisive role. In on-going surveys, these survey design parameters are usually estimated from previous experience and updated gradually with new experience. In new surveys, these parameters are estimated from expert opinion and experience with similar surveys. Although survey institutes have a fair expertise and experience, the postulation, estimation and updating of survey design parameters is rarely done in a systematic way. This paper presents a Bayesian framework to include and update prior knowledge and expert opinion about the parameters. This framework is set in the context of adaptive survey designs in which different population units may receive different treatment given quality and cost objectives. For this type of survey, the accuracy of design parameters becomes even more crucial to effective design decisions. The framework allows for a Bayesian analysis of the performance of a survey during data collection and in between waves of a survey. We demonstrate the Bayesian analysis using a realistic simulation study.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014708
    Description:

    Statistics Canada’s Household Survey Frames (HSF) Programme provides various universe files that can be used alone or in combination to improve survey design, sampling, collection, and processing in the traditional “need to contact a household model.” Even as surveys are migrating onto these core suite of products, the HSF is starting to plan the changes to infrastructure, organisation, and linkages with other data assets in Statistics Canada that will help enable a shift to increased use of a wide variety of administrative data as input to the social statistics programme. The presentation will provide an overview of the HSF Programme, foundational concepts that will need to be implemented to expand linkage potential, and will identify strategic research being under-taken toward 2021.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014743
    Description:

    Probabilistic linkage is susceptible to linkage errors such as false positives and false negatives. In many cases, these errors may be reliably measured through clerical-reviews, i.e. the visual inspection of a sample of record pairs to determine if they are matched. A framework is described to effectively carry-out such clerical-reviews based on a probabilistic sample of pairs, repeated independent reviews of the same pairs and latent class analysis to account for clerical errors.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014748
    Description:

    This paper describes the creation of a database developed in Switzerland to analyze migration and the structural integration of the foreign national population. The database is created from various registers (register of residents, social insurance, unemployment) and surveys, and covers 15 years (1998 to 2013). Information on migration status and socioeconomic characteristics is also available for nearly 4 million foreign nationals who lived in Switzerland between 1998 and 2013. This database is the result of a collaboration between the Federal Statistics Office and researchers from the National Center of Competence in Research (NCCR)–On the Move.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014729
    Description:

    The use of administrative datasets as a data source in official statistics has become much more common as there is a drive for more outputs to be produced more efficiently. Many outputs rely on linkage between two or more datasets, and this is often undertaken in a number of phases with different methods and rules. In these situations we would like to be able to assess the quality of the linkage, and this involves some re-assessment of both links and non-links. In this paper we discuss sampling approaches to obtain estimates of false negatives and false positives with reasonable control of both accuracy of estimates and cost. Approaches to stratification of links (non-links) to sample are evaluated using information from the 2011 England and Wales population census.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014755
    Description:

    The National Children’s Study Vanguard Study was a pilot epidemiological cohort study of children and their parents. Measures were to be taken from pre-pregnancy until adulthood. The use of extant data was planned to supplement direct data collection from the respondents. Our paper outlines a strategy for cataloging and evaluating extant data sources for use with large scale longitudinal. Through our review we selected five evaluation factors to guide a researcher through available data sources including 1) relevance, 2) timeliness, 3) spatiality, 4) accessibility, and 5) accuracy.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014754
    Description:

    Background: There is increasing interest in measuring and benchmarking health system performance. We compared Canada’s health system with other countries in the Organisation for Economic Co-operation and Development (OECD) on both the national and provincial levels, across 50 indicators of health system performance. This analysis can help provinces identify potential areas for improvement, considering an optimal comparator for international comparisons. Methods: OECD Health Data from 2013 was used to compare Canada’s results internationally. We also calculated provincial results for OECD’s indicators on health system performance, using OECD methodology. We normalized the indicator results to present multiple indicators on the same scale and compared them to the OECD average, 25th and 75th percentiles. Results: Presenting normalized values allow Canada’s results to be compared across multiple OECD indicators on the same scale. No country or province consistently has higher results than the others. For most indicators, Canadian results are similar to other countries, but there remain areas where Canada performs particularly well (i.e. smoking rates) or poorly (i.e. patient safety). This data was presented in an interactive eTool. Conclusion: Comparing Canada’s provinces internationally can highlight areas where improvement is needed, and help to identify potential strategies for improvement.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014753
    Description:

    The fact that the world is in continuous change and that new technologies are becoming widely available creates new opportunities and challenges for National Statistical Institutes (NSIs) worldwide. What if NSIs could access vast amounts of sophisticated data for free (or for a low cost) from enterprises? Could this facilitate the possibility for NSIs to disseminate more accurate indicators for the policy-makers and users, significantly reduce the response burden for companies, reduce costs for the NSIs and in the long run improve the living standards of the people in a country? The time has now come for NSIs to find the best practice to align legislation, regulations and practices in relation to scanner data and big data. Without common ground, the prospect of reaching consensus is unlikely. The discussions need to start with how to define quality. If NSIs define and approach quality differently, this will lead to a highly undesirable situation, as NSIs will move further away from harmonisation. Sweden was one of the leading countries that put these issues on the agenda for European cooperation; in 2012 Sweden implemented scanner data in the national Consumer Price Index after it was proven through research studies and statistical analyses that scanner data was significantly better than the manually collected data.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014715
    Description:

    In preparation for 2021 UK Census the ONS has committed to an extensive research programme exploring how linked administrative data can be used to support conventional statistical processes. Item-level edit and imputation (E&I) will play an important role in adjusting the 2021 Census database. However, uncertainty associated with the accuracy and quality of available administrative data renders the efficacy of an integrated census-administrative data approach to E&I unclear. Current constraints that dictate an anonymised ‘hash-key’ approach to record linkage to ensure confidentiality add to that uncertainty. Here, we provide preliminary results from a simulation study comparing the predictive and distributional accuracy of the conventional E&I strategy implemented in CANCEIS for the 2011 UK Census to that of an integrated approach using synthetic administrative data with systematically increasing error as auxiliary information. In this initial phase of research we focus on imputing single year of age. The aim of the study is to gain insight into whether auxiliary information from admin data can improve imputation estimates and where the different strategies fall on a continuum of accuracy.

    Release date: 2016-03-24

  • Articles and reports: 82-003-X201600114306
    Description:

    This article is an overview of the creation, content, and quality of the 2006 Canadian Birth-Census Cohort Database.

    Release date: 2016-01-20

  • Articles and reports: 12-001-X201500214229
    Description:

    Self-weighting estimation through equal probability selection methods (epsem) is desirable for variance efficiency. Traditionally, the epsem property for (one phase) two stage designs for estimating population-level parameters is realized by using each primary sampling unit (PSU) population count as the measure of size for PSU selection along with equal sample size allocation per PSU under simple random sampling (SRS) of elementary units. However, when self-weighting estimates are desired for parameters corresponding to multiple domains under a pre-specified sample allocation to domains, Folsom, Potter and Williams (1987) showed that a composite measure of size can be used to select PSUs to obtain epsem designs when besides domain-level PSU counts (i.e., distribution of domain population over PSUs), frame-level domain identifiers for elementary units are also assumed to be available. The term depsem-A will be used to denote such (one phase) two stage designs to obtain domain-level epsem estimation. Folsom et al. also considered two phase two stage designs when domain-level PSU counts are unknown, but whole PSU counts are known. For these designs (to be termed depsem-B) with PSUs selected proportional to the usual size measure (i.e., the total PSU count) at the first stage, all elementary units within each selected PSU are first screened for classification into domains in the first phase of data collection before SRS selection at the second stage. Domain-stratified samples are then selected within PSUs with suitably chosen domain sampling rates such that the desired domain sample sizes are achieved and the resulting design is self-weighting. In this paper, we first present a simple justification of composite measures of size for the depsem-A design and of the domain sampling rates for the depsem-B design. Then, for depsem-A and -B designs, we propose generalizations, first to cases where frame-level domain identifiers for elementary units are not available and domain-level PSU counts are only approximately known from alternative sources, and second to cases where PSU size measures are pre-specified based on other practical and desirable considerations of over- and under-sampling of certain domains. We also present a further generalization in the presence of subsampling of elementary units and nonresponse within selected PSUs at the first phase before selecting phase two elementary units from domains within each selected PSU. This final generalization of depsem-B is illustrated for an area sample of housing units.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214249
    Description:

    The problem of optimal allocation of samples in surveys using a stratified sampling plan was first discussed by Neyman in 1934. Since then, many researchers have studied the problem of the sample allocation in multivariate surveys and several methods have been proposed. Basically, these methods are divided into two classes: The first class comprises methods that seek an allocation which minimizes survey costs while keeping the coefficients of variation of estimators of totals below specified thresholds for all survey variables of interest. The second aims to minimize a weighted average of the relative variances of the estimators of totals given a maximum overall sample size or a maximum cost. This paper proposes a new optimization approach for the sample allocation problem in multivariate surveys. This approach is based on a binary integer programming formulation. Several numerical experiments showed that the proposed approach provides efficient solutions to this problem, which improve upon a ‘textbook algorithm’ and can be more efficient than the algorithm by Bethel (1985, 1989).

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214237
    Description:

    Careful design of a dual-frame random digit dial (RDD) telephone survey requires selecting from among many options that have varying impacts on cost, precision, and coverage in order to obtain the best possible implementation of the study goals. One such consideration is whether to screen cell-phone households in order to interview cell-phone only (CPO) households and exclude dual-user household, or to take all interviews obtained via the cell-phone sample. We present a framework in which to consider the tradeoffs between these two options and a method to select the optimal design. We derive and discuss the optimum allocation of sample size between the two sampling frames and explore the choice of optimum p, the mixing parameter for the dual-user domain. We illustrate our methods using the National Immunization Survey, sponsored by the Centers for Disease Control and Prevention.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214250
    Description:

    Assessing the impact of mode effects on survey estimates has become a crucial research objective due to the increasing use of mixed-mode designs. Despite the advantages of a mixed-mode design, such as lower costs and increased coverage, there is sufficient evidence that mode effects may be large relative to the precision of a survey. They may lead to incomparable statistics in time or over population subgroups and they may increase bias. Adaptive survey designs offer a flexible mathematical framework to obtain an optimal balance between survey quality and costs. In this paper, we employ adaptive designs in order to minimize mode effects. We illustrate our optimization model by means of a case-study on the Dutch Labor Force Survey. We focus on item-dependent mode effects and we evaluate the impact on survey quality by comparison to a gold standard.

    Release date: 2015-12-17

  • Articles and reports: 82-003-X201501214295
    Description:

    Using the Wisconsin Cancer Intervention and Surveillance Monitoring Network breast cancer simulation model adapted to the Canadian context, costs and quality-adjusted life years were evaluated for 11 mammography screening strategies that varied by start/stop age and screening frequency for the general population. Incremental cost-effectiveness ratios are presented, and sensitivity analyses are used to assess the robustness of model conclusions.

    Release date: 2015-12-16

  • Articles and reports: 82-003-X201501114243
    Description:

    A surveillance tool was developed to assess dietary intake collected by surveys in relation to Eating Well with Canada’s Food Guide (CFG). The tool classifies foods in the Canadian Nutrient File (CNF) according to how closely they reflect CFG. This article describes the validation exercise conducted to ensure that CNF foods determined to be “in line with CFG” were appropriately classified.

    Release date: 2015-11-18

  • Articles and reports: 82-003-X201501014228
    Description:

    This study presents the results of a hierarchical exact matching approach to link the 2006 Census of Population with hospital data for all provinces and territories (excluding Quebec) to the 2006/2007-to-2008/2009 Discharge Abstract Database. The purpose is to determine if the Census–DAD linkage performed similarly in different jurisdictions, and if linkage and coverage rates declined as time passed since the census.

    Release date: 2015-10-21

  • Articles and reports: 12-001-X201500114161
    Description:

    A popular area level model used for the estimation of small area means is the Fay-Herriot model. This model involves unobservable random effects for the areas apart from the (fixed) linear regression based on area level covariates. Empirical best linear unbiased predictors of small area means are obtained by estimating the area random effects, and they can be expressed as a weighted average of area-specific direct estimators and regression-synthetic estimators. In some cases the observed data do not support the inclusion of the area random effects in the model. Excluding these area effects leads to the regression-synthetic estimator, that is, a zero weight is attached to the direct estimator. A preliminary test estimator of a small area mean obtained after testing for the presence of area random effects is studied. On the other hand, empirical best linear unbiased predictors of small area means that always give non-zero weights to the direct estimators in all areas together with alternative estimators based on the preliminary test are also studied. The preliminary testing procedure is also used to define new mean squared error estimators of the point estimators of small area means. Results of a limited simulation study show that, for small number of areas, the preliminary testing procedure leads to mean squared error estimators with considerably smaller average absolute relative bias than the usual mean squared error estimators, especially when the variance of the area effects is small relative to the sampling variances.

    Release date: 2015-06-29

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (172)

Analysis (172) (25 of 172 results)

  • Articles and reports: 12-001-X201700254897
    Description:

    This note by Chris Skinner presents a discussion of the paper “Sample survey theory and methods: Past, present, and future directions” where J.N.K. Rao and Wayne A. Fuller share their views regarding the developments in sample survey theory and methods covering the past 100 years.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700254871
    Description:

    In this paper the question is addressed how alternative data sources, such as administrative and social media data, can be used in the production of official statistics. Since most surveys at national statistical institutes are conducted repeatedly over time, a multivariate structural time series modelling approach is proposed to model the series observed by a repeated surveys with related series obtained from such alternative data sources. Generally, this improves the precision of the direct survey estimates by using sample information observed in preceding periods and information from related auxiliary series. This model also makes it possible to utilize the higher frequency of the social media to produce more precise estimates for the sample survey in real time at the moment that statistics for the social media become available but the sample data are not yet available. The concept of cointegration is applied to address the question to which extent the alternative series represent the same phenomena as the series observed with the repeated survey. The methodology is applied to the Dutch Consumer Confidence Survey and a sentiment index derived from social media.

    Release date: 2017-12-21

  • Articles and reports: 13-605-X201700114840
    Description:

    Statistics Canada is presently preparing the statistical system to be able to gauge the impact of the transition from illegal to legal non-medical cannabis use and to shed light on the social and economic activities related to the use of cannabis thereafter. While the system of social statistics captures some information on the use of cannabis, updates will be required to more accurately measure health effects and the impact on the judicial system. Current statistical infrastructure used to more comprehensively measure the use and impacts of substances such as tobacco and alcohol could be adapted to do the same for cannabis. However, available economic statistics are largely silent on the role illegal drugs play in the economy. Both social and economic statistics will need to be updated to reflect the legalization of cannabis and the challenge is especially great for economic statistics This paper provides a summary of the work that is now under way toward these ends.

    Release date: 2017-09-28

  • Articles and reports: 82-003-X201601214687
    Description:

    This study describes record linkage of the Canadian Community Health Survey and the Canadian Mortality Database. The article explains the record linkage process and presents results about associations between health behaviours and mortality among a representative sample of Canadians.

    Release date: 2016-12-21

  • Articles and reports: 12-001-X201600214663
    Description:

    We present theoretical evidence that efforts during data collection to balance the survey response with respect to selected auxiliary variables will improve the chances for low nonresponse bias in the estimates that are ultimately produced by calibrated weighting. One of our results shows that the variance of the bias – measured here as the deviation of the calibration estimator from the (unrealized) full-sample unbiased estimator – decreases linearly as a function of the response imbalance that we assume measured and controlled continuously over the data collection period. An attractive prospect is thus a lower risk of bias if one can manage the data collection to get low imbalance. The theoretical results are validated in a simulation study with real data from an Estonian household survey.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114545
    Description:

    The estimation of quantiles is an important topic not only in the regression framework, but also in sampling theory. A natural alternative or addition to quantiles are expectiles. Expectiles as a generalization of the mean have become popular during the last years as they not only give a more detailed picture of the data than the ordinary mean, but also can serve as a basis to calculate quantiles by using their close relationship. We show, how to estimate expectiles under sampling with unequal probabilities and how expectiles can be used to estimate the distribution function. The resulting fitted distribution function estimator can be inverted leading to quantile estimates. We run a simulation study to investigate and compare the efficiency of the expectile based estimator.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114538
    Description:

    The aim of automatic editing is to use a computer to detect and amend erroneous values in a data set, without human intervention. Most automatic editing methods that are currently used in official statistics are based on the seminal work of Fellegi and Holt (1976). Applications of this methodology in practice have shown systematic differences between data that are edited manually and automatically, because human editors may perform complex edit operations. In this paper, a generalization of the Fellegi-Holt paradigm is proposed that can incorporate a large class of edit operations in a natural way. In addition, an algorithm is outlined that solves the resulting generalized error localization problem. It is hoped that this generalization may be used to increase the suitability of automatic editing in practice, and hence to improve the efficiency of data editing processes. Some first results on synthetic data are promising in this respect.

    Release date: 2016-06-22

  • Articles and reports: 82-003-X201600114306
    Description:

    This article is an overview of the creation, content, and quality of the 2006 Canadian Birth-Census Cohort Database.

    Release date: 2016-01-20

  • Articles and reports: 12-001-X201500214229
    Description:

    Self-weighting estimation through equal probability selection methods (epsem) is desirable for variance efficiency. Traditionally, the epsem property for (one phase) two stage designs for estimating population-level parameters is realized by using each primary sampling unit (PSU) population count as the measure of size for PSU selection along with equal sample size allocation per PSU under simple random sampling (SRS) of elementary units. However, when self-weighting estimates are desired for parameters corresponding to multiple domains under a pre-specified sample allocation to domains, Folsom, Potter and Williams (1987) showed that a composite measure of size can be used to select PSUs to obtain epsem designs when besides domain-level PSU counts (i.e., distribution of domain population over PSUs), frame-level domain identifiers for elementary units are also assumed to be available. The term depsem-A will be used to denote such (one phase) two stage designs to obtain domain-level epsem estimation. Folsom et al. also considered two phase two stage designs when domain-level PSU counts are unknown, but whole PSU counts are known. For these designs (to be termed depsem-B) with PSUs selected proportional to the usual size measure (i.e., the total PSU count) at the first stage, all elementary units within each selected PSU are first screened for classification into domains in the first phase of data collection before SRS selection at the second stage. Domain-stratified samples are then selected within PSUs with suitably chosen domain sampling rates such that the desired domain sample sizes are achieved and the resulting design is self-weighting. In this paper, we first present a simple justification of composite measures of size for the depsem-A design and of the domain sampling rates for the depsem-B design. Then, for depsem-A and -B designs, we propose generalizations, first to cases where frame-level domain identifiers for elementary units are not available and domain-level PSU counts are only approximately known from alternative sources, and second to cases where PSU size measures are pre-specified based on other practical and desirable considerations of over- and under-sampling of certain domains. We also present a further generalization in the presence of subsampling of elementary units and nonresponse within selected PSUs at the first phase before selecting phase two elementary units from domains within each selected PSU. This final generalization of depsem-B is illustrated for an area sample of housing units.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214249
    Description:

    The problem of optimal allocation of samples in surveys using a stratified sampling plan was first discussed by Neyman in 1934. Since then, many researchers have studied the problem of the sample allocation in multivariate surveys and several methods have been proposed. Basically, these methods are divided into two classes: The first class comprises methods that seek an allocation which minimizes survey costs while keeping the coefficients of variation of estimators of totals below specified thresholds for all survey variables of interest. The second aims to minimize a weighted average of the relative variances of the estimators of totals given a maximum overall sample size or a maximum cost. This paper proposes a new optimization approach for the sample allocation problem in multivariate surveys. This approach is based on a binary integer programming formulation. Several numerical experiments showed that the proposed approach provides efficient solutions to this problem, which improve upon a ‘textbook algorithm’ and can be more efficient than the algorithm by Bethel (1985, 1989).

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214237
    Description:

    Careful design of a dual-frame random digit dial (RDD) telephone survey requires selecting from among many options that have varying impacts on cost, precision, and coverage in order to obtain the best possible implementation of the study goals. One such consideration is whether to screen cell-phone households in order to interview cell-phone only (CPO) households and exclude dual-user household, or to take all interviews obtained via the cell-phone sample. We present a framework in which to consider the tradeoffs between these two options and a method to select the optimal design. We derive and discuss the optimum allocation of sample size between the two sampling frames and explore the choice of optimum p, the mixing parameter for the dual-user domain. We illustrate our methods using the National Immunization Survey, sponsored by the Centers for Disease Control and Prevention.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214250
    Description:

    Assessing the impact of mode effects on survey estimates has become a crucial research objective due to the increasing use of mixed-mode designs. Despite the advantages of a mixed-mode design, such as lower costs and increased coverage, there is sufficient evidence that mode effects may be large relative to the precision of a survey. They may lead to incomparable statistics in time or over population subgroups and they may increase bias. Adaptive survey designs offer a flexible mathematical framework to obtain an optimal balance between survey quality and costs. In this paper, we employ adaptive designs in order to minimize mode effects. We illustrate our optimization model by means of a case-study on the Dutch Labor Force Survey. We focus on item-dependent mode effects and we evaluate the impact on survey quality by comparison to a gold standard.

    Release date: 2015-12-17

  • Articles and reports: 82-003-X201501214295
    Description:

    Using the Wisconsin Cancer Intervention and Surveillance Monitoring Network breast cancer simulation model adapted to the Canadian context, costs and quality-adjusted life years were evaluated for 11 mammography screening strategies that varied by start/stop age and screening frequency for the general population. Incremental cost-effectiveness ratios are presented, and sensitivity analyses are used to assess the robustness of model conclusions.

    Release date: 2015-12-16

  • Articles and reports: 82-003-X201501114243
    Description:

    A surveillance tool was developed to assess dietary intake collected by surveys in relation to Eating Well with Canada’s Food Guide (CFG). The tool classifies foods in the Canadian Nutrient File (CNF) according to how closely they reflect CFG. This article describes the validation exercise conducted to ensure that CNF foods determined to be “in line with CFG” were appropriately classified.

    Release date: 2015-11-18

  • Articles and reports: 82-003-X201501014228
    Description:

    This study presents the results of a hierarchical exact matching approach to link the 2006 Census of Population with hospital data for all provinces and territories (excluding Quebec) to the 2006/2007-to-2008/2009 Discharge Abstract Database. The purpose is to determine if the Census–DAD linkage performed similarly in different jurisdictions, and if linkage and coverage rates declined as time passed since the census.

    Release date: 2015-10-21

  • Articles and reports: 12-001-X201500114161
    Description:

    A popular area level model used for the estimation of small area means is the Fay-Herriot model. This model involves unobservable random effects for the areas apart from the (fixed) linear regression based on area level covariates. Empirical best linear unbiased predictors of small area means are obtained by estimating the area random effects, and they can be expressed as a weighted average of area-specific direct estimators and regression-synthetic estimators. In some cases the observed data do not support the inclusion of the area random effects in the model. Excluding these area effects leads to the regression-synthetic estimator, that is, a zero weight is attached to the direct estimator. A preliminary test estimator of a small area mean obtained after testing for the presence of area random effects is studied. On the other hand, empirical best linear unbiased predictors of small area means that always give non-zero weights to the direct estimators in all areas together with alternative estimators based on the preliminary test are also studied. The preliminary testing procedure is also used to define new mean squared error estimators of the point estimators of small area means. Results of a limited simulation study show that, for small number of areas, the preliminary testing procedure leads to mean squared error estimators with considerably smaller average absolute relative bias than the usual mean squared error estimators, especially when the variance of the area effects is small relative to the sampling variances.

    Release date: 2015-06-29

  • Articles and reports: 82-003-X201500614196
    Description:

    This study investigates the feasibility and validity of using personal health insurance numbers to deterministically link the CCR and the Discharge Abstract Database to obtain hospitalization information about people with primary cancers.

    Release date: 2015-06-17

  • Articles and reports: 12-001-X201400214119
    Description:

    When considering sample stratification by several variables, we often face the case where the expected number of sample units to be selected in each stratum is very small and the total number of units to be selected is smaller than the total number of strata. These stratified sample designs are specifically represented by the tabular arrays with real numbers, called controlled selection problems, and are beyond the reach of conventional methods of allocation. Many algorithms for solving these problems have been studied over about 60 years beginning with Goodman and Kish (1950). Those developed more recently are especially computer intensive and always find the solutions. However, there still remains the unanswered question: In what sense are the solutions to a controlled selection problem obtained from those algorithms optimal? We introduce the general concept of optimal solutions, and propose a new controlled selection algorithm based on typical distance functions to achieve solutions. This algorithm can be easily performed by a new SAS-based software. This study focuses on two-way stratification designs. The controlled selection solutions from the new algorithm are compared with those from existing algorithms using several examples. The new algorithm successfully obtains robust solutions to two-way controlled selection problems that meet the optimality criteria.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214110
    Description:

    In developing the sample design for a survey we attempt to produce a good design for the funds available. Information on costs can be used to develop sample designs that minimise the sampling variance of an estimator of total for fixed cost. Improvements in survey management systems mean that it is now sometimes possible to estimate the cost of including each unit in the sample. This paper develops relatively simple approaches to determine whether the potential gains arising from using this unit level cost information are likely to be of practical use. It is shown that the key factor is the coefficient of variation of the costs relative to the coefficient of variation of the relative error on the estimated cost coefficients.

    Release date: 2014-12-19

  • Articles and reports: 82-003-X201401014098
    Description:

    This study compares registry and non-registry approaches to linking 2006 Census of Population data for Manitoba and Ontario to Hospital data from the Discharge Abstract Database.

    Release date: 2014-10-15

  • Articles and reports: 12-001-X201400114004
    Description:

    In 2009, two major surveys in the Governments Division of the U.S. Census Bureau were redesigned to reduce sample size, save resources, and improve the precision of the estimates (Cheng, Corcoran, Barth and Hogue 2009). The new design divides each of the traditional state by government-type strata with sufficiently many units into two sub-strata according to each governmental unit’s total payroll, in order to sample less from the sub-stratum with small size units. The model-assisted approach is adopted in estimating population totals. Regression estimators using auxiliary variables are obtained either within each created sub-stratum or within the original stratum by collapsing two sub-strata. A decision-based method was proposed in Cheng, Slud and Hogue (2010), applying a hypothesis test to decide which regression estimator is used within each original stratum. Consistency and asymptotic normality of these model-assisted estimators are established here, under a design-based or model-assisted asymptotic framework. Our asymptotic results also suggest two types of consistent variance estimators, one obtained by substituting unknown quantities in the asymptotic variances and the other by applying the bootstrap. The performance of all the estimators of totals and of their variance estimators are examined in some empirical studies. The U.S. Annual Survey of Public Employment and Payroll (ASPEP) is used to motivate and illustrate our study.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300111824
    Description:

    In most surveys all sample units receive the same treatment and the same design features apply to all selected people and households. In this paper, it is explained how survey designs may be tailored to optimize quality given constraints on costs. Such designs are called adaptive survey designs. The basic ingredients of such designs are introduced, discussed and illustrated with various examples.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201300111823
    Description:

    Although weights are widely used in survey sampling their ultimate justification from the design perspective is often problematical. Here we will argue for a stepwise Bayes justification for weights that does not depend explicitly on the sampling design. This approach will make use of the standard kind of information present in auxiliary variables however it will not assume a model relating the auxiliary variables to the characteristic of interest. The resulting weight for a unit in the sample can be given the usual interpretation as the number of units in the population which it represents.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201300111831
    Description:

    We consider conservative variance estimation for the Horvitz-Thompson estimator of a population total in sampling designs with zero pairwise inclusion probabilities, known as "non-measurable" designs. We decompose the standard Horvitz-Thompson variance estimator under such designs and characterize the bias precisely. We develop a bias correction that is guaranteed to be weakly conservative (nonnegatively biased) regardless of the nature of the non-measurability. The analysis sheds light on conditions under which the standard Horvitz-Thompson variance estimator performs well despite non-measurability and where the conservative bias correction may outperform commonly-used approximations.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201300111825
    Description:

    A considerable limitation of current methods for automatic data editing is that they treat all edits as hard constraints. That is to say, an edit failure is always attributed to an error in the data. In manual editing, however, subject-matter specialists also make extensive use of soft edits, i.e., constraints that identify (combinations of) values that are suspicious but not necessarily incorrect. The inability of automatic editing methods to handle soft edits partly explains why in practice many differences are found between manually edited and automatically edited data. The object of this article is to present a new formulation of the error localisation problem which can distinguish between hard and soft edits. Moreover, it is shown how this problem may be solved by an extension of the error localisation algorithm of De Waal and Quere (2003).

    Release date: 2013-06-28

Reference (94)

Reference (94) (25 of 94 results)

  • Technical products: 11-522-X201700014745
    Description:

    In the design of surveys a number of parameters like contact propensities, participation propensities and costs per sample unit play a decisive role. In on-going surveys, these survey design parameters are usually estimated from previous experience and updated gradually with new experience. In new surveys, these parameters are estimated from expert opinion and experience with similar surveys. Although survey institutes have a fair expertise and experience, the postulation, estimation and updating of survey design parameters is rarely done in a systematic way. This paper presents a Bayesian framework to include and update prior knowledge and expert opinion about the parameters. This framework is set in the context of adaptive survey designs in which different population units may receive different treatment given quality and cost objectives. For this type of survey, the accuracy of design parameters becomes even more crucial to effective design decisions. The framework allows for a Bayesian analysis of the performance of a survey during data collection and in between waves of a survey. We demonstrate the Bayesian analysis using a realistic simulation study.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014708
    Description:

    Statistics Canada’s Household Survey Frames (HSF) Programme provides various universe files that can be used alone or in combination to improve survey design, sampling, collection, and processing in the traditional “need to contact a household model.” Even as surveys are migrating onto these core suite of products, the HSF is starting to plan the changes to infrastructure, organisation, and linkages with other data assets in Statistics Canada that will help enable a shift to increased use of a wide variety of administrative data as input to the social statistics programme. The presentation will provide an overview of the HSF Programme, foundational concepts that will need to be implemented to expand linkage potential, and will identify strategic research being under-taken toward 2021.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014743
    Description:

    Probabilistic linkage is susceptible to linkage errors such as false positives and false negatives. In many cases, these errors may be reliably measured through clerical-reviews, i.e. the visual inspection of a sample of record pairs to determine if they are matched. A framework is described to effectively carry-out such clerical-reviews based on a probabilistic sample of pairs, repeated independent reviews of the same pairs and latent class analysis to account for clerical errors.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014748
    Description:

    This paper describes the creation of a database developed in Switzerland to analyze migration and the structural integration of the foreign national population. The database is created from various registers (register of residents, social insurance, unemployment) and surveys, and covers 15 years (1998 to 2013). Information on migration status and socioeconomic characteristics is also available for nearly 4 million foreign nationals who lived in Switzerland between 1998 and 2013. This database is the result of a collaboration between the Federal Statistics Office and researchers from the National Center of Competence in Research (NCCR)–On the Move.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014729
    Description:

    The use of administrative datasets as a data source in official statistics has become much more common as there is a drive for more outputs to be produced more efficiently. Many outputs rely on linkage between two or more datasets, and this is often undertaken in a number of phases with different methods and rules. In these situations we would like to be able to assess the quality of the linkage, and this involves some re-assessment of both links and non-links. In this paper we discuss sampling approaches to obtain estimates of false negatives and false positives with reasonable control of both accuracy of estimates and cost. Approaches to stratification of links (non-links) to sample are evaluated using information from the 2011 England and Wales population census.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014755
    Description:

    The National Children’s Study Vanguard Study was a pilot epidemiological cohort study of children and their parents. Measures were to be taken from pre-pregnancy until adulthood. The use of extant data was planned to supplement direct data collection from the respondents. Our paper outlines a strategy for cataloging and evaluating extant data sources for use with large scale longitudinal. Through our review we selected five evaluation factors to guide a researcher through available data sources including 1) relevance, 2) timeliness, 3) spatiality, 4) accessibility, and 5) accuracy.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014754
    Description:

    Background: There is increasing interest in measuring and benchmarking health system performance. We compared Canada’s health system with other countries in the Organisation for Economic Co-operation and Development (OECD) on both the national and provincial levels, across 50 indicators of health system performance. This analysis can help provinces identify potential areas for improvement, considering an optimal comparator for international comparisons. Methods: OECD Health Data from 2013 was used to compare Canada’s results internationally. We also calculated provincial results for OECD’s indicators on health system performance, using OECD methodology. We normalized the indicator results to present multiple indicators on the same scale and compared them to the OECD average, 25th and 75th percentiles. Results: Presenting normalized values allow Canada’s results to be compared across multiple OECD indicators on the same scale. No country or province consistently has higher results than the others. For most indicators, Canadian results are similar to other countries, but there remain areas where Canada performs particularly well (i.e. smoking rates) or poorly (i.e. patient safety). This data was presented in an interactive eTool. Conclusion: Comparing Canada’s provinces internationally can highlight areas where improvement is needed, and help to identify potential strategies for improvement.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014753
    Description:

    The fact that the world is in continuous change and that new technologies are becoming widely available creates new opportunities and challenges for National Statistical Institutes (NSIs) worldwide. What if NSIs could access vast amounts of sophisticated data for free (or for a low cost) from enterprises? Could this facilitate the possibility for NSIs to disseminate more accurate indicators for the policy-makers and users, significantly reduce the response burden for companies, reduce costs for the NSIs and in the long run improve the living standards of the people in a country? The time has now come for NSIs to find the best practice to align legislation, regulations and practices in relation to scanner data and big data. Without common ground, the prospect of reaching consensus is unlikely. The discussions need to start with how to define quality. If NSIs define and approach quality differently, this will lead to a highly undesirable situation, as NSIs will move further away from harmonisation. Sweden was one of the leading countries that put these issues on the agenda for European cooperation; in 2012 Sweden implemented scanner data in the national Consumer Price Index after it was proven through research studies and statistical analyses that scanner data was significantly better than the manually collected data.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014715
    Description:

    In preparation for 2021 UK Census the ONS has committed to an extensive research programme exploring how linked administrative data can be used to support conventional statistical processes. Item-level edit and imputation (E&I) will play an important role in adjusting the 2021 Census database. However, uncertainty associated with the accuracy and quality of available administrative data renders the efficacy of an integrated census-administrative data approach to E&I unclear. Current constraints that dictate an anonymised ‘hash-key’ approach to record linkage to ensure confidentiality add to that uncertainty. Here, we provide preliminary results from a simulation study comparing the predictive and distributional accuracy of the conventional E&I strategy implemented in CANCEIS for the 2011 UK Census to that of an integrated approach using synthetic administrative data with systematically increasing error as auxiliary information. In this initial phase of research we focus on imputing single year of age. The aim of the study is to gain insight into whether auxiliary information from admin data can improve imputation estimates and where the different strategies fall on a continuum of accuracy.

    Release date: 2016-03-24

  • Technical products: 12-002-X201500114147
    Description:

    Influential observations in logistic regression are those that have a notable effect on certain aspects of the model fit. Large sample size alone does not eliminate this concern; it is still important to examine potentially influential observations, especially in complex survey data. This paper describes a straightforward algorithm for examining potentially influential observations in complex survey data using SAS software. This algorithm was applied in a study using the 2005 Canadian Community Health Survey that examined factors associated with family physician utilization for adolescents.

    Release date: 2015-03-25

  • Technical products: 11-522-X201300014280
    Description:

    During the last decade, web panel surveys have been established as a fast and cost-efficient method in market surveys. The rationale for this is new developments in information technology, in particular the continued rapid growth of internet and computer use among the public. Also growing nonresponse rates and prices forced down in the survey industry lie behind this change. However, there are some serious inherent risks connected with web panel surveys, not least selection bias due to the self-selection of respondents. There are also risks of coverage and measurement errors. The absence of an inferential framework and of data quality indicators is an obstacle against using the web panel approach for high-quality statistics about general populations. Still, there seems to be increasing challenges for some national statistical institutes by a new form of competition for ad hoc statistics and even official statistics from web panel surveys.This paper explores the question of design and use of web panels in a scientifically sound way. An outline is given of a standard from the Swedish Survey Society for performance metrics to assess some quality aspects of results from web panel surveys. Decomposition of bias and mitigation of bias risks are discussed in some detail. Some ideas are presented for combining web panel surveys and traditional surveys to achieve controlled cost-efficient inference.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014291
    Description:

    Occupational coding in Germany is mostly done using dictionary approaches with subsequent manual revision of cases which could not be coded. Since manual coding is expensive, it is desirable to assign a higher number of codes automatically. At the same time the quality of the automatic coding must at least reach that of the manual coding. As a possible solution we employ different machine learning algorithms for the task using a substantial amount of manually coded occuptions available from recent studies as training data. We asses the feasibility of these methods of evaluating performance and quality of the algorithms.

    Release date: 2014-10-31

  • Technical products: 11-522-X200800011011
    Description:

    The Federation of Canadian Municipalities' (FCM) Quality of Life Reporting System (QOLRS) is a means by which to measure, monitor, and report on the quality of life in Canadian municipalities. To address that challenge of administrative data collection across member municipalities the QOLRS technical team collaborated on the development of the Municipal Data Collection Tool (MDCT) which has become a key component of QOLRS' data acquisition methodology. Offered as a case study on administrative data collection, this paper argues that the recent launch of the MDCT has enabled the FCM to access reliable pan-Canadian municipal administrative data for the QOLRS.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010955
    Description:

    Survey managers are still discovering the usefulness of digital audio recording for monitoring and managing field staff. Its value so far has been for confirming the authenticity of interviews, detecting curbstoning, offering a concrete basis for feedback on interviewing performance and giving data collection managers an intimate view of in-person interviews. In addition, computer audio-recorded interviewing (CARI) can improve other aspects of survey data quality, offering corroboration or correction of response coding by field staff. Audio recordings may replace or supplement in-field verbatim transcription of free responses, and speech-to-text technology might make this technique more efficient in the future.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010983
    Description:

    The US Census Bureau conducts monthly, quarterly, and annual surveys of the American economy and a census every 5 years. These programs require significant business effort. New technologies, new forms of organization, and scarce resources affect the ability of businesses to respond. Changes also affect what businesses expect from the Census Bureau, the Census Bureau's internal systems, and the way businesses interact with the Census Bureau.

    For several years, the Census Bureau has provided a special relationship to help large companies prepare for the census. We also have worked toward company-centric communication across all programs. A relationship model has emerged that focuses on infrastructure and business practices, and allows the Census Bureau to be more responsive.

    This paper focuses on the Census Bureau's company-centric communications and systems. We describe important initiatives and challenges, and we review their impact on Census Bureau practices and respondent behavior.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010951
    Description:

    Missing values caused by item nonresponse represent one type of non-sampling error that occurs in surveys. When cases with missing values are discarded in statistical analyses estimates may be biased because of differences between responders with missing values and responders that do not have missing values. Also, when variables in the data have different patterns of missingness among sampled cases, and cases with missing values are discarded in statistical analyses, those analyses may yield inconsistent results because they are based on different subsets of sampled cases that may not be comparable. However, analyses that discard cases with missing values may be valid provided those values are missing completely at random (MCAR). Are those missing values MCAR?

    To compensate, missing values are often imputed or survey weights are adjusted using weighting class methods. Subsequent analyses based on those compensations may be valid provided that missing values are missing at random (MAR) within each of the categorizations of the data implied by the independent variables of the models that underlie those adjustment approaches. Are those missing values MAR?

    Because missing values are not observed, MCAR and MAR assumptions made by statistical analyses are infrequently examined. This paper describes a selection model from which statistical significance tests for the MCAR and MAR assumptions can be examined although the missing values are not observed. Data from the National Immunization Survey conducted by the U.S. Department of Health and Human Services are used to illustrate the methods.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010993
    Description:

    Until now, years of experience in questionnaire design were required to estimate how long it would take a respondent, on the average, to complete a CATI questionnaire for a new survey. This presentation focuses on a new method which produces interview time estimates for questionnaires at the development stage. The method uses Blaise Audit Trail data and previous surveys. It was developed, tested and verified for accuracy on some large scale surveys.

    First, audit trail data was used to determine the average time previous respondents have taken to answer specific types of questions. These would include questions that require a yes/no answer, scaled questions, "mark all that apply" questions, etc. Second, for any given questionnaire, the paths taken by population sub-groups were mapped to identify the series of questions answered by different types of respondents, and timed to determine what the longest possible interview time would be. Finally, the overall expected time it takes to complete the questionnaire is calculated using estimated proportions of the population expected to answer each question.

    So far, we used paradata to accurately estimate average respondent interview completion times. We note that the method that we developed could also be used to estimate specific respondent interview completion times.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010960
    Description:

    Non-response is inevitable in any survey, despite all the effort put into reducing it at the various stages of the survey. In particular, non-response can cause bias in the estimates. In addition, non-response is an especially serious problem in longitudinal studies because the sample shrinks over time. France's ELFE (Étude Longitudinale Française depuis l'Enfance) is a project that aims to track 20,000 children from birth to adulthood using a multidisciplinary approach. This paper is based on the results of the initial pilot studies conducted in 2007 to test the survey's feasibility and acceptance. The participation rates are presented (response rate, non-response factors) along with a preliminary description of the non-response treatment methods being considered.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010969
    Description:

    In a multi-divisional initiative within the U. S. Census Bureau, a highly sophisticated and innovative system was developed and implemented for the capturing, tracking, and scanning of respondent data that implements Intelligent Character Recognition (ICR), Optical Character Recognition (OCR), Optical Mark Recognition (OMR), and keying technology with heavy emphasis on error detection and control. The system, known as the integrated Computer Assisted Data Entry (iCADE) System, provides digital imaging of respondent questionnaires which are then processed by a combination of imaging algorithms, sent through Optical Mark Recognition (OMR) to collect check box data, and automatically collect and send only write-in areas to data-keying staff for the data capture process. These capabilities have produced great efficiencies in the data capture process and have led to a novel and efficient approach to post-collection activities.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011003
    Description:

    This study examined the feasibility of developing correction factors to adjust self-reported measures of Body Mass Index to more closely approximate measured values. Data are from the 2005 Canadian Community Health Survey where respondents were asked to report their height and weight and were subsequently measured. Regression analyses were used to determine which socio-demographic and health characteristics were associated with the discrepancies between reported and measured values. The sample was then split into two groups. In the first, the self-reported BMI and the predictors of the discrepancies were regressed on the measured BMI. Correction equations were generated using all predictor variables that were significant at the p<0.05 level. These correction equations were then tested in the second group to derive estimates of sensitivity, specificity and of obesity prevalence. Logistic regression was used to examine the relationship between measured, reported and corrected BMI and obesity-related health conditions. Corrected estimates provided more accurate measures of obesity prevalence, mean BMI and sensitivity levels. Self-reported data exaggerated the relationship between BMI and health conditions, while in most cases the corrected estimates provided odds ratios that were more similar to those generated with the measured BMI.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010968
    Description:

    Statistics Canada has embarked on a program of increasing and improving the usage of imaging technology for paper survey questionnaires. The goal is to make the process an efficient, reliable and cost effective method of capturing survey data. The objective is to continue using Optical Character Recognition (OCR) to capture the data from questionnaires, documents and faxes received whilst improving the process integration and Quality Assurance/Quality Control (QC) of the data capture process. These improvements are discussed in this paper.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010975
    Description:

    A major issue in official statistics is the availability of objective measures supporting the based-on-fact decision process. Istat has developed an Information System to assess survey quality. Among other standard quality indicators, nonresponse rates are systematically computed and stored for all surveys. Such a rich information base permits analysis over time and comparisons among surveys. The paper focuses on the analysis of interrelationships between data collection mode and other survey characteristics on total nonresponse. Particular attention is devoted to the extent to which multi-mode data collection improves response rates.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010994
    Description:

    The growing difficulty of reaching respondents has a general impact on non-response in telephone surveys, especially those that use random digit dialling (RDD), such as the General Social Survey (GSS). The GSS is an annual multipurpose survey with 25,000 respondents. Its aim is to monitor the characteristics of and major changes in Canada's social structure. GSS Cycle 21 (2007) was about the family, social support and retirement. Its target population consisted of persons aged 45 and over living in the 10 Canadian provinces. For more effective coverage, part of the sample was taken from a follow-up with the respondents of GSS Cycle 20 (2006), which was on family transitions. The remainder was a new RDD sample. In this paper, we describe the survey's sampling plan and the random digit dialling method used. Then we discuss the challenges of calculating the non-response rate in an RDD survey that targets a subset of a population, for which the in-scope population must be estimated or modelled. This is done primarily through the use of paradata. The methodology used in GSS Cycle 21 is presented in detail.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010976
    Description:

    Many survey organizations use the response rate as an indicator for the quality of survey data. As a consequence, a variety of measures are implemented to reduce non-response or to maintain response at an acceptable level. However, the response rate is not necessarily a good indicator of non-response bias. A higher response rate does not imply smaller non-response bias. What matters is how the composition of the response differs from the composition of the sample as a whole. This paper describes the concept of R-indicators to assess potential differences between the sample and the response. Such indicators may facilitate analysis of survey response over time, between various fieldwork strategies or data collection modes. Some practical examples are given.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011002
    Description:

    Based on a representative sample of the Canadian population, this article quantifies the bias resulting from the use of self-reported rather than directly measured height, weight and body mass index (BMI). Associations between BMI categories and selected health conditions are compared to see if the misclassification resulting from the use of self-reported data alters associations between obesity and obesity-related health conditions. The analysis is based on 4,567 respondents to the 2005 Canadian Community Health Survey (CCHS) who, during a face-to-face interview, provided self-reported values for height and weight and were then measured by trained interviewers. Based on self-reported data, a substantial proportion of individuals with excess body weight were erroneously placed in lower BMI categories. This misclassification resulted in elevated associations between overweight/obesity and morbidity.

    Release date: 2009-12-03

Date modified: