Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

131 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

131 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

131 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

131 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (176)

All (176) (25 of 176 results)

  • Articles and reports: 12-001-X201700114823
    Description:

    The derivation of estimators in a multi-phase calibration process requires a sequential computation of estimators and calibrated weights of previous phases in order to obtain those of later ones. Already after two phases of calibration the estimators and their variances involve calibration factors from both phases and the formulae become cumbersome and uninformative. As a consequence the literature so far deals mainly with two phases while three phases or more are rarely being considered. The analysis in some cases is ad-hoc for a specific design and no comprehensive methodology for constructing calibrated estimators, and more challengingly, estimating their variances in three or more phases was formed. We provide a closed form formula for the variance of multi-phase calibrated estimators that holds for any number of phases. By specifying a new presentation of multi-phase calibrated weights it is possible to construct calibrated estimators that have the form of multi-variate regression estimators which enables a computation of a consistent estimator for their variance. This new variance estimator is not only general for any number of phases but also has some favorable characteristics. A comparison to other estimators in the special case of two-phase calibration and another independent study for three phases are presented.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201700114820
    Description:

    Measurement errors can induce bias in the estimation of transitions, leading to erroneous conclusions about labour market dynamics. Traditional literature on gross flows estimation is based on the assumption that measurement errors are uncorrelated over time. This assumption is not realistic in many contexts, because of survey design and data collection strategies. In this work, we use a model-based approach to correct observed gross flows from classification errors with latent class Markov models. We refer to data collected with the Italian Continuous Labour Force Survey, which is cross-sectional, quarterly, with a 2-2-2 rotating design. The questionnaire allows us to use multiple indicators of labour force conditions for each quarter: two collected in the first interview, and a third collected one year later. Our approach provides a method to estimate labour market mobility, taking into account correlated errors and the rotating design of the survey. The best-fitting model is a mixed latent class Markov model with covariates affecting latent transitions and correlated errors among indicators; the mixture components are of mover-stayer type. The better fit of the mixture specification is due to more accurately estimated latent transitions.

    Release date: 2017-06-22

  • Technical products: 11-522-X201700014708
    Description:

    Statistics Canada’s Household Survey Frames (HSF) Programme provides various universe files that can be used alone or in combination to improve survey design, sampling, collection, and processing in the traditional “need to contact a household model.” Even as surveys are migrating onto these core suite of products, the HSF is starting to plan the changes to infrastructure, organisation, and linkages with other data assets in Statistics Canada that will help enable a shift to increased use of a wide variety of administrative data as input to the social statistics programme. The presentation will provide an overview of the HSF Programme, foundational concepts that will need to be implemented to expand linkage potential, and will identify strategic research being under-taken toward 2021.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014721
    Description:

    Open data is becoming an increasingly important expectation of Canadians, researchers, and developers. Learn how and why the Government of Canada has centralized the distribution of all Government of Canada open data through Open.Canada.ca and how this initiative will continue to support the consumption of statistical information.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014746
    Description:

    Paradata research has focused on identifying opportunities for strategic improvement in data collection that could be operationally viable and lead to enhancements in quality or cost efficiency. To that end, Statistics Canada has developed and implemented a responsive collection design (RCD) strategy for computer-assisted telephone interview (CATI) household surveys to maximize quality and efficiency and to potentially reduce costs. RCD is an adaptive approach to survey data collection that uses information available prior to and during data collection to adjust the collection strategy for the remaining in-progress cases. In practice, the survey managers monitor and analyze collection progress against a predetermined set of indicators for two purposes: to identify critical data-collection milestones that require significant changes to the collection approach and to adjust collection strategies to make the most efficient use of remaining available resources. In the RCD context, numerous considerations come into play when determining which aspects of data collection to adjust and how to adjust them. Paradata sources play a key role in the planning, development and implementation of active management for RCD surveys. Since 2009, Statistics Canada has conducted several RCD surveys. This paper describes Statistics Canada’s experiences in implementing and monitoring this type of surveys.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014756
    Description:

    How can we bring together multidimensional health system performance data in a simplified way that is easy to access and provides comparable and actionable information to accelerate improvements in health care? The Canadian Institute for Health Information has developed a suite of tools to meet performance measurement needs across different audiences, to identify improvement priorities, understand how health regions and facilities compare with peers and support transparency and accountability. The pan-Canadian tools [Your Health System (YHS)] consolidates reporting of 45 key performance indicators in a structured way, and are comparable over time and at different geographic levels. This paper outlines the development and the methodological approaches and considerations taken to create a dynamic tool that facilitates benchmarking and meaningful comparisons for health system performance improvement.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014752
    Description:

    This paper presents a new price index method for processing electronic transaction (scanner) data. Price indices are calculated as a ratio of a turnover index and a weighted quantity index. Product weights of quantities sold are computed from the deflated prices of each month in the current publication year. New products can be timely incorporated without price imputations, so that all transactions can be processed. Product weights are monthly updated and are used to calculate direct indices with respect to a fixed base month. Price indices are free of chain drift by this construction. The results are robust under departures from the methodological choices. The method is part of the Dutch CPI since January 2016, when it was first applied to mobile phones.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014729
    Description:

    The use of administrative datasets as a data source in official statistics has become much more common as there is a drive for more outputs to be produced more efficiently. Many outputs rely on linkage between two or more datasets, and this is often undertaken in a number of phases with different methods and rules. In these situations we would like to be able to assess the quality of the linkage, and this involves some re-assessment of both links and non-links. In this paper we discuss sampling approaches to obtain estimates of false negatives and false positives with reasonable control of both accuracy of estimates and cost. Approaches to stratification of links (non-links) to sample are evaluated using information from the 2011 England and Wales population census.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014754
    Description:

    Background: There is increasing interest in measuring and benchmarking health system performance. We compared Canada’s health system with other countries in the Organisation for Economic Co-operation and Development (OECD) on both the national and provincial levels, across 50 indicators of health system performance. This analysis can help provinces identify potential areas for improvement, considering an optimal comparator for international comparisons. Methods: OECD Health Data from 2013 was used to compare Canada’s results internationally. We also calculated provincial results for OECD’s indicators on health system performance, using OECD methodology. We normalized the indicator results to present multiple indicators on the same scale and compared them to the OECD average, 25th and 75th percentiles. Results: Presenting normalized values allow Canada’s results to be compared across multiple OECD indicators on the same scale. No country or province consistently has higher results than the others. For most indicators, Canadian results are similar to other countries, but there remain areas where Canada performs particularly well (i.e. smoking rates) or poorly (i.e. patient safety). This data was presented in an interactive eTool. Conclusion: Comparing Canada’s provinces internationally can highlight areas where improvement is needed, and help to identify potential strategies for improvement.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014749
    Description:

    As part of the Tourism Statistics Program redesign, Statistics Canada is developing the National Travel Survey (NTS) to collect travel information from Canadian travellers. This new survey will replace the Travel Survey of Residents of Canada and the Canadian resident component of the International Travel Survey. The NTS will take advantage of Statistics Canada’s common sampling frames and common processing tools while maximizing the use of administrative data. This paper discusses the potential uses of administrative data such as Passport Canada files, Canada Border Service Agency files and Canada Revenue Agency files, to increase the efficiency of the NTS sample design.

    Release date: 2016-03-24

  • Articles and reports: 12-001-X201500214250
    Description:

    Assessing the impact of mode effects on survey estimates has become a crucial research objective due to the increasing use of mixed-mode designs. Despite the advantages of a mixed-mode design, such as lower costs and increased coverage, there is sufficient evidence that mode effects may be large relative to the precision of a survey. They may lead to incomparable statistics in time or over population subgroups and they may increase bias. Adaptive survey designs offer a flexible mathematical framework to obtain an optimal balance between survey quality and costs. In this paper, we employ adaptive designs in order to minimize mode effects. We illustrate our optimization model by means of a case-study on the Dutch Labor Force Survey. We focus on item-dependent mode effects and we evaluate the impact on survey quality by comparison to a gold standard.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214230
    Description:

    This paper develops allocation methods for stratified sample surveys where composite small area estimators are a priority, and areas are used as strata. Longford (2006) proposed an objective criterion for this situation, based on a weighted combination of the mean squared errors of small area means and a grand mean. Here, we redefine this approach within a model-assisted framework, allowing regressor variables and a more natural interpretation of results using an intra-class correlation parameter. We also consider several uses of power allocation, and allow the placing of other constraints such as maximum relative root mean squared errors for stratum estimators. We find that a simple power allocation can perform very nearly as well as the optimal design even when the objective is to minimize Longford’s (2006) criterion.

    Release date: 2015-12-17

  • Articles and reports: 82-003-X201500714205
    Description:

    Discrepancies between self-reported and objectively measured physical activity are well-known. For the purpose of validation, this study compares a new self-reported physical activity questionnaire with an existing one and with accelerometer data.

    Release date: 2015-07-15

  • Articles and reports: 82-003-X201500614196
    Description:

    This study investigates the feasibility and validity of using personal health insurance numbers to deterministically link the CCR and the Discharge Abstract Database to obtain hospitalization information about people with primary cancers.

    Release date: 2015-06-17

  • Technical products: 12-002-X201500114147
    Description:

    Influential observations in logistic regression are those that have a notable effect on certain aspects of the model fit. Large sample size alone does not eliminate this concern; it is still important to examine potentially influential observations, especially in complex survey data. This paper describes a straightforward algorithm for examining potentially influential observations in complex survey data using SAS software. This algorithm was applied in a study using the 2005 Canadian Community Health Survey that examined factors associated with family physician utilization for adolescents.

    Release date: 2015-03-25

  • Articles and reports: 12-001-X201400214110
    Description:

    In developing the sample design for a survey we attempt to produce a good design for the funds available. Information on costs can be used to develop sample designs that minimise the sampling variance of an estimator of total for fixed cost. Improvements in survey management systems mean that it is now sometimes possible to estimate the cost of including each unit in the sample. This paper develops relatively simple approaches to determine whether the potential gains arising from using this unit level cost information are likely to be of practical use. It is shown that the key factor is the coefficient of variation of the costs relative to the coefficient of variation of the relative error on the estimated cost coefficients.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214128
    Description:

    Users, funders and providers of official statistics want estimates that are “wider, deeper, quicker, better, cheaper” (channeling Tim Holt, former head of the UK Office for National Statistics), to which I would add “more relevant” and “less burdensome”. Since World War II, we have relied heavily on the probability sample survey as the best we could do - and that best being very good - to meet these goals for estimates of household income and unemployment, self-reported health status, time use, crime victimization, business activity, commodity flows, consumer and business expenditures, et al. Faced with secularly declining unit and item response rates and evidence of reporting error, we have responded in many ways, including the use of multiple survey modes, more sophisticated weighting and imputation methods, adaptive design, cognitive testing of survey items, and other means to maintain data quality. For statistics on the business sector, in order to reduce burden and costs, we long ago moved away from relying solely on surveys to produce needed estimates, but, to date, we have not done that for household surveys, at least not in the United States. I argue that we can and must move from a paradigm of producing the best estimates possible from a survey to that of producing the best possible estimates to meet user needs from multiple data sources. Such sources include administrative records and, increasingly, transaction and Internet-based data. I provide two examples - household income and plumbing facilities - to illustrate my thesis. I suggest ways to inculcate a culture of official statistics that focuses on the end result of relevant, timely, accurate and cost-effective statistics and treats surveys, along with other data sources, as means to that end.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214090
    Description:

    When studying a finite population, it is sometimes necessary to select samples from several sampling frames in order to represent all individuals. Here we are interested in the scenario where two samples are selected using a two-stage design, with common first-stage selection. We apply the Hartley (1962), Bankier (1986) and Kalton and Anderson (1986) methods, and we show that these methods can be applied conditional on first-stage selection. We also compare the performance of several estimators as part of a simulation study. Our results suggest that the estimator should be chosen carefully when there are multiple sampling frames, and that a simple estimator is sometimes preferable, even if it uses only part of the information collected.

    Release date: 2014-12-19

  • Technical products: 11-522-X201300014289
    Description:

    This paper provides an overview of the main new features that will be added to the forthcoming version of the Demosim microsimulation projection model based on microdata from the 2011 National Household Survey. The paper first describes the additions to the base population, namely new variables, some of which are added to the National Household Survey data by means of data linkage. This is followed by a brief description of the methods being considered for the projection of language variables, citizenship and religion as examples of the new features for events simulated by the model.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014251
    Description:

    I present a modeller's perspective on the current status quo in official statistics surveys-based inference. In doing so, I try to identify the strengths and weaknesses of the design and model-based inferential positions that survey sampling, at least as far as the official statistics world is concerned, finds itself at present. I close with an example from adaptive survey design that illustrates why taking a model-based perspective (either frequentist or Bayesian) represents the best way for official statistics to avoid the debilitating 'inferential schizophrenia' that seems inevitable if current methodologies are applied to the emerging information requirements of today's world (and possibly even tomorrow's).

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014286
    Description:

    The Étude Longitudinale Française depuis l’Enfance (ELFE) [French longitudinal study from childhood on], which began in 2011, involves over 18,300 infants whose parents agreed to participate when they were in the maternity hospital. This cohort survey, which will track the children from birth to adulthood, covers the many aspects of their lives from the perspective of social science, health and environmental health. In randomly selected maternity hospitals, all infants in the target population, who were born on one of 25 days distributed across the four seasons, were chosen. This sample is the outcome of a non-standard sampling scheme that we call product sampling. In this survey, it takes the form of the cross-tabulation between two independent samples: a sampling of maternity hospitals and a sampling of days. While it is easy to imagine a cluster effect due to the sampling of maternity hospitals, one can also imagine a cluster effect due to the sampling of days. The scheme’s time dimension therefore cannot be ignored if the desired estimates are subject to daily or seasonal variation. While this non-standard scheme can be viewed as a particular kind of two-phase design, it needs to be defined within a more specific framework. Following a comparison of the product scheme with a conventional two-stage design, we propose variance estimators specially formulated for this sampling scheme. Our ideas are illustrated with a simulation study.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014272
    Description:

    Two converging trends raise questions about the future of large-scale probability surveys conducted by or for National Statistical Institutes (NSIs). First, increasing costs and rising rates of nonresponse potentially threaten the cost-effectiveness and inferential value of surveys. Second, there is growing interest in Big Data as a replacement for surveys. There are many different types of Big Data, but the primary focus here is on data generated through social media. This paper supplements and updates an earlier paper on the topic (Couper, 2013). I review some of the concerns about Big Data, particularly from the survey perspective. I argue that there is a role for both high-quality surveys and big data analytics in the work of NSIs. While Big Data is unlikely to replace high-quality surveys, I believe the two methods can serve complementary functions. I attempt to identify some of the criteria that need to be met, and questions that need to be answered, before Big Data can be used for reliable population-based inference.

    Release date: 2014-10-31

  • Articles and reports: 82-003-X201401014098
    Description:

    This study compares registry and non-registry approaches to linking 2006 Census of Population data for Manitoba and Ontario to Hospital data from the Discharge Abstract Database.

    Release date: 2014-10-15

  • Articles and reports: 12-001-X201400114004
    Description:

    In 2009, two major surveys in the Governments Division of the U.S. Census Bureau were redesigned to reduce sample size, save resources, and improve the precision of the estimates (Cheng, Corcoran, Barth and Hogue 2009). The new design divides each of the traditional state by government-type strata with sufficiently many units into two sub-strata according to each governmental unit’s total payroll, in order to sample less from the sub-stratum with small size units. The model-assisted approach is adopted in estimating population totals. Regression estimators using auxiliary variables are obtained either within each created sub-stratum or within the original stratum by collapsing two sub-strata. A decision-based method was proposed in Cheng, Slud and Hogue (2010), applying a hypothesis test to decide which regression estimator is used within each original stratum. Consistency and asymptotic normality of these model-assisted estimators are established here, under a design-based or model-assisted asymptotic framework. Our asymptotic results also suggest two types of consistent variance estimators, one obtained by substituting unknown quantities in the asymptotic variances and the other by applying the bootstrap. The performance of all the estimators of totals and of their variance estimators are examined in some empirical studies. The U.S. Annual Survey of Public Employment and Payroll (ASPEP) is used to motivate and illustrate our study.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211888
    Description:

    When the study variables are functional and storage capacities are limited or transmission costs are high, using survey techniques to select a portion of the observations of the population is an interesting alternative to using signal compression techniques. In this context of functional data, our focus in this study is on estimating the mean electricity consumption curve over a one-week period. We compare different estimation strategies that take account of a piece of auxiliary information such as the mean consumption for the previous period. The first strategy consists in using a simple random sampling design without replacement, then incorporating the auxiliary information into the estimator by introducing a functional linear model. The second approach consists in incorporating the auxiliary information into the sampling designs by considering unequal probability designs, such as stratified and pi designs. We then address the issue of constructing confidence bands for these estimators of the mean. When effective estimators of the covariance function are available and the mean estimator satisfies a functional central limit theorem, it is possible to use a fast technique for constructing confidence bands, based on the simulation of Gaussian processes. This approach is compared with bootstrap techniques that have been adapted to take account of the functional nature of the data.

    Release date: 2014-01-15

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (96)

Analysis (96) (25 of 96 results)

  • Articles and reports: 12-001-X201700114823
    Description:

    The derivation of estimators in a multi-phase calibration process requires a sequential computation of estimators and calibrated weights of previous phases in order to obtain those of later ones. Already after two phases of calibration the estimators and their variances involve calibration factors from both phases and the formulae become cumbersome and uninformative. As a consequence the literature so far deals mainly with two phases while three phases or more are rarely being considered. The analysis in some cases is ad-hoc for a specific design and no comprehensive methodology for constructing calibrated estimators, and more challengingly, estimating their variances in three or more phases was formed. We provide a closed form formula for the variance of multi-phase calibrated estimators that holds for any number of phases. By specifying a new presentation of multi-phase calibrated weights it is possible to construct calibrated estimators that have the form of multi-variate regression estimators which enables a computation of a consistent estimator for their variance. This new variance estimator is not only general for any number of phases but also has some favorable characteristics. A comparison to other estimators in the special case of two-phase calibration and another independent study for three phases are presented.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201700114820
    Description:

    Measurement errors can induce bias in the estimation of transitions, leading to erroneous conclusions about labour market dynamics. Traditional literature on gross flows estimation is based on the assumption that measurement errors are uncorrelated over time. This assumption is not realistic in many contexts, because of survey design and data collection strategies. In this work, we use a model-based approach to correct observed gross flows from classification errors with latent class Markov models. We refer to data collected with the Italian Continuous Labour Force Survey, which is cross-sectional, quarterly, with a 2-2-2 rotating design. The questionnaire allows us to use multiple indicators of labour force conditions for each quarter: two collected in the first interview, and a third collected one year later. Our approach provides a method to estimate labour market mobility, taking into account correlated errors and the rotating design of the survey. The best-fitting model is a mixed latent class Markov model with covariates affecting latent transitions and correlated errors among indicators; the mixture components are of mover-stayer type. The better fit of the mixture specification is due to more accurately estimated latent transitions.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201500214250
    Description:

    Assessing the impact of mode effects on survey estimates has become a crucial research objective due to the increasing use of mixed-mode designs. Despite the advantages of a mixed-mode design, such as lower costs and increased coverage, there is sufficient evidence that mode effects may be large relative to the precision of a survey. They may lead to incomparable statistics in time or over population subgroups and they may increase bias. Adaptive survey designs offer a flexible mathematical framework to obtain an optimal balance between survey quality and costs. In this paper, we employ adaptive designs in order to minimize mode effects. We illustrate our optimization model by means of a case-study on the Dutch Labor Force Survey. We focus on item-dependent mode effects and we evaluate the impact on survey quality by comparison to a gold standard.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214230
    Description:

    This paper develops allocation methods for stratified sample surveys where composite small area estimators are a priority, and areas are used as strata. Longford (2006) proposed an objective criterion for this situation, based on a weighted combination of the mean squared errors of small area means and a grand mean. Here, we redefine this approach within a model-assisted framework, allowing regressor variables and a more natural interpretation of results using an intra-class correlation parameter. We also consider several uses of power allocation, and allow the placing of other constraints such as maximum relative root mean squared errors for stratum estimators. We find that a simple power allocation can perform very nearly as well as the optimal design even when the objective is to minimize Longford’s (2006) criterion.

    Release date: 2015-12-17

  • Articles and reports: 82-003-X201500714205
    Description:

    Discrepancies between self-reported and objectively measured physical activity are well-known. For the purpose of validation, this study compares a new self-reported physical activity questionnaire with an existing one and with accelerometer data.

    Release date: 2015-07-15

  • Articles and reports: 82-003-X201500614196
    Description:

    This study investigates the feasibility and validity of using personal health insurance numbers to deterministically link the CCR and the Discharge Abstract Database to obtain hospitalization information about people with primary cancers.

    Release date: 2015-06-17

  • Articles and reports: 12-001-X201400214110
    Description:

    In developing the sample design for a survey we attempt to produce a good design for the funds available. Information on costs can be used to develop sample designs that minimise the sampling variance of an estimator of total for fixed cost. Improvements in survey management systems mean that it is now sometimes possible to estimate the cost of including each unit in the sample. This paper develops relatively simple approaches to determine whether the potential gains arising from using this unit level cost information are likely to be of practical use. It is shown that the key factor is the coefficient of variation of the costs relative to the coefficient of variation of the relative error on the estimated cost coefficients.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214128
    Description:

    Users, funders and providers of official statistics want estimates that are “wider, deeper, quicker, better, cheaper” (channeling Tim Holt, former head of the UK Office for National Statistics), to which I would add “more relevant” and “less burdensome”. Since World War II, we have relied heavily on the probability sample survey as the best we could do - and that best being very good - to meet these goals for estimates of household income and unemployment, self-reported health status, time use, crime victimization, business activity, commodity flows, consumer and business expenditures, et al. Faced with secularly declining unit and item response rates and evidence of reporting error, we have responded in many ways, including the use of multiple survey modes, more sophisticated weighting and imputation methods, adaptive design, cognitive testing of survey items, and other means to maintain data quality. For statistics on the business sector, in order to reduce burden and costs, we long ago moved away from relying solely on surveys to produce needed estimates, but, to date, we have not done that for household surveys, at least not in the United States. I argue that we can and must move from a paradigm of producing the best estimates possible from a survey to that of producing the best possible estimates to meet user needs from multiple data sources. Such sources include administrative records and, increasingly, transaction and Internet-based data. I provide two examples - household income and plumbing facilities - to illustrate my thesis. I suggest ways to inculcate a culture of official statistics that focuses on the end result of relevant, timely, accurate and cost-effective statistics and treats surveys, along with other data sources, as means to that end.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214090
    Description:

    When studying a finite population, it is sometimes necessary to select samples from several sampling frames in order to represent all individuals. Here we are interested in the scenario where two samples are selected using a two-stage design, with common first-stage selection. We apply the Hartley (1962), Bankier (1986) and Kalton and Anderson (1986) methods, and we show that these methods can be applied conditional on first-stage selection. We also compare the performance of several estimators as part of a simulation study. Our results suggest that the estimator should be chosen carefully when there are multiple sampling frames, and that a simple estimator is sometimes preferable, even if it uses only part of the information collected.

    Release date: 2014-12-19

  • Articles and reports: 82-003-X201401014098
    Description:

    This study compares registry and non-registry approaches to linking 2006 Census of Population data for Manitoba and Ontario to Hospital data from the Discharge Abstract Database.

    Release date: 2014-10-15

  • Articles and reports: 12-001-X201400114004
    Description:

    In 2009, two major surveys in the Governments Division of the U.S. Census Bureau were redesigned to reduce sample size, save resources, and improve the precision of the estimates (Cheng, Corcoran, Barth and Hogue 2009). The new design divides each of the traditional state by government-type strata with sufficiently many units into two sub-strata according to each governmental unit’s total payroll, in order to sample less from the sub-stratum with small size units. The model-assisted approach is adopted in estimating population totals. Regression estimators using auxiliary variables are obtained either within each created sub-stratum or within the original stratum by collapsing two sub-strata. A decision-based method was proposed in Cheng, Slud and Hogue (2010), applying a hypothesis test to decide which regression estimator is used within each original stratum. Consistency and asymptotic normality of these model-assisted estimators are established here, under a design-based or model-assisted asymptotic framework. Our asymptotic results also suggest two types of consistent variance estimators, one obtained by substituting unknown quantities in the asymptotic variances and the other by applying the bootstrap. The performance of all the estimators of totals and of their variance estimators are examined in some empirical studies. The U.S. Annual Survey of Public Employment and Payroll (ASPEP) is used to motivate and illustrate our study.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211888
    Description:

    When the study variables are functional and storage capacities are limited or transmission costs are high, using survey techniques to select a portion of the observations of the population is an interesting alternative to using signal compression techniques. In this context of functional data, our focus in this study is on estimating the mean electricity consumption curve over a one-week period. We compare different estimation strategies that take account of a piece of auxiliary information such as the mean consumption for the previous period. The first strategy consists in using a simple random sampling design without replacement, then incorporating the auxiliary information into the estimator by introducing a functional linear model. The second approach consists in incorporating the auxiliary information into the sampling designs by considering unequal probability designs, such as stratified and pi designs. We then address the issue of constructing confidence bands for these estimators of the mean. When effective estimators of the covariance function are available and the mean estimator satisfies a functional central limit theorem, it is possible to use a fast technique for constructing confidence bands, based on the simulation of Gaussian processes. This approach is compared with bootstrap techniques that have been adapted to take account of the functional nature of the data.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300211871
    Description:

    Regression models are routinely used in the analysis of survey data, where one common issue of interest is to identify influential factors that are associated with certain behavioral, social, or economic indices within a target population. When data are collected through complex surveys, the properties of classical variable selection approaches developed in i.i.d. non-survey settings need to be re-examined. In this paper, we derive a pseudo-likelihood-based BIC criterion for variable selection in the analysis of survey data and suggest a sample-based penalized likelihood approach for its implementation. The sampling weights are appropriately assigned to correct the biased selection result caused by the distortion between the sample and the target population. Under a joint randomization framework, we establish the consistency of the proposed selection procedure. The finite-sample performance of the approach is assessed through analysis and computer simulations based on data from the hypertension component of the 2009 Survey on Living with Chronic Diseases in Canada.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300111824
    Description:

    In most surveys all sample units receive the same treatment and the same design features apply to all selected people and households. In this paper, it is explained how survey designs may be tailored to optimize quality given constraints on costs. Such designs are called adaptive survey designs. The basic ingredients of such designs are introduced, discussed and illustrated with various examples.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201300111828
    Description:

    A question that commonly arises in longitudinal surveys is the issue of how to combine differing cohorts of the survey. In this paper we present a novel method for combining different cohorts, and using all available data, in a longitudinal survey to estimate parameters of a semiparametric model, which relates the response variable to a set of covariates. The procedure builds upon the Weighted Generalized Estimation Equation method for handling missing waves in longitudinal studies. Our method is set up under a joint-randomization framework for estimation of model parameters, which takes into account the superpopulation model as well as the survey design randomization. We also propose a design-based, and a joint-randomization, variance estimation method. To illustrate the methodology we apply it to the Survey of Doctorate Recipients, conducted by the U.S. National Science Foundation.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201200211758
    Description:

    This paper develops two Bayesian methods for inference about finite population quantiles of continuous survey variables from unequal probability sampling. The first method estimates cumulative distribution functions of the continuous survey variable by fitting a number of probit penalized spline regression models on the inclusion probabilities. The finite population quantiles are then obtained by inverting the estimated distribution function. This method is quite computationally demanding. The second method predicts non-sampled values by assuming a smoothly-varying relationship between the continuous survey variable and the probability of inclusion, by modeling both the mean function and the variance function using splines. The two Bayesian spline-model-based estimators yield a desirable balance between robustness and efficiency. Simulation studies show that both methods yield smaller root mean squared errors than the sample-weighted estimator and the ratio and difference estimators described by Rao, Kovar, and Mantel (RKM 1990), and are more robust to model misspecification than the regression through the origin model-based estimator described in Chambers and Dunstan (1986). When the sample size is small, the 95% credible intervals of the two new methods have closer to nominal confidence coverage than the sample-weighted estimator.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X201200211755
    Description:

    Non-response in longitudinal studies is addressed by assessing the accuracy of response propensity models constructed to discriminate between and predict different types of non-response. Particular attention is paid to summary measures derived from receiver operating characteristic (ROC) curves and logit rank plots. The ideas are applied to data from the UK Millennium Cohort Study. The results suggest that the ability to discriminate between and predict non-respondents is not high. Weights generated from the response propensity models lead to only small adjustments in employment transitions. Conclusions are drawn in terms of the potential of interventions to prevent non-response.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X201200111682
    Description:

    Sample allocation issues are studied in the context of estimating sub-population (stratum or domain) means as well as the aggregate population mean under stratified simple random sampling. A non-linear programming method is used to obtain "optimal" sample allocation to strata that minimizes the total sample size subject to specified tolerances on the coefficient of variation of the estimators of strata means and the population mean. The resulting total sample size is then used to determine sample allocations for the methods of Costa, Satorra and Ventura (2004) based on compromise allocation and Longford (2006) based on specified "inferential priorities". In addition, we study sample allocation to strata when reliability requirements for domains, cutting across strata, are also specified. Performance of the three methods is studied using data from Statistics Canada's Monthly Retail Trade Survey (MRTS) of single establishments.

    Release date: 2012-06-27

  • Articles and reports: 12-001-X201200111683
    Description:

    We consider alternatives to poststratification for doubly classified data in which at least one of the two-way cells is too small to allow the poststratification based upon this double classification. In our study data set, the expected count in the smallest cell is 0.36. One approach is simply to collapse cells. This is likely, however, to destroy the double classification structure. Our alternative approaches allows one to maintain the original double classification of the data. The approaches are based upon the calibration study by Chang and Kott (2008). We choose weight adjustments dependent upon the marginal classifications (but not full cross classification) to minimize an objective function of the differences between the population counts of the two way cells and their sample estimates. In the terminology of Chang and Kott (2008), if the row and column classifications have I and J cells respectively, this results in IJ benchmark variables and I + J - 1 model variables. We study the performance of these estimators by constructing simulation simple random samples from the 2005 Quarterly Census of Employment and Wages which is maintained by the Bureau of Labor Statistics. We use the double classification of state and industry group. In our study, the calibration approaches introduced an asymptotically trivial bias, but reduced the MSE, compared to the unbiased estimator, by as much as 20% for a small sample.

    Release date: 2012-06-27

  • Articles and reports: 82-003-X201200111633
    Description:

    This paper explains the methodology for creating Geozones, which are area-based thresholds of population characteristics derived from census data, which can be used in the analysis of social or economic differences in health and health service utilization.

    Release date: 2012-03-21

  • Articles and reports: 12-001-X201100211604
    Description:

    We propose a method of mean squared error (MSE) estimation for estimators of finite population domain means that can be expressed in pseudo-linear form, i.e., as weighted sums of sample values. In particular, it can be used for estimating the MSE of the empirical best linear unbiased predictor, the model-based direct estimator and the M-quantile predictor. The proposed method represents an extension of the ideas in Royall and Cumberland (1978) and leads to MSE estimators that are simpler to implement, and potentially more bias-robust, than those suggested in the small area literature. However, it should be noted that the MSE estimators defined using this method can also exhibit large variability when the area-specific sample sizes are very small. We illustrate the performance of the method through extensive model-based and design-based simulation, with the latter based on two realistic survey data sets containing small area information.

    Release date: 2011-12-21

  • Articles and reports: 82-003-X201100311533
    Description:

    This study compares the bias in self-reported height, weight and body mass index in the 2008 and 2005 Canadian Community Health Surveys and the 2007 to 2009 Canadian Health Measures Survey. The feasibility of using correction equations to adjust self-reported 2008 Canadian Community Health Survey values to more closely approximate measured values is assessed.

    Release date: 2011-08-17

  • Articles and reports: 82-003-X201100311534
    Description:

    Using data from the 2007 to 2009 Canadian Health Measures Survey, this study investigates the bias that exists when height, weight and body mass index are based on parent-reported values. Factors associated with reporting error are used to establish the feasibility of developing correction equations to adjust parent-reported estimates.

    Release date: 2011-08-17

  • Articles and reports: 12-001-X201100111446
    Description:

    Small area estimation based on linear mixed models can be inefficient when the underlying relationships are non-linear. In this paper we introduce SAE techniques for variables that can be modelled linearly following a non-linear transformation. In particular, we extend the model-based direct estimator of Chandra and Chambers (2005, 2009) to data that are consistent with a linear mixed model in the logarithmic scale, using model calibration to define appropriate weights for use in this estimator. Our results show that the resulting transformation-based estimator is both efficient and robust with respect to the distribution of the random effects in the model. An application to business survey data demonstrates the satisfactory performance of the method.

    Release date: 2011-06-29

  • Articles and reports: 12-001-X201100111444
    Description:

    Data linkage is the act of bringing together records that are believed to belong to the same unit (e.g., person or business) from two or more files. It is a very common way to enhance dimensions such as time and breadth or depth of detail. Data linkage is often not an error-free process and can lead to linking a pair of records that do not belong to the same unit. There is an explosion of record linkage applications, yet there has been little work on assuring the quality of analyses using such linked files. Naively treating such a linked file as if it were linked without errors will, in general, lead to biased estimates. This paper develops a maximum likelihood estimator for contingency tables and logistic regression with incorrectly linked records. The estimation technique is simple and is implemented using the well-known EM algorithm. A well known method of linking records in the present context is probabilistic data linking. The paper demonstrates the effectiveness of the proposed estimators in an empirical study which uses probabilistic data linkage.

    Release date: 2011-06-29

Reference (80)

Reference (80) (25 of 80 results)

  • Technical products: 11-522-X201700014708
    Description:

    Statistics Canada’s Household Survey Frames (HSF) Programme provides various universe files that can be used alone or in combination to improve survey design, sampling, collection, and processing in the traditional “need to contact a household model.” Even as surveys are migrating onto these core suite of products, the HSF is starting to plan the changes to infrastructure, organisation, and linkages with other data assets in Statistics Canada that will help enable a shift to increased use of a wide variety of administrative data as input to the social statistics programme. The presentation will provide an overview of the HSF Programme, foundational concepts that will need to be implemented to expand linkage potential, and will identify strategic research being under-taken toward 2021.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014721
    Description:

    Open data is becoming an increasingly important expectation of Canadians, researchers, and developers. Learn how and why the Government of Canada has centralized the distribution of all Government of Canada open data through Open.Canada.ca and how this initiative will continue to support the consumption of statistical information.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014746
    Description:

    Paradata research has focused on identifying opportunities for strategic improvement in data collection that could be operationally viable and lead to enhancements in quality or cost efficiency. To that end, Statistics Canada has developed and implemented a responsive collection design (RCD) strategy for computer-assisted telephone interview (CATI) household surveys to maximize quality and efficiency and to potentially reduce costs. RCD is an adaptive approach to survey data collection that uses information available prior to and during data collection to adjust the collection strategy for the remaining in-progress cases. In practice, the survey managers monitor and analyze collection progress against a predetermined set of indicators for two purposes: to identify critical data-collection milestones that require significant changes to the collection approach and to adjust collection strategies to make the most efficient use of remaining available resources. In the RCD context, numerous considerations come into play when determining which aspects of data collection to adjust and how to adjust them. Paradata sources play a key role in the planning, development and implementation of active management for RCD surveys. Since 2009, Statistics Canada has conducted several RCD surveys. This paper describes Statistics Canada’s experiences in implementing and monitoring this type of surveys.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014756
    Description:

    How can we bring together multidimensional health system performance data in a simplified way that is easy to access and provides comparable and actionable information to accelerate improvements in health care? The Canadian Institute for Health Information has developed a suite of tools to meet performance measurement needs across different audiences, to identify improvement priorities, understand how health regions and facilities compare with peers and support transparency and accountability. The pan-Canadian tools [Your Health System (YHS)] consolidates reporting of 45 key performance indicators in a structured way, and are comparable over time and at different geographic levels. This paper outlines the development and the methodological approaches and considerations taken to create a dynamic tool that facilitates benchmarking and meaningful comparisons for health system performance improvement.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014752
    Description:

    This paper presents a new price index method for processing electronic transaction (scanner) data. Price indices are calculated as a ratio of a turnover index and a weighted quantity index. Product weights of quantities sold are computed from the deflated prices of each month in the current publication year. New products can be timely incorporated without price imputations, so that all transactions can be processed. Product weights are monthly updated and are used to calculate direct indices with respect to a fixed base month. Price indices are free of chain drift by this construction. The results are robust under departures from the methodological choices. The method is part of the Dutch CPI since January 2016, when it was first applied to mobile phones.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014729
    Description:

    The use of administrative datasets as a data source in official statistics has become much more common as there is a drive for more outputs to be produced more efficiently. Many outputs rely on linkage between two or more datasets, and this is often undertaken in a number of phases with different methods and rules. In these situations we would like to be able to assess the quality of the linkage, and this involves some re-assessment of both links and non-links. In this paper we discuss sampling approaches to obtain estimates of false negatives and false positives with reasonable control of both accuracy of estimates and cost. Approaches to stratification of links (non-links) to sample are evaluated using information from the 2011 England and Wales population census.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014754
    Description:

    Background: There is increasing interest in measuring and benchmarking health system performance. We compared Canada’s health system with other countries in the Organisation for Economic Co-operation and Development (OECD) on both the national and provincial levels, across 50 indicators of health system performance. This analysis can help provinces identify potential areas for improvement, considering an optimal comparator for international comparisons. Methods: OECD Health Data from 2013 was used to compare Canada’s results internationally. We also calculated provincial results for OECD’s indicators on health system performance, using OECD methodology. We normalized the indicator results to present multiple indicators on the same scale and compared them to the OECD average, 25th and 75th percentiles. Results: Presenting normalized values allow Canada’s results to be compared across multiple OECD indicators on the same scale. No country or province consistently has higher results than the others. For most indicators, Canadian results are similar to other countries, but there remain areas where Canada performs particularly well (i.e. smoking rates) or poorly (i.e. patient safety). This data was presented in an interactive eTool. Conclusion: Comparing Canada’s provinces internationally can highlight areas where improvement is needed, and help to identify potential strategies for improvement.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014749
    Description:

    As part of the Tourism Statistics Program redesign, Statistics Canada is developing the National Travel Survey (NTS) to collect travel information from Canadian travellers. This new survey will replace the Travel Survey of Residents of Canada and the Canadian resident component of the International Travel Survey. The NTS will take advantage of Statistics Canada’s common sampling frames and common processing tools while maximizing the use of administrative data. This paper discusses the potential uses of administrative data such as Passport Canada files, Canada Border Service Agency files and Canada Revenue Agency files, to increase the efficiency of the NTS sample design.

    Release date: 2016-03-24

  • Technical products: 12-002-X201500114147
    Description:

    Influential observations in logistic regression are those that have a notable effect on certain aspects of the model fit. Large sample size alone does not eliminate this concern; it is still important to examine potentially influential observations, especially in complex survey data. This paper describes a straightforward algorithm for examining potentially influential observations in complex survey data using SAS software. This algorithm was applied in a study using the 2005 Canadian Community Health Survey that examined factors associated with family physician utilization for adolescents.

    Release date: 2015-03-25

  • Technical products: 11-522-X201300014289
    Description:

    This paper provides an overview of the main new features that will be added to the forthcoming version of the Demosim microsimulation projection model based on microdata from the 2011 National Household Survey. The paper first describes the additions to the base population, namely new variables, some of which are added to the National Household Survey data by means of data linkage. This is followed by a brief description of the methods being considered for the projection of language variables, citizenship and religion as examples of the new features for events simulated by the model.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014251
    Description:

    I present a modeller's perspective on the current status quo in official statistics surveys-based inference. In doing so, I try to identify the strengths and weaknesses of the design and model-based inferential positions that survey sampling, at least as far as the official statistics world is concerned, finds itself at present. I close with an example from adaptive survey design that illustrates why taking a model-based perspective (either frequentist or Bayesian) represents the best way for official statistics to avoid the debilitating 'inferential schizophrenia' that seems inevitable if current methodologies are applied to the emerging information requirements of today's world (and possibly even tomorrow's).

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014286
    Description:

    The Étude Longitudinale Française depuis l’Enfance (ELFE) [French longitudinal study from childhood on], which began in 2011, involves over 18,300 infants whose parents agreed to participate when they were in the maternity hospital. This cohort survey, which will track the children from birth to adulthood, covers the many aspects of their lives from the perspective of social science, health and environmental health. In randomly selected maternity hospitals, all infants in the target population, who were born on one of 25 days distributed across the four seasons, were chosen. This sample is the outcome of a non-standard sampling scheme that we call product sampling. In this survey, it takes the form of the cross-tabulation between two independent samples: a sampling of maternity hospitals and a sampling of days. While it is easy to imagine a cluster effect due to the sampling of maternity hospitals, one can also imagine a cluster effect due to the sampling of days. The scheme’s time dimension therefore cannot be ignored if the desired estimates are subject to daily or seasonal variation. While this non-standard scheme can be viewed as a particular kind of two-phase design, it needs to be defined within a more specific framework. Following a comparison of the product scheme with a conventional two-stage design, we propose variance estimators specially formulated for this sampling scheme. Our ideas are illustrated with a simulation study.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014272
    Description:

    Two converging trends raise questions about the future of large-scale probability surveys conducted by or for National Statistical Institutes (NSIs). First, increasing costs and rising rates of nonresponse potentially threaten the cost-effectiveness and inferential value of surveys. Second, there is growing interest in Big Data as a replacement for surveys. There are many different types of Big Data, but the primary focus here is on data generated through social media. This paper supplements and updates an earlier paper on the topic (Couper, 2013). I review some of the concerns about Big Data, particularly from the survey perspective. I argue that there is a role for both high-quality surveys and big data analytics in the work of NSIs. While Big Data is unlikely to replace high-quality surveys, I believe the two methods can serve complementary functions. I attempt to identify some of the criteria that need to be met, and questions that need to be answered, before Big Data can be used for reliable population-based inference.

    Release date: 2014-10-31

  • Technical products: 11-522-X200800011011
    Description:

    The Federation of Canadian Municipalities' (FCM) Quality of Life Reporting System (QOLRS) is a means by which to measure, monitor, and report on the quality of life in Canadian municipalities. To address that challenge of administrative data collection across member municipalities the QOLRS technical team collaborated on the development of the Municipal Data Collection Tool (MDCT) which has become a key component of QOLRS' data acquisition methodology. Offered as a case study on administrative data collection, this paper argues that the recent launch of the MDCT has enabled the FCM to access reliable pan-Canadian municipal administrative data for the QOLRS.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010963
    Description:

    BackgroundThere has been a reluctance to conduct child related research studies with, rather than on children in the African continent. Several studies have embarked on this method, however, ethical and privacy challenges still prevail. The Amajuba Child Health and Wellbeing Research Project is a longitudinal study that was conducted with 725 children aged between 9 and 15 years old in KwaZulu Natal in South Africa and also faced the same challenges.

    MethodsFocus group discussions and self administered questionnaires were used as data collection techniques for ACHWRP. One of ACHWRP's objectives is to document the consequences of parental or caregiver death on health and well-being of orphans in Amajuba District of KwaZulu Natal. Ethical clearance was received from two ethical review boards.

    Lessons learnedEthical challenges included problems of coercion for participation, gatekeeper's and partner's roles, getting consent and assent, recruitment and referral system.Privacy challenges included data collection techniques, curiosity and destruction by members of the family during an interview, the setting where interviews are conducted, logistical issues and the method of recruitment's potential to compromise confidentiality.Resolutions: Detailed consent and assent forms with all relevant information are necessary. Careful selection of partnerships is crucial. Use of a venue that is far from the community is necessary, but in the process making sure that the participants are covered with regards to the expenses of travelling to the venue.

    ConclusionConducting research with children entails investing more time and attention to the planning stage of the study.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011002
    Description:

    Based on a representative sample of the Canadian population, this article quantifies the bias resulting from the use of self-reported rather than directly measured height, weight and body mass index (BMI). Associations between BMI categories and selected health conditions are compared to see if the misclassification resulting from the use of self-reported data alters associations between obesity and obesity-related health conditions. The analysis is based on 4,567 respondents to the 2005 Canadian Community Health Survey (CCHS) who, during a face-to-face interview, provided self-reported values for height and weight and were then measured by trained interviewers. Based on self-reported data, a substantial proportion of individuals with excess body weight were erroneously placed in lower BMI categories. This misclassification resulted in elevated associations between overweight/obesity and morbidity.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010976
    Description:

    Many survey organizations use the response rate as an indicator for the quality of survey data. As a consequence, a variety of measures are implemented to reduce non-response or to maintain response at an acceptable level. However, the response rate is not necessarily a good indicator of non-response bias. A higher response rate does not imply smaller non-response bias. What matters is how the composition of the response differs from the composition of the sample as a whole. This paper describes the concept of R-indicators to assess potential differences between the sample and the response. Such indicators may facilitate analysis of survey response over time, between various fieldwork strategies or data collection modes. Some practical examples are given.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010986
    Description:

    Major changes were made to the data collection process for the 2006 Census. One of those changes was the Internet response option, which was offered to all private households in Canada. Nearly one in five households chose to complete and return the questionnaire on-line. In addition, a new method of promoting Internet response was tested via the Internet Response Promotion (IRP) Study. The new approach proved very effective at increasing the on-line response rate. Planning for the 2011 Census, which is under way, calls for the use of a wave collection strategy, and wave 1 would be the IRP method. This paper provides an overview of Internet data collection in the 2006 Census - evaluations, results, lessons learned - and the methodology that will be used in the next census in 2011.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011003
    Description:

    This study examined the feasibility of developing correction factors to adjust self-reported measures of Body Mass Index to more closely approximate measured values. Data are from the 2005 Canadian Community Health Survey where respondents were asked to report their height and weight and were subsequently measured. Regression analyses were used to determine which socio-demographic and health characteristics were associated with the discrepancies between reported and measured values. The sample was then split into two groups. In the first, the self-reported BMI and the predictors of the discrepancies were regressed on the measured BMI. Correction equations were generated using all predictor variables that were significant at the p<0.05 level. These correction equations were then tested in the second group to derive estimates of sensitivity, specificity and of obesity prevalence. Logistic regression was used to examine the relationship between measured, reported and corrected BMI and obesity-related health conditions. Corrected estimates provided more accurate measures of obesity prevalence, mean BMI and sensitivity levels. Self-reported data exaggerated the relationship between BMI and health conditions, while in most cases the corrected estimates provided odds ratios that were more similar to those generated with the measured BMI.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011001
    Description:

    Currently underway, the Québec Population Health Survey (EQSP), for which collection will wrap up in February 2009, provides an opportunity, because of the size of its sample, to assess the impact that sending out introductory letters to respondents has on the response rate in a controlled environment. Since this regional telephone survey is expected to have more than 38,000 respondents, it was possible to use part of its sample for this study without having too great an impact on its overall response rate. In random digit dialling (RDD) surveys such as the EQSP, one of the main challenges in sending out introductory letters is reaching the survey units. Doing so depends largely on our capacity to associate an address with the sample units and on the quality of that information.

    This article describes the controlled study proposed by the Institut de la statistique du Québec to measure the effect that sending out introductory letters to respondents had on the survey's response rate.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010994
    Description:

    The growing difficulty of reaching respondents has a general impact on non-response in telephone surveys, especially those that use random digit dialling (RDD), such as the General Social Survey (GSS). The GSS is an annual multipurpose survey with 25,000 respondents. Its aim is to monitor the characteristics of and major changes in Canada's social structure. GSS Cycle 21 (2007) was about the family, social support and retirement. Its target population consisted of persons aged 45 and over living in the 10 Canadian provinces. For more effective coverage, part of the sample was taken from a follow-up with the respondents of GSS Cycle 20 (2006), which was on family transitions. The remainder was a new RDD sample. In this paper, we describe the survey's sampling plan and the random digit dialling method used. Then we discuss the challenges of calculating the non-response rate in an RDD survey that targets a subset of a population, for which the in-scope population must be estimated or modelled. This is done primarily through the use of paradata. The methodology used in GSS Cycle 21 is presented in detail.

    Release date: 2009-12-03

  • Technical products: 12-002-X200900110692
    Description:

    Researchers are able to examine changes in trends over time, through the examination of responses to repeatedly-asked questions, among the same respondents, over several cycles of longitudinal data. Working with these repeatedly-measured responses can often be challenging. This article examines trends in youth's volunteering activities, using data from the National Longitudinal Survey of Children and Youth, to highlight several issues that researchers should consider when working with repeated measures.

    Release date: 2009-04-22

  • Technical products: 11-522-X200600110436
    Description:

    The 2006/2007 New Zealand Health Survey sample was designed to meet a range of objectives, the most challenging of which was to achieve sufficient precision for subpopulations of interest, particularly the indigenous Maori population. About 14% of New Zealand's population are Maori. This group is geographically clustered to some extent, but even so most Maori live in areas which have relatively low proportions of Maori, making it difficult to sample this population efficiently. Disproportionate sampling and screening were used to achieve sufficient sample size while maintaining low design effects.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110423
    Description:

    Statistics Canada's Canadian Community Health Survey uses two sample frames and two data collection methods. In cycle 2.1, a change was made in sample allocation between the two frames. A study of the collection method effect by Statistics Canada revealed comparability problems between cycles 1.1 and 2.1. In contrast, the Institut de la statistique du Québec took a comprehensive look at the changes, and classified 178 variables as "comparable" or 'non-comparable". It made recommendations to Quebec users concerning chronological and interregional comparisons.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110429
    Description:

    During the last three decades, there has been general acceptance of an approach to describing health states of individuals in terms of multiple domains of health, and in developing self-report instruments that seek information on each of these domains. A health state is thus a multi-dimensional attribute of an individual that reflects his or her levels on the various components or domains of health. Thus, a health state differs from pathology, risk factors or etiology, and from health service encounters or interventions.

    How to describe health states, is a central challenge in undertaking the measurement of health. The relationship of health states to other aspects of health such as future non-fatal health outcomes or risk of mortality need to be examined. The way people report their own health varies consistently with factors such as education, sex, age, or other cultural factors. Various people use different response category cut-points across cultures or population sub-groups, and this 'response shift' implies that self-report categorical data are not comparable across individuals. The responses cannot be directly used to measure health without adjustment.

    In recognition of this the WHO World Health Surveys (WHS), used a set of questions across a core set of domains to measure health states and employed vignettes to detect and correct for biases in self-report in order to adjust for response category cut-point shifts. This paper will describe the instrument used in the WHS and the methods used to provide cross population comparable data. It will present results from the WHS demonstrating the existence of systematic reporting biases, the ability of respondents to rate vignettes and their use to adjust for biases in order to make data more comparable. Future strategies to address these problems will be discussed.

    Release date: 2008-03-17

Date modified: