Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

86 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

86 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

86 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

86 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (116)

All (116) (25 of 116 results)

  • Articles and reports: 82-003-X201700614829
    Description:

    POHEM-BMI is a microsimulation tool that includes a model of adult body mass index (BMI) and a model of childhood BMI history. This overview describes the development of BMI prediction models for adults and of childhood BMI history, and compares projected BMI estimates with those from nationally representative survey data to establish validity.

    Release date: 2017-06-21

  • Articles and reports: 82-003-X201601014665
    Description:

    The purpose of this analysis was to use data from the 2007-to-2013 Canadian Health Measures Survey to develop reference equations for maximum, right-hand and left-hand grip strength for Canadians aged 6 to79, based on a healthy, nationally representative population. These equations can be used to determine reference values against which to assess an individual’s grip strength.

    Release date: 2016-10-19

  • Technical products: 11-522-X201700014708
    Description:

    Statistics Canada’s Household Survey Frames (HSF) Programme provides various universe files that can be used alone or in combination to improve survey design, sampling, collection, and processing in the traditional “need to contact a household model.” Even as surveys are migrating onto these core suite of products, the HSF is starting to plan the changes to infrastructure, organisation, and linkages with other data assets in Statistics Canada that will help enable a shift to increased use of a wide variety of administrative data as input to the social statistics programme. The presentation will provide an overview of the HSF Programme, foundational concepts that will need to be implemented to expand linkage potential, and will identify strategic research being under-taken toward 2021.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014741
    Description:

    Statistics Canada’s mandate includes producing statistical data to shed light on current business issues. The linking of business records is an important aspect of the development, production, evaluation and analysis of these statistical data. As record linkage can intrude on one’s privacy, Statistics Canada uses it only when the public good is clear and outweighs the intrusion. Record linkage is experiencing a revival triggered by a greater use of administrative data in many statistical programs. There are many challenges to business record linkage. For example, many administrative files not have common identifiers, information is recorded is in non-standardized formats, information contains typographical errors, administrative data files are usually large in size, and finally the evaluation of multiple record pairings makes absolute comparison impractical and sometimes impossible. Due to the importance and challenges associated with record linkage, Statistics Canada has been developing a record linkage standard to help users optimize their business record linkage process. For example, this process includes building on a record linkage blocking strategy that reduces the amount of record-pairs to compare and match, making use of Statistics Canada’s internal software to conduct deterministic and probabilistic matching, and creating standard business name and address fields on Statistics Canada’s Business Register. This article gives an overview of the business record linkage methodology and looks at various economic projects which use record linkage at Statistics Canada, these include projects in the National Accounts, International Trade, Agriculture and the Business Register.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014748
    Description:

    This paper describes the creation of a database developed in Switzerland to analyze migration and the structural integration of the foreign national population. The database is created from various registers (register of residents, social insurance, unemployment) and surveys, and covers 15 years (1998 to 2013). Information on migration status and socioeconomic characteristics is also available for nearly 4 million foreign nationals who lived in Switzerland between 1998 and 2013. This database is the result of a collaboration between the Federal Statistics Office and researchers from the National Center of Competence in Research (NCCR)–On the Move.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014744
    Description:

    "This presentation will begin with Dr. West providing a summary of research that has been conducted on the quality and utility of paradata collected as part of the United States National Survey of Family Growth (NSFG). The NSFG is the major national fertility survey in the U.S., and an important source of data on sexual activity, sexual behavior, and reproductive health for policy makers. For many years, the NSFG has been collecting various forms of paradata, including keystroke information (e.g., Couper and Kreuter 2013), call record information, detailed case disposition information, and interviewer observations related to key NSFG measures (e.g., West 2013). Dr. West will discuss some of the challenges of working with these data, in addition to evidence of their utility for nonresponse adjustment, interviewer evaluation, and/or responsive survey design purposes. Dr. Kreuter will then present research done using paradata collected as part of two panel surveys: the Medical Expenditure Panel Survey (MEPS) in the United States, and the Panel Labour Market and Social Security (PASS) in Germany. In both surveys, information from contacts in prior waves were experimentally used to improve contact and response rates in subsequent waves. In addition, research from PASS will be presented where interviewer observations on key outcome variables were collected to be used in nonresponse adjustment or responsive survey design decisions. Dr. Kreuter will not only present the research results but also the practical challenges in implementing the collection and use of both sets of paradata. "

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014733
    Description:

    The social value of data collections are dramatically enhanced by the broad dissemination of research files and the resulting increase in scientific productivity. Currently, most studies are designed with a focus on collecting information that is analytically useful and accurate, with little forethought as to how it will be shared. Both literature and practice also presume that disclosure analysis will take place after data collection. But to produce public-use data of the highest analytical utility for the largest user group, disclosure risk must be considered at the beginning of the research process. Drawing upon economic and statistical decision-theoretic frameworks and survey methodology research, this study seeks to enhance the scientific productivity of shared research data by describing how disclosure risk can be addressed in the earliest stages of research with the formulation of "safe designs" and "disclosure simulations", where an applied statistical approach has been taken in: (1) developing and validating models that predict the composition of survey data under different sampling designs; (2) selecting and/or developing measures and methods used in the assessments of disclosure risk, analytical utility, and disclosure survey costs that are best suited for evaluating sampling and database designs; and (3) conducting simulations to gather estimates of risk, utility, and cost for studies with a wide range of sampling and database design characteristics.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014731
    Description:

    Our study describes various factors that are of concern when evaluating disclosure risk of contextualized microdata and some of the empirical steps that are involved in their assessment. Utilizing synthetic sets of survey respondents, we illustrate how different postulates shape the assessment of risk when considering: (1) estimated probabilities that unidentified geographic areas are represented within a survey; (2) the number of people in the population who share the same personal and contextual identifiers as a respondent; and (3) the anticipated amount of coverage error in census population counts and extant files that provide identifying information (like names and addresses).

    Release date: 2016-03-24

  • Articles and reports: 82-003-X201600114306
    Description:

    This article is an overview of the creation, content, and quality of the 2006 Canadian Birth-Census Cohort Database.

    Release date: 2016-01-20

  • Articles and reports: 12-001-X201500214237
    Description:

    Careful design of a dual-frame random digit dial (RDD) telephone survey requires selecting from among many options that have varying impacts on cost, precision, and coverage in order to obtain the best possible implementation of the study goals. One such consideration is whether to screen cell-phone households in order to interview cell-phone only (CPO) households and exclude dual-user household, or to take all interviews obtained via the cell-phone sample. We present a framework in which to consider the tradeoffs between these two options and a method to select the optimal design. We derive and discuss the optimum allocation of sample size between the two sampling frames and explore the choice of optimum p, the mixing parameter for the dual-user domain. We illustrate our methods using the National Immunization Survey, sponsored by the Centers for Disease Control and Prevention.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114192
    Description:

    We are concerned with optimal linear estimation of means on subsequent occasions under sample rotation where evolution of samples in time is designed through a cascade pattern. It has been known since the seminal paper of Patterson (1950) that when the units are not allowed to return to the sample after leaving it for certain period (there are no gaps in the rotation pattern), one step recursion for optimal estimator holds. However, in some important real surveys, e.g., Current Population Survey in the US or Labour Force Survey in many countries in Europe, units return to the sample after being absent in the sample for several occasions (there are gaps in rotation patterns). In such situations difficulty of the question of the form of the recurrence for optimal estimator increases drastically. This issue has not been resolved yet. Instead alternative sub-optimal approaches were developed, as K - composite estimation (see e.g., Hansen, Hurwitz, Nisselson and Steinberg (1955)), AK - composite estimation (see e.g., Gurney and Daly (1965)) or time series approach (see e.g., Binder and Hidiroglou (1988)).

    In the present paper we overcome this long-standing difficulty, that is, we present analytical recursion formulas for the optimal linear estimator of the mean for schemes with gaps in rotation patterns. It is achieved under some technical conditions: ASSUMPTION I and ASSUMPTION II (numerical experiments suggest that these assumptions might be universally satisfied). To attain the goal we develop an algebraic operator approach which allows to reduce the problem of recursion for the optimal linear estimator to two issues: (1) localization of roots (possibly complex) of a polynomial Qp defined in terms of the rotation pattern (Qp happens to be conveniently expressed through Chebyshev polynomials of the first kind), (2) rank of a matrix S defined in terms of the rotation pattern and the roots of the polynomial Qp. In particular, it is shown that the order of the recursion is equal to one plus the size of the largest gap in the rotation pattern. Exact formulas for calculation of the recurrence coefficients are given - of course, to use them one has to check (in many cases, numerically) that ASSUMPTIONs I and II are satisfied. The solution is illustrated through several examples of rotation schemes arising in real surveys.

    Release date: 2015-06-29

  • Articles and reports: 82-003-X201500614196
    Description:

    This study investigates the feasibility and validity of using personal health insurance numbers to deterministically link the CCR and the Discharge Abstract Database to obtain hospitalization information about people with primary cancers.

    Release date: 2015-06-17

  • Articles and reports: 12-001-X201400214092
    Description:

    Survey methodologists have long studied the effects of interviewers on the variance of survey estimates. Statistical models including random interviewer effects are often fitted in such investigations, and research interest lies in the magnitude of the interviewer variance component. One question that might arise in a methodological investigation is whether or not different groups of interviewers (e.g., those with prior experience on a given survey vs. new hires, or CAPI interviewers vs. CATI interviewers) have significantly different variance components in these models. Significant differences may indicate a need for additional training in particular subgroups, or sub-optimal properties of different modes or interviewing styles for particular survey items (in terms of the overall mean squared error of survey estimates). Survey researchers seeking answers to these types of questions have different statistical tools available to them. This paper aims to provide an overview of alternative frequentist and Bayesian approaches to the comparison of variance components in different groups of survey interviewers, using a hierarchical generalized linear modeling framework that accommodates a variety of different types of survey variables. We first consider the benefits and limitations of each approach, contrasting the methods used for estimation and inference. We next present a simulation study, empirically evaluating the ability of each approach to efficiently estimate differences in variance components. We then apply the two approaches to an analysis of real survey data collected in the U.S. National Survey of Family Growth (NSFG). We conclude that the two approaches tend to result in very similar inferences, and we provide suggestions for practice given some of the subtle differences observed.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214118
    Description:

    Bagging is a powerful computational method used to improve the performance of inefficient estimators. This article is a first exploration of the use of bagging in survey estimation, and we investigate the effects of bagging on non-differentiable survey estimators including sample distribution functions and quantiles, among others. The theoretical properties of bagged survey estimators are investigated under both design-based and model-based regimes. In particular, we show the design consistency of the bagged estimators, and obtain the asymptotic normality of the estimators in the model-based context. The article describes how implementation of bagging for survey estimators can take advantage of replicates developed for survey variance estimation, providing an easy way for practitioners to apply bagging in existing surveys. A major remaining challenge in implementing bagging in the survey context is variance estimation for the bagged estimators themselves, and we explore two possible variance estimation approaches. Simulation experiments reveal the improvement of the proposed bagging estimator relative to the original estimator and compare the two variance estimation approaches.

    Release date: 2014-12-19

  • Technical products: 11-522-X201300014291
    Description:

    Occupational coding in Germany is mostly done using dictionary approaches with subsequent manual revision of cases which could not be coded. Since manual coding is expensive, it is desirable to assign a higher number of codes automatically. At the same time the quality of the automatic coding must at least reach that of the manual coding. As a possible solution we employ different machine learning algorithms for the task using a substantial amount of manually coded occuptions available from recent studies as training data. We asses the feasibility of these methods of evaluating performance and quality of the algorithms.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014278
    Description:

    In January and February 2014, Statistics Canada conducted a test aiming at measuring the effectiveness of different collection strategies using an online self-reporting survey. Sampled units were contacted using mailed introductory letters and asked to complete the online survey without any interviewer contact. The objectives of this test were to measure the take-up rates for completing an online survey, and to profile the respondents/non-respondents. Different samples and letters were tested to determine the relative effectiveness of the different approaches. The results of this project will be used to inform various social surveys that are preparing to include an internet response option in their surveys. The paper will present the general methodology of the test as well as results observed from collection and the analysis of profiles.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014258
    Description:

    The National Fuel Consumption Survey (FCS) was created in 2013 and is a quarterly survey that is designed to analyze distance driven and fuel consumption for passenger cars and other vehicles weighing less than 4,500 kilograms. The sampling frame consists of vehicles extracted from the vehicle registration files, which are maintained by provincial ministries. For collection, FCS uses car chips for a part of the sampled units to collect information about the trips and the fuel consumed. There are numerous advantages to using this new technology, for example, reduction in response burden, collection costs and effects on data quality. For the quarters in 2013, the sampled units were surveyed 95% via paper questionnaires and 5% with car chips, and in Q1 2014, 40% of sampled units were surveyed with car chips. This study outlines the methodology of the survey process, examines the advantages and challenges in processing and imputation for the two collection modes, presents some initial results and concludes with a summary of the lessons learned.

    Release date: 2014-10-31

  • Articles and reports: 12-001-X201400114004
    Description:

    In 2009, two major surveys in the Governments Division of the U.S. Census Bureau were redesigned to reduce sample size, save resources, and improve the precision of the estimates (Cheng, Corcoran, Barth and Hogue 2009). The new design divides each of the traditional state by government-type strata with sufficiently many units into two sub-strata according to each governmental unit’s total payroll, in order to sample less from the sub-stratum with small size units. The model-assisted approach is adopted in estimating population totals. Regression estimators using auxiliary variables are obtained either within each created sub-stratum or within the original stratum by collapsing two sub-strata. A decision-based method was proposed in Cheng, Slud and Hogue (2010), applying a hypothesis test to decide which regression estimator is used within each original stratum. Consistency and asymptotic normality of these model-assisted estimators are established here, under a design-based or model-assisted asymptotic framework. Our asymptotic results also suggest two types of consistent variance estimators, one obtained by substituting unknown quantities in the asymptotic variances and the other by applying the bootstrap. The performance of all the estimators of totals and of their variance estimators are examined in some empirical studies. The U.S. Annual Survey of Public Employment and Payroll (ASPEP) is used to motivate and illustrate our study.

    Release date: 2014-06-27

  • Articles and reports: 82-003-X201301011873
    Description:

    A computer simulation model of physical activity was developed for the Canadian adult population using longitudinal data from the National Population Health Survey and cross-sectional data from the Canadian Community Health Survey. The model is based on the Population Health Model (POHEM) platform developed by Statistics Canada. This article presents an overview of POHEM and describes the additions that were made to create the physical activity module (POHEM-PA). These additions include changes in physical activity over time, and the relationship between physical activity levels and health-adjusted life expectancy, life expectancy and the onset of selected chronic conditions. Estimates from simulation projections are compared with nationally representative survey data to provide an indication of the validity of POHEM-PA.

    Release date: 2013-10-16

  • Articles and reports: 12-001-X201300111826
    Description:

    It is routine practice for survey organizations to provide replication weights as part of survey data files. These replication weights are meant to produce valid and efficient variance estimates for a variety of estimators in a simple and systematic manner. Most existing methods for constructing replication weights, however, are only valid for specific sampling designs and typically require a very large number of replicates. In this paper we first show how to produce replication weights based on the method outlined in Fay (1984) such that the resulting replication variance estimator is algebraically equivalent to the fully efficient linearization variance estimator for any given sampling design. We then propose a novel weight-calibration method to simultaneously achieve efficiency and sparsity in the sense that a small number of sets of replication weights can produce valid and efficient replication variance estimators for key population parameters. Our proposed method can be used in conjunction with existing resampling techniques for large-scale complex surveys. Validity of the proposed methods and extensions to some balanced sampling designs are also discussed. Simulation results showed that our proposed variance estimators perform very well in tracking coverage probabilities of confidence intervals. Our proposed strategies will likely have impact on how public-use survey data files are produced and how these data sets are analyzed.

    Release date: 2013-06-28

  • Articles and reports: 82-003-X201300511792
    Description:

    This article describes implementation of the indoor air component of the 2009 to 2011 Canadian Health Measures Survey and presents information about response rates and results of field quality control samples.

    Release date: 2013-05-15

  • Articles and reports: 12-001-X201200111686
    Description:

    We present a generalized estimating equations approach for estimating the concordance correlation coefficient and the kappa coefficient from sample survey data. The estimates and their accompanying standard error need to correctly account for the sampling design. Weighted measures of the concordance correlation coefficient and the kappa coefficient, along with the variance of these measures accounting for the sampling design, are presented. We use the Taylor series linearization method and the jackknife procedure for estimating the standard errors of the resulting parameter estimates. Body measurement and oral health data from the Third National Health and Nutrition Examination Survey are used to illustrate this methodology.

    Release date: 2012-06-27

  • Articles and reports: 82-003-X201200111625
    Description:

    This study compares estimates of the prevalence of cigarette smoking based on self-report with estimates based on urinary cotinine concentrations. The data are from the 2007 to 2009 Canadian Health Measures Survey, which included self-reported smoking status and the first nationally representative measures of urinary cotinine.

    Release date: 2012-02-15

  • Articles and reports: 11-010-X201100611501
    Description:

    A detailed exposition of how the pattern of quarterly growth affects the average annual growth rate, including the relative importance of these quarters in determining growth These basic principles are applied to monthly and quarterly growth.

    Release date: 2011-06-16

  • Articles and reports: 12-001-X201000211382
    Description:

    The size of the cell-phone-only population in the USA has increased rapidly in recent years and, correspondingly, researchers have begun to experiment with sampling and interviewing of cell-phone subscribers. We discuss statistical issues involved in the sampling design and estimation phases of cell-phone studies. This work is presented primarily in the context of a nonoverlapping dual-frame survey in which one frame and sample are employed for the landline population and a second frame and sample are employed for the cell-phone-only population. Additional considerations necessary for overlapping dual-frame surveys (where the cell-phone frame and sample include some of the landline population) are also discussed. We illustrate the methods using the design of the National Immunization Survey (NIS), which monitors the vaccination rates of children age 19-35 months and teens age 13-17 years. The NIS is a nationwide telephone survey, followed by a provider record check, conducted by the Centers for Disease Control and Prevention.

    Release date: 2010-12-21

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (58)

Analysis (58) (25 of 58 results)

  • Articles and reports: 82-003-X201700614829
    Description:

    POHEM-BMI is a microsimulation tool that includes a model of adult body mass index (BMI) and a model of childhood BMI history. This overview describes the development of BMI prediction models for adults and of childhood BMI history, and compares projected BMI estimates with those from nationally representative survey data to establish validity.

    Release date: 2017-06-21

  • Articles and reports: 82-003-X201601014665
    Description:

    The purpose of this analysis was to use data from the 2007-to-2013 Canadian Health Measures Survey to develop reference equations for maximum, right-hand and left-hand grip strength for Canadians aged 6 to79, based on a healthy, nationally representative population. These equations can be used to determine reference values against which to assess an individual’s grip strength.

    Release date: 2016-10-19

  • Articles and reports: 82-003-X201600114306
    Description:

    This article is an overview of the creation, content, and quality of the 2006 Canadian Birth-Census Cohort Database.

    Release date: 2016-01-20

  • Articles and reports: 12-001-X201500214237
    Description:

    Careful design of a dual-frame random digit dial (RDD) telephone survey requires selecting from among many options that have varying impacts on cost, precision, and coverage in order to obtain the best possible implementation of the study goals. One such consideration is whether to screen cell-phone households in order to interview cell-phone only (CPO) households and exclude dual-user household, or to take all interviews obtained via the cell-phone sample. We present a framework in which to consider the tradeoffs between these two options and a method to select the optimal design. We derive and discuss the optimum allocation of sample size between the two sampling frames and explore the choice of optimum p, the mixing parameter for the dual-user domain. We illustrate our methods using the National Immunization Survey, sponsored by the Centers for Disease Control and Prevention.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114192
    Description:

    We are concerned with optimal linear estimation of means on subsequent occasions under sample rotation where evolution of samples in time is designed through a cascade pattern. It has been known since the seminal paper of Patterson (1950) that when the units are not allowed to return to the sample after leaving it for certain period (there are no gaps in the rotation pattern), one step recursion for optimal estimator holds. However, in some important real surveys, e.g., Current Population Survey in the US or Labour Force Survey in many countries in Europe, units return to the sample after being absent in the sample for several occasions (there are gaps in rotation patterns). In such situations difficulty of the question of the form of the recurrence for optimal estimator increases drastically. This issue has not been resolved yet. Instead alternative sub-optimal approaches were developed, as K - composite estimation (see e.g., Hansen, Hurwitz, Nisselson and Steinberg (1955)), AK - composite estimation (see e.g., Gurney and Daly (1965)) or time series approach (see e.g., Binder and Hidiroglou (1988)).

    In the present paper we overcome this long-standing difficulty, that is, we present analytical recursion formulas for the optimal linear estimator of the mean for schemes with gaps in rotation patterns. It is achieved under some technical conditions: ASSUMPTION I and ASSUMPTION II (numerical experiments suggest that these assumptions might be universally satisfied). To attain the goal we develop an algebraic operator approach which allows to reduce the problem of recursion for the optimal linear estimator to two issues: (1) localization of roots (possibly complex) of a polynomial Qp defined in terms of the rotation pattern (Qp happens to be conveniently expressed through Chebyshev polynomials of the first kind), (2) rank of a matrix S defined in terms of the rotation pattern and the roots of the polynomial Qp. In particular, it is shown that the order of the recursion is equal to one plus the size of the largest gap in the rotation pattern. Exact formulas for calculation of the recurrence coefficients are given - of course, to use them one has to check (in many cases, numerically) that ASSUMPTIONs I and II are satisfied. The solution is illustrated through several examples of rotation schemes arising in real surveys.

    Release date: 2015-06-29

  • Articles and reports: 82-003-X201500614196
    Description:

    This study investigates the feasibility and validity of using personal health insurance numbers to deterministically link the CCR and the Discharge Abstract Database to obtain hospitalization information about people with primary cancers.

    Release date: 2015-06-17

  • Articles and reports: 12-001-X201400214092
    Description:

    Survey methodologists have long studied the effects of interviewers on the variance of survey estimates. Statistical models including random interviewer effects are often fitted in such investigations, and research interest lies in the magnitude of the interviewer variance component. One question that might arise in a methodological investigation is whether or not different groups of interviewers (e.g., those with prior experience on a given survey vs. new hires, or CAPI interviewers vs. CATI interviewers) have significantly different variance components in these models. Significant differences may indicate a need for additional training in particular subgroups, or sub-optimal properties of different modes or interviewing styles for particular survey items (in terms of the overall mean squared error of survey estimates). Survey researchers seeking answers to these types of questions have different statistical tools available to them. This paper aims to provide an overview of alternative frequentist and Bayesian approaches to the comparison of variance components in different groups of survey interviewers, using a hierarchical generalized linear modeling framework that accommodates a variety of different types of survey variables. We first consider the benefits and limitations of each approach, contrasting the methods used for estimation and inference. We next present a simulation study, empirically evaluating the ability of each approach to efficiently estimate differences in variance components. We then apply the two approaches to an analysis of real survey data collected in the U.S. National Survey of Family Growth (NSFG). We conclude that the two approaches tend to result in very similar inferences, and we provide suggestions for practice given some of the subtle differences observed.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214118
    Description:

    Bagging is a powerful computational method used to improve the performance of inefficient estimators. This article is a first exploration of the use of bagging in survey estimation, and we investigate the effects of bagging on non-differentiable survey estimators including sample distribution functions and quantiles, among others. The theoretical properties of bagged survey estimators are investigated under both design-based and model-based regimes. In particular, we show the design consistency of the bagged estimators, and obtain the asymptotic normality of the estimators in the model-based context. The article describes how implementation of bagging for survey estimators can take advantage of replicates developed for survey variance estimation, providing an easy way for practitioners to apply bagging in existing surveys. A major remaining challenge in implementing bagging in the survey context is variance estimation for the bagged estimators themselves, and we explore two possible variance estimation approaches. Simulation experiments reveal the improvement of the proposed bagging estimator relative to the original estimator and compare the two variance estimation approaches.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400114004
    Description:

    In 2009, two major surveys in the Governments Division of the U.S. Census Bureau were redesigned to reduce sample size, save resources, and improve the precision of the estimates (Cheng, Corcoran, Barth and Hogue 2009). The new design divides each of the traditional state by government-type strata with sufficiently many units into two sub-strata according to each governmental unit’s total payroll, in order to sample less from the sub-stratum with small size units. The model-assisted approach is adopted in estimating population totals. Regression estimators using auxiliary variables are obtained either within each created sub-stratum or within the original stratum by collapsing two sub-strata. A decision-based method was proposed in Cheng, Slud and Hogue (2010), applying a hypothesis test to decide which regression estimator is used within each original stratum. Consistency and asymptotic normality of these model-assisted estimators are established here, under a design-based or model-assisted asymptotic framework. Our asymptotic results also suggest two types of consistent variance estimators, one obtained by substituting unknown quantities in the asymptotic variances and the other by applying the bootstrap. The performance of all the estimators of totals and of their variance estimators are examined in some empirical studies. The U.S. Annual Survey of Public Employment and Payroll (ASPEP) is used to motivate and illustrate our study.

    Release date: 2014-06-27

  • Articles and reports: 82-003-X201301011873
    Description:

    A computer simulation model of physical activity was developed for the Canadian adult population using longitudinal data from the National Population Health Survey and cross-sectional data from the Canadian Community Health Survey. The model is based on the Population Health Model (POHEM) platform developed by Statistics Canada. This article presents an overview of POHEM and describes the additions that were made to create the physical activity module (POHEM-PA). These additions include changes in physical activity over time, and the relationship between physical activity levels and health-adjusted life expectancy, life expectancy and the onset of selected chronic conditions. Estimates from simulation projections are compared with nationally representative survey data to provide an indication of the validity of POHEM-PA.

    Release date: 2013-10-16

  • Articles and reports: 12-001-X201300111826
    Description:

    It is routine practice for survey organizations to provide replication weights as part of survey data files. These replication weights are meant to produce valid and efficient variance estimates for a variety of estimators in a simple and systematic manner. Most existing methods for constructing replication weights, however, are only valid for specific sampling designs and typically require a very large number of replicates. In this paper we first show how to produce replication weights based on the method outlined in Fay (1984) such that the resulting replication variance estimator is algebraically equivalent to the fully efficient linearization variance estimator for any given sampling design. We then propose a novel weight-calibration method to simultaneously achieve efficiency and sparsity in the sense that a small number of sets of replication weights can produce valid and efficient replication variance estimators for key population parameters. Our proposed method can be used in conjunction with existing resampling techniques for large-scale complex surveys. Validity of the proposed methods and extensions to some balanced sampling designs are also discussed. Simulation results showed that our proposed variance estimators perform very well in tracking coverage probabilities of confidence intervals. Our proposed strategies will likely have impact on how public-use survey data files are produced and how these data sets are analyzed.

    Release date: 2013-06-28

  • Articles and reports: 82-003-X201300511792
    Description:

    This article describes implementation of the indoor air component of the 2009 to 2011 Canadian Health Measures Survey and presents information about response rates and results of field quality control samples.

    Release date: 2013-05-15

  • Articles and reports: 12-001-X201200111686
    Description:

    We present a generalized estimating equations approach for estimating the concordance correlation coefficient and the kappa coefficient from sample survey data. The estimates and their accompanying standard error need to correctly account for the sampling design. Weighted measures of the concordance correlation coefficient and the kappa coefficient, along with the variance of these measures accounting for the sampling design, are presented. We use the Taylor series linearization method and the jackknife procedure for estimating the standard errors of the resulting parameter estimates. Body measurement and oral health data from the Third National Health and Nutrition Examination Survey are used to illustrate this methodology.

    Release date: 2012-06-27

  • Articles and reports: 82-003-X201200111625
    Description:

    This study compares estimates of the prevalence of cigarette smoking based on self-report with estimates based on urinary cotinine concentrations. The data are from the 2007 to 2009 Canadian Health Measures Survey, which included self-reported smoking status and the first nationally representative measures of urinary cotinine.

    Release date: 2012-02-15

  • Articles and reports: 11-010-X201100611501
    Description:

    A detailed exposition of how the pattern of quarterly growth affects the average annual growth rate, including the relative importance of these quarters in determining growth These basic principles are applied to monthly and quarterly growth.

    Release date: 2011-06-16

  • Articles and reports: 12-001-X201000211382
    Description:

    The size of the cell-phone-only population in the USA has increased rapidly in recent years and, correspondingly, researchers have begun to experiment with sampling and interviewing of cell-phone subscribers. We discuss statistical issues involved in the sampling design and estimation phases of cell-phone studies. This work is presented primarily in the context of a nonoverlapping dual-frame survey in which one frame and sample are employed for the landline population and a second frame and sample are employed for the cell-phone-only population. Additional considerations necessary for overlapping dual-frame surveys (where the cell-phone frame and sample include some of the landline population) are also discussed. We illustrate the methods using the design of the National Immunization Survey (NIS), which monitors the vaccination rates of children age 19-35 months and teens age 13-17 years. The NIS is a nationwide telephone survey, followed by a provider record check, conducted by the Centers for Disease Control and Prevention.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X201000211383
    Description:

    Data collection for poverty assessments in Africa is time consuming, expensive and can be subject to numerous constraints. In this paper we present a procedure to collect data from poor households involved in small-scale inland fisheries as well as agricultural activities. A sampling scheme has been developed that captures the heterogeneity in ecological conditions and the seasonality of livelihood options. Sampling includes a three point panel survey of 300 households. The respondents belong to four different ethnic groups randomly chosen from three strata, each representing a different ecological zone. In the first part of the paper some background information is given on the objectives of the research, the study site and survey design, which were guiding the data collection process. The second part of the paper discusses the typical constraints that are hampering empirical work in Sub-Saharan Africa, and shows how different challenges have been resolved. These lessons could guide researchers in designing appropriate socio-economic surveys in comparable settings.

    Release date: 2010-12-21

  • Articles and reports: 11-010-X201001111370
    Description:

    A look at how these different measures relate to each other, when they should be used and why statistical agencies have developed more sophisticated measures of volume data.

    Release date: 2010-11-12

  • Articles and reports: 11-010-X201000311141
    Description:

    A review of what seasonal adjustment does, and how it helps analysts focus on recent movements in the underlying trend of economic data.

    Release date: 2010-03-18

  • Articles and reports: 12-001-X200900211046
    Description:

    A semiparametric regression model is developed for complex surveys. In this model, the explanatory variables are represented separately as a nonparametric part and a parametric linear part. The estimation techniques combine nonparametric local polynomial regression estimation and least squares estimation. Asymptotic results such as consistency and normality of the estimators of regression coefficients and the regression functions have also been developed. Success of the performance of the methods and the properties of estimates have been shown by simulation and empirical examples with the Ontario Health Survey 1990.

    Release date: 2009-12-23

  • Articles and reports: 82-003-X200900110795
    Description:

    This article presents methods of combining cycles of the Canadian Community Health Survey and discusses issues to consider if these data are to be combined.

    Release date: 2009-02-18

  • Articles and reports: 12-001-X200800210756
    Description:

    In longitudinal surveys nonresponse often occurs in a pattern that is not monotone. We consider estimation of time-dependent means under the assumption that the nonresponse mechanism is last-value-dependent. Since the last value itself may be missing when nonresponse is nonmonotone, the nonresponse mechanism under consideration is nonignorable. We propose an imputation method by first deriving some regression imputation models according to the nonresponse mechanism and then applying nonparametric regression imputation. We assume that the longitudinal data follow a Markov chain with finite second-order moments. No other assumption is imposed on the joint distribution of longitudinal data and their nonresponse indicators. A bootstrap method is applied for variance estimation. Some simulation results and an example concerning the Current Employment Survey are presented.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800110613
    Description:

    The International Tobacco Control (ITC) Policy Evaluation Survey of China uses a multi-stage unequal probability sampling design with upper level clusters selected by the randomized systematic PPS sampling method. A difficulty arises in the execution of the survey: several selected upper level clusters refuse to participate in the survey and have to be replaced by substitute units, selected from units not included in the initial sample and once again using the randomized systematic PPS sampling method. Under such a scenario the first order inclusion probabilities of the final selected units are very difficult to calculate and the second order inclusion probabilities become virtually intractable. In this paper we develop a simulation-based approach for computing the first and the second order inclusion probabilities when direct calculation is prohibitive or impossible. The efficiency and feasibility of the proposed approach are demonstrated through both theoretical considerations and numerical examples. Several R/S-PLUS functions and codes for the proposed procedure are included. The approach can be extended to handle more complex refusal/substitution scenarios one may encounter in practice.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110619
    Description:

    Small area prediction based on random effects, called EBLUP, is a procedure for constructing estimates for small geographical areas or small subpopulations using existing survey data. The total of the small area predictors is often forced to equal the direct survey estimate and such predictors are said to be calibrated. Several calibrated predictors are reviewed and a criterion that unifies the derivation of these calibrated predictors is presented. The predictor that is the unique best linear unbiased predictor under the criterion is derived and the mean square error of the calibrated predictors is discussed. Implicit in the imposition of the restriction is the possibility that the small area model is misspecified and the predictors are biased. Augmented models with one additional explanatory variable for which the usual small area predictors achieve the self-calibrated property are considered. Simulations demonstrate that calibrated predictors have slightly smaller bias compared to those of the usual EBLUP predictor. However, if the bias is a concern, a better approach is to use an augmented model with an added auxiliary variable that is a function of area size. In the simulation, the predictors based on the augmented model had smaller MSE than EBLUP when the incorrect model was used for prediction. Furthermore, there was a very small increase in MSE relative to EBLUP if the auxiliary variable was added to the correct model.

    Release date: 2008-06-26

  • Articles and reports: 82-003-S200700010361
    Description:

    This article summarizes the background, history and rationale for the Canadian Health Measures Survey, and provides an overview of the objectives, methods and analysis plans.

    Release date: 2007-12-05

Reference (58)

Reference (58) (25 of 58 results)

  • Technical products: 11-522-X201700014708
    Description:

    Statistics Canada’s Household Survey Frames (HSF) Programme provides various universe files that can be used alone or in combination to improve survey design, sampling, collection, and processing in the traditional “need to contact a household model.” Even as surveys are migrating onto these core suite of products, the HSF is starting to plan the changes to infrastructure, organisation, and linkages with other data assets in Statistics Canada that will help enable a shift to increased use of a wide variety of administrative data as input to the social statistics programme. The presentation will provide an overview of the HSF Programme, foundational concepts that will need to be implemented to expand linkage potential, and will identify strategic research being under-taken toward 2021.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014741
    Description:

    Statistics Canada’s mandate includes producing statistical data to shed light on current business issues. The linking of business records is an important aspect of the development, production, evaluation and analysis of these statistical data. As record linkage can intrude on one’s privacy, Statistics Canada uses it only when the public good is clear and outweighs the intrusion. Record linkage is experiencing a revival triggered by a greater use of administrative data in many statistical programs. There are many challenges to business record linkage. For example, many administrative files not have common identifiers, information is recorded is in non-standardized formats, information contains typographical errors, administrative data files are usually large in size, and finally the evaluation of multiple record pairings makes absolute comparison impractical and sometimes impossible. Due to the importance and challenges associated with record linkage, Statistics Canada has been developing a record linkage standard to help users optimize their business record linkage process. For example, this process includes building on a record linkage blocking strategy that reduces the amount of record-pairs to compare and match, making use of Statistics Canada’s internal software to conduct deterministic and probabilistic matching, and creating standard business name and address fields on Statistics Canada’s Business Register. This article gives an overview of the business record linkage methodology and looks at various economic projects which use record linkage at Statistics Canada, these include projects in the National Accounts, International Trade, Agriculture and the Business Register.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014748
    Description:

    This paper describes the creation of a database developed in Switzerland to analyze migration and the structural integration of the foreign national population. The database is created from various registers (register of residents, social insurance, unemployment) and surveys, and covers 15 years (1998 to 2013). Information on migration status and socioeconomic characteristics is also available for nearly 4 million foreign nationals who lived in Switzerland between 1998 and 2013. This database is the result of a collaboration between the Federal Statistics Office and researchers from the National Center of Competence in Research (NCCR)–On the Move.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014744
    Description:

    "This presentation will begin with Dr. West providing a summary of research that has been conducted on the quality and utility of paradata collected as part of the United States National Survey of Family Growth (NSFG). The NSFG is the major national fertility survey in the U.S., and an important source of data on sexual activity, sexual behavior, and reproductive health for policy makers. For many years, the NSFG has been collecting various forms of paradata, including keystroke information (e.g., Couper and Kreuter 2013), call record information, detailed case disposition information, and interviewer observations related to key NSFG measures (e.g., West 2013). Dr. West will discuss some of the challenges of working with these data, in addition to evidence of their utility for nonresponse adjustment, interviewer evaluation, and/or responsive survey design purposes. Dr. Kreuter will then present research done using paradata collected as part of two panel surveys: the Medical Expenditure Panel Survey (MEPS) in the United States, and the Panel Labour Market and Social Security (PASS) in Germany. In both surveys, information from contacts in prior waves were experimentally used to improve contact and response rates in subsequent waves. In addition, research from PASS will be presented where interviewer observations on key outcome variables were collected to be used in nonresponse adjustment or responsive survey design decisions. Dr. Kreuter will not only present the research results but also the practical challenges in implementing the collection and use of both sets of paradata. "

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014733
    Description:

    The social value of data collections are dramatically enhanced by the broad dissemination of research files and the resulting increase in scientific productivity. Currently, most studies are designed with a focus on collecting information that is analytically useful and accurate, with little forethought as to how it will be shared. Both literature and practice also presume that disclosure analysis will take place after data collection. But to produce public-use data of the highest analytical utility for the largest user group, disclosure risk must be considered at the beginning of the research process. Drawing upon economic and statistical decision-theoretic frameworks and survey methodology research, this study seeks to enhance the scientific productivity of shared research data by describing how disclosure risk can be addressed in the earliest stages of research with the formulation of "safe designs" and "disclosure simulations", where an applied statistical approach has been taken in: (1) developing and validating models that predict the composition of survey data under different sampling designs; (2) selecting and/or developing measures and methods used in the assessments of disclosure risk, analytical utility, and disclosure survey costs that are best suited for evaluating sampling and database designs; and (3) conducting simulations to gather estimates of risk, utility, and cost for studies with a wide range of sampling and database design characteristics.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014731
    Description:

    Our study describes various factors that are of concern when evaluating disclosure risk of contextualized microdata and some of the empirical steps that are involved in their assessment. Utilizing synthetic sets of survey respondents, we illustrate how different postulates shape the assessment of risk when considering: (1) estimated probabilities that unidentified geographic areas are represented within a survey; (2) the number of people in the population who share the same personal and contextual identifiers as a respondent; and (3) the anticipated amount of coverage error in census population counts and extant files that provide identifying information (like names and addresses).

    Release date: 2016-03-24

  • Technical products: 11-522-X201300014291
    Description:

    Occupational coding in Germany is mostly done using dictionary approaches with subsequent manual revision of cases which could not be coded. Since manual coding is expensive, it is desirable to assign a higher number of codes automatically. At the same time the quality of the automatic coding must at least reach that of the manual coding. As a possible solution we employ different machine learning algorithms for the task using a substantial amount of manually coded occuptions available from recent studies as training data. We asses the feasibility of these methods of evaluating performance and quality of the algorithms.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014278
    Description:

    In January and February 2014, Statistics Canada conducted a test aiming at measuring the effectiveness of different collection strategies using an online self-reporting survey. Sampled units were contacted using mailed introductory letters and asked to complete the online survey without any interviewer contact. The objectives of this test were to measure the take-up rates for completing an online survey, and to profile the respondents/non-respondents. Different samples and letters were tested to determine the relative effectiveness of the different approaches. The results of this project will be used to inform various social surveys that are preparing to include an internet response option in their surveys. The paper will present the general methodology of the test as well as results observed from collection and the analysis of profiles.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014258
    Description:

    The National Fuel Consumption Survey (FCS) was created in 2013 and is a quarterly survey that is designed to analyze distance driven and fuel consumption for passenger cars and other vehicles weighing less than 4,500 kilograms. The sampling frame consists of vehicles extracted from the vehicle registration files, which are maintained by provincial ministries. For collection, FCS uses car chips for a part of the sampled units to collect information about the trips and the fuel consumed. There are numerous advantages to using this new technology, for example, reduction in response burden, collection costs and effects on data quality. For the quarters in 2013, the sampled units were surveyed 95% via paper questionnaires and 5% with car chips, and in Q1 2014, 40% of sampled units were surveyed with car chips. This study outlines the methodology of the survey process, examines the advantages and challenges in processing and imputation for the two collection modes, presents some initial results and concludes with a summary of the lessons learned.

    Release date: 2014-10-31

  • Technical products: 11-522-X200800010941
    Description:

    Prior to 2004, the design and development of collection functions at Statistics New Zealand (Statistics NZ) was done by a centralised team of data collection methodologists. In 2004, an organisational review considered whether the design and development of these functions was being done in the most effective way. A key issue was the rising costs of surveying as the organisation moved from paper-based data collection to electronic data collection. The review saw some collection functions decentralised. However, a smaller centralised team of data collection methodologists was retained to work with subject matter areas across Statistics NZ.

    This paper will discuss the strategy used by the smaller centralised team of data collection methodologists to support subject matter areas. There are three key themes to the strategy. First, is the development of best practice standards and a central standards repository. Second, is training and introducing knowledge sharing forums. Third, is providing advice and independent review to subject matter areas which design and develop collection instruments.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011010
    Description:

    The Survey of Employment, Payrolls and Hours (SEPH) is a monthly survey using two sources of data: a census of payroll deduction (PD7) forms (administrative data) and a survey of business establishments. This paper focuses on the processing of the administrative data, from the weekly receipt of data from the Canada Revenue Agency to the production of monthly estimates produced by SEPH.

    The edit and imputation methods used to process the administrative data have been revised in the last several years. The goals of this redesign were primarily to improve the data quality and to increase the consistency with another administrative data source (T4) which is a benchmark measure for Statistics Canada's System of National Accounts people. An additional goal was to ensure that the new process would be easier to understand and to modify, if needed. As a result, a new processing module was developed to edit and impute PD7 forms before their data is aggregated to the monthly level.

    This paper presents an overview of both the current and new processes, including a description of challenges that we faced during development. Improved quality is demonstrated both conceptually (by presenting examples of PD7 forms and their treatment under the old and new systems) and quantitatively (by comparison to T4 data).

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010946
    Description:

    In the mid 1990s the first question testing unit was set-up in the UK Office for National Statistics (ONS). The key objective of the unit was to develop and test the questions and questionnaire for the 2001 Census. Since the establishment of this unit the area has been expanded into a Data Collection Methodology (DCM) Centre of Expertise which now sits in the Methodology Directorate. The DCM centre has three branches which support DCM work for social surveys, business surveys, the Census and external organisations.

    In the past ten years DCM has achieved a variety of things. For example, introduced survey methodology involvement in the development and testing of business survey question(naire)s; introduced a mix-method approach to the development of questions and questionnaires; developed and implemented standards e.g. for the 2011 census questionnaire & showcards; and developed and delivered DCM training events.

    This paper will provide an overview of data collection methodology at the ONS from the perspective of achievements and challenges. It will cover areas such as methods, staff (e.g. recruitment, development and field security), and integration with the survey process.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010970
    Description:

    RTI International is currently conducting a longitudinal education study. One component of the study involved collecting transcripts and course catalogs from high schools that the sample members attended. Information from the transcripts and course catalogs also needed to be keyed and coded. This presented a challenge because the transcripts and course catalogs were collected from different types of schools, including public, private, and religious schools, from across the nation and they varied widely in both content and format. The challenge called for a sophisticated system that could be used by multiple users simultaneously. RTI developed such a system possessing all the characteristics of a high-end, high-tech, multi-user, multitask, user-friendly and low maintenance cost high school transcript and course catalog keying and coding system. The system is web based and has three major functions: transcript and catalog keying and coding, transcript and catalog keying quality control (keyer-coder end), and transcript and catalog coding QC (management end). Given the complex nature of transcript and catalog keying and coding, the system was designed to be flexible and to have the ability to transport keyed and coded data throughout the system to reduce the keying time, the ability to logically guide users through all the pages that a type of activity required, the ability to display appropriate information to help keying performance, and the ability to track all the keying, coding, and QC activities. Hundreds of catalogs and thousands of transcripts were successfully keyed, coded, and verified using the system. This paper will report on the system needs and design, implementation tips, problems faced and their solutions, and lessons learned.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011006
    Description:

    The Office for National Statistics (ONS) has an obligation to measure and annually report on the burden that it places on businesses participating in its surveys. There are also targets for reduction of costs to businesses complying with government regulation as part of the 2005 Administrative Burdens Reduction Project (ABRP) coordinated by the Better Regulation Executive (BRE).

    Respondent burden is measured by looking at the economic costs to businesses. Over time the methodology for measuring this economic cost has changed with the most recent method being the development and piloting of a Standard Cost Model (SCM) approach.

    The SCM is commonly used in Europe and is focused on measuring objective administrative burdens for all government requests for information e.g. tax returns, VAT, as well as survey participation. This method was not therefore specifically developed to measure statistical response burden. The SCM methodology is activity-based, meaning that the costs and time taken to fulfil requirements are broken down by activity.

    The SCM approach generally collects data using face-to-face interviews. The approach is therefore labour intensive both from a collection and analysis perspective but provides in depth information. The approach developed and piloted at ONS uses paper self-completion questionnaires.

    The objective of this paper is to provide an overview of respondent burden reporting and targets; and to review the different methodologies that ONS has used to measure respondent burden from the perspectives of sampling, data collection, analysis and usability.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010994
    Description:

    The growing difficulty of reaching respondents has a general impact on non-response in telephone surveys, especially those that use random digit dialling (RDD), such as the General Social Survey (GSS). The GSS is an annual multipurpose survey with 25,000 respondents. Its aim is to monitor the characteristics of and major changes in Canada's social structure. GSS Cycle 21 (2007) was about the family, social support and retirement. Its target population consisted of persons aged 45 and over living in the 10 Canadian provinces. For more effective coverage, part of the sample was taken from a follow-up with the respondents of GSS Cycle 20 (2006), which was on family transitions. The remainder was a new RDD sample. In this paper, we describe the survey's sampling plan and the random digit dialling method used. Then we discuss the challenges of calculating the non-response rate in an RDD survey that targets a subset of a population, for which the in-scope population must be estimated or modelled. This is done primarily through the use of paradata. The methodology used in GSS Cycle 21 is presented in detail.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010999
    Description:

    The choice of number of call attempts in a telephone survey is an important decision. A large number of call attempts makes the data collection costly and time-consuming; and a small number of attempts decreases the response set from which conclusions are drawn and increases the variance. The decision can also have an effect on the nonresponse bias. In this paper we study the effects of number of call attempts on the nonresponse rate and the nonresponse bias in two surveys conducted by Statistics Sweden: The Labour Force Survey (LFS) and Household Finances (HF).

    By use of paradata we calculate the response rate as a function of the number of call attempts. To estimate the nonresponse bias we use estimates of some register variables, where observations are available for both respondents and nonrespondents. We also calculate estimates of some real survey parameters as functions of varying number of call attempts. The results indicate that it is possible to reduce the current number of call attempts without getting an increased nonresponse bias.

    Release date: 2009-12-03

  • Technical products: 11-536-X200900110806
    Description:

    Recent work using a pseudo empirical likelihood (EL) method for finite population inferences with complex survey data focused primarily on a single survey sample, non-stratified or stratified, with considerable effort devoted to computational procedures. In this talk we present a pseudo empirical likelihood approach to inference from multiple surveys and multiple-frame surveys, two commonly encountered problems in survey practice. We show that inferences about the common parameter of interest and the effective use of various types of auxiliary information can be conveniently carried out through the constrained maximization of joint pseudo EL function. We obtain asymptotic results which are used for constructing the pseudo EL ratio confidence intervals, either using a chi-square approximation or a bootstrap calibration. All related computational problems can be handled using existing algorithms on stratified sampling after suitable re-formulation.

    Release date: 2009-08-11

  • Technical products: 11-522-X200600110424
    Description:

    The International Tobacco Control (ITC) Policy Evaluation China Survey uses a multi-stage unequal probability sampling design with upper level clusters selected by the randomized systematic PPS sampling method. A difficulty arises in the execution of the survey: several selected upper level clusters refuse to participate in the survey and have to be replaced by substitute units, selected from units not included in the initial sample and once again using the randomized systematic PPS sampling method. Under such a scenario the first order inclusion probabilities of the final selected units are very difficult to calculate and the second order inclusion probabilities become virtually intractable. In this paper we develop a simulation-based approach for computing the first and the second order inclusion probabilities when direct calculation is prohibitive or impossible. The efficiency and feasibility of the proposed approach are demonstrated through both theoretical considerations and numerical examples. Several R/S-PLUS functions and codes for the proposed procedure are included. The approach can be extended to handle more complex refusal/substitution scenarios one may encounter in practice.

    Release date: 2008-06-26

  • Technical products: 11-522-X200600110418
    Description:

    The current use of multilevel models to examine the effects of surrounding contexts on health outcomes attest to their value as a statistical method for analyzing grouped data. But the use of multilevel modeling with data from population-based surveys is often limited by the small number of cases per level-2 unit, prompting a recent trend in the neighborhood literature to apply cluster analysis techniques to address the problem of data sparseness. In this paper we use Monte Carlo simulations to investigate the effects of marginal group sizes and cluster analysis techniques on the validity of parameter estimates in both linear and non-linear multilevel models.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110402
    Description:

    This paper explains how to append census area-level summary data to survey or administrative data. It uses examples from survey datasets present in Statistics Canada Research Data Centres, but the methods also apply to external datasets, including administrative datasets. Four examples illustrate common situations faced by researchers: (1) when the survey (or administrative) and census data both contain the same level of geographic identifiers, coded to the same year standard ("vintage") of census geography (for example, if both have 2001 DA); (2) when the two files contain geographic identifiers of the same vintage, but at different levels of census geography (for example, 1996 EA in the survey, but 1996 CT in the census data); (3) when the two files contain data coded to different vintages of census geography (such as 1996 EA for the survey, but 2001 DA for the census); (4) when the survey data are lacking in geographic identifiers, and those identifiers must first be generated from postal codes present on the file. The examples are shown using SAS syntax, but the principles apply to other programming languages or statistical packages.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110435
    Description:

    In 1999, the first nationally representative survey of the mental health of children and young people aged 5-15 was carried out in Great Britain. A second survey was carried out in 2004. The aim of these surveys was threefold: to estimate the prevalence of mental disorders among young people, to look at their use of health, social and educational services, and to investigate risk factors associated with mental disorders. The achieved number of interviews was 10,500 and 8,000 respectively. Some key questions had to be addressed on a large number of methodological issues and the factors taken into account to reach decisions on all these issues are discussed.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110417
    Description:

    The coefficients of regression equations are often parameters of interest for health surveys and such surveys are usually of complex design with differential sampling rates. We give estimators for the regression coefficients for complex surveys that are superior to ordinary expansion estimators under the subject matter model, but also retain desirable design properties. Theoretical and Monte Carlo properties are presented.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110451
    Description:

    Household response rates have steadily declined across many large scale social surveys. The Health Survey for England has observed a 9 percentage points decline in response across an eleven year period. Evidence from other studies has suggested that unconditional gifts or incentives, with small monetary value, can improve rates of co-operation. An incentive experiment conducted on the Health Survey for England aimed to replicate findings of a previous experiment carried out on the Family Resources Study, which showed significant increases in household response among those who had received a book of first class stamps with the advance letter. The HSE incentive experiment, however, did not show any significant differences in household response rates, response to other stages of the survey or in respondent profile between two experimental conditions (stamps included with the advance letter, bookmark sent with the advance letter) and the control group (the advance letter alone).

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110400
    Description:

    Estimates of the attributable number of deaths (AD) from all-causes can be obtained by first estimating population attributable risk (AR) adjusted for confounding covariates, and then multiplying the AR by the number of deaths determined from vital mortality statistics that occurred for a specific time period. Proportional hazard regression estimates of adjusted relative hazards obtained from mortality follow-up data from a cohort or a survey is combined with a joint distribution of risk factor and confounding covariates to compute an adjusted AR. Two estimators of adjusted AR are examined, which differ according to the reference population that the joint distribution of risk factor and confounders is obtained. The two types of reference populations considered: (i) the population that is represented by the baseline cohort and (ii) a population that is external to the cohort. Methods based on influence function theory are applied to obtain expressions for estimating the variance of the AD estimator. These variance estimators can be applied to data that range from simple random samples to (sample) weighted multi-stage stratified cluster samples from national household surveys. The variance estimation of AD is illustrated in an analysis of excess deaths due to having a non-ideal body mass index using data from the second National Health and Examination Survey (NHANES) Mortality Study and the 1999-2002 NHANES. These methods can also be used to estimate the attributable number of cause-specific deaths or incident cases of a disease and their standard errors when the time period for the accrual of is short.

    Release date: 2008-03-17

  • Technical products: 11-522-X20050019462
    Description:

    The traditional approach to presenting variance information to data users is to publish estimates of variance or related statistics, such as standard errors, coefficients of variation, confidence limits or simple grading systems. The paper examines potential sources of variance, such as sample design, sample allocation, sample selection, non-response, and considers what might best be done to reduce variance. Finally, the paper assesses briefly the financial costs to producers and users of reducing or not reducing variance and how we might trade off the costs of producing more accurate statistics against the financial benefits of greater accuracy.

    Release date: 2007-03-02

Date modified: