Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

121 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

121 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

121 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

121 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (169)

All (169) (25 of 169 results)

  • Articles and reports: 12-001-X201700254896
    Description:

    This note by Sharon L. Lohr presents a discussion of the paper “Sample survey theory and methods: Past, present, and future directions” where J.N.K. Rao and Wayne A. Fuller share their views regarding the developments in sample survey theory and methods covering the past 100 years.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700114822
    Description:

    We use a Bayesian method to infer about a finite population proportion when binary data are collected using a two-fold sample design from small areas. The two-fold sample design has a two-stage cluster sample design within each area. A former hierarchical Bayesian model assumes that for each area the first stage binary responses are independent Bernoulli distributions, and the probabilities have beta distributions which are parameterized by a mean and a correlation coefficient. The means vary with areas but the correlation is the same over areas. However, to gain some flexibility we have now extended this model to accommodate different correlations. The means and the correlations have independent beta distributions. We call the former model a homogeneous model and the new model a heterogeneous model. All hyperparameters have proper noninformative priors. An additional complexity is that some of the parameters are weakly identified making it difficult to use a standard Gibbs sampler for computation. So we have used unimodal constraints for the beta prior distributions and a blocked Gibbs sampler to perform the computation. We have compared the heterogeneous and homogeneous models using an illustrative example and simulation study. As expected, the two-fold model with heterogeneous correlations is preferred.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600214677
    Description:

    How do we tell whether weighting adjustments reduce nonresponse bias? If a variable is measured for everyone in the selected sample, then the design weights can be used to calculate an approximately unbiased estimate of the population mean or total for that variable. A second estimate of the population mean or total can be calculated using the survey respondents only, with weights that have been adjusted for nonresponse. If the two estimates disagree, then there is evidence that the weight adjustments may not have removed the nonresponse bias for that variable. In this paper we develop the theoretical properties of linearization and jackknife variance estimators for evaluating the bias of an estimated population mean or total by comparing estimates calculated from overlapping subsets of the same data with different sets of weights, when poststratification or inverse propensity weighting is used for the nonresponse adjustments to the weights. We provide sufficient conditions on the population, sample, and response mechanism for the variance estimators to be consistent, and demonstrate their small-sample properties through a simulation study.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214663
    Description:

    We present theoretical evidence that efforts during data collection to balance the survey response with respect to selected auxiliary variables will improve the chances for low nonresponse bias in the estimates that are ultimately produced by calibrated weighting. One of our results shows that the variance of the bias – measured here as the deviation of the calibration estimator from the (unrealized) full-sample unbiased estimator – decreases linearly as a function of the response imbalance that we assume measured and controlled continuously over the data collection period. An attractive prospect is thus a lower risk of bias if one can manage the data collection to get low imbalance. The theoretical results are validated in a simulation study with real data from an Estonian household survey.

    Release date: 2016-12-20

  • Technical products: 11-522-X201700014707
    Description:

    The Labour Force Survey (LFS) is a monthly household survey of about 56,000 households that provides information on the Canadian labour market. Audit Trail is a Blaise programming option, for surveys like LFS with Computer Assisted Interviewing (CAI), which creates files containing every keystroke and edit and timestamp of every data collection attempt on all households. Combining such a large survey with such a complete source of paradata opens the door to in-depth data quality analysis but also quickly leads to Big Data challenges. How can meaningful information be extracted from this large set of keystrokes and timestamps? How can it help assess the quality of LFS data collection? The presentation will describe some of the challenges that were encountered, solutions that were used to address them, and results of the analysis on data quality.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014715
    Description:

    In preparation for 2021 UK Census the ONS has committed to an extensive research programme exploring how linked administrative data can be used to support conventional statistical processes. Item-level edit and imputation (E&I) will play an important role in adjusting the 2021 Census database. However, uncertainty associated with the accuracy and quality of available administrative data renders the efficacy of an integrated census-administrative data approach to E&I unclear. Current constraints that dictate an anonymised ‘hash-key’ approach to record linkage to ensure confidentiality add to that uncertainty. Here, we provide preliminary results from a simulation study comparing the predictive and distributional accuracy of the conventional E&I strategy implemented in CANCEIS for the 2011 UK Census to that of an integrated approach using synthetic administrative data with systematically increasing error as auxiliary information. In this initial phase of research we focus on imputing single year of age. The aim of the study is to gain insight into whether auxiliary information from admin data can improve imputation estimates and where the different strategies fall on a continuum of accuracy.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014728
    Description:

    Record linkage joins together two or more sources. The product of record linkage is a file with one record per individual containing all the information about the individual from the multiple files. The problem is difficult when a unique identification key is not available, there are errors in some variables, some data are missing, and files are large. Probabilistic record linkage computes a probability that records from on different files pertain to a single individual. Some true links are given low probabilities of matching, whereas some non links are given high probabilities. Errors in linkage designations can cause bias in analyses based on the composite data base. The SEER cancer registries contain information on breast cancer cases in their registry areas. A diagnostic test based on the Oncotype DX assay, performed by Genomic Health, Inc. (GHI), is often performed for certain types of breast cancers. Record linkage using personal identifiable information was conducted to associate Oncotype DC assay results with SEER cancer registry information. The software Link Plus was used to generate a score describing the similarity of records and to identify the apparent best match of SEER cancer registry individuals to the GHI database. Clerical review was used to check samples of likely matches, possible matches, and unlikely matches. Models are proposed for jointly modeling the record linkage process and subsequent statistical analysis in this and other applications.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014756
    Description:

    How can we bring together multidimensional health system performance data in a simplified way that is easy to access and provides comparable and actionable information to accelerate improvements in health care? The Canadian Institute for Health Information has developed a suite of tools to meet performance measurement needs across different audiences, to identify improvement priorities, understand how health regions and facilities compare with peers and support transparency and accountability. The pan-Canadian tools [Your Health System (YHS)] consolidates reporting of 45 key performance indicators in a structured way, and are comparable over time and at different geographic levels. This paper outlines the development and the methodological approaches and considerations taken to create a dynamic tool that facilitates benchmarking and meaningful comparisons for health system performance improvement.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014746
    Description:

    Paradata research has focused on identifying opportunities for strategic improvement in data collection that could be operationally viable and lead to enhancements in quality or cost efficiency. To that end, Statistics Canada has developed and implemented a responsive collection design (RCD) strategy for computer-assisted telephone interview (CATI) household surveys to maximize quality and efficiency and to potentially reduce costs. RCD is an adaptive approach to survey data collection that uses information available prior to and during data collection to adjust the collection strategy for the remaining in-progress cases. In practice, the survey managers monitor and analyze collection progress against a predetermined set of indicators for two purposes: to identify critical data-collection milestones that require significant changes to the collection approach and to adjust collection strategies to make the most efficient use of remaining available resources. In the RCD context, numerous considerations come into play when determining which aspects of data collection to adjust and how to adjust them. Paradata sources play a key role in the planning, development and implementation of active management for RCD surveys. Since 2009, Statistics Canada has conducted several RCD surveys. This paper describes Statistics Canada’s experiences in implementing and monitoring this type of surveys.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014731
    Description:

    Our study describes various factors that are of concern when evaluating disclosure risk of contextualized microdata and some of the empirical steps that are involved in their assessment. Utilizing synthetic sets of survey respondents, we illustrate how different postulates shape the assessment of risk when considering: (1) estimated probabilities that unidentified geographic areas are represented within a survey; (2) the number of people in the population who share the same personal and contextual identifiers as a respondent; and (3) the anticipated amount of coverage error in census population counts and extant files that provide identifying information (like names and addresses).

    Release date: 2016-03-24

  • Articles and reports: 82-003-X201501214295
    Description:

    Using the Wisconsin Cancer Intervention and Surveillance Monitoring Network breast cancer simulation model adapted to the Canadian context, costs and quality-adjusted life years were evaluated for 11 mammography screening strategies that varied by start/stop age and screening frequency for the general population. Incremental cost-effectiveness ratios are presented, and sensitivity analyses are used to assess the robustness of model conclusions.

    Release date: 2015-12-16

  • Articles and reports: 12-001-X201500114172
    Description:

    When a random sample drawn from a complete list frame suffers from unit nonresponse, calibration weighting to population totals can be used to remove nonresponse bias under either an assumed response (selection) or an assumed prediction (outcome) model. Calibration weighting in this way can not only provide double protection against nonresponse bias, it can also decrease variance. By employing a simple trick one can estimate the variance under the assumed prediction model and the mean squared error under the combination of an assumed response model and the probability-sampling mechanism simultaneously. Unfortunately, there is a practical limitation on what response model can be assumed when design weights are calibrated to population totals in a single step. In particular, the choice for the response function cannot always be logistic. That limitation does not hinder calibration weighting when performed in two steps: from the respondent sample to the full sample to remove the response bias and then from the full sample to the population to decrease variance. There are potential efficiency advantages from using the two-step approach as well even when the calibration variables employed in each step is a subset of the calibration variables in the single step. Simultaneous mean-squared-error estimation using linearization is possible, but more complicated than when calibrating in a single step.

    Release date: 2015-06-29

  • Articles and reports: 82-003-X201500614196
    Description:

    This study investigates the feasibility and validity of using personal health insurance numbers to deterministically link the CCR and the Discharge Abstract Database to obtain hospitalization information about people with primary cancers.

    Release date: 2015-06-17

  • Articles and reports: 12-001-X201400214096
    Description:

    In order to obtain better coverage of the population of interest and cost less, a number of surveys employ dual frame structure, in which independent samples are taken from two overlapping sampling frames. This research considers chi-squared tests in dual frame surveys when categorical data is encountered. We extend generalized Wald’s test (Wald 1943), Rao-Scott first-order and second-order corrected tests (Rao and Scott 1981) from a single survey to a dual frame survey and derive the asymptotic distributions. Simulation studies show that both Rao-Scott type corrected tests work well and thus are recommended for use in dual frame surveys. An example is given to illustrate the usage of the developed tests.

    Release date: 2014-12-19

  • Technical products: 11-522-X201300014260
    Description:

    The Survey of Employment, Payrolls and Hours (SEPH) produces monthly estimates and determines the month-to-month changes for variables such as employment, earnings and hours at detailed industrial levels for Canada, the provinces and territories. In order to improve the efficiency of collection activities for this survey, an electronic questionnaire (EQ) was introduced in the fall of 2012. Given the timeframe allowed for this transition as well as the production calendar of the survey, a conversion strategy was developed for the integration of this new mode. The goal of the strategy was to ensure a good adaptation of the collection environment and also to allow the implementation of a plan of analysis that would evaluate the impact of this change on the results of the survey. This paper will give an overview of the conversion strategy, the different adjustments that were made during the transition period and the results of various evaluations that were conducted. For example, the impact of the integration of the EQ on the collection process, the response rate and the follow-up rate will be presented. In addition, the effect that this new collection mode has on the survey estimates will also be discussed. More specifically, the results of a randomized experiment that was conducted in order to determine the presence of a mode effect will be presented.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014270
    Description:

    There is a wide range of character-string comparators in the record-linkage field. Comparison problems arise when factors affect the composition of the strings (for example, the use of a nickname instead of a given name, and typographical errors). In these cases, more sophisticated comparators must be used. Such tools help to reduce the number of potentially missed links. Unfortunately, some of the gains may be false links. In order to improve the matches, three sophisticated string comparators were developed; they are described in this paper. They are the Lachance comparator and its derivatives, the multi-word comparator and the multi-type comparator. This set of tools is currently available in a deterministic record-linkage prototype known as MixMatch. This application can use prior knowledge to reduce the volume of false links generated during matching. This paper also proposes a link-strength indicator.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014285
    Description:

    The 2011 National Household Survey (NHS) is a voluntary survey that replaced the traditional mandatory long-form questionnaire of the Canadian census of population. The NHS sampled about 30% of Canadian households and achieved a design-weighted response rate of 77%. In comparison, the last census long form was sent to 20% of households and achieved a response rate of 94%. Based on the long-form data, Statistics Canada traditionally produces two public use microdata files (PUMFs): the individual PUMF and the hierarchical PUMF. Both give information on individuals, but the hierarchical PUMF provides extra information on the household and family relationships between the individuals. To produce two PUMFs, based on the NHS data, that cover the whole country evenly and that do not overlap, we applied a special sub-sampling strategy. Difficulties in the confidentiality analyses have increased because of the numerous new variables, the more detailed geographic information and the voluntary nature of the NHS. This paper describes the 2011 PUMF methodology and how it balances the requirements for more information and for low risk of disclosure.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014284
    Description:

    The decline in response rates observed by several national statistical institutes, their desire to limit response burden and the significant budget pressures they face support greater use of administrative data to produce statistical information. The administrative data sources they must consider have to be evaluated according to several aspects to determine their fitness for use. Statistics Canada recently developed a process to evaluate administrative data sources for use as inputs to the statistical information production process. This evaluation is conducted in two phases. The initial phase requires access only to the metadata associated with the administrative data considered, whereas the second phase uses a version of data that can be evaluated. This article outlines the evaluation process and tool.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014281
    Description:

    Web surveys exclude the entire non-internet population and often have low response rates. Therefore, statistical inference based on Web survey samples will require availability of additional information about the non-covered population, careful choice of survey methods to account for potential biases, and caution with interpretation and generalization of the results to a target population. In this paper, we focus on non-coverage bias, and explore the use of weighted estimators and hot-deck imputation estimators for bias adjustment under the ideal scenario where covariate information was obtained for a simple random sample of individuals from the non-covered population. We illustrate empirically the performance of the proposed estimators under this scenario. Possible extensions of these approaches to more realistic scenarios are discussed.

    Release date: 2014-10-31

  • Articles and reports: 12-001-X201400114030
    Description:

    The paper reports the results of a Monte Carlo simulation study that was conducted to compare the effectiveness of four different hierarchical Bayes small area models for producing state estimates of proportions based on data from stratified simple random samples from a fixed finite population. Two of the models adopted the commonly made assumptions that the survey weighted proportion for each sampled small area has a normal distribution and that the sampling variance of this proportion is known. One of these models used a linear linking model and the other used a logistic linking model. The other two models both employed logistic linking models and assumed that the sampling variance was unknown. One of these models assumed a normal distribution for the sampling model while the other assumed a beta distribution. The study found that for all four models the credible interval design-based coverage of the finite population state proportions deviated markedly from the 95 percent nominal level used in constructing the intervals.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211888
    Description:

    When the study variables are functional and storage capacities are limited or transmission costs are high, using survey techniques to select a portion of the observations of the population is an interesting alternative to using signal compression techniques. In this context of functional data, our focus in this study is on estimating the mean electricity consumption curve over a one-week period. We compare different estimation strategies that take account of a piece of auxiliary information such as the mean consumption for the previous period. The first strategy consists in using a simple random sampling design without replacement, then incorporating the auxiliary information into the estimator by introducing a functional linear model. The second approach consists in incorporating the auxiliary information into the sampling designs by considering unequal probability designs, such as stratified and pi designs. We then address the issue of constructing confidence bands for these estimators of the mean. When effective estimators of the covariance function are available and the mean estimator satisfies a functional central limit theorem, it is possible to use a fast technique for constructing confidence bands, based on the simulation of Gaussian processes. This approach is compared with bootstrap techniques that have been adapted to take account of the functional nature of the data.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300111824
    Description:

    In most surveys all sample units receive the same treatment and the same design features apply to all selected people and households. In this paper, it is explained how survey designs may be tailored to optimize quality given constraints on costs. Such designs are called adaptive survey designs. The basic ingredients of such designs are introduced, discussed and illustrated with various examples.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201300111829
    Description:

    Indirect Sampling is used when the sampling frame is not the same as the target population, but related to the latter. The estimation process for Indirect Sampling is carried out using the Generalised Weight Share Method (GWSM), which is an unbiased procedure (see Lavallée 2002, 2007). For business surveys, Indirect Sampling is applied as follows: the sampling frame is one of establishments, while the target population is one of enterprises. Enterprises are selected through their establishments. This allows stratifying according to the establishment characteristics, rather than those associated with enterprises. Because the variables of interest of establishments are generally highly skewed (a small portion of the establishments covers the major portion of the economy), the GWSM results in unbiased estimates, but their variance can be large. The purpose of this paper is to suggest some adjustments to the weights to reduce the variance of the estimates in the context of skewed populations, while keeping the method unbiased. After a brief overview of Indirect Sampling and the GWSM, we describe the required adjustments to the GWSM. The estimates produced with these adjustments are compared to those from the original GWSM, via a small numerical example, and using real data originating from the Statistics Canada's Business Register.

    Release date: 2013-06-28

  • Articles and reports: 82-003-X201300111764
    Description:

    This study compares two sources of information about prescription drug use by people aged 65 or older in Ontario - the Canadian Community Health Survey and the drug claimsdatabase of the Ontario Drug Benefit Program. The analysis pertains to cardiovascular and diabetes drugs because they are commonly used, and almost all are prescribed on a regular basis.

    Release date: 2013-01-16

  • Articles and reports: 12-001-X201200211751
    Description:

    Survey quality is a multi-faceted concept that originates from two different development paths. One path is the total survey error paradigm that rests on four pillars providing principles that guide survey design, survey implementation, survey evaluation, and survey data analysis. We should design surveys so that the mean squared error of an estimate is minimized given budget and other constraints. It is important to take all known error sources into account, to monitor major error sources during implementation, to periodically evaluate major error sources and combinations of these sources after the survey is completed, and to study the effects of errors on the survey analysis. In this context survey quality can be measured by the mean squared error and controlled by observations made during implementation and improved by evaluation studies. The paradigm has both strengths and weaknesses. One strength is that research can be defined by error sources and one weakness is that most total survey error assessments are incomplete in the sense that it is not possible to include the effects of all the error sources. The second path is influenced by ideas from the quality management sciences. These sciences concern business excellence in providing products and services with a focus on customers and competition from other providers. These ideas have had a great influence on many statistical organizations. One effect is the acceptance among data providers that product quality cannot be achieved without a sufficient underlying process quality and process quality cannot be achieved without a good organizational quality. These levels can be controlled and evaluated by service level agreements, customer surveys, paradata analysis using statistical process control, and organizational assessment using business excellence models or other sets of criteria. All levels can be improved by conducting improvement projects chosen by means of priority functions. The ultimate goal of improvement projects is that the processes involved should gradually approach a state where they are error-free. Of course, this might be an unattainable goal, albeit one to strive for. It is not realistic to hope for continuous measurements of the total survey error using the mean squared error. Instead one can hope that continuous quality improvement using management science ideas and statistical methods can minimize biases and other survey process problems so that the variance becomes an approximation of the mean squared error. If that can be achieved we have made the two development paths approximately coincide.

    Release date: 2012-12-19

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (97)

Analysis (97) (25 of 97 results)

  • Articles and reports: 12-001-X201700254896
    Description:

    This note by Sharon L. Lohr presents a discussion of the paper “Sample survey theory and methods: Past, present, and future directions” where J.N.K. Rao and Wayne A. Fuller share their views regarding the developments in sample survey theory and methods covering the past 100 years.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700114822
    Description:

    We use a Bayesian method to infer about a finite population proportion when binary data are collected using a two-fold sample design from small areas. The two-fold sample design has a two-stage cluster sample design within each area. A former hierarchical Bayesian model assumes that for each area the first stage binary responses are independent Bernoulli distributions, and the probabilities have beta distributions which are parameterized by a mean and a correlation coefficient. The means vary with areas but the correlation is the same over areas. However, to gain some flexibility we have now extended this model to accommodate different correlations. The means and the correlations have independent beta distributions. We call the former model a homogeneous model and the new model a heterogeneous model. All hyperparameters have proper noninformative priors. An additional complexity is that some of the parameters are weakly identified making it difficult to use a standard Gibbs sampler for computation. So we have used unimodal constraints for the beta prior distributions and a blocked Gibbs sampler to perform the computation. We have compared the heterogeneous and homogeneous models using an illustrative example and simulation study. As expected, the two-fold model with heterogeneous correlations is preferred.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600214677
    Description:

    How do we tell whether weighting adjustments reduce nonresponse bias? If a variable is measured for everyone in the selected sample, then the design weights can be used to calculate an approximately unbiased estimate of the population mean or total for that variable. A second estimate of the population mean or total can be calculated using the survey respondents only, with weights that have been adjusted for nonresponse. If the two estimates disagree, then there is evidence that the weight adjustments may not have removed the nonresponse bias for that variable. In this paper we develop the theoretical properties of linearization and jackknife variance estimators for evaluating the bias of an estimated population mean or total by comparing estimates calculated from overlapping subsets of the same data with different sets of weights, when poststratification or inverse propensity weighting is used for the nonresponse adjustments to the weights. We provide sufficient conditions on the population, sample, and response mechanism for the variance estimators to be consistent, and demonstrate their small-sample properties through a simulation study.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214663
    Description:

    We present theoretical evidence that efforts during data collection to balance the survey response with respect to selected auxiliary variables will improve the chances for low nonresponse bias in the estimates that are ultimately produced by calibrated weighting. One of our results shows that the variance of the bias – measured here as the deviation of the calibration estimator from the (unrealized) full-sample unbiased estimator – decreases linearly as a function of the response imbalance that we assume measured and controlled continuously over the data collection period. An attractive prospect is thus a lower risk of bias if one can manage the data collection to get low imbalance. The theoretical results are validated in a simulation study with real data from an Estonian household survey.

    Release date: 2016-12-20

  • Articles and reports: 82-003-X201501214295
    Description:

    Using the Wisconsin Cancer Intervention and Surveillance Monitoring Network breast cancer simulation model adapted to the Canadian context, costs and quality-adjusted life years were evaluated for 11 mammography screening strategies that varied by start/stop age and screening frequency for the general population. Incremental cost-effectiveness ratios are presented, and sensitivity analyses are used to assess the robustness of model conclusions.

    Release date: 2015-12-16

  • Articles and reports: 12-001-X201500114172
    Description:

    When a random sample drawn from a complete list frame suffers from unit nonresponse, calibration weighting to population totals can be used to remove nonresponse bias under either an assumed response (selection) or an assumed prediction (outcome) model. Calibration weighting in this way can not only provide double protection against nonresponse bias, it can also decrease variance. By employing a simple trick one can estimate the variance under the assumed prediction model and the mean squared error under the combination of an assumed response model and the probability-sampling mechanism simultaneously. Unfortunately, there is a practical limitation on what response model can be assumed when design weights are calibrated to population totals in a single step. In particular, the choice for the response function cannot always be logistic. That limitation does not hinder calibration weighting when performed in two steps: from the respondent sample to the full sample to remove the response bias and then from the full sample to the population to decrease variance. There are potential efficiency advantages from using the two-step approach as well even when the calibration variables employed in each step is a subset of the calibration variables in the single step. Simultaneous mean-squared-error estimation using linearization is possible, but more complicated than when calibrating in a single step.

    Release date: 2015-06-29

  • Articles and reports: 82-003-X201500614196
    Description:

    This study investigates the feasibility and validity of using personal health insurance numbers to deterministically link the CCR and the Discharge Abstract Database to obtain hospitalization information about people with primary cancers.

    Release date: 2015-06-17

  • Articles and reports: 12-001-X201400214096
    Description:

    In order to obtain better coverage of the population of interest and cost less, a number of surveys employ dual frame structure, in which independent samples are taken from two overlapping sampling frames. This research considers chi-squared tests in dual frame surveys when categorical data is encountered. We extend generalized Wald’s test (Wald 1943), Rao-Scott first-order and second-order corrected tests (Rao and Scott 1981) from a single survey to a dual frame survey and derive the asymptotic distributions. Simulation studies show that both Rao-Scott type corrected tests work well and thus are recommended for use in dual frame surveys. An example is given to illustrate the usage of the developed tests.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400114030
    Description:

    The paper reports the results of a Monte Carlo simulation study that was conducted to compare the effectiveness of four different hierarchical Bayes small area models for producing state estimates of proportions based on data from stratified simple random samples from a fixed finite population. Two of the models adopted the commonly made assumptions that the survey weighted proportion for each sampled small area has a normal distribution and that the sampling variance of this proportion is known. One of these models used a linear linking model and the other used a logistic linking model. The other two models both employed logistic linking models and assumed that the sampling variance was unknown. One of these models assumed a normal distribution for the sampling model while the other assumed a beta distribution. The study found that for all four models the credible interval design-based coverage of the finite population state proportions deviated markedly from the 95 percent nominal level used in constructing the intervals.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211888
    Description:

    When the study variables are functional and storage capacities are limited or transmission costs are high, using survey techniques to select a portion of the observations of the population is an interesting alternative to using signal compression techniques. In this context of functional data, our focus in this study is on estimating the mean electricity consumption curve over a one-week period. We compare different estimation strategies that take account of a piece of auxiliary information such as the mean consumption for the previous period. The first strategy consists in using a simple random sampling design without replacement, then incorporating the auxiliary information into the estimator by introducing a functional linear model. The second approach consists in incorporating the auxiliary information into the sampling designs by considering unequal probability designs, such as stratified and pi designs. We then address the issue of constructing confidence bands for these estimators of the mean. When effective estimators of the covariance function are available and the mean estimator satisfies a functional central limit theorem, it is possible to use a fast technique for constructing confidence bands, based on the simulation of Gaussian processes. This approach is compared with bootstrap techniques that have been adapted to take account of the functional nature of the data.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300111824
    Description:

    In most surveys all sample units receive the same treatment and the same design features apply to all selected people and households. In this paper, it is explained how survey designs may be tailored to optimize quality given constraints on costs. Such designs are called adaptive survey designs. The basic ingredients of such designs are introduced, discussed and illustrated with various examples.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201300111829
    Description:

    Indirect Sampling is used when the sampling frame is not the same as the target population, but related to the latter. The estimation process for Indirect Sampling is carried out using the Generalised Weight Share Method (GWSM), which is an unbiased procedure (see Lavallée 2002, 2007). For business surveys, Indirect Sampling is applied as follows: the sampling frame is one of establishments, while the target population is one of enterprises. Enterprises are selected through their establishments. This allows stratifying according to the establishment characteristics, rather than those associated with enterprises. Because the variables of interest of establishments are generally highly skewed (a small portion of the establishments covers the major portion of the economy), the GWSM results in unbiased estimates, but their variance can be large. The purpose of this paper is to suggest some adjustments to the weights to reduce the variance of the estimates in the context of skewed populations, while keeping the method unbiased. After a brief overview of Indirect Sampling and the GWSM, we describe the required adjustments to the GWSM. The estimates produced with these adjustments are compared to those from the original GWSM, via a small numerical example, and using real data originating from the Statistics Canada's Business Register.

    Release date: 2013-06-28

  • Articles and reports: 82-003-X201300111764
    Description:

    This study compares two sources of information about prescription drug use by people aged 65 or older in Ontario - the Canadian Community Health Survey and the drug claimsdatabase of the Ontario Drug Benefit Program. The analysis pertains to cardiovascular and diabetes drugs because they are commonly used, and almost all are prescribed on a regular basis.

    Release date: 2013-01-16

  • Articles and reports: 12-001-X201200211751
    Description:

    Survey quality is a multi-faceted concept that originates from two different development paths. One path is the total survey error paradigm that rests on four pillars providing principles that guide survey design, survey implementation, survey evaluation, and survey data analysis. We should design surveys so that the mean squared error of an estimate is minimized given budget and other constraints. It is important to take all known error sources into account, to monitor major error sources during implementation, to periodically evaluate major error sources and combinations of these sources after the survey is completed, and to study the effects of errors on the survey analysis. In this context survey quality can be measured by the mean squared error and controlled by observations made during implementation and improved by evaluation studies. The paradigm has both strengths and weaknesses. One strength is that research can be defined by error sources and one weakness is that most total survey error assessments are incomplete in the sense that it is not possible to include the effects of all the error sources. The second path is influenced by ideas from the quality management sciences. These sciences concern business excellence in providing products and services with a focus on customers and competition from other providers. These ideas have had a great influence on many statistical organizations. One effect is the acceptance among data providers that product quality cannot be achieved without a sufficient underlying process quality and process quality cannot be achieved without a good organizational quality. These levels can be controlled and evaluated by service level agreements, customer surveys, paradata analysis using statistical process control, and organizational assessment using business excellence models or other sets of criteria. All levels can be improved by conducting improvement projects chosen by means of priority functions. The ultimate goal of improvement projects is that the processes involved should gradually approach a state where they are error-free. Of course, this might be an unattainable goal, albeit one to strive for. It is not realistic to hope for continuous measurements of the total survey error using the mean squared error. Instead one can hope that continuous quality improvement using management science ideas and statistical methods can minimize biases and other survey process problems so that the variance becomes an approximation of the mean squared error. If that can be achieved we have made the two development paths approximately coincide.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X201200211758
    Description:

    This paper develops two Bayesian methods for inference about finite population quantiles of continuous survey variables from unequal probability sampling. The first method estimates cumulative distribution functions of the continuous survey variable by fitting a number of probit penalized spline regression models on the inclusion probabilities. The finite population quantiles are then obtained by inverting the estimated distribution function. This method is quite computationally demanding. The second method predicts non-sampled values by assuming a smoothly-varying relationship between the continuous survey variable and the probability of inclusion, by modeling both the mean function and the variance function using splines. The two Bayesian spline-model-based estimators yield a desirable balance between robustness and efficiency. Simulation studies show that both methods yield smaller root mean squared errors than the sample-weighted estimator and the ratio and difference estimators described by Rao, Kovar, and Mantel (RKM 1990), and are more robust to model misspecification than the regression through the origin model-based estimator described in Chambers and Dunstan (1986). When the sample size is small, the 95% credible intervals of the two new methods have closer to nominal confidence coverage than the sample-weighted estimator.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X201200211757
    Description:

    Collinearities among explanatory variables in linear regression models affect estimates from survey data just as they do in non-survey data. Undesirable effects are unnecessarily inflated standard errors, spuriously low or high t-statistics, and parameter estimates with illogical signs. The available collinearity diagnostics are not generally appropriate for survey data because the variance estimators they incorporate do not properly account for stratification, clustering, and survey weights. In this article, we derive condition indexes and variance decompositions to diagnose collinearity problems in complex survey data. The adapted diagnostics are illustrated with data based on a survey of health characteristics.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X201200111686
    Description:

    We present a generalized estimating equations approach for estimating the concordance correlation coefficient and the kappa coefficient from sample survey data. The estimates and their accompanying standard error need to correctly account for the sampling design. Weighted measures of the concordance correlation coefficient and the kappa coefficient, along with the variance of these measures accounting for the sampling design, are presented. We use the Taylor series linearization method and the jackknife procedure for estimating the standard errors of the resulting parameter estimates. Body measurement and oral health data from the Third National Health and Nutrition Examination Survey are used to illustrate this methodology.

    Release date: 2012-06-27

  • Articles and reports: 82-003-X201200111625
    Description:

    This study compares estimates of the prevalence of cigarette smoking based on self-report with estimates based on urinary cotinine concentrations. The data are from the 2007 to 2009 Canadian Health Measures Survey, which included self-reported smoking status and the first nationally representative measures of urinary cotinine.

    Release date: 2012-02-15

  • Articles and reports: 12-001-X201100211608
    Description:

    Designs and estimators for the single frame surveys currently used by U.S. government agencies were developed in response to practical problems. Federal household surveys now face challenges of decreasing response rates and frame coverage, higher data collection costs, and increasing demand for small area statistics. Multiple frame surveys, in which independent samples are drawn from separate frames, can be used to help meet some of these challenges. Examples include combining a list frame with an area frame or using two frames to sample landline telephone households and cellular telephone households. We review point estimators and weight adjustments that can be used to analyze multiple frame surveys with standard survey software, and summarize construction of replicate weights for variance estimation. Because of their increased complexity, multiple frame surveys face some challenges not found in single frame surveys. We investigate misclassification bias in multiple frame surveys, and propose a method for correcting for this bias when misclassification probabilities are known. Finally, we discuss research that is needed on nonsampling errors with multiple frame surveys.

    Release date: 2011-12-21

  • Articles and reports: 12-001-X201100111451
    Description:

    In the calibration method proposed by Deville and Särndal (1992), the calibration equations take only exact estimates of auxiliary variable totals into account. This article examines other parameters besides totals for calibration. Parameters that are considered complex include the ratio, median or variance of auxiliary variables.

    Release date: 2011-06-29

  • Articles and reports: 12-001-X201100111443
    Description:

    Dual frame telephone surveys are becoming common in the U.S. because of the incompleteness of the landline frame as people transition to cell phones. This article examines nonsampling errors in dual frame telephone surveys. Even though nonsampling errors are ignored in much of the dual frame literature, we find that under some conditions substantial biases may arise in dual frame telephone surveys due to these errors. We specifically explore biases due to nonresponse and measurement error in these telephone surveys. To reduce the bias resulting from these errors, we propose dual frame sampling and weighting methods. The compositing factor for combining the estimates from the two frames is shown to play an important role in reducing nonresponse bias.

    Release date: 2011-06-29

  • Articles and reports: 12-001-X201000211376
    Description:

    This article develops computational tools, called indicators, for judging the effectiveness of the auxiliary information used to control nonresponse bias in survey estimates, obtained in this article by calibration. This work is motivated by the survey environment in a number of countries, notably in northern Europe, where many potential auxiliary variables are derived from reliable administrative registers for household and individuals. Many auxiliary vectors can be composed. There is a need to compare these vectors to assess their potential for reducing bias. The indicators in this article are designed to meet that need. They are used in surveys at Statistics Sweden. General survey conditions are considered: There is probability sampling from the finite population, by an arbitrary sampling design; nonresponse occurs. The probability of inclusion in the sample is known for each population unit; the probability of response is unknown, causing bias. The study variable (the y-variable) is observed for the set of respondents only. No matter what auxiliary vector is used in a calibration estimator (or in any other estimation method), a residual bias will always remain. The choice of a "best possible" auxiliary vector is guided by the indicators proposed in the article. Their background and computational features are described in the early sections of the article. Their theoretical background is explained. The concluding sections are devoted to empirical studies. One of these illustrates the selection of auxiliary variables in a survey at Statistics Sweden. A second empirical illustration is a simulation with a constructed finite population; a number of potential auxiliary vectors are ranked in order of preference with the aid of the indicators.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X201000111248
    Description:

    Gross flows are often used to study transitions in employment status or other categorical variables among individuals in a population. Dual frame longitudinal surveys, in which independent samples are selected from two frames to decrease survey costs or improve coverage, can present challenges for efficient and consistent estimation of gross flows because of complex designs and missing data in either or both samples. We propose estimators of gross flows in dual frame surveys and examine their asymptotic properties. We then estimate transitions in employment status using data from the Current Population Survey and the Survey of Income and Program Participation.

    Release date: 2010-06-29

  • Articles and reports: 12-001-X201000111249
    Description:

    For many designs, there is a nonzero probability of selecting a sample that provides poor estimates for known quantities. Stratified random sampling reduces the set of such possible samples by fixing the sample size within each stratum. However, undesirable samples are still possible with stratification. Rejective sampling removes poor performing samples by only retaining a sample if specified functions of sample estimates are within a tolerance of known values. The resulting samples are often said to be balanced on the function of the variables used in the rejection procedure. We provide modifications to the rejection procedure of Fuller (2009a) that allow more flexibility on the rejection rules. Through simulation, we compare estimation properties of a rejective sampling procedure to those of cube sampling.

    Release date: 2010-06-29

  • Articles and reports: 12-001-X201000111243
    Description:

    The 2003 National Assessment of Adult Literacy (NAAL) and the international Adult Literacy and Lifeskills (ALL) surveys each involved stratified multi-stage area sample designs. During the last stage, a household roster was constructed, the eligibility status of each individual was determined, and the selection procedure was invoked to randomly select one or two eligible persons within the household. The objective of this paper is to evaluate the within-household selection rules under a multi-stage design while improving the procedure in future literacy surveys. The analysis is based on the current US household size distribution and intracluster correlation coefficients using the adult literacy data. In our evaluation, several feasible household selection rules are studied, considering effects from clustering, differential sampling rates, cost per interview, and household burden. In doing so, an evaluation of within-household sampling under a two-stage design is extended to a four-stage design and some generalizations are made to multi-stage samples with different cost ratios.

    Release date: 2010-06-29

Reference (72)

Reference (72) (25 of 72 results)

  • Technical products: 11-522-X201700014707
    Description:

    The Labour Force Survey (LFS) is a monthly household survey of about 56,000 households that provides information on the Canadian labour market. Audit Trail is a Blaise programming option, for surveys like LFS with Computer Assisted Interviewing (CAI), which creates files containing every keystroke and edit and timestamp of every data collection attempt on all households. Combining such a large survey with such a complete source of paradata opens the door to in-depth data quality analysis but also quickly leads to Big Data challenges. How can meaningful information be extracted from this large set of keystrokes and timestamps? How can it help assess the quality of LFS data collection? The presentation will describe some of the challenges that were encountered, solutions that were used to address them, and results of the analysis on data quality.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014715
    Description:

    In preparation for 2021 UK Census the ONS has committed to an extensive research programme exploring how linked administrative data can be used to support conventional statistical processes. Item-level edit and imputation (E&I) will play an important role in adjusting the 2021 Census database. However, uncertainty associated with the accuracy and quality of available administrative data renders the efficacy of an integrated census-administrative data approach to E&I unclear. Current constraints that dictate an anonymised ‘hash-key’ approach to record linkage to ensure confidentiality add to that uncertainty. Here, we provide preliminary results from a simulation study comparing the predictive and distributional accuracy of the conventional E&I strategy implemented in CANCEIS for the 2011 UK Census to that of an integrated approach using synthetic administrative data with systematically increasing error as auxiliary information. In this initial phase of research we focus on imputing single year of age. The aim of the study is to gain insight into whether auxiliary information from admin data can improve imputation estimates and where the different strategies fall on a continuum of accuracy.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014728
    Description:

    Record linkage joins together two or more sources. The product of record linkage is a file with one record per individual containing all the information about the individual from the multiple files. The problem is difficult when a unique identification key is not available, there are errors in some variables, some data are missing, and files are large. Probabilistic record linkage computes a probability that records from on different files pertain to a single individual. Some true links are given low probabilities of matching, whereas some non links are given high probabilities. Errors in linkage designations can cause bias in analyses based on the composite data base. The SEER cancer registries contain information on breast cancer cases in their registry areas. A diagnostic test based on the Oncotype DX assay, performed by Genomic Health, Inc. (GHI), is often performed for certain types of breast cancers. Record linkage using personal identifiable information was conducted to associate Oncotype DC assay results with SEER cancer registry information. The software Link Plus was used to generate a score describing the similarity of records and to identify the apparent best match of SEER cancer registry individuals to the GHI database. Clerical review was used to check samples of likely matches, possible matches, and unlikely matches. Models are proposed for jointly modeling the record linkage process and subsequent statistical analysis in this and other applications.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014756
    Description:

    How can we bring together multidimensional health system performance data in a simplified way that is easy to access and provides comparable and actionable information to accelerate improvements in health care? The Canadian Institute for Health Information has developed a suite of tools to meet performance measurement needs across different audiences, to identify improvement priorities, understand how health regions and facilities compare with peers and support transparency and accountability. The pan-Canadian tools [Your Health System (YHS)] consolidates reporting of 45 key performance indicators in a structured way, and are comparable over time and at different geographic levels. This paper outlines the development and the methodological approaches and considerations taken to create a dynamic tool that facilitates benchmarking and meaningful comparisons for health system performance improvement.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014746
    Description:

    Paradata research has focused on identifying opportunities for strategic improvement in data collection that could be operationally viable and lead to enhancements in quality or cost efficiency. To that end, Statistics Canada has developed and implemented a responsive collection design (RCD) strategy for computer-assisted telephone interview (CATI) household surveys to maximize quality and efficiency and to potentially reduce costs. RCD is an adaptive approach to survey data collection that uses information available prior to and during data collection to adjust the collection strategy for the remaining in-progress cases. In practice, the survey managers monitor and analyze collection progress against a predetermined set of indicators for two purposes: to identify critical data-collection milestones that require significant changes to the collection approach and to adjust collection strategies to make the most efficient use of remaining available resources. In the RCD context, numerous considerations come into play when determining which aspects of data collection to adjust and how to adjust them. Paradata sources play a key role in the planning, development and implementation of active management for RCD surveys. Since 2009, Statistics Canada has conducted several RCD surveys. This paper describes Statistics Canada’s experiences in implementing and monitoring this type of surveys.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014731
    Description:

    Our study describes various factors that are of concern when evaluating disclosure risk of contextualized microdata and some of the empirical steps that are involved in their assessment. Utilizing synthetic sets of survey respondents, we illustrate how different postulates shape the assessment of risk when considering: (1) estimated probabilities that unidentified geographic areas are represented within a survey; (2) the number of people in the population who share the same personal and contextual identifiers as a respondent; and (3) the anticipated amount of coverage error in census population counts and extant files that provide identifying information (like names and addresses).

    Release date: 2016-03-24

  • Technical products: 11-522-X201300014260
    Description:

    The Survey of Employment, Payrolls and Hours (SEPH) produces monthly estimates and determines the month-to-month changes for variables such as employment, earnings and hours at detailed industrial levels for Canada, the provinces and territories. In order to improve the efficiency of collection activities for this survey, an electronic questionnaire (EQ) was introduced in the fall of 2012. Given the timeframe allowed for this transition as well as the production calendar of the survey, a conversion strategy was developed for the integration of this new mode. The goal of the strategy was to ensure a good adaptation of the collection environment and also to allow the implementation of a plan of analysis that would evaluate the impact of this change on the results of the survey. This paper will give an overview of the conversion strategy, the different adjustments that were made during the transition period and the results of various evaluations that were conducted. For example, the impact of the integration of the EQ on the collection process, the response rate and the follow-up rate will be presented. In addition, the effect that this new collection mode has on the survey estimates will also be discussed. More specifically, the results of a randomized experiment that was conducted in order to determine the presence of a mode effect will be presented.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014270
    Description:

    There is a wide range of character-string comparators in the record-linkage field. Comparison problems arise when factors affect the composition of the strings (for example, the use of a nickname instead of a given name, and typographical errors). In these cases, more sophisticated comparators must be used. Such tools help to reduce the number of potentially missed links. Unfortunately, some of the gains may be false links. In order to improve the matches, three sophisticated string comparators were developed; they are described in this paper. They are the Lachance comparator and its derivatives, the multi-word comparator and the multi-type comparator. This set of tools is currently available in a deterministic record-linkage prototype known as MixMatch. This application can use prior knowledge to reduce the volume of false links generated during matching. This paper also proposes a link-strength indicator.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014285
    Description:

    The 2011 National Household Survey (NHS) is a voluntary survey that replaced the traditional mandatory long-form questionnaire of the Canadian census of population. The NHS sampled about 30% of Canadian households and achieved a design-weighted response rate of 77%. In comparison, the last census long form was sent to 20% of households and achieved a response rate of 94%. Based on the long-form data, Statistics Canada traditionally produces two public use microdata files (PUMFs): the individual PUMF and the hierarchical PUMF. Both give information on individuals, but the hierarchical PUMF provides extra information on the household and family relationships between the individuals. To produce two PUMFs, based on the NHS data, that cover the whole country evenly and that do not overlap, we applied a special sub-sampling strategy. Difficulties in the confidentiality analyses have increased because of the numerous new variables, the more detailed geographic information and the voluntary nature of the NHS. This paper describes the 2011 PUMF methodology and how it balances the requirements for more information and for low risk of disclosure.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014284
    Description:

    The decline in response rates observed by several national statistical institutes, their desire to limit response burden and the significant budget pressures they face support greater use of administrative data to produce statistical information. The administrative data sources they must consider have to be evaluated according to several aspects to determine their fitness for use. Statistics Canada recently developed a process to evaluate administrative data sources for use as inputs to the statistical information production process. This evaluation is conducted in two phases. The initial phase requires access only to the metadata associated with the administrative data considered, whereas the second phase uses a version of data that can be evaluated. This article outlines the evaluation process and tool.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014281
    Description:

    Web surveys exclude the entire non-internet population and often have low response rates. Therefore, statistical inference based on Web survey samples will require availability of additional information about the non-covered population, careful choice of survey methods to account for potential biases, and caution with interpretation and generalization of the results to a target population. In this paper, we focus on non-coverage bias, and explore the use of weighted estimators and hot-deck imputation estimators for bias adjustment under the ideal scenario where covariate information was obtained for a simple random sample of individuals from the non-covered population. We illustrate empirically the performance of the proposed estimators under this scenario. Possible extensions of these approaches to more realistic scenarios are discussed.

    Release date: 2014-10-31

  • Technical products: 11-522-X200800010999
    Description:

    The choice of number of call attempts in a telephone survey is an important decision. A large number of call attempts makes the data collection costly and time-consuming; and a small number of attempts decreases the response set from which conclusions are drawn and increases the variance. The decision can also have an effect on the nonresponse bias. In this paper we study the effects of number of call attempts on the nonresponse rate and the nonresponse bias in two surveys conducted by Statistics Sweden: The Labour Force Survey (LFS) and Household Finances (HF).

    By use of paradata we calculate the response rate as a function of the number of call attempts. To estimate the nonresponse bias we use estimates of some register variables, where observations are available for both respondents and nonrespondents. We also calculate estimates of some real survey parameters as functions of varying number of call attempts. The results indicate that it is possible to reduce the current number of call attempts without getting an increased nonresponse bias.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010950
    Description:

    The next census will be conducted in May 2011. Being a major survey, it presents a formidable challenge for Statistics Canada and requires a great deal of time and resources. Careful planning has been done to ensure that all deadlines are met. A number of steps have been planned in the questionnaire testing process. These tests apply to both census content and the proposed communications strategy. This paper presents an overview of the strategy, with a focus on combining qualitative studies with the 2008 quantitative study so that the results can be analyzed and the proposals properly evaluated.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010953
    Description:

    As survey researchers attempt to maintain traditionally high response rates, reluctant respondents have resulted in increasing data collection costs. This respondent reluctance may be related to the amount of time it takes to complete an interview in large-scale, multi-purpose surveys, such as the National Survey of Recent College Graduates (NSRCG). Recognizing that respondent burden or questionnaire length may contribute to lower response rates, in 2003, following several months of data collection under the standard data collection protocol, the NSRCG offered its nonrespondents monetary incentives about two months before the end of the data collection,. In conjunction with the incentive offer, the NSRCG also offered persistent nonrespondents an opportunity to complete a much-abbreviated interview consisting of a few critical items. The late respondents who completed the interviews as the result of the incentive and critical items-only questionnaire offers may provide some insight into the issue of nonresponse bias and the likelihood that such interviewees would have remained survey nonrespondents if these refusal conversion efforts had not been made.

    In this paper, we define "reluctant respondents" as those who responded to the survey only after extra efforts were made beyond the ones initially planned in the standard data collection protocol. Specifically, reluctant respondents in the 2003 NSRCG are those who responded to the regular or shortened questionnaire following the incentive offer. Our conjecture was that the behavior of the reluctant respondents would be more like that of nonrespondents than of respondents to the surveys. This paper describes an investigation of reluctant respondents and the extent to which they are different from regular respondents. We compare different response groups on several key survey estimates. This comparison will expand our understanding of nonresponse bias in the NSRCG, and of the characteristics of nonrespondents themselves, thus providing a basis for changes in the NSRCG weighting system or estimation procedures in the future.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010986
    Description:

    Major changes were made to the data collection process for the 2006 Census. One of those changes was the Internet response option, which was offered to all private households in Canada. Nearly one in five households chose to complete and return the questionnaire on-line. In addition, a new method of promoting Internet response was tested via the Internet Response Promotion (IRP) Study. The new approach proved very effective at increasing the on-line response rate. Planning for the 2011 Census, which is under way, calls for the use of a wave collection strategy, and wave 1 would be the IRP method. This paper provides an overview of Internet data collection in the 2006 Census - evaluations, results, lessons learned - and the methodology that will be used in the next census in 2011.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010997
    Description:

    Over the past few years, Statistics Canada has conducted several analytical studies using paradata to learn more about various issues surrounding the data collection process and practices. In particular, these investigations have attempted to better understand how data collection progresses through its cycle, to identify strategic opportunities, to evaluate new collection initiatives and to improve the way the agency conducts and manages its surveys. The main objectives of this paper are to present the main results of these past and ongoing investigations describing Statistics Canada's experiences with regards to paradata. Future research plans that focus on identifying viable operational strategies that could improve efficiency or data quality are also discussed.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010996
    Description:

    In recent years, the use of paradata has become increasingly important to the management of collection activities at Statistics Canada. Particular attention has been paid to social surveys conducted over the phone, like the Survey of Labour and Income Dynamics (SLID). For recent SLID data collections, the number of call attempts was capped at 40 calls. Investigations of the SLID Blaise Transaction History (BTH) files were undertaken to assess the impact of the cap on calls.The purpose of the first study was to inform decisions as to the capping of call attempts, the second study focused on the nature of nonresponse given the limit of 40 attempts.

    The use of paradata as auxiliary information for studying and accounting for survey nonresponse was also examined. Nonresponse adjustment models using different paradata variables gathered at the collection stage were compared to the current models based on available auxiliary information from the Labour Force Survey.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010958
    Description:

    Telephone Data Entry (TDE) is a system by which survey respondents can return their data to the Office for National Statistics (ONS) using the keypad on their telephone and currently accounts for approximately 12% of total responses to ONS business surveys. ONS is currently increasing the number of surveys which use TDE as the primary mode of response and this paper gives an overview of the redevelopment project covering; the redevelopment of the paper questionnaire, enhancements made to the TDE system and the results from piloting these changes. Improvements to the quality of the data received and increased response via TDE as a result of these developments suggest that data quality improvements and cost savings are possible as a result of promoting TDE as the primary mode of response to short term surveys.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011013
    Description:

    Collecting data using audio recordings for interviewing can be an effective and versatile data collection tool. These recordings however can lead to large files which are cumbersome to manage. Technological developments including better audio software development tools and increased adoption of broadband connections has eased the burden in the collection of audio data. This paper focuses on technologies and techniques used to record and manage audio collected surveys using laptops, telephones and internet connections. The process outlined involves devices connecting directly to the phone receiver which streams conversations directly to the laptop for storage and transmission.

    Release date: 2009-12-03

  • Technical products: 11-536-X200900110804
    Description:

    This paper deals with calibration estimation for surveys with nonresponse. Efficient weighting adjustment for unit nonresponse requires powerful auxiliary information. The weights in the calibration estimator are computed on information about a specified auxiliary vector. Even with the "best possible" auxiliary vector, some bias remains in the estimator. An indicator of the remaining bias is presented and analyzed.

    The many potential auxiliary variables allow the statistician to compose a wide variety of possible auxiliary vectors. The need arises to compare these vectors to assess their effectiveness for bias reduction. To this end we examine an indicator useful for ranking alternative auxiliary vectors in regard to their ability to reduce the bias. The indicator is computed on the auxiliary vector values for the sampled units, responding and nonresponding. An advantage is its independence of the study variables, of which there are many in a large survey.

    The properties of the indicator are examined in empirical studies. A synthetic population is constructed and potential auxiliary vectors are ranked with the aid of the indicator. Another empirical illustration illustrates how the indicator is used for selecting auxiliary variables in a large survey at Statistics Sweden.

    Release date: 2009-08-11

  • Technical products: 11-536-X200900110803
    Description:

    "Classical GREG estimator" is used here to refer to the generalized regression estimator extensively discussed for example in Särndal, Swensson and Wretman (1992). This paper summarize some recent extensions of the classical GREG estimator when applied to the estimation of totals for population subgroups or domains. GREG estimation was introduced for domain estimation in Särndal (1981, 1984), Hidiroglou and Särndal (1985) and Särndal and Hidiroglou (1989), and was developed further in Estevao, Hidiroglou and Särndal (1995). For the classical GREG estimator, fixed-effects linear model serves as the underlying working or assisting model, and aggregate-level auxiliary totals are incorporated in the estimation procedure. In some recent developments, an access to unit-level auxiliary data is assumed for GREG estimation for domains. Obviously, an access to micro-merged register and survey data involves much flexibility for domain estimation. This view has been adopted for GREG estimation for example in Lehtonen and Veijanen (1998), Lehtonen, Särndal and Veijanen (2003, 2005), and Lehtonen, Myrskylä, Särndal and Veijanen (2007). These extensions cover the cases of continuous and binary or polytomous response variables, use of generalized linear mixed models as assisting models, and unequal probability sampling designs. Relative merits and challenges of the various GREG estimators will be discussed.

    Release date: 2009-08-11

  • Technical products: 11-522-X200600110420
    Description:

    Most major survey research organizations in the United States and Canada do not include wireless telephone numbers when conducting random-digit-dialed (RDD) household telephone surveys. In this paper, we offer the most up-to-date estimates available from the U.S. National Center for Health Statistics and Statistics Canada concerning the prevalence and demographic characteristics of the wireless-only population. We then present data from the U.S. National Health Interview Survey on the health and health care access of wireless-only adults, and we examine the potential for coverage bias when health research is conducted using RDD surveys that exclude wireless telephone numbers.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110434
    Description:

    Protecting respondents from disclosure of their identity in publicly released survey data is of practical concern to many government agencies. Methods for doing so include suppression of cluster and stratum identifiers and altering or swapping record values between respondents. Unfortunately, stratum and cluster identifiers are usually needed for variance estimation using linearization and for replication methods as resampling is typically done on first-stage sampling units within strata. One might feel that releasing a set of replicate weights that also have stratum and cluster identifiers suppressed might circumvent this problem to some extent, especially using some random resampling such as the bootstrap. In this article, we first demonstrate that by viewing the replicate weights as observations in a high dimensional space one can easily use clustering algorithms to reconstruct the cluster identifiers irrespective of the resampling method even if the resampling weights are randomly altered. We then propose a fast algorithm for swapping cluster and strata identifiers of ultimate units before creating replicate weights without significantly impacting resulting variance estimates of characteristics of interest. The methods are illustrated by application to publicly released data from the National Health and Nutrition Examination Surveys, where such disclosure issues are extremely important..

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110393
    Description:

    In this paper we derive a second-order unbiased (or nearly unbiased) mean squared prediction error (MSPE) estimator of empirical best linear unbiased predictor (EBLUP) of a small area total for a non-normal extension to the well-known Fay-Herriot model. Specifically, we derive our MSPE estimator essentially assuming certain moment conditions on both the sampling and random effects distributions. The normality-based Prasad-Rao MSPE estimator has a surprising robustness property in that it remains second-order unbiased under the non-normality of random effects when a simple method-of-moments estimator is used for the variance component and the sampling error distribution is normal. We show that the normality-based MSPE estimator is no longer second-order unbiased when the sampling error distribution is non-normal or when the Fay-Herriot moment method is used to estimate the variance component, even when the sampling error distribution is normal. It is interesting to note that when the simple method-of moments estimator is used for the variance component, our proposed MSPE estimator does not require the estimation of kurtosis of the random effects. Results of a simulation study on the accuracy of the proposed MSPE estimator, under non-normality of both sampling and random effects distributions, are also presented.

    Release date: 2008-03-17

  • Technical products: 11-522-X20050019464
    Description:

    The Quarterly Services Survey has maintained comprehensive response data since the survey's inception. In analyzing the data, we concentrate on three fundamental features of response: rate, timeliness, and quality. We examine these three components across multiple dimensions. We observe the effect associated with NAICS classification, company size and response mode.

    Release date: 2007-03-02

Date modified: