Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

80 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

80 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

80 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

80 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (153)

All (153) (25 of 153 results)

  • Articles and reports: 12-001-X201700254895
    Description:

    This note by Graham Kalton presents a discussion of the paper “Sample survey theory and methods: Past, present, and future directions” where J.N.K. Rao and Wayne A. Fuller share their views regarding the developments in sample survey theory and methods covering the past 100 years.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700254872
    Description:

    This note discusses the theoretical foundations for the extension of the Wilson two-sided coverage interval to an estimated proportion computed from complex survey data. The interval is shown to be asymptotically equivalent to an interval derived from a logistic transformation. A mildly better version is discussed, but users may prefer constructing a one-sided interval already in the literature.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700114822
    Description:

    We use a Bayesian method to infer about a finite population proportion when binary data are collected using a two-fold sample design from small areas. The two-fold sample design has a two-stage cluster sample design within each area. A former hierarchical Bayesian model assumes that for each area the first stage binary responses are independent Bernoulli distributions, and the probabilities have beta distributions which are parameterized by a mean and a correlation coefficient. The means vary with areas but the correlation is the same over areas. However, to gain some flexibility we have now extended this model to accommodate different correlations. The means and the correlations have independent beta distributions. We call the former model a homogeneous model and the new model a heterogeneous model. All hyperparameters have proper noninformative priors. An additional complexity is that some of the parameters are weakly identified making it difficult to use a standard Gibbs sampler for computation. So we have used unimodal constraints for the beta prior distributions and a blocked Gibbs sampler to perform the computation. We have compared the heterogeneous and homogeneous models using an illustrative example and simulation study. As expected, the two-fold model with heterogeneous correlations is preferred.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201700114817
    Description:

    We present research results on sample allocations for efficient model-based small area estimation in cases where the areas of interest coincide with the strata. Although model-assisted and model-based estimation methods are common in the production of small area statistics, utilization of the underlying model and estimation method are rarely included in the sample area allocation scheme. Therefore, we have developed a new model-based allocation named g1-allocation. For comparison, one recently developed model-assisted allocation is presented. These two allocations are based on an adjusted measure of homogeneity which is computed using an auxiliary variable and is an approximation of the intra-class correlation within areas. Five model-free area allocation solutions presented in the past are selected from the literature as reference allocations. Equal and proportional allocations need the number of areas and area-specific numbers of basic statistical units. The Neyman, Bankier and NLP (Non-Linear Programming) allocation need values for the study variable concerning area level parameters such as standard deviation, coefficient of variation or totals. In general, allocation methods can be classified according to the optimization criteria and use of auxiliary data. Statistical properties of the various methods are assessed through sample simulation experiments using real population register data. It can be concluded from simulation results that inclusion of the model and estimation method into the allocation method improves estimation results.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600214676
    Description:

    Winsorization procedures replace extreme values with less extreme values, effectively moving the original extreme values toward the center of the distribution. Winsorization therefore both detects and treats influential values. Mulry, Oliver and Kaputa (2014) compare the performance of the one-sided Winsorization method developed by Clark (1995) and described by Chambers, Kokic, Smith and Cruddas (2000) to the performance of M-estimation (Beaumont and Alavi 2004) in highly skewed business population data. One aspect of particular interest for methods that detect and treat influential values is the range of values designated as influential, called the detection region. The Clark Winsorization algorithm is easy to implement and can be extremely effective. However, the resultant detection region is highly dependent on the number of influential values in the sample, especially when the survey totals are expected to vary greatly by collection period. In this note, we examine the effect of the number and magnitude of influential values on the detection regions from Clark Winsorization using data simulated to realistically reflect the properties of the population for the Monthly Retail Trade Survey (MRTS) conducted by the U.S. Census Bureau. Estimates from the MRTS and other economic surveys are used in economic indicators, such as the Gross Domestic Product (GDP).

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114539
    Description:

    Statistical matching is a technique for integrating two or more data sets when information available for matching records for individual participants across data sets is incomplete. Statistical matching can be viewed as a missing data problem where a researcher wants to perform a joint analysis of variables that are never jointly observed. A conditional independence assumption is often used to create imputed data for statistical matching. We consider a general approach to statistical matching using parametric fractional imputation of Kim (2011) to create imputed data under the assumption that the specified model is fully identified. The proposed method does not have a convergent EM sequence if the model is not identified. We also present variance estimators appropriate for the imputation procedure. We explain how the method applies directly to the analysis of data from split questionnaire designs and measurement error models.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114545
    Description:

    The estimation of quantiles is an important topic not only in the regression framework, but also in sampling theory. A natural alternative or addition to quantiles are expectiles. Expectiles as a generalization of the mean have become popular during the last years as they not only give a more detailed picture of the data than the ordinary mean, but also can serve as a basis to calculate quantiles by using their close relationship. We show, how to estimate expectiles under sampling with unequal probabilities and how expectiles can be used to estimate the distribution function. The resulting fitted distribution function estimator can be inverted leading to quantile estimates. We run a simulation study to investigate and compare the efficiency of the expectile based estimator.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114543
    Description:

    The regression estimator is extensively used in practice because it can improve the reliability of the estimated parameters of interest such as means or totals. It uses control totals of variables known at the population level that are included in the regression set up. In this paper, we investigate the properties of the regression estimator that uses control totals estimated from the sample, as well as those known at the population level. This estimator is compared to the regression estimators that strictly use the known totals both theoretically and via a simulation study.

    Release date: 2016-06-22

  • Technical products: 11-522-X201700014758
    Description:

    "Several Canadian jurisdictions including Ontario are using patient-based healthcare data in their funding models. These initiatives can influence the quality of this data both positively and negatively as people tend to pay more attention to the data and its quality when financial decisions are based upon it. Ontario’s funding formula uses data from several national databases housed at the Canadian Institute for Health Information (CIHI). These databases provide information on patient activity and clinical status across the continuum of care. As funding models may influence coding behaviour, CIHI is collaborating with the Ontario Ministry of Health and Long-Term Care to assess and monitor the quality of this data. CIHI is using data mining software and modelling techniques (that are often associated with “big data”) to identify data anomalies across multiple factors. The models identify what the “typical” clinical coding patterns are for key patient groups (for example, patients seen in special care units or discharged to home care), so that outliers can be identified, where patients do not fit the expected pattern. A key component of the modelling is segmenting the data based on patient, provider and hospital characteristics to take into account key differences in the delivery of health care and patient populations across the province. CIHI’s analysis identified several hospitals with coding practices that appear to be changing or significantly different from their peer group. Further investigation is required to understand why these differences exist and to develop appropriate strategies to mitigate variations. "

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014744
    Description:

    "This presentation will begin with Dr. West providing a summary of research that has been conducted on the quality and utility of paradata collected as part of the United States National Survey of Family Growth (NSFG). The NSFG is the major national fertility survey in the U.S., and an important source of data on sexual activity, sexual behavior, and reproductive health for policy makers. For many years, the NSFG has been collecting various forms of paradata, including keystroke information (e.g., Couper and Kreuter 2013), call record information, detailed case disposition information, and interviewer observations related to key NSFG measures (e.g., West 2013). Dr. West will discuss some of the challenges of working with these data, in addition to evidence of their utility for nonresponse adjustment, interviewer evaluation, and/or responsive survey design purposes. Dr. Kreuter will then present research done using paradata collected as part of two panel surveys: the Medical Expenditure Panel Survey (MEPS) in the United States, and the Panel Labour Market and Social Security (PASS) in Germany. In both surveys, information from contacts in prior waves were experimentally used to improve contact and response rates in subsequent waves. In addition, research from PASS will be presented where interviewer observations on key outcome variables were collected to be used in nonresponse adjustment or responsive survey design decisions. Dr. Kreuter will not only present the research results but also the practical challenges in implementing the collection and use of both sets of paradata. "

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014739
    Description:

    Vital statistics datasets such as the Canadian Mortality Database lack identifiers for certain populations of interest such as First Nations, Métis and Inuit. Record linkage between vital statistics and survey or other administrative datasets can circumvent this limitation. This paper describes a linkage between the Canadian Mortality Database and the 2006 Census of the Population and the planned analysis using the linked data.

    Release date: 2016-03-24

  • Articles and reports: 82-003-X201600114307
    Description:

    Using the 2012 Aboriginal Peoples Survey, this study examined the psychometric properties of the 10-item Kessler Psychological Distress Scale (a short measure of non-specific psychological distress) for First Nations people living off reserve, Métis, and Inuit aged 15 or older.

    Release date: 2016-01-20

  • Articles and reports: 82-003-X201600114306
    Description:

    This article is an overview of the creation, content, and quality of the 2006 Canadian Birth-Census Cohort Database.

    Release date: 2016-01-20

  • Articles and reports: 12-001-X201500214231
    Description:

    Rotating panels are widely applied by national statistical institutes, for example, to produce official statistics about the labour force. Estimation procedures are generally based on traditional design-based procedures known from classical sampling theory. A major drawback of this class of estimators is that small sample sizes result in large standard errors and that they are not robust for measurement bias. Two examples showing the effects of measurement bias are rotation group bias in rotating panels, and systematic differences in the outcome of a survey due to a major redesign of the underlying process. In this paper we apply a multivariate structural time series model to the Dutch Labour Force Survey to produce model-based figures about the monthly labour force. The model reduces the standard errors of the estimates by taking advantage of sample information collected in previous periods, accounts for rotation group bias and autocorrelation induced by the rotating panel, and models discontinuities due to a survey redesign. Additionally, we discuss the use of correlated auxiliary series in the model to further improve the accuracy of the model estimates. The method is applied by Statistics Netherlands to produce accurate official monthly statistics about the labour force that are consistent over time, despite a redesign of the survey process.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114192
    Description:

    We are concerned with optimal linear estimation of means on subsequent occasions under sample rotation where evolution of samples in time is designed through a cascade pattern. It has been known since the seminal paper of Patterson (1950) that when the units are not allowed to return to the sample after leaving it for certain period (there are no gaps in the rotation pattern), one step recursion for optimal estimator holds. However, in some important real surveys, e.g., Current Population Survey in the US or Labour Force Survey in many countries in Europe, units return to the sample after being absent in the sample for several occasions (there are gaps in rotation patterns). In such situations difficulty of the question of the form of the recurrence for optimal estimator increases drastically. This issue has not been resolved yet. Instead alternative sub-optimal approaches were developed, as K - composite estimation (see e.g., Hansen, Hurwitz, Nisselson and Steinberg (1955)), AK - composite estimation (see e.g., Gurney and Daly (1965)) or time series approach (see e.g., Binder and Hidiroglou (1988)).

    In the present paper we overcome this long-standing difficulty, that is, we present analytical recursion formulas for the optimal linear estimator of the mean for schemes with gaps in rotation patterns. It is achieved under some technical conditions: ASSUMPTION I and ASSUMPTION II (numerical experiments suggest that these assumptions might be universally satisfied). To attain the goal we develop an algebraic operator approach which allows to reduce the problem of recursion for the optimal linear estimator to two issues: (1) localization of roots (possibly complex) of a polynomial Qp defined in terms of the rotation pattern (Qp happens to be conveniently expressed through Chebyshev polynomials of the first kind), (2) rank of a matrix S defined in terms of the rotation pattern and the roots of the polynomial Qp. In particular, it is shown that the order of the recursion is equal to one plus the size of the largest gap in the rotation pattern. Exact formulas for calculation of the recurrence coefficients are given - of course, to use them one has to check (in many cases, numerically) that ASSUMPTIONs I and II are satisfied. The solution is illustrated through several examples of rotation schemes arising in real surveys.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114172
    Description:

    When a random sample drawn from a complete list frame suffers from unit nonresponse, calibration weighting to population totals can be used to remove nonresponse bias under either an assumed response (selection) or an assumed prediction (outcome) model. Calibration weighting in this way can not only provide double protection against nonresponse bias, it can also decrease variance. By employing a simple trick one can estimate the variance under the assumed prediction model and the mean squared error under the combination of an assumed response model and the probability-sampling mechanism simultaneously. Unfortunately, there is a practical limitation on what response model can be assumed when design weights are calibrated to population totals in a single step. In particular, the choice for the response function cannot always be logistic. That limitation does not hinder calibration weighting when performed in two steps: from the respondent sample to the full sample to remove the response bias and then from the full sample to the population to decrease variance. There are potential efficiency advantages from using the two-step approach as well even when the calibration variables employed in each step is a subset of the calibration variables in the single step. Simultaneous mean-squared-error estimation using linearization is possible, but more complicated than when calibrating in a single step.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114150
    Description:

    An area-level model approach to combining information from several sources is considered in the context of small area estimation. At each small area, several estimates are computed and linked through a system of structural error models. The best linear unbiased predictor of the small area parameter can be computed by the general least squares method. Parameters in the structural error models are estimated using the theory of measurement error models. Estimation of mean squared errors is also discussed. The proposed method is applied to the real problem of labor force surveys in Korea.

    Release date: 2015-06-29

  • Articles and reports: 82-003-X201500614196
    Description:

    This study investigates the feasibility and validity of using personal health insurance numbers to deterministically link the CCR and the Discharge Abstract Database to obtain hospitalization information about people with primary cancers.

    Release date: 2015-06-17

  • Technical products: 12-002-X201500114147
    Description:

    Influential observations in logistic regression are those that have a notable effect on certain aspects of the model fit. Large sample size alone does not eliminate this concern; it is still important to examine potentially influential observations, especially in complex survey data. This paper describes a straightforward algorithm for examining potentially influential observations in complex survey data using SAS software. This algorithm was applied in a study using the 2005 Canadian Community Health Survey that examined factors associated with family physician utilization for adolescents.

    Release date: 2015-03-25

  • Articles and reports: 12-001-X201400214097
    Description:

    When monthly business surveys are not completely overlapping, there are two different estimators for the monthly growth rate of the turnover: (i) one that is based on the monthly estimated population totals and (ii) one that is purely based on enterprises observed on both occasions in the overlap of the corresponding surveys. The resulting estimates and variances might be quite different. This paper proposes an optimal composite estimator for the growth rate as well as the population totals.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214091
    Description:

    Parametric fractional imputation (PFI), proposed by Kim (2011), is a tool for general purpose parameter estimation under missing data. We propose a fractional hot deck imputation (FHDI) which is more robust than PFI or multiple imputation. In the proposed method, the imputed values are chosen from the set of respondents and assigned proper fractional weights. The weights are then adjusted to meet certain calibration conditions, which makes the resulting FHDI estimator efficient. Two simulation studies are presented to compare the proposed method with existing methods.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214119
    Description:

    When considering sample stratification by several variables, we often face the case where the expected number of sample units to be selected in each stratum is very small and the total number of units to be selected is smaller than the total number of strata. These stratified sample designs are specifically represented by the tabular arrays with real numbers, called controlled selection problems, and are beyond the reach of conventional methods of allocation. Many algorithms for solving these problems have been studied over about 60 years beginning with Goodman and Kish (1950). Those developed more recently are especially computer intensive and always find the solutions. However, there still remains the unanswered question: In what sense are the solutions to a controlled selection problem obtained from those algorithms optimal? We introduce the general concept of optimal solutions, and propose a new controlled selection algorithm based on typical distance functions to achieve solutions. This algorithm can be easily performed by a new SAS-based software. This study focuses on two-way stratification designs. The controlled selection solutions from the new algorithm are compared with those from existing algorithms using several examples. The new algorithm successfully obtains robust solutions to two-way controlled selection problems that meet the optimality criteria.

    Release date: 2014-12-19

  • Technical products: 11-522-X201300014282
    Description:

    The IAB-Establishment Panel is the most comprehensive establishment survey in Germany with almost 16.000 firms participating every year. Face-to-face interviews with paper and pencil (PAPI) are conducted since 1993. An ongoing project examines possible effects of a change of the survey to computer aided personal interviews (CAPI) combined with a web based version of the questionnaire (CAWI). In a first step, questions about the internet access, the willingness to complete the questionnaire online and reasons for refusal were included in the 2012 wave. First results are indicating a widespread refusal to take part in a web survey. A closer look reveals that smaller establishments, long time participants and older respondents are reluctant to use the internet.

    Release date: 2014-10-31

  • Technical products: 12-002-X201400111901
    Description:

    This document is for analysts/researchers who are considering doing research with data from a survey where both survey weights and bootstrap weights are provided in the data files. This document gives directions, for some selected software packages, about how to get started in using survey weights and bootstrap weights for an analysis of survey data. We give brief directions for obtaining survey-weighted estimates, bootstrap variance estimates (and other desired error quantities) and some typical test statistics for each software package in turn. While these directions are provided just for the chosen examples, there will be information about the range of weighted and bootstrapped analyses that can be carried out by each software package.

    Release date: 2014-08-07

  • Articles and reports: 12-001-X201400114030
    Description:

    The paper reports the results of a Monte Carlo simulation study that was conducted to compare the effectiveness of four different hierarchical Bayes small area models for producing state estimates of proportions based on data from stratified simple random samples from a fixed finite population. Two of the models adopted the commonly made assumptions that the survey weighted proportion for each sampled small area has a normal distribution and that the sampling variance of this proportion is known. One of these models used a linear linking model and the other used a logistic linking model. The other two models both employed logistic linking models and assumed that the sampling variance was unknown. One of these models assumed a normal distribution for the sampling model while the other assumed a beta distribution. The study found that for all four models the credible interval design-based coverage of the finite population state proportions deviated markedly from the 95 percent nominal level used in constructing the intervals.

    Release date: 2014-06-27

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (107)

Analysis (107) (25 of 107 results)

  • Articles and reports: 12-001-X201700254895
    Description:

    This note by Graham Kalton presents a discussion of the paper “Sample survey theory and methods: Past, present, and future directions” where J.N.K. Rao and Wayne A. Fuller share their views regarding the developments in sample survey theory and methods covering the past 100 years.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700254872
    Description:

    This note discusses the theoretical foundations for the extension of the Wilson two-sided coverage interval to an estimated proportion computed from complex survey data. The interval is shown to be asymptotically equivalent to an interval derived from a logistic transformation. A mildly better version is discussed, but users may prefer constructing a one-sided interval already in the literature.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700114822
    Description:

    We use a Bayesian method to infer about a finite population proportion when binary data are collected using a two-fold sample design from small areas. The two-fold sample design has a two-stage cluster sample design within each area. A former hierarchical Bayesian model assumes that for each area the first stage binary responses are independent Bernoulli distributions, and the probabilities have beta distributions which are parameterized by a mean and a correlation coefficient. The means vary with areas but the correlation is the same over areas. However, to gain some flexibility we have now extended this model to accommodate different correlations. The means and the correlations have independent beta distributions. We call the former model a homogeneous model and the new model a heterogeneous model. All hyperparameters have proper noninformative priors. An additional complexity is that some of the parameters are weakly identified making it difficult to use a standard Gibbs sampler for computation. So we have used unimodal constraints for the beta prior distributions and a blocked Gibbs sampler to perform the computation. We have compared the heterogeneous and homogeneous models using an illustrative example and simulation study. As expected, the two-fold model with heterogeneous correlations is preferred.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201700114817
    Description:

    We present research results on sample allocations for efficient model-based small area estimation in cases where the areas of interest coincide with the strata. Although model-assisted and model-based estimation methods are common in the production of small area statistics, utilization of the underlying model and estimation method are rarely included in the sample area allocation scheme. Therefore, we have developed a new model-based allocation named g1-allocation. For comparison, one recently developed model-assisted allocation is presented. These two allocations are based on an adjusted measure of homogeneity which is computed using an auxiliary variable and is an approximation of the intra-class correlation within areas. Five model-free area allocation solutions presented in the past are selected from the literature as reference allocations. Equal and proportional allocations need the number of areas and area-specific numbers of basic statistical units. The Neyman, Bankier and NLP (Non-Linear Programming) allocation need values for the study variable concerning area level parameters such as standard deviation, coefficient of variation or totals. In general, allocation methods can be classified according to the optimization criteria and use of auxiliary data. Statistical properties of the various methods are assessed through sample simulation experiments using real population register data. It can be concluded from simulation results that inclusion of the model and estimation method into the allocation method improves estimation results.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600214676
    Description:

    Winsorization procedures replace extreme values with less extreme values, effectively moving the original extreme values toward the center of the distribution. Winsorization therefore both detects and treats influential values. Mulry, Oliver and Kaputa (2014) compare the performance of the one-sided Winsorization method developed by Clark (1995) and described by Chambers, Kokic, Smith and Cruddas (2000) to the performance of M-estimation (Beaumont and Alavi 2004) in highly skewed business population data. One aspect of particular interest for methods that detect and treat influential values is the range of values designated as influential, called the detection region. The Clark Winsorization algorithm is easy to implement and can be extremely effective. However, the resultant detection region is highly dependent on the number of influential values in the sample, especially when the survey totals are expected to vary greatly by collection period. In this note, we examine the effect of the number and magnitude of influential values on the detection regions from Clark Winsorization using data simulated to realistically reflect the properties of the population for the Monthly Retail Trade Survey (MRTS) conducted by the U.S. Census Bureau. Estimates from the MRTS and other economic surveys are used in economic indicators, such as the Gross Domestic Product (GDP).

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114539
    Description:

    Statistical matching is a technique for integrating two or more data sets when information available for matching records for individual participants across data sets is incomplete. Statistical matching can be viewed as a missing data problem where a researcher wants to perform a joint analysis of variables that are never jointly observed. A conditional independence assumption is often used to create imputed data for statistical matching. We consider a general approach to statistical matching using parametric fractional imputation of Kim (2011) to create imputed data under the assumption that the specified model is fully identified. The proposed method does not have a convergent EM sequence if the model is not identified. We also present variance estimators appropriate for the imputation procedure. We explain how the method applies directly to the analysis of data from split questionnaire designs and measurement error models.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114545
    Description:

    The estimation of quantiles is an important topic not only in the regression framework, but also in sampling theory. A natural alternative or addition to quantiles are expectiles. Expectiles as a generalization of the mean have become popular during the last years as they not only give a more detailed picture of the data than the ordinary mean, but also can serve as a basis to calculate quantiles by using their close relationship. We show, how to estimate expectiles under sampling with unequal probabilities and how expectiles can be used to estimate the distribution function. The resulting fitted distribution function estimator can be inverted leading to quantile estimates. We run a simulation study to investigate and compare the efficiency of the expectile based estimator.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114543
    Description:

    The regression estimator is extensively used in practice because it can improve the reliability of the estimated parameters of interest such as means or totals. It uses control totals of variables known at the population level that are included in the regression set up. In this paper, we investigate the properties of the regression estimator that uses control totals estimated from the sample, as well as those known at the population level. This estimator is compared to the regression estimators that strictly use the known totals both theoretically and via a simulation study.

    Release date: 2016-06-22

  • Articles and reports: 82-003-X201600114307
    Description:

    Using the 2012 Aboriginal Peoples Survey, this study examined the psychometric properties of the 10-item Kessler Psychological Distress Scale (a short measure of non-specific psychological distress) for First Nations people living off reserve, Métis, and Inuit aged 15 or older.

    Release date: 2016-01-20

  • Articles and reports: 82-003-X201600114306
    Description:

    This article is an overview of the creation, content, and quality of the 2006 Canadian Birth-Census Cohort Database.

    Release date: 2016-01-20

  • Articles and reports: 12-001-X201500214231
    Description:

    Rotating panels are widely applied by national statistical institutes, for example, to produce official statistics about the labour force. Estimation procedures are generally based on traditional design-based procedures known from classical sampling theory. A major drawback of this class of estimators is that small sample sizes result in large standard errors and that they are not robust for measurement bias. Two examples showing the effects of measurement bias are rotation group bias in rotating panels, and systematic differences in the outcome of a survey due to a major redesign of the underlying process. In this paper we apply a multivariate structural time series model to the Dutch Labour Force Survey to produce model-based figures about the monthly labour force. The model reduces the standard errors of the estimates by taking advantage of sample information collected in previous periods, accounts for rotation group bias and autocorrelation induced by the rotating panel, and models discontinuities due to a survey redesign. Additionally, we discuss the use of correlated auxiliary series in the model to further improve the accuracy of the model estimates. The method is applied by Statistics Netherlands to produce accurate official monthly statistics about the labour force that are consistent over time, despite a redesign of the survey process.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114192
    Description:

    We are concerned with optimal linear estimation of means on subsequent occasions under sample rotation where evolution of samples in time is designed through a cascade pattern. It has been known since the seminal paper of Patterson (1950) that when the units are not allowed to return to the sample after leaving it for certain period (there are no gaps in the rotation pattern), one step recursion for optimal estimator holds. However, in some important real surveys, e.g., Current Population Survey in the US or Labour Force Survey in many countries in Europe, units return to the sample after being absent in the sample for several occasions (there are gaps in rotation patterns). In such situations difficulty of the question of the form of the recurrence for optimal estimator increases drastically. This issue has not been resolved yet. Instead alternative sub-optimal approaches were developed, as K - composite estimation (see e.g., Hansen, Hurwitz, Nisselson and Steinberg (1955)), AK - composite estimation (see e.g., Gurney and Daly (1965)) or time series approach (see e.g., Binder and Hidiroglou (1988)).

    In the present paper we overcome this long-standing difficulty, that is, we present analytical recursion formulas for the optimal linear estimator of the mean for schemes with gaps in rotation patterns. It is achieved under some technical conditions: ASSUMPTION I and ASSUMPTION II (numerical experiments suggest that these assumptions might be universally satisfied). To attain the goal we develop an algebraic operator approach which allows to reduce the problem of recursion for the optimal linear estimator to two issues: (1) localization of roots (possibly complex) of a polynomial Qp defined in terms of the rotation pattern (Qp happens to be conveniently expressed through Chebyshev polynomials of the first kind), (2) rank of a matrix S defined in terms of the rotation pattern and the roots of the polynomial Qp. In particular, it is shown that the order of the recursion is equal to one plus the size of the largest gap in the rotation pattern. Exact formulas for calculation of the recurrence coefficients are given - of course, to use them one has to check (in many cases, numerically) that ASSUMPTIONs I and II are satisfied. The solution is illustrated through several examples of rotation schemes arising in real surveys.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114172
    Description:

    When a random sample drawn from a complete list frame suffers from unit nonresponse, calibration weighting to population totals can be used to remove nonresponse bias under either an assumed response (selection) or an assumed prediction (outcome) model. Calibration weighting in this way can not only provide double protection against nonresponse bias, it can also decrease variance. By employing a simple trick one can estimate the variance under the assumed prediction model and the mean squared error under the combination of an assumed response model and the probability-sampling mechanism simultaneously. Unfortunately, there is a practical limitation on what response model can be assumed when design weights are calibrated to population totals in a single step. In particular, the choice for the response function cannot always be logistic. That limitation does not hinder calibration weighting when performed in two steps: from the respondent sample to the full sample to remove the response bias and then from the full sample to the population to decrease variance. There are potential efficiency advantages from using the two-step approach as well even when the calibration variables employed in each step is a subset of the calibration variables in the single step. Simultaneous mean-squared-error estimation using linearization is possible, but more complicated than when calibrating in a single step.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114150
    Description:

    An area-level model approach to combining information from several sources is considered in the context of small area estimation. At each small area, several estimates are computed and linked through a system of structural error models. The best linear unbiased predictor of the small area parameter can be computed by the general least squares method. Parameters in the structural error models are estimated using the theory of measurement error models. Estimation of mean squared errors is also discussed. The proposed method is applied to the real problem of labor force surveys in Korea.

    Release date: 2015-06-29

  • Articles and reports: 82-003-X201500614196
    Description:

    This study investigates the feasibility and validity of using personal health insurance numbers to deterministically link the CCR and the Discharge Abstract Database to obtain hospitalization information about people with primary cancers.

    Release date: 2015-06-17

  • Articles and reports: 12-001-X201400214097
    Description:

    When monthly business surveys are not completely overlapping, there are two different estimators for the monthly growth rate of the turnover: (i) one that is based on the monthly estimated population totals and (ii) one that is purely based on enterprises observed on both occasions in the overlap of the corresponding surveys. The resulting estimates and variances might be quite different. This paper proposes an optimal composite estimator for the growth rate as well as the population totals.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214091
    Description:

    Parametric fractional imputation (PFI), proposed by Kim (2011), is a tool for general purpose parameter estimation under missing data. We propose a fractional hot deck imputation (FHDI) which is more robust than PFI or multiple imputation. In the proposed method, the imputed values are chosen from the set of respondents and assigned proper fractional weights. The weights are then adjusted to meet certain calibration conditions, which makes the resulting FHDI estimator efficient. Two simulation studies are presented to compare the proposed method with existing methods.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214119
    Description:

    When considering sample stratification by several variables, we often face the case where the expected number of sample units to be selected in each stratum is very small and the total number of units to be selected is smaller than the total number of strata. These stratified sample designs are specifically represented by the tabular arrays with real numbers, called controlled selection problems, and are beyond the reach of conventional methods of allocation. Many algorithms for solving these problems have been studied over about 60 years beginning with Goodman and Kish (1950). Those developed more recently are especially computer intensive and always find the solutions. However, there still remains the unanswered question: In what sense are the solutions to a controlled selection problem obtained from those algorithms optimal? We introduce the general concept of optimal solutions, and propose a new controlled selection algorithm based on typical distance functions to achieve solutions. This algorithm can be easily performed by a new SAS-based software. This study focuses on two-way stratification designs. The controlled selection solutions from the new algorithm are compared with those from existing algorithms using several examples. The new algorithm successfully obtains robust solutions to two-way controlled selection problems that meet the optimality criteria.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400114030
    Description:

    The paper reports the results of a Monte Carlo simulation study that was conducted to compare the effectiveness of four different hierarchical Bayes small area models for producing state estimates of proportions based on data from stratified simple random samples from a fixed finite population. Two of the models adopted the commonly made assumptions that the survey weighted proportion for each sampled small area has a normal distribution and that the sampling variance of this proportion is known. One of these models used a linear linking model and the other used a logistic linking model. The other two models both employed logistic linking models and assumed that the sampling variance was unknown. One of these models assumed a normal distribution for the sampling model while the other assumed a beta distribution. The study found that for all four models the credible interval design-based coverage of the finite population state proportions deviated markedly from the 95 percent nominal level used in constructing the intervals.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211868
    Description:

    Thompson and Sigman (2000) introduced an estimation procedure for estimating medians from highly positively skewed population data. Their procedure uses interpolation over data-dependent intervals (bins). The earlier paper demonstrated that this procedure has good statistical properties for medians computed from a highly skewed sample. This research extends the previous work to decile estimation methods for a positively skewed population using complex survey data. We present three different interpolation methods along with the traditional decile estimation method (no bins) and evaluate each method empirically, using residential housing data from the Survey of Construction and via a simulation study. We found that a variant of the current procedure using the 95th percentile as a scaling factor produces decile estimates with the best statistical properties.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300111828
    Description:

    A question that commonly arises in longitudinal surveys is the issue of how to combine differing cohorts of the survey. In this paper we present a novel method for combining different cohorts, and using all available data, in a longitudinal survey to estimate parameters of a semiparametric model, which relates the response variable to a set of covariates. The procedure builds upon the Weighted Generalized Estimation Equation method for handling missing waves in longitudinal studies. Our method is set up under a joint-randomization framework for estimation of model parameters, which takes into account the superpopulation model as well as the survey design randomization. We also propose a design-based, and a joint-randomization, variance estimation method. To illustrate the methodology we apply it to the Survey of Doctorate Recipients, conducted by the U.S. National Science Foundation.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201300111826
    Description:

    It is routine practice for survey organizations to provide replication weights as part of survey data files. These replication weights are meant to produce valid and efficient variance estimates for a variety of estimators in a simple and systematic manner. Most existing methods for constructing replication weights, however, are only valid for specific sampling designs and typically require a very large number of replicates. In this paper we first show how to produce replication weights based on the method outlined in Fay (1984) such that the resulting replication variance estimator is algebraically equivalent to the fully efficient linearization variance estimator for any given sampling design. We then propose a novel weight-calibration method to simultaneously achieve efficiency and sparsity in the sense that a small number of sets of replication weights can produce valid and efficient replication variance estimators for key population parameters. Our proposed method can be used in conjunction with existing resampling techniques for large-scale complex surveys. Validity of the proposed methods and extensions to some balanced sampling designs are also discussed. Simulation results showed that our proposed variance estimators perform very well in tracking coverage probabilities of confidence intervals. Our proposed strategies will likely have impact on how public-use survey data files are produced and how these data sets are analyzed.

    Release date: 2013-06-28

  • Articles and reports: 82-003-X201300611796
    Description:

    The study assesses the feasibility of using statistical modelling techniques to fill information gaps related to risk factors, specifically, smoking status, in linked long-form census data.

    Release date: 2013-06-19

  • Articles and reports: 82-003-X201300111765
    Description:

    This study describes how items collected from parents/guardians for a nationally representative sample of Aboriginal children (off reserve) as part of the 2006 Aboriginal Children's Survey could be used as language indicators.

    Release date: 2013-01-16

  • Articles and reports: 12-001-X201200211754
    Description:

    The propensity-scoring-adjustment approach is commonly used to handle selection bias in survey sampling applications, including unit nonresponse and undercoverage. The propensity score is computed using auxiliary variables observed throughout the sample. We discuss some asymptotic properties of propensity-score-adjusted estimators and derive optimal estimators based on a regression model for the finite population. An optimal propensity-score-adjusted estimator can be implemented using an augmented propensity model. Variance estimation is discussed and the results from two simulation studies are presented.

    Release date: 2012-12-19

Reference (46)

Reference (46) (25 of 46 results)

  • Technical products: 11-522-X201700014758
    Description:

    "Several Canadian jurisdictions including Ontario are using patient-based healthcare data in their funding models. These initiatives can influence the quality of this data both positively and negatively as people tend to pay more attention to the data and its quality when financial decisions are based upon it. Ontario’s funding formula uses data from several national databases housed at the Canadian Institute for Health Information (CIHI). These databases provide information on patient activity and clinical status across the continuum of care. As funding models may influence coding behaviour, CIHI is collaborating with the Ontario Ministry of Health and Long-Term Care to assess and monitor the quality of this data. CIHI is using data mining software and modelling techniques (that are often associated with “big data”) to identify data anomalies across multiple factors. The models identify what the “typical” clinical coding patterns are for key patient groups (for example, patients seen in special care units or discharged to home care), so that outliers can be identified, where patients do not fit the expected pattern. A key component of the modelling is segmenting the data based on patient, provider and hospital characteristics to take into account key differences in the delivery of health care and patient populations across the province. CIHI’s analysis identified several hospitals with coding practices that appear to be changing or significantly different from their peer group. Further investigation is required to understand why these differences exist and to develop appropriate strategies to mitigate variations. "

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014744
    Description:

    "This presentation will begin with Dr. West providing a summary of research that has been conducted on the quality and utility of paradata collected as part of the United States National Survey of Family Growth (NSFG). The NSFG is the major national fertility survey in the U.S., and an important source of data on sexual activity, sexual behavior, and reproductive health for policy makers. For many years, the NSFG has been collecting various forms of paradata, including keystroke information (e.g., Couper and Kreuter 2013), call record information, detailed case disposition information, and interviewer observations related to key NSFG measures (e.g., West 2013). Dr. West will discuss some of the challenges of working with these data, in addition to evidence of their utility for nonresponse adjustment, interviewer evaluation, and/or responsive survey design purposes. Dr. Kreuter will then present research done using paradata collected as part of two panel surveys: the Medical Expenditure Panel Survey (MEPS) in the United States, and the Panel Labour Market and Social Security (PASS) in Germany. In both surveys, information from contacts in prior waves were experimentally used to improve contact and response rates in subsequent waves. In addition, research from PASS will be presented where interviewer observations on key outcome variables were collected to be used in nonresponse adjustment or responsive survey design decisions. Dr. Kreuter will not only present the research results but also the practical challenges in implementing the collection and use of both sets of paradata. "

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014739
    Description:

    Vital statistics datasets such as the Canadian Mortality Database lack identifiers for certain populations of interest such as First Nations, Métis and Inuit. Record linkage between vital statistics and survey or other administrative datasets can circumvent this limitation. This paper describes a linkage between the Canadian Mortality Database and the 2006 Census of the Population and the planned analysis using the linked data.

    Release date: 2016-03-24

  • Technical products: 12-002-X201500114147
    Description:

    Influential observations in logistic regression are those that have a notable effect on certain aspects of the model fit. Large sample size alone does not eliminate this concern; it is still important to examine potentially influential observations, especially in complex survey data. This paper describes a straightforward algorithm for examining potentially influential observations in complex survey data using SAS software. This algorithm was applied in a study using the 2005 Canadian Community Health Survey that examined factors associated with family physician utilization for adolescents.

    Release date: 2015-03-25

  • Technical products: 11-522-X201300014282
    Description:

    The IAB-Establishment Panel is the most comprehensive establishment survey in Germany with almost 16.000 firms participating every year. Face-to-face interviews with paper and pencil (PAPI) are conducted since 1993. An ongoing project examines possible effects of a change of the survey to computer aided personal interviews (CAPI) combined with a web based version of the questionnaire (CAWI). In a first step, questions about the internet access, the willingness to complete the questionnaire online and reasons for refusal were included in the 2012 wave. First results are indicating a widespread refusal to take part in a web survey. A closer look reveals that smaller establishments, long time participants and older respondents are reluctant to use the internet.

    Release date: 2014-10-31

  • Technical products: 12-002-X201400111901
    Description:

    This document is for analysts/researchers who are considering doing research with data from a survey where both survey weights and bootstrap weights are provided in the data files. This document gives directions, for some selected software packages, about how to get started in using survey weights and bootstrap weights for an analysis of survey data. We give brief directions for obtaining survey-weighted estimates, bootstrap variance estimates (and other desired error quantities) and some typical test statistics for each software package in turn. While these directions are provided just for the chosen examples, there will be information about the range of weighted and bootstrapped analyses that can be carried out by each software package.

    Release date: 2014-08-07

  • Technical products: 11-522-X200800010941
    Description:

    Prior to 2004, the design and development of collection functions at Statistics New Zealand (Statistics NZ) was done by a centralised team of data collection methodologists. In 2004, an organisational review considered whether the design and development of these functions was being done in the most effective way. A key issue was the rising costs of surveying as the organisation moved from paper-based data collection to electronic data collection. The review saw some collection functions decentralised. However, a smaller centralised team of data collection methodologists was retained to work with subject matter areas across Statistics NZ.

    This paper will discuss the strategy used by the smaller centralised team of data collection methodologists to support subject matter areas. There are three key themes to the strategy. First, is the development of best practice standards and a central standards repository. Second, is training and introducing knowledge sharing forums. Third, is providing advice and independent review to subject matter areas which design and develop collection instruments.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010952
    Description:

    In a survey where results were estimated by simple averages, we will compare the effect on the results of a follow-up among non-respondents, and weighting based on the last ten percents of the respondents. The data used are collected from a Survey of Living Conditions among Immigrants in Norway that was carried out in 2006.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010953
    Description:

    As survey researchers attempt to maintain traditionally high response rates, reluctant respondents have resulted in increasing data collection costs. This respondent reluctance may be related to the amount of time it takes to complete an interview in large-scale, multi-purpose surveys, such as the National Survey of Recent College Graduates (NSRCG). Recognizing that respondent burden or questionnaire length may contribute to lower response rates, in 2003, following several months of data collection under the standard data collection protocol, the NSRCG offered its nonrespondents monetary incentives about two months before the end of the data collection,. In conjunction with the incentive offer, the NSRCG also offered persistent nonrespondents an opportunity to complete a much-abbreviated interview consisting of a few critical items. The late respondents who completed the interviews as the result of the incentive and critical items-only questionnaire offers may provide some insight into the issue of nonresponse bias and the likelihood that such interviewees would have remained survey nonrespondents if these refusal conversion efforts had not been made.

    In this paper, we define "reluctant respondents" as those who responded to the survey only after extra efforts were made beyond the ones initially planned in the standard data collection protocol. Specifically, reluctant respondents in the 2003 NSRCG are those who responded to the regular or shortened questionnaire following the incentive offer. Our conjecture was that the behavior of the reluctant respondents would be more like that of nonrespondents than of respondents to the surveys. This paper describes an investigation of reluctant respondents and the extent to which they are different from regular respondents. We compare different response groups on several key survey estimates. This comparison will expand our understanding of nonresponse bias in the NSRCG, and of the characteristics of nonrespondents themselves, thus providing a basis for changes in the NSRCG weighting system or estimation procedures in the future.

    Release date: 2009-12-03

  • Technical products: 11-536-X200900110813
    Description:

    The National Agricultural Statistics Service (NASS) has increasingly been using a delete-a-group (DAG) jackknife to estimate variances. In surveys where this technique is used, each sampled element is given 16 weights: the element's actual sampling weight after incorporating all nonresponse and calibration adjustments and 15 jackknife replicate weights. NASS recommends constructing confidence intervals for univariate statistics assuming its DAG jackknife has 14 degrees of freedom. This paper discusses methods of modifying the DAG jackknife to reduce the potential finite-sample bias. It also describes a method of measuring the effective degrees of freedom in situations where the NASS recommendation of 14 may be too generous.

    Release date: 2009-08-11

  • Technical products: 11-522-X200600110401
    Description:

    The Australian Bureau of Statistics (ABS) will begin the formation of a Statistical Longitudinal Census Data Set (SLCD) by choosing a 5% sample of people from the 2006 population census to be linked probabilistically with subsequent censuses. A long-term aim is to use the power of the rich longitudinal demographic data provided by the SLCD to shed light on a variety of issues which cannot be addressed using cross-sectional data. The SLCD may be further enhanced by probabilistically linking it with births, deaths, immigration settlements or disease registers. This paper gives a brief description of recent developments in data linking at the ABS, outlines the data linking methodology and quality measures we have considered and summarises preliminary results using Census Dress Rehearsal data.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110425
    Description:

    Suppose data for a survey with multi-stage design is to be collected in two periods of time. This paper assesses the relative merits of keeping the same clusters in the sample vs. sampling new clusters, under different statistical (correlation between clusters and over time) and logistical (costs of survey) scenarios. The design effect of re-using the same clusters from the master sample over time is of the form "1 - Ap(pi)/n" where "p" is intertemporal correlation of the cluster totals, "n" is the number of clusters, "pi" is the proportion of clusters retained from the previous round, and "A>0" is a fixed constant. As long as the efficiency gains appear to be minor, the value of the designs that reuse the clusters comes from the logistical (cost of the survey) considerations. Empirical demonstration that uses Demographic and Health Survey (DHS) data for Bangladesh, 1996 and 2000, is provided.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110370
    Description:

    Many countries conduct surveys that focus specifically on their population's health. Because health plays a key role in most aspects of life, health data are also often collected in population surveys on other topics. The subject matter of population health surveys broadly encompasses physical and mental heath, dental health, disabilities, substance abuse, health risk factors, nutrition, health promotion, health care utilization and quality, health coverage, and costs. Some surveys focus on specific health conditions, whereas others aim to obtain an overall health assessment. Health is often an important component in longitudinal studies, particularly in birth and aging cohorts. Information about health can be collected by respondents' reports (for themselves and sometimes for others), by medical examinations, and by collecting biological measures. There is a serious concern about the accuracy of health information collected by respondents' reports. Logistical issues, cost considerations, and respondent cooperation feature prominently when the information is collected by medical examinations. Ethical and privacy issues are often important, particularly when DNA and biomarkers are involved. International comparability of health measures is of growing importance. This paper reviews the methodology for a range of health surveys and will discuss the challenges in obtaining accurate data in this field.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110447
    Description:

    The classification and identification of locations where persons report to be more or less healthy or have more or less social capital, within a specific area such as a health region, is tremendously helpful for understanding place and health associations. The objective of the proposed study is to classify and map areas within the Zone 6 Health Region (Figure 1) of Nova Scotia (Halifax Regional Municipality and Annapolis Valley regions) according to health status (Dimension 1) and social capital (Dimension 2). We abstracted responses to questions about self-reported health status, mental health, and social capital from the master files of the Canadian Community Health Survey (Cycles 1.1, 1.2 and 2.1), National Population Health Survey (Cycle 5), and the General Social Survey (Cycles 13, 14, 17, and 18). Responses were geocoded using the Statistics Canada Postal Code Conversion File (PCCF+) and imported into a geographical information system (GIS) so that the postal code associated with the response will be assigned to a latitude and longitude within the Nova Scotia Zone 6 health region. Kernel density estimators and additional spatial interpolators were used to develop statistically-smoothed surfaces of the distribution of respondent values for each question. The smoothing process eliminates the possibility of revealing individual respondent location and confidential Statistics Canada sampling frame information. Using responses from similar questions across multiple surveys improves the likelihood of detecting heterogeneity among the responses within the health region area, as well as the accuracy of the smoothed map classification.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110432
    Description:

    The use of discrete variables having known statistical distributions in the masking of data on discrete variables has been under study for some time. This paper presents a few results from our research on this topic. The consequences of sampling with and without replacement from finite populations are one principal interest. Estimates of first and second order moments which attenuate or adjust for the additional variation due to masking of known type are developed. The impact of masking of the original data on the correlation structure of concomitantly measured discrete variables is considered and the need for the further development of results for analyses of multivariate data is discussed.

    Release date: 2008-03-17

  • Technical products: 11-522-X20050019439
    Description:

    The data collection process is becoming increasingly more challenging due to a number of factors, including: ageing of the farm population, decreasing number of farmers, increasing farm sizes, financial crises arising from BSE (mad cow disease) and the avian influenza, and from extreme climatic impacts causing drought conditions in some areas and flooding in others. There also seems to be rising levels of concern about privacy and confidentiality. This paper will describe how agriculture is an industry in transition, how difficulties faced by the agricultural sector impact data collection issues, and how our subsequent responses and actions are addressing these challenging issues.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019475
    Description:

    To determine and measure the impact of informativeness we compare design-based and model-based variances of estimated parameters, as well as the estimated parameters themselves, in a logistic model under the assumption that the postulated model is true. An approach for assessing the impact of informativeness is given. In order to address the additional complexity of the impact of informativeness on power, we propose a new approximation for a linear combination of non-central chi-square distributions, using generalized design effects. A large simulation study, based on generating a population under the postulated model, using parameter estimates derived from the NPHS, allows us to detect and to measure the informativeness, and to compare the robustness of studied approaches.

    Release date: 2007-03-02

  • Technical products: 11-522-X20040018752
    Description:

    This paper outlines some possible applications of the permanent sample of households ready to respond with respect to surveying difficult-to-reach population groups.

    Release date: 2005-10-27

  • Technical products: 11-522-X20040018748
    Description:

    Given the small numbers of Aboriginal people, survey sample sizes are usually too small to permit sufficient analysis of these small groups. This paper discusses efforts that are being made by the Canadian Centre for Justice Statistics in this regard.

    Release date: 2005-10-27

  • Technical products: 11-522-X20040018745
    Description:

    The testing of questionnaires with specialized populations such as Aboriginal people, homosexuals, bisexuals, children, victims of abuse brings challenges: identifying respondents, testing methodology, location of testing, respondent rapport and trust.

    Release date: 2005-10-27

  • Technical products: 11-522-X20040018734
    Description:

    The Ethnic Diversity Survey generated methodological challenges like choosing the sampling plan, developing the questionnaire, collecting the data, weighting the data and estimating the variance.

    Release date: 2005-10-27

  • Technical products: 11-522-X20040018751
    Description:

    This paper examines how adaptive sampling methods might be used to extend current national health surveys to enable effective tracking and monitoring of new forms of health threats and trace exposed persons.

    Release date: 2005-10-27

  • Technical products: 11-522-X20030017596
    Description:

    This paper discusses the measurement problems that affected the Demographic Analysis (DA), a coverage measurement program used for Census 2000.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017521
    Description:

    This paper discusses the need for developments in areas such as international surveys, panel surveys, small area estimation, observational studies, secondary analyses, and modes of data collection to foster public co-operation in surveys and keep response rates from falling.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017692
    Description:

    This paper discusses regression servers, which are data dissemination systems that return some of the output generated by regression analyses in response to user queries. It details work on the special case where the data contain a sensitive variable whose regressions must be protected.

    Release date: 2005-01-26

Date modified: