Weighting and estimation

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (67)

All (67) (0 to 10 of 67 results)

  • Articles and reports: 11-522-X202200100003
    Description: Estimation at fine levels of aggregation is necessary to better describe society. Small area estimation model-based approaches that combine sparse survey data with rich data from auxiliary sources have been proven useful to improve the reliability of estimates for small domains. Considered here is a scenario where small area model-based estimates, produced at a given aggregation level, needed to be disaggregated to better describe the social structure at finer levels. For this scenario, an allocation method was developed to implement the disaggregation, overcoming challenges associated with data availability and model development at such fine levels. The method is applied to adult literacy and numeracy estimation at the county-by-group-level, using data from the U.S. Program for the International Assessment of Adult Competencies. In this application the groups are defined in terms of age or education, but the method could be applied to estimation of other equity-deserving groups.
    Release date: 2024-03-25

  • Articles and reports: 12-001-X202300100005
    Description: Weight smoothing is a useful technique in improving the efficiency of design-based estimators at the risk of bias due to model misspecification. As an extension of the work of Kim and Skinner (2013), we propose using weight smoothing to construct the conditional likelihood for efficient analytic inference under informative sampling. The Beta prime distribution can be used to build a parameter model for weights in the sample. A score test is developed to test for model misspecification in the weight model. A pretest estimator using the score test can be developed naturally. The pretest estimator is nearly unbiased and can be more efficient than the design-based estimator when the weight model is correctly specified, or the original weights are highly variable. A limited simulation study is presented to investigate the performance of the proposed methods.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202100100001
    Description:

    In a previous paper, we developed a model to make inference about small area proportions under selection bias in which the binary responses and the selection probabilities are correlated. This is the homogeneous nonignorable selection model; nonignorable selection means that the selection probabilities and the binary responses are correlated. The homogeneous nonignorable selection model was shown to perform better than a baseline ignorable selection model. However, one limitation of the homogeneous nonignorable selection model is that the distributions of the selection probabilities are assumed to be identical across areas. Therefore, we introduce a more general model, the heterogeneous nonignorable selection model, in which the selection probabilities are not identically distributed over areas. We used Markov chain Monte Carlo methods to fit the three models. We illustrate our methodology and compare our models using an example on severe activity limitation of the U.S. National Health Interview Survey. We also perform a simulation study to demonstrate that our heterogeneous nonignorable selection model is needed when there is moderate to strong selection bias.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100005
    Description:

    Bayesian pooling strategies are used to solve precision problems related to statistical analyses of data from small areas. In such cases, the subpopulation samples are usually small, even though the population might not be. As an alternative, similar data can be pooled in order to reduce the number of parameters in the model. Many surveys consist of categorical data on each area, collected into a contingency table. We consider hierarchical Bayesian pooling models with a Dirichlet process prior for analyzing categorical data based on small areas. However, the prior used to pool such data frequently results in an overshrinkage problem. To mitigate for this problem, the parameters are separated into global and local effects. This study focuses on data pooling using a Dirichlet process prior. We compare the pooling models using bone mineral density (BMD) data taken from the Third National Health and Nutrition Examination Survey for the period 1988 to 1994 in the United States. Our analyses of the BMD data are performed using a Gibbs sampler and slice sampling to carry out the posterior computations.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X201900300001
    Description:

    Standard linearization estimators of the variance of the general regression estimator are often too small, leading to confidence intervals that do not cover at the desired rate. Hat matrix adjustments can be used in two-stage sampling that help remedy this problem. We present theory for several new variance estimators and compare them to standard estimators in a series of simulations. The proposed estimators correct negative biases and improve confidence interval coverage rates in a variety of situations that mirror ones that are met in practice.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300003
    Description:

    The widely used formulas for the variance of the ratio estimator may lead to serious underestimates when the sample size is small; see Sukhatme (1954), Koop (1968), Rao (1969), and Cochran (1977, pages 163-164). In order to solve this classical problem, we propose in this paper new estimators for the variance and the mean square error of the ratio estimator that do not suffer from such a large negative bias. Similar estimation formulas can be derived for alternative ratio estimators as discussed in Tin (1965). We compare three mean square error estimators for the ratio estimator in a simulation study.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900200007
    Description:

    When fitting an ordered categorical variable with L > 2 levels to a set of covariates onto complex survey data, it is common to assume that the elements of the population fit a simple cumulative logistic regression model (proportional-odds logistic-regression model). This means the probability that the categorical variable is at or below some level is a binary logistic function of the model covariates. Moreover, except for the intercept, the values of the logistic-regression parameters are the same at each level. The conventional “design-based” method used for fitting the proportional-odds model is based on pseudo-maximum likelihood. We compare estimates computed using pseudo-maximum likelihood with those computed by assuming an alternative design-sensitive robust model-based framework. We show with a simple numerical example how estimates using the two approaches can differ. The alternative approach is easily extended to fit a general cumulative logistic model, in which the parallel-lines assumption can fail. A test of that assumption easily follows.

    Release date: 2019-06-27

  • Articles and reports: 13-604-M2019002
    Description:

    The aim of this article is two-fold: first, to discuss concepts and methods of estimating Canada-U.S. purchasing power parity (PPP); and second, to present key estimates. The estimates incorporate the 2017 benchmark prices from the Organisation for Economic Cooperation and Development (OECD) PPP Program and corresponding national income data from the Canadian System of National Accounts. Furthermore, U.S. data were obtained from the Bureau of Economic Analysis (BEA) and the U.S. Census Bureau.

    Release date: 2019-04-26

  • Articles and reports: 12-001-X201600114543
    Description:

    The regression estimator is extensively used in practice because it can improve the reliability of the estimated parameters of interest such as means or totals. It uses control totals of variables known at the population level that are included in the regression set up. In this paper, we investigate the properties of the regression estimator that uses control totals estimated from the sample, as well as those known at the population level. This estimator is compared to the regression estimators that strictly use the known totals both theoretically and via a simulation study.

    Release date: 2016-06-22

  • Articles and reports: 11-522-X201700014737
    Description:

    Standard statistical methods that do not take proper account of the complexity of survey design can lead to erroneous inferences when applied to survey data. In particular, the actual type I error rates of tests of hypotheses based on standard tests can be much bigger than the nominal level. Methods that take account of survey design features in testing hypotheses have been proposed, including Wald tests and quasi-score tests (Rao, Scott and Skinner 1998) that involve the estimated covariance matrices of parameter estimates. The bootstrap method of Rao and Wu (1983) is often applied at Statistics Canada to estimate the covariance matrices, using the data file containing columns of bootstrap weights. Standard statistical packages often permit the use of survey weighted test statistics and it is attractive to approximate their distributions under the null hypothesis by their bootstrap analogues computed from the bootstrap weights supplied in the data file. Beaumont and Bocci (2009) applied this bootstrap method to testing hypotheses on regression parameters under a linear regression model, using weighted F statistics. In this paper, we present a unified approach to the above method by constructing bootstrap approximations to weighted likelihood ratio statistics and weighted quasi-score statistics. We report the results of a simulation study on testing independence in a two way table of categorical survey data. We studied the relative performance of the proposed method to alternative methods, including Rao-Scott corrected chi-squared statistic for categorical survey data.

    Release date: 2016-03-24
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (66)

Analysis (66) (0 to 10 of 66 results)

  • Articles and reports: 11-522-X202200100003
    Description: Estimation at fine levels of aggregation is necessary to better describe society. Small area estimation model-based approaches that combine sparse survey data with rich data from auxiliary sources have been proven useful to improve the reliability of estimates for small domains. Considered here is a scenario where small area model-based estimates, produced at a given aggregation level, needed to be disaggregated to better describe the social structure at finer levels. For this scenario, an allocation method was developed to implement the disaggregation, overcoming challenges associated with data availability and model development at such fine levels. The method is applied to adult literacy and numeracy estimation at the county-by-group-level, using data from the U.S. Program for the International Assessment of Adult Competencies. In this application the groups are defined in terms of age or education, but the method could be applied to estimation of other equity-deserving groups.
    Release date: 2024-03-25

  • Articles and reports: 12-001-X202300100005
    Description: Weight smoothing is a useful technique in improving the efficiency of design-based estimators at the risk of bias due to model misspecification. As an extension of the work of Kim and Skinner (2013), we propose using weight smoothing to construct the conditional likelihood for efficient analytic inference under informative sampling. The Beta prime distribution can be used to build a parameter model for weights in the sample. A score test is developed to test for model misspecification in the weight model. A pretest estimator using the score test can be developed naturally. The pretest estimator is nearly unbiased and can be more efficient than the design-based estimator when the weight model is correctly specified, or the original weights are highly variable. A limited simulation study is presented to investigate the performance of the proposed methods.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202100100001
    Description:

    In a previous paper, we developed a model to make inference about small area proportions under selection bias in which the binary responses and the selection probabilities are correlated. This is the homogeneous nonignorable selection model; nonignorable selection means that the selection probabilities and the binary responses are correlated. The homogeneous nonignorable selection model was shown to perform better than a baseline ignorable selection model. However, one limitation of the homogeneous nonignorable selection model is that the distributions of the selection probabilities are assumed to be identical across areas. Therefore, we introduce a more general model, the heterogeneous nonignorable selection model, in which the selection probabilities are not identically distributed over areas. We used Markov chain Monte Carlo methods to fit the three models. We illustrate our methodology and compare our models using an example on severe activity limitation of the U.S. National Health Interview Survey. We also perform a simulation study to demonstrate that our heterogeneous nonignorable selection model is needed when there is moderate to strong selection bias.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100005
    Description:

    Bayesian pooling strategies are used to solve precision problems related to statistical analyses of data from small areas. In such cases, the subpopulation samples are usually small, even though the population might not be. As an alternative, similar data can be pooled in order to reduce the number of parameters in the model. Many surveys consist of categorical data on each area, collected into a contingency table. We consider hierarchical Bayesian pooling models with a Dirichlet process prior for analyzing categorical data based on small areas. However, the prior used to pool such data frequently results in an overshrinkage problem. To mitigate for this problem, the parameters are separated into global and local effects. This study focuses on data pooling using a Dirichlet process prior. We compare the pooling models using bone mineral density (BMD) data taken from the Third National Health and Nutrition Examination Survey for the period 1988 to 1994 in the United States. Our analyses of the BMD data are performed using a Gibbs sampler and slice sampling to carry out the posterior computations.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X201900300001
    Description:

    Standard linearization estimators of the variance of the general regression estimator are often too small, leading to confidence intervals that do not cover at the desired rate. Hat matrix adjustments can be used in two-stage sampling that help remedy this problem. We present theory for several new variance estimators and compare them to standard estimators in a series of simulations. The proposed estimators correct negative biases and improve confidence interval coverage rates in a variety of situations that mirror ones that are met in practice.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300003
    Description:

    The widely used formulas for the variance of the ratio estimator may lead to serious underestimates when the sample size is small; see Sukhatme (1954), Koop (1968), Rao (1969), and Cochran (1977, pages 163-164). In order to solve this classical problem, we propose in this paper new estimators for the variance and the mean square error of the ratio estimator that do not suffer from such a large negative bias. Similar estimation formulas can be derived for alternative ratio estimators as discussed in Tin (1965). We compare three mean square error estimators for the ratio estimator in a simulation study.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900200007
    Description:

    When fitting an ordered categorical variable with L > 2 levels to a set of covariates onto complex survey data, it is common to assume that the elements of the population fit a simple cumulative logistic regression model (proportional-odds logistic-regression model). This means the probability that the categorical variable is at or below some level is a binary logistic function of the model covariates. Moreover, except for the intercept, the values of the logistic-regression parameters are the same at each level. The conventional “design-based” method used for fitting the proportional-odds model is based on pseudo-maximum likelihood. We compare estimates computed using pseudo-maximum likelihood with those computed by assuming an alternative design-sensitive robust model-based framework. We show with a simple numerical example how estimates using the two approaches can differ. The alternative approach is easily extended to fit a general cumulative logistic model, in which the parallel-lines assumption can fail. A test of that assumption easily follows.

    Release date: 2019-06-27

  • Articles and reports: 13-604-M2019002
    Description:

    The aim of this article is two-fold: first, to discuss concepts and methods of estimating Canada-U.S. purchasing power parity (PPP); and second, to present key estimates. The estimates incorporate the 2017 benchmark prices from the Organisation for Economic Cooperation and Development (OECD) PPP Program and corresponding national income data from the Canadian System of National Accounts. Furthermore, U.S. data were obtained from the Bureau of Economic Analysis (BEA) and the U.S. Census Bureau.

    Release date: 2019-04-26

  • Articles and reports: 12-001-X201600114543
    Description:

    The regression estimator is extensively used in practice because it can improve the reliability of the estimated parameters of interest such as means or totals. It uses control totals of variables known at the population level that are included in the regression set up. In this paper, we investigate the properties of the regression estimator that uses control totals estimated from the sample, as well as those known at the population level. This estimator is compared to the regression estimators that strictly use the known totals both theoretically and via a simulation study.

    Release date: 2016-06-22

  • Articles and reports: 11-522-X201700014737
    Description:

    Standard statistical methods that do not take proper account of the complexity of survey design can lead to erroneous inferences when applied to survey data. In particular, the actual type I error rates of tests of hypotheses based on standard tests can be much bigger than the nominal level. Methods that take account of survey design features in testing hypotheses have been proposed, including Wald tests and quasi-score tests (Rao, Scott and Skinner 1998) that involve the estimated covariance matrices of parameter estimates. The bootstrap method of Rao and Wu (1983) is often applied at Statistics Canada to estimate the covariance matrices, using the data file containing columns of bootstrap weights. Standard statistical packages often permit the use of survey weighted test statistics and it is attractive to approximate their distributions under the null hypothesis by their bootstrap analogues computed from the bootstrap weights supplied in the data file. Beaumont and Bocci (2009) applied this bootstrap method to testing hypotheses on regression parameters under a linear regression model, using weighted F statistics. In this paper, we present a unified approach to the above method by constructing bootstrap approximations to weighted likelihood ratio statistics and weighted quasi-score statistics. We report the results of a simulation study on testing independence in a two way table of categorical survey data. We studied the relative performance of the proposed method to alternative methods, including Rao-Scott corrected chi-squared statistic for categorical survey data.

    Release date: 2016-03-24
Reference (1)

Reference (1) ((1 result))

  • Surveys and statistical programs – Documentation: 11-522-X19980015028
    Description:

    We address the problem of estimation for the income dynamics statistics calculated from complex longitudinal surveys. In addition, we compare two design-based estimators of longitudinal proportions and transition rates in terms of variability under large attrition rates. One estimator is based on the cross-sectional samples for the estimation of the income class boundaries at each time period and on the longitudinal sample for the estimation of the longitudinal counts; the other estimator is entirely based on the longitudinal sample, both for the estimation of the class boundaries and the longitudinal counts. We develop Taylor linearization-type variance estimators for both the longitudinal and the mixed estimator under the assumption of no change in the population, and for the mixed estimator when there is change.

    Release date: 1999-10-22
Date modified: