Weighting and estimation

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Type

1 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (15)

All (15) (0 to 10 of 15 results)

  • Articles and reports: 12-001-X201900100001
    Description:

    Demographers are facing increasing pressure to disaggregate their estimates and forecasts by characteristics such as region, ethnicity, and income. Traditional demographic methods were designed for large samples, and perform poorly with disaggregated data. Methods based on formal Bayesian statistical models offer better performance. We illustrate with examples from a long-term project to develop Bayesian approaches to demographic estimation and forecasting. In our first example, we estimate mortality rates disaggregated by age and sex for a small population. In our second example, we simultaneously estimate and forecast obesity prevalence disaggregated by age. We conclude by addressing two traditional objections to the use of Bayesian methods in statistical agencies.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201900100007
    Description:

    The Horvitz-Thompson (HT) estimator is widely used in survey sampling. However, the variance of the HT estimator becomes large when the inclusion probabilities are highly heterogeneous. To overcome this shortcoming, in this paper we propose a hard-threshold method for the first-order inclusion probabilities. Specifically, we carefully choose a threshold value, then replace the inclusion probabilities smaller than the threshold by the threshold. Through this shrinkage strategy, we construct a new estimator called the improved Horvitz-Thompson (IHT) estimator to estimate the population total. The IHT estimator increases the estimation accuracy much although it brings a bias which is relatively small. We derive the IHT estimator’s mean squared error and its unbiased estimator, and theoretically compare the IHT estimator with the HT estimator. We also apply our idea to construct an improved ratio estimator. We numerically analyze simulated and real data sets to illustrate that the proposed estimators are more efficient and robust than the classical estimators.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201900100008
    Description:

    This paper studies small area quantile estimation under a unit level non-parametric nested-error regression model. We assume the small area specific error distributions satisfy a semi-parametric density ratio model. We fit the non-parametric model via the penalized spline regression method of Opsomer, Claeskens, Ranalli, Kauermann and Breidt (2008). Empirical likelihood is then applied to estimate the parameters in the density ratio model based on the residuals. This leads to natural area-specific estimates of error distributions. A kernel method is then applied to obtain smoothed error distribution estimates. These estimates are then used for quantile estimation in two situations: one is where we only have knowledge of covariate power means at the population level, the other is where we have covariate values of all sample units in the population. Simulation experiments indicate that the proposed methods for small area quantiles estimation work well for quantiles around the median in the first situation, and for a broad range of the quantiles in the second situation. A bootstrap mean square error estimator of the proposed estimators is also investigated. An empirical example based on Canadian income data is included.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201100111445
    Description:

    In this paper we study small area estimation using area level models. We first consider the Fay-Herriot model (Fay and Herriot 1979) for the case of smoothed known sampling variances and the You-Chapman model (You and Chapman 2006) for the case of sampling variance modeling. Then we consider hierarchical Bayes (HB) spatial models that extend the Fay-Herriot and You-Chapman models by capturing both the geographically unstructured heterogeneity and spatial correlation effects among areas for local smoothing. The proposed models are implemented using the Gibbs sampling method for fully Bayesian inference. We apply the proposed models to the analysis of health survey data and make comparisons among the HB model-based estimates and direct design-based estimates. Our results have shown that the HB model-based estimates perform much better than the direct estimates. In addition, the proposed area level spatial models achieve smaller CVs than the Fay-Herriot and You-Chapman models, particularly for the areas with three or more neighbouring areas. Bayesian model comparison and model fit analysis are also presented.

    Release date: 2011-06-29

  • Articles and reports: 12-001-X200900211041
    Description:

    Estimation of small area (or domain) compositions may suffer from informative missing data, if the probability of missing varies across the categories of interest as well as the small areas. We develop a double mixed modeling approach that combines a random effects mixed model for the underlying complete data with a random effects mixed model of the differential missing-data mechanism. The effect of sampling design can be incorporated through a quasi-likelihood sampling model. The associated conditional mean squared error of prediction is approximated in terms of a three-part decomposition, corresponding to a naive prediction variance, a positive correction that accounts for the hypothetical parameter estimation uncertainty based on the latent complete data, and another positive correction for the extra variation due to the missing data. We illustrate our approach with an application to the estimation of Municipality household compositions based on the Norwegian register household data, which suffer from informative under-registration of the dwelling identity number.

    Release date: 2009-12-23

  • Articles and reports: 12-001-X20060019264
    Description:

    Sampling for nonresponse follow-up (NRFU) was an innovation for U.S. Decennial Census methodology considered for the year 2000. Sampling for NRFU involves sending field enumerators to only a sample of the housing units that did not respond to the initial mailed questionnaire, thereby reducing costs but creating a major small-area estimation problem. We propose a model to impute the characteristics of the housing units that did not respond to the mailed questionnaire, to benefit from the large cost savings of NRFU sampling while still attaining acceptable levels of accuracy for small areas. Our strategy is to model household characteristics using low-dimensional covariates at detailed levels of geography and more detailed covariates at larger levels of geography. To do this, households are first classified into a small number of types. A hierarchical loglinear model then estimates the distribution of household types among the nonsample nonrespondent households in each block. This distribution depends on the characteristics of mailback respondents in the same block and sampled nonrespondents in nearby blocks. Nonsample nonrespondent households can then be imputed according to this estimated household type distribution. We evaluate the performance of our loglinear model through simulation. Results show that, when compared to estimates from alternative models, our loglinear model produces estimates with much smaller MSE in many cases and estimates with approximately the same size MSE in most other cases. Although sampling for NRFU was not used in the 2000 census, our estimation and imputation strategy can be used in any census or survey using sampling for NRFU where units are clustered such that the characteristics of nonrespondents are related to the characteristics of respondents in the same area and also related to the characteristics of sampled nonrespondents in nearby areas.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20050029047
    Description:

    This paper considers the problem of estimating, in the presence of considerable nonignorable nonresponse, the number of private households of various sizes and the total number of households in Norway. The approach is model-based with a population model for household size given registered family size. We account for possible nonresponse biases by modeling the response mechanism conditional on household size. Various models are evaluated together with a maximum likelihood estimator and imputation-based poststratification. Comparisons are made with pure poststratification using registered family size as stratifier and estimation methods used in official statistics for The Norwegian Consumer Expenditure Survey. The study indicates that a modeling approach, including response modeling, poststratification and imputation are important ingredients for a satisfactory approach.

    Release date: 2006-02-17

  • Articles and reports: 12-001-X20050029052
    Description:

    Estimates of a sampling variance-covariance matrix are required in many statistical analyses, particularly for multilevel analysis. In univariate problems, functions relating the variance to the mean have been used to obtain variance estimates, pooling information across units or variables. We present variance and correlation functions for multivariate means of ordinal survey items, both for complete data and for data with structured non-response. Methods are also developed for assessing model fit, and for computing composite estimators that combine direct and model-based predictions. Survey data from the Consumer Assessments of Health Plans Study (CAHPS®) illustrate the application of the methodology.

    Release date: 2006-02-17

  • Articles and reports: 11-522-X20040018733
    Description:

    A survey on injecting drug users is designed to use the information collected from needle exchange centres and from sampled injecting drug users. A methodology is developed to produce various estimates.

    Release date: 2005-10-27

  • Articles and reports: 12-001-X20040027753
    Description:

    Samplers often distrust model-based approaches to survey inference because of concerns about misspecification when models are applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator can be very inefficient. In Zheng and Little (2003, 2004) we used penalized splines (p-splines) to model smoothly - varying relationships between the outcome and the inclusion probabilities in one-stage probability proportional to size (PPS) samples. We showed that p spline model-based estimators are in general more efficient than the HT estimator, and can provide narrower confidence intervals with close to nominal confidence coverage. In this article, we extend this approach to two-stage sampling designs. We use a p-spline based mixed model that fits a nonparametric relationship between the primary sampling unit (PSU) means and a measure of PSU size, and incorporates random effects to model clustering. For variance estimation we consider the empirical Bayes model-based variance, the jackknife and balanced repeated replication (BRR) methods. Simulation studies on simulated data and samples drawn from public use microdata in the 1990 census demonstrate gains for the model-based p-spline estimator over the HT estimator and linear model-assisted estimators. Simulations also show the variance estimation methods yield confidence intervals with satisfactory confidence coverage. Interestingly, these gains can be seen for a common equal-probability design, where the first stage selection is PPS and the second stage selection probabilities are proportional to the inverse of the first stage inclusion probabilities, and the HT estimator leads to the unweighted mean. In situations that most favor the HT estimator, the model-based estimators have comparable efficiency.

    Release date: 2005-02-03
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (15)

Analysis (15) (0 to 10 of 15 results)

  • Articles and reports: 12-001-X201900100001
    Description:

    Demographers are facing increasing pressure to disaggregate their estimates and forecasts by characteristics such as region, ethnicity, and income. Traditional demographic methods were designed for large samples, and perform poorly with disaggregated data. Methods based on formal Bayesian statistical models offer better performance. We illustrate with examples from a long-term project to develop Bayesian approaches to demographic estimation and forecasting. In our first example, we estimate mortality rates disaggregated by age and sex for a small population. In our second example, we simultaneously estimate and forecast obesity prevalence disaggregated by age. We conclude by addressing two traditional objections to the use of Bayesian methods in statistical agencies.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201900100007
    Description:

    The Horvitz-Thompson (HT) estimator is widely used in survey sampling. However, the variance of the HT estimator becomes large when the inclusion probabilities are highly heterogeneous. To overcome this shortcoming, in this paper we propose a hard-threshold method for the first-order inclusion probabilities. Specifically, we carefully choose a threshold value, then replace the inclusion probabilities smaller than the threshold by the threshold. Through this shrinkage strategy, we construct a new estimator called the improved Horvitz-Thompson (IHT) estimator to estimate the population total. The IHT estimator increases the estimation accuracy much although it brings a bias which is relatively small. We derive the IHT estimator’s mean squared error and its unbiased estimator, and theoretically compare the IHT estimator with the HT estimator. We also apply our idea to construct an improved ratio estimator. We numerically analyze simulated and real data sets to illustrate that the proposed estimators are more efficient and robust than the classical estimators.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201900100008
    Description:

    This paper studies small area quantile estimation under a unit level non-parametric nested-error regression model. We assume the small area specific error distributions satisfy a semi-parametric density ratio model. We fit the non-parametric model via the penalized spline regression method of Opsomer, Claeskens, Ranalli, Kauermann and Breidt (2008). Empirical likelihood is then applied to estimate the parameters in the density ratio model based on the residuals. This leads to natural area-specific estimates of error distributions. A kernel method is then applied to obtain smoothed error distribution estimates. These estimates are then used for quantile estimation in two situations: one is where we only have knowledge of covariate power means at the population level, the other is where we have covariate values of all sample units in the population. Simulation experiments indicate that the proposed methods for small area quantiles estimation work well for quantiles around the median in the first situation, and for a broad range of the quantiles in the second situation. A bootstrap mean square error estimator of the proposed estimators is also investigated. An empirical example based on Canadian income data is included.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201100111445
    Description:

    In this paper we study small area estimation using area level models. We first consider the Fay-Herriot model (Fay and Herriot 1979) for the case of smoothed known sampling variances and the You-Chapman model (You and Chapman 2006) for the case of sampling variance modeling. Then we consider hierarchical Bayes (HB) spatial models that extend the Fay-Herriot and You-Chapman models by capturing both the geographically unstructured heterogeneity and spatial correlation effects among areas for local smoothing. The proposed models are implemented using the Gibbs sampling method for fully Bayesian inference. We apply the proposed models to the analysis of health survey data and make comparisons among the HB model-based estimates and direct design-based estimates. Our results have shown that the HB model-based estimates perform much better than the direct estimates. In addition, the proposed area level spatial models achieve smaller CVs than the Fay-Herriot and You-Chapman models, particularly for the areas with three or more neighbouring areas. Bayesian model comparison and model fit analysis are also presented.

    Release date: 2011-06-29

  • Articles and reports: 12-001-X200900211041
    Description:

    Estimation of small area (or domain) compositions may suffer from informative missing data, if the probability of missing varies across the categories of interest as well as the small areas. We develop a double mixed modeling approach that combines a random effects mixed model for the underlying complete data with a random effects mixed model of the differential missing-data mechanism. The effect of sampling design can be incorporated through a quasi-likelihood sampling model. The associated conditional mean squared error of prediction is approximated in terms of a three-part decomposition, corresponding to a naive prediction variance, a positive correction that accounts for the hypothetical parameter estimation uncertainty based on the latent complete data, and another positive correction for the extra variation due to the missing data. We illustrate our approach with an application to the estimation of Municipality household compositions based on the Norwegian register household data, which suffer from informative under-registration of the dwelling identity number.

    Release date: 2009-12-23

  • Articles and reports: 12-001-X20060019264
    Description:

    Sampling for nonresponse follow-up (NRFU) was an innovation for U.S. Decennial Census methodology considered for the year 2000. Sampling for NRFU involves sending field enumerators to only a sample of the housing units that did not respond to the initial mailed questionnaire, thereby reducing costs but creating a major small-area estimation problem. We propose a model to impute the characteristics of the housing units that did not respond to the mailed questionnaire, to benefit from the large cost savings of NRFU sampling while still attaining acceptable levels of accuracy for small areas. Our strategy is to model household characteristics using low-dimensional covariates at detailed levels of geography and more detailed covariates at larger levels of geography. To do this, households are first classified into a small number of types. A hierarchical loglinear model then estimates the distribution of household types among the nonsample nonrespondent households in each block. This distribution depends on the characteristics of mailback respondents in the same block and sampled nonrespondents in nearby blocks. Nonsample nonrespondent households can then be imputed according to this estimated household type distribution. We evaluate the performance of our loglinear model through simulation. Results show that, when compared to estimates from alternative models, our loglinear model produces estimates with much smaller MSE in many cases and estimates with approximately the same size MSE in most other cases. Although sampling for NRFU was not used in the 2000 census, our estimation and imputation strategy can be used in any census or survey using sampling for NRFU where units are clustered such that the characteristics of nonrespondents are related to the characteristics of respondents in the same area and also related to the characteristics of sampled nonrespondents in nearby areas.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20050029047
    Description:

    This paper considers the problem of estimating, in the presence of considerable nonignorable nonresponse, the number of private households of various sizes and the total number of households in Norway. The approach is model-based with a population model for household size given registered family size. We account for possible nonresponse biases by modeling the response mechanism conditional on household size. Various models are evaluated together with a maximum likelihood estimator and imputation-based poststratification. Comparisons are made with pure poststratification using registered family size as stratifier and estimation methods used in official statistics for The Norwegian Consumer Expenditure Survey. The study indicates that a modeling approach, including response modeling, poststratification and imputation are important ingredients for a satisfactory approach.

    Release date: 2006-02-17

  • Articles and reports: 12-001-X20050029052
    Description:

    Estimates of a sampling variance-covariance matrix are required in many statistical analyses, particularly for multilevel analysis. In univariate problems, functions relating the variance to the mean have been used to obtain variance estimates, pooling information across units or variables. We present variance and correlation functions for multivariate means of ordinal survey items, both for complete data and for data with structured non-response. Methods are also developed for assessing model fit, and for computing composite estimators that combine direct and model-based predictions. Survey data from the Consumer Assessments of Health Plans Study (CAHPS®) illustrate the application of the methodology.

    Release date: 2006-02-17

  • Articles and reports: 11-522-X20040018733
    Description:

    A survey on injecting drug users is designed to use the information collected from needle exchange centres and from sampled injecting drug users. A methodology is developed to produce various estimates.

    Release date: 2005-10-27

  • Articles and reports: 12-001-X20040027753
    Description:

    Samplers often distrust model-based approaches to survey inference because of concerns about misspecification when models are applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator can be very inefficient. In Zheng and Little (2003, 2004) we used penalized splines (p-splines) to model smoothly - varying relationships between the outcome and the inclusion probabilities in one-stage probability proportional to size (PPS) samples. We showed that p spline model-based estimators are in general more efficient than the HT estimator, and can provide narrower confidence intervals with close to nominal confidence coverage. In this article, we extend this approach to two-stage sampling designs. We use a p-spline based mixed model that fits a nonparametric relationship between the primary sampling unit (PSU) means and a measure of PSU size, and incorporates random effects to model clustering. For variance estimation we consider the empirical Bayes model-based variance, the jackknife and balanced repeated replication (BRR) methods. Simulation studies on simulated data and samples drawn from public use microdata in the 1990 census demonstrate gains for the model-based p-spline estimator over the HT estimator and linear model-assisted estimators. Simulations also show the variance estimation methods yield confidence intervals with satisfactory confidence coverage. Interestingly, these gains can be seen for a common equal-probability design, where the first stage selection is PPS and the second stage selection probabilities are proportional to the inverse of the first stage inclusion probabilities, and the HT estimator leads to the unweighted mean. In situations that most favor the HT estimator, the model-based estimators have comparable efficiency.

    Release date: 2005-02-03
Reference (0)

Reference (0) (0 results)

No content available at this time.

Date modified: