Weighting and estimation

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Type

1 facets displayed. 0 facets selected.

Geography

1 facets displayed. 0 facets selected.

Survey or statistical program

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (32)

All (32) (0 to 10 of 32 results)

  • Articles and reports: 12-001-X200800210757
    Description:

    Sample weights can be calibrated to reflect the known population totals of a set of auxiliary variables. Predictors of finite population totals calculated using these weights have low bias if these variables are related to the variable of interest, but can have high variance if too many auxiliary variables are used. This article develops an "adaptive calibration" approach, where the auxiliary variables to be used in weighting are selected using sample data. Adaptively calibrated estimators are shown to have lower mean squared error and better coverage properties than non-adaptive estimators in many cases.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210758
    Description:

    We propose a method for estimating the variance of estimators of changes over time, a method that takes account of all the components of these estimators: the sampling design, treatment of non-response, treatment of large companies, correlation of non-response from one wave to another, the effect of using a panel, robustification, and calibration using a ratio estimator. This method, which serves to determine the confidence intervals of changes over time, is then applied to the Swiss survey of value added.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210759
    Description:

    The analysis of stratified multistage sample data requires the use of design information such as stratum and primary sampling unit (PSU) identifiers, or associated replicate weights, in variance estimation. In some public release data files, such design information is masked as an effort to avoid their disclosure risk and yet to allow the user to obtain valid variance estimation. For example, in area surveys with a limited number of PSUs, the original PSUs are split or/and recombined to construct pseudo-PSUs with swapped second or subsequent stage sampling units. Such PSU masking methods, however, obviously distort the clustering structure of the sample design, yielding biased variance estimates possibly with certain systematic patterns between two variance estimates from the unmasked and masked PSU identifiers. Some of the previous work observed patterns in the ratio of the masked and unmasked variance estimates when plotted against the unmasked design effect. This paper investigates the effect of PSU masking on variance estimates under cluster sampling regarding various aspects including the clustering structure and the degree of masking. Also, we seek a PSU masking strategy through swapping of subsequent stage sampling units that helps reduce the resulting biases of the variance estimates. For illustration, we used data from the National Health Interview Survey (NHIS) with some artificial modification. The proposed strategy performs very well in reducing the biases of variance estimates. Both theory and empirical results indicate that the effect of PSU masking on variance estimates is modest with minimal swapping of subsequent stage sampling units. The proposed masking strategy has been applied to the 2003-2004 National Health and Nutrition Examination Survey (NHANES) data release.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210764
    Description:

    This paper considers situations where the target response value is either zero or an observation from a continuous distribution. A typical example analyzed in the paper is the assessment of literacy proficiency with the possible outcome being either zero, indicating illiteracy, or a positive score measuring the level of literacy. Our interest is in how to obtain valid estimates of the average response, or the proportion of positive responses in small areas, for which only small samples or no samples are available. As in other small area estimation problems, the small sample sizes in at least some of the sampled areas and/or the existence of nonsampled areas requires the use of model based methods. Available methods, however, are not suitable for this kind of data because of the mixed distribution of the responses, having a large peak at zero, juxtaposed to a continuous distribution for the rest of the responses. We develop, therefore, a suitable two-part random effects model and show how to fit the model and assess its goodness of fit, and how to compute the small area estimators of interest and measure their precision. The proposed method is illustrated using simulated data and data obtained from a literacy survey conducted in Cambodia.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800110610
    Description:

    A new generalized regression estimator of a finite population total based on the Box-Cox transformation technique and its variance estimator are proposed under a general unequal probability sampling design. By being design consistent, the proposed estimator maintains the robustness property of the GREG estimator even if the underlying model fails. Furthermore, the Box-Cox technique automatically finds a reasonable transformation for the dependent variable using the data. The robustness and efficiency of the new estimator are evaluated analytically and via Monte Carlo simulation studies.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110612
    Description:

    Lehtonen and Veijanen (1999) proposed a new model-assisted generalized regression (GREG) estimator of a small area mean under a two-level model. They have shown that the proposed estimator performs better than the customary GREG estimator in terms of average absolute relative bias and average median absolute relative error. We derive the mean squared error (MSE) of the new GREG estimator under the two-level model and compare it to the MSE of the best linear unbiased prediction (BLUP) estimator. We also provide empirical results on the relative efficiency of the estimators. We show that the new GREG estimator exhibits better performance relative to the customary GREG estimator in terms of average MSE and average absolute relative error. We also show that, due to borrowing strength from related small areas, the EBLUP estimator exhibits significantly better performance relative to the customary GREG and the new GREG estimators. We provide simulation results under a model-based set-up as well as under a real finite population.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110614
    Geography: Canada
    Description:

    The Canadian Labour Force Survey (LFS) produces monthly estimates of the unemployment rate at national and provincial levels. The LFS also releases unemployment estimates for sub-provincial areas such as Census Metropolitan Areas (CMAs) and Urban Centers (UCs). However, for some sub-provincial areas, the direct estimates are not reliable since the sample size in some areas is quite small. The small area estimation in LFS concerns estimation of unemployment rates for local sub-provincial areas such as CMA/UCs using small area models. In this paper, we will discuss various models including the Fay-Herriot model and cross-sectional and time series models. In particular, an integrated non-linear mixed effects model will be proposed under the hierarchical Bayes (HB) framework for the LFS unemployment rate estimation. Monthly Employment Insurance (EI) beneficiary data at the CMA/UC level are used as auxiliary covariates in the model. A HB approach with the Gibbs sampling method is used to obtain the estimates of posterior means and posterior variances of the CMA/UC level unemployment rates. The proposed HB model leads to reliable model-based estimates in terms of CV reduction. Model fit analysis and comparison of the model-based estimates with the direct estimates are presented in the paper.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110616
    Description:

    With complete multivariate data the BACON algorithm (Billor, Hadi and Vellemann 2000) yields a robust estimate of the covariance matrix. The corresponding Mahalanobis distance may be used for multivariate outlier detection. When items are missing the EM algorithm is a convenient way to estimate the covariance matrix at each iteration step of the BACON algorithm. In finite population sampling the EM algorithm must be enhanced to estimate the covariance matrix of the population rather than of the sample. A version of the EM algorithm for survey data following a multivariate normal model, the EEM algorithm (Estimated Expectation Maximization), is proposed. The combination of the two algorithms, the BACON-EEM algorithm, is applied to two datasets and compared with alternative methods.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110619
    Description:

    Small area prediction based on random effects, called EBLUP, is a procedure for constructing estimates for small geographical areas or small subpopulations using existing survey data. The total of the small area predictors is often forced to equal the direct survey estimate and such predictors are said to be calibrated. Several calibrated predictors are reviewed and a criterion that unifies the derivation of these calibrated predictors is presented. The predictor that is the unique best linear unbiased predictor under the criterion is derived and the mean square error of the calibrated predictors is discussed. Implicit in the imposition of the restriction is the possibility that the small area model is misspecified and the predictors are biased. Augmented models with one additional explanatory variable for which the usual small area predictors achieve the self-calibrated property are considered. Simulations demonstrate that calibrated predictors have slightly smaller bias compared to those of the usual EBLUP predictor. However, if the bias is a concern, a better approach is to use an augmented model with an added auxiliary variable that is a function of area size. In the simulation, the predictors based on the augmented model had smaller MSE than EBLUP when the incorrect model was used for prediction. Furthermore, there was a very small increase in MSE relative to EBLUP if the auxiliary variable was added to the correct model.

    Release date: 2008-06-26

  • Articles and reports: 11-522-X200600110390
    Description:

    We propose an aggregate level generalized linear model with additive random components (GLMARC) for binary count data from surveys. It has both linear (for random effects) and nonlinear (for fixed effects) parts in modeling the mean function and hence belongs to a class termed as mixed linear non-linear models. The model allows for linear mixed model (LMM)-type approach to small area estimation (SAE) somewhat similar to the well-known Fay-Herriot (1979) method and thus takes full account of the sampling design. Unlike the alternative hierarchical Bayes (HB) approach of You and Rao (2002), the proposed method gives rise to easily interpretable SAEs and frequentist diagnostics as well as self-benchmarking to reliable large area direct estimates. The usual LMM methodology is not appropriate for the problem with count data because of lack of range restrictions on the mean function and the possibility of unrealistic (e.g. zero in the context of SAE) estimates of the variance component as the model does not allow the random effect part of the conditional mean function to depend on the marginal mean. The proposed method is an improvement of the earlier method due to Vonesh and Carter (1992) which also uses mixed linear nonlinear models but the variance-mean relationship was not accounted for although typically done via range restrictions on the random effect. Also the implications of survey design were not considered as well as the estimation of random effects. In our application for SAE, however, it is important to obtain suitable estimates of both fixed and random effects. It may be noted that unlike the generalized linear mixed model (GLMM), GLMARC like LMM offers considerable simplicity in model fitting. This was made possible by replacing the original fixed and random effects of GLMM with a new set of parameters of GLMARC with quite a different interpretation as the random effect is no longer inside the nonlinear predictor function. However, this is of no consequence for SAE because the small area parameters correspond to the overall conditional means and not on individual model parameters. We propose a method of iterative BLUP for parameters estimation which allows for self-benchmarking after a suitable model enlargement. The problem of small areas with small or no sample sizes or zero direct estimates is addressed by collapsing domains only for the stage of parameter estimation. Application to the 2000-01 Canadian Community Health Survey for estimation of the proportion of daily smokers in subpopulations defined by provincial health regions by age-sex groups is presented as an illustration.

    Release date: 2008-03-17
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (32)

Analysis (32) (0 to 10 of 32 results)

  • Articles and reports: 12-001-X200800210757
    Description:

    Sample weights can be calibrated to reflect the known population totals of a set of auxiliary variables. Predictors of finite population totals calculated using these weights have low bias if these variables are related to the variable of interest, but can have high variance if too many auxiliary variables are used. This article develops an "adaptive calibration" approach, where the auxiliary variables to be used in weighting are selected using sample data. Adaptively calibrated estimators are shown to have lower mean squared error and better coverage properties than non-adaptive estimators in many cases.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210758
    Description:

    We propose a method for estimating the variance of estimators of changes over time, a method that takes account of all the components of these estimators: the sampling design, treatment of non-response, treatment of large companies, correlation of non-response from one wave to another, the effect of using a panel, robustification, and calibration using a ratio estimator. This method, which serves to determine the confidence intervals of changes over time, is then applied to the Swiss survey of value added.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210759
    Description:

    The analysis of stratified multistage sample data requires the use of design information such as stratum and primary sampling unit (PSU) identifiers, or associated replicate weights, in variance estimation. In some public release data files, such design information is masked as an effort to avoid their disclosure risk and yet to allow the user to obtain valid variance estimation. For example, in area surveys with a limited number of PSUs, the original PSUs are split or/and recombined to construct pseudo-PSUs with swapped second or subsequent stage sampling units. Such PSU masking methods, however, obviously distort the clustering structure of the sample design, yielding biased variance estimates possibly with certain systematic patterns between two variance estimates from the unmasked and masked PSU identifiers. Some of the previous work observed patterns in the ratio of the masked and unmasked variance estimates when plotted against the unmasked design effect. This paper investigates the effect of PSU masking on variance estimates under cluster sampling regarding various aspects including the clustering structure and the degree of masking. Also, we seek a PSU masking strategy through swapping of subsequent stage sampling units that helps reduce the resulting biases of the variance estimates. For illustration, we used data from the National Health Interview Survey (NHIS) with some artificial modification. The proposed strategy performs very well in reducing the biases of variance estimates. Both theory and empirical results indicate that the effect of PSU masking on variance estimates is modest with minimal swapping of subsequent stage sampling units. The proposed masking strategy has been applied to the 2003-2004 National Health and Nutrition Examination Survey (NHANES) data release.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210764
    Description:

    This paper considers situations where the target response value is either zero or an observation from a continuous distribution. A typical example analyzed in the paper is the assessment of literacy proficiency with the possible outcome being either zero, indicating illiteracy, or a positive score measuring the level of literacy. Our interest is in how to obtain valid estimates of the average response, or the proportion of positive responses in small areas, for which only small samples or no samples are available. As in other small area estimation problems, the small sample sizes in at least some of the sampled areas and/or the existence of nonsampled areas requires the use of model based methods. Available methods, however, are not suitable for this kind of data because of the mixed distribution of the responses, having a large peak at zero, juxtaposed to a continuous distribution for the rest of the responses. We develop, therefore, a suitable two-part random effects model and show how to fit the model and assess its goodness of fit, and how to compute the small area estimators of interest and measure their precision. The proposed method is illustrated using simulated data and data obtained from a literacy survey conducted in Cambodia.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800110610
    Description:

    A new generalized regression estimator of a finite population total based on the Box-Cox transformation technique and its variance estimator are proposed under a general unequal probability sampling design. By being design consistent, the proposed estimator maintains the robustness property of the GREG estimator even if the underlying model fails. Furthermore, the Box-Cox technique automatically finds a reasonable transformation for the dependent variable using the data. The robustness and efficiency of the new estimator are evaluated analytically and via Monte Carlo simulation studies.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110612
    Description:

    Lehtonen and Veijanen (1999) proposed a new model-assisted generalized regression (GREG) estimator of a small area mean under a two-level model. They have shown that the proposed estimator performs better than the customary GREG estimator in terms of average absolute relative bias and average median absolute relative error. We derive the mean squared error (MSE) of the new GREG estimator under the two-level model and compare it to the MSE of the best linear unbiased prediction (BLUP) estimator. We also provide empirical results on the relative efficiency of the estimators. We show that the new GREG estimator exhibits better performance relative to the customary GREG estimator in terms of average MSE and average absolute relative error. We also show that, due to borrowing strength from related small areas, the EBLUP estimator exhibits significantly better performance relative to the customary GREG and the new GREG estimators. We provide simulation results under a model-based set-up as well as under a real finite population.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110614
    Geography: Canada
    Description:

    The Canadian Labour Force Survey (LFS) produces monthly estimates of the unemployment rate at national and provincial levels. The LFS also releases unemployment estimates for sub-provincial areas such as Census Metropolitan Areas (CMAs) and Urban Centers (UCs). However, for some sub-provincial areas, the direct estimates are not reliable since the sample size in some areas is quite small. The small area estimation in LFS concerns estimation of unemployment rates for local sub-provincial areas such as CMA/UCs using small area models. In this paper, we will discuss various models including the Fay-Herriot model and cross-sectional and time series models. In particular, an integrated non-linear mixed effects model will be proposed under the hierarchical Bayes (HB) framework for the LFS unemployment rate estimation. Monthly Employment Insurance (EI) beneficiary data at the CMA/UC level are used as auxiliary covariates in the model. A HB approach with the Gibbs sampling method is used to obtain the estimates of posterior means and posterior variances of the CMA/UC level unemployment rates. The proposed HB model leads to reliable model-based estimates in terms of CV reduction. Model fit analysis and comparison of the model-based estimates with the direct estimates are presented in the paper.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110616
    Description:

    With complete multivariate data the BACON algorithm (Billor, Hadi and Vellemann 2000) yields a robust estimate of the covariance matrix. The corresponding Mahalanobis distance may be used for multivariate outlier detection. When items are missing the EM algorithm is a convenient way to estimate the covariance matrix at each iteration step of the BACON algorithm. In finite population sampling the EM algorithm must be enhanced to estimate the covariance matrix of the population rather than of the sample. A version of the EM algorithm for survey data following a multivariate normal model, the EEM algorithm (Estimated Expectation Maximization), is proposed. The combination of the two algorithms, the BACON-EEM algorithm, is applied to two datasets and compared with alternative methods.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110619
    Description:

    Small area prediction based on random effects, called EBLUP, is a procedure for constructing estimates for small geographical areas or small subpopulations using existing survey data. The total of the small area predictors is often forced to equal the direct survey estimate and such predictors are said to be calibrated. Several calibrated predictors are reviewed and a criterion that unifies the derivation of these calibrated predictors is presented. The predictor that is the unique best linear unbiased predictor under the criterion is derived and the mean square error of the calibrated predictors is discussed. Implicit in the imposition of the restriction is the possibility that the small area model is misspecified and the predictors are biased. Augmented models with one additional explanatory variable for which the usual small area predictors achieve the self-calibrated property are considered. Simulations demonstrate that calibrated predictors have slightly smaller bias compared to those of the usual EBLUP predictor. However, if the bias is a concern, a better approach is to use an augmented model with an added auxiliary variable that is a function of area size. In the simulation, the predictors based on the augmented model had smaller MSE than EBLUP when the incorrect model was used for prediction. Furthermore, there was a very small increase in MSE relative to EBLUP if the auxiliary variable was added to the correct model.

    Release date: 2008-06-26

  • Articles and reports: 11-522-X200600110390
    Description:

    We propose an aggregate level generalized linear model with additive random components (GLMARC) for binary count data from surveys. It has both linear (for random effects) and nonlinear (for fixed effects) parts in modeling the mean function and hence belongs to a class termed as mixed linear non-linear models. The model allows for linear mixed model (LMM)-type approach to small area estimation (SAE) somewhat similar to the well-known Fay-Herriot (1979) method and thus takes full account of the sampling design. Unlike the alternative hierarchical Bayes (HB) approach of You and Rao (2002), the proposed method gives rise to easily interpretable SAEs and frequentist diagnostics as well as self-benchmarking to reliable large area direct estimates. The usual LMM methodology is not appropriate for the problem with count data because of lack of range restrictions on the mean function and the possibility of unrealistic (e.g. zero in the context of SAE) estimates of the variance component as the model does not allow the random effect part of the conditional mean function to depend on the marginal mean. The proposed method is an improvement of the earlier method due to Vonesh and Carter (1992) which also uses mixed linear nonlinear models but the variance-mean relationship was not accounted for although typically done via range restrictions on the random effect. Also the implications of survey design were not considered as well as the estimation of random effects. In our application for SAE, however, it is important to obtain suitable estimates of both fixed and random effects. It may be noted that unlike the generalized linear mixed model (GLMM), GLMARC like LMM offers considerable simplicity in model fitting. This was made possible by replacing the original fixed and random effects of GLMM with a new set of parameters of GLMARC with quite a different interpretation as the random effect is no longer inside the nonlinear predictor function. However, this is of no consequence for SAE because the small area parameters correspond to the overall conditional means and not on individual model parameters. We propose a method of iterative BLUP for parameters estimation which allows for self-benchmarking after a suitable model enlargement. The problem of small areas with small or no sample sizes or zero direct estimates is addressed by collapsing domains only for the stage of parameter estimation. Application to the 2000-01 Canadian Community Health Survey for estimation of the proportion of daily smokers in subpopulations defined by provincial health regions by age-sex groups is presented as an illustration.

    Release date: 2008-03-17
Reference (0)

Reference (0) (0 results)

No content available at this time.

Date modified: