Weighting and estimation

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Type

1 facets displayed. 0 facets selected.

Survey or statistical program

1 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (29)

All (29) (0 to 10 of 29 results)

  • Articles and reports: 12-001-X202300200002
    Description: Being able to quantify the accuracy (bias, variance) of published output is crucial in official statistics. Output in official statistics is nearly always divided into subpopulations according to some classification variable, such as mean income by categories of educational level. Such output is also referred to as domain statistics. In the current paper, we limit ourselves to binary classification variables. In practice, misclassifications occur and these contribute to the bias and variance of domain statistics. Existing analytical and numerical methods to estimate this effect have two disadvantages. The first disadvantage is that they require that the misclassification probabilities are known beforehand and the second is that the bias and variance estimates are biased themselves. In the current paper we present a new method, a Gaussian mixture model estimated by an Expectation-Maximisation (EM) algorithm combined with a bootstrap, referred to as the EM bootstrap method. This new method does not require that the misclassification probabilities are known beforehand, although it is more efficient when a small audit sample is used that yields a starting value for the misclassification probabilities in the EM algorithm. We compared the performance of the new method with currently available numerical methods: the bootstrap method and the SIMEX method. Previous research has shown that for non-linear parameters the bootstrap outperforms the analytical expressions. For nearly all conditions tested, the bias and variance estimates that are obtained by the EM bootstrap method are closer to their true values than those obtained by the bootstrap and SIMEX methods. We end this paper by discussing the results and possible future extensions of the method.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300100004
    Description: The Dutch Health Survey (DHS), conducted by Statistics Netherlands, is designed to produce reliable direct estimates at an annual frequency. Data collection is based on a combination of web interviewing and face-to-face interviewing. Due to lockdown measures during the Covid-19 pandemic there was no or less face-to-face interviewing possible, which resulted in a sudden change in measurement and selection effects in the survey outcomes. Furthermore, the production of annual data about the effect of Covid-19 on health-related themes with a delay of about one year compromises the relevance of the survey. The sample size of the DHS does not allow the production of figures for shorter reference periods. Both issues are solved by developing a bivariate structural time series model (STM) to estimate quarterly figures for eight key health indicators. This model combines two series of direct estimates, a series based on complete response and a series based on web response only and provides model-based predictions for the indicators that are corrected for the loss of face-to-face interviews during the lockdown periods. The model is also used as a form of small area estimation and borrows sample information observed in previous reference periods. In this way timely and relevant statistics describing the effects of the corona crisis on the development of Dutch health are published. In this paper the method based on the bivariate STM is compared with two alternative methods. The first one uses a univariate STM where no correction for the lack of face-to-face observation is applied to the estimates. The second one uses a univariate STM that also contains an intervention variable that models the effect of the loss of face-to-face response during the lockdown.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202200200010
    Description:

    Multilevel time series (MTS) models are applied to estimate trends in time series of antenatal care coverage at several administrative levels in Bangladesh, based on repeated editions of the Bangladesh Demographic and Health Survey (BDHS) within the period 1994-2014. MTS models are expressed in an hierarchical Bayesian framework and fitted using Markov Chain Monte Carlo simulations. The models account for varying time lags of three or four years between the editions of the BDHS and provide predictions for the intervening years as well. It is proposed to apply cross-sectional Fay-Herriot models to the survey years separately at district level, which is the most detailed regional level. Time series of these small domain predictions at the district level and their variance-covariance matrices are used as input series for the MTS models. Spatial correlations among districts, random intercept and slope at the district level, and different trend models at district level and higher regional levels are examined in the MTS models to borrow strength over time and space. Trend estimates at district level are obtained directly from the model outputs, while trend estimates at higher regional and national levels are obtained by aggregation of the district level predictions, resulting in a numerically consistent set of trend estimates.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202100100008
    Description:

    Changes in the design of a repeated survey generally result in systematic effects in the sample estimates, which are further referred to as discontinuities. To avoid confounding real period-to-period change with the effects of a redesign, discontinuities are often quantified by conducting the old and the new design in parallel for some period of time. Sample sizes of such parallel runs are generally too small to apply direct estimators for domain discontinuities. A bivariate hierarchical Bayesian Fay-Herriot (FH) model is proposed to obtain more precise predictions for domain discontinuities and is applied to a redesign of the Dutch Crime Victimization Survey. This method is compared with a univariate FH model where the direct estimates under the regular approach are used as covariates in a FH model for the alternative approach conducted on a reduced sample size and a univariate FH model where the direct estimates for the discontinuities are modeled directly. An adjusted step forward selection procedure is proposed that minimizes the WAIC until the reduction of the WAIC is smaller than the standard error of this criteria. With this approach more parsimonious models are selected, which prevents selecting complex models that tend to overfit the data.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X201900300001
    Description:

    Standard linearization estimators of the variance of the general regression estimator are often too small, leading to confidence intervals that do not cover at the desired rate. Hat matrix adjustments can be used in two-stage sampling that help remedy this problem. We present theory for several new variance estimators and compare them to standard estimators in a series of simulations. The proposed estimators correct negative biases and improve confidence interval coverage rates in a variety of situations that mirror ones that are met in practice.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300005
    Description:

    Monthly estimates of provincial unemployment based on the Dutch Labour Force Survey (LFS) are obtained using time series models. The models account for rotation group bias and serial correlation due to the rotating panel design of the LFS. This paper compares two approaches of estimating structural time series models (STM). In the first approach STMs are expressed as state space models, fitted using a Kalman filter and smoother in a frequentist framework. As an alternative, these STMs are expressed as time series multilevel models in an hierarchical Bayesian framework, and estimated using a Gibbs sampler. Monthly unemployment estimates and standard errors based on these models are compared for the twelve provinces of the Netherlands. Pros and cons of the multilevel approach and state space approach are discussed. Multivariate STMs are appropriate to borrow strength over time and space. Modeling the full correlation matrix between time series components rapidly increases the numbers of hyperparameters to be estimated. Modeling common factors is one possibility to obtain more parsimonious models that still account for cross-sectional correlation. In this paper an even more parsimonious approach is proposed, where domains share one overall trend, and have their own independent trends for the domain-specific deviations from this overall trend. The time series modeling approach is particularly appropriate to estimate month-to-month change of unemployment.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900100004
    Description:

    In this paper, we make use of auxiliary information to improve the efficiency of the estimates of the censored quantile regression parameters. Utilizing the information available from previous studies, we computed empirical likelihood probabilities as weights and proposed weighted censored quantile regression. Theoretical properties of the proposed method are derived. Our simulation studies shown that our proposed method has advantages compared to standard censored quantile regression.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201800154963
    Description:

    The probability-sampling-based framework has dominated survey research because it provides precise mathematical tools to assess sampling variability. However increasing costs and declining response rates are expanding the use of non-probability samples, particularly in general population settings, where samples of individuals pulled from web surveys are becoming increasingly cheap and easy to access. But non-probability samples are at risk for selection bias due to differential access, degrees of interest, and other factors. Calibration to known statistical totals in the population provide a means of potentially diminishing the effect of selection bias in non-probability samples. Here we show that model calibration using adaptive LASSO can yield a consistent estimator of a population total as long as a subset of the true predictors is included in the prediction model, thus allowing large numbers of possible covariates to be included without risk of overfitting. We show that the model calibration using adaptive LASSO provides improved estimation with respect to mean square error relative to standard competitors such as generalized regression (GREG) estimators when a large number of covariates are required to determine the true model, with effectively no loss in efficiency over GREG when smaller models will suffice. We also derive closed form variance estimators of population totals, and compare their behavior with bootstrap estimators. We conclude with a real world example using data from the National Health Interview Survey.

    Release date: 2018-06-21

  • Articles and reports: 12-001-X201700114819
    Description:

    Structural time series models are a powerful technique for variance reduction in the framework of small area estimation (SAE) based on repeatedly conducted surveys. Statistics Netherlands implemented a structural time series model to produce monthly figures about the labour force with the Dutch Labour Force Survey (DLFS). Such models, however, contain unknown hyperparameters that have to be estimated before the Kalman filter can be launched to estimate state variables of the model. This paper describes a simulation aimed at studying the properties of hyperparameter estimators in the model. Simulating distributions of the hyperparameter estimators under different model specifications complements standard model diagnostics for state space models. Uncertainty around the model hyperparameters is another major issue. To account for hyperparameter uncertainty in the mean squared errors (MSE) estimates of the DLFS, several estimation approaches known in the literature are considered in a simulation. Apart from the MSE bias comparison, this paper also provides insight into the variances and MSEs of the MSE estimators considered.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600114544
    Description:

    In the Netherlands, statistical information about income and wealth is based on two large scale household panels that are completely derived from administrative data. A problem with using households as sampling units in the sample design of panels is the instability of these units over time. Changes in the household composition affect the inclusion probabilities required for design-based and model-assisted inference procedures. Such problems are circumvented in the two aforementioned household panels by sampling persons, who are followed over time. At each period the household members of these sampled persons are included in the sample. This is equivalent to sampling with probabilities proportional to household size where households can be selected more than once but with a maximum equal to the number of household members. In this paper properties of this sample design are described and contrasted with the Generalized Weight Share method for indirect sampling (Lavallée 1995, 2007). Methods are illustrated with an application to the Dutch Regional Income Survey.

    Release date: 2016-06-22
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (29)

Analysis (29) (0 to 10 of 29 results)

  • Articles and reports: 12-001-X202300200002
    Description: Being able to quantify the accuracy (bias, variance) of published output is crucial in official statistics. Output in official statistics is nearly always divided into subpopulations according to some classification variable, such as mean income by categories of educational level. Such output is also referred to as domain statistics. In the current paper, we limit ourselves to binary classification variables. In practice, misclassifications occur and these contribute to the bias and variance of domain statistics. Existing analytical and numerical methods to estimate this effect have two disadvantages. The first disadvantage is that they require that the misclassification probabilities are known beforehand and the second is that the bias and variance estimates are biased themselves. In the current paper we present a new method, a Gaussian mixture model estimated by an Expectation-Maximisation (EM) algorithm combined with a bootstrap, referred to as the EM bootstrap method. This new method does not require that the misclassification probabilities are known beforehand, although it is more efficient when a small audit sample is used that yields a starting value for the misclassification probabilities in the EM algorithm. We compared the performance of the new method with currently available numerical methods: the bootstrap method and the SIMEX method. Previous research has shown that for non-linear parameters the bootstrap outperforms the analytical expressions. For nearly all conditions tested, the bias and variance estimates that are obtained by the EM bootstrap method are closer to their true values than those obtained by the bootstrap and SIMEX methods. We end this paper by discussing the results and possible future extensions of the method.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300100004
    Description: The Dutch Health Survey (DHS), conducted by Statistics Netherlands, is designed to produce reliable direct estimates at an annual frequency. Data collection is based on a combination of web interviewing and face-to-face interviewing. Due to lockdown measures during the Covid-19 pandemic there was no or less face-to-face interviewing possible, which resulted in a sudden change in measurement and selection effects in the survey outcomes. Furthermore, the production of annual data about the effect of Covid-19 on health-related themes with a delay of about one year compromises the relevance of the survey. The sample size of the DHS does not allow the production of figures for shorter reference periods. Both issues are solved by developing a bivariate structural time series model (STM) to estimate quarterly figures for eight key health indicators. This model combines two series of direct estimates, a series based on complete response and a series based on web response only and provides model-based predictions for the indicators that are corrected for the loss of face-to-face interviews during the lockdown periods. The model is also used as a form of small area estimation and borrows sample information observed in previous reference periods. In this way timely and relevant statistics describing the effects of the corona crisis on the development of Dutch health are published. In this paper the method based on the bivariate STM is compared with two alternative methods. The first one uses a univariate STM where no correction for the lack of face-to-face observation is applied to the estimates. The second one uses a univariate STM that also contains an intervention variable that models the effect of the loss of face-to-face response during the lockdown.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202200200010
    Description:

    Multilevel time series (MTS) models are applied to estimate trends in time series of antenatal care coverage at several administrative levels in Bangladesh, based on repeated editions of the Bangladesh Demographic and Health Survey (BDHS) within the period 1994-2014. MTS models are expressed in an hierarchical Bayesian framework and fitted using Markov Chain Monte Carlo simulations. The models account for varying time lags of three or four years between the editions of the BDHS and provide predictions for the intervening years as well. It is proposed to apply cross-sectional Fay-Herriot models to the survey years separately at district level, which is the most detailed regional level. Time series of these small domain predictions at the district level and their variance-covariance matrices are used as input series for the MTS models. Spatial correlations among districts, random intercept and slope at the district level, and different trend models at district level and higher regional levels are examined in the MTS models to borrow strength over time and space. Trend estimates at district level are obtained directly from the model outputs, while trend estimates at higher regional and national levels are obtained by aggregation of the district level predictions, resulting in a numerically consistent set of trend estimates.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202100100008
    Description:

    Changes in the design of a repeated survey generally result in systematic effects in the sample estimates, which are further referred to as discontinuities. To avoid confounding real period-to-period change with the effects of a redesign, discontinuities are often quantified by conducting the old and the new design in parallel for some period of time. Sample sizes of such parallel runs are generally too small to apply direct estimators for domain discontinuities. A bivariate hierarchical Bayesian Fay-Herriot (FH) model is proposed to obtain more precise predictions for domain discontinuities and is applied to a redesign of the Dutch Crime Victimization Survey. This method is compared with a univariate FH model where the direct estimates under the regular approach are used as covariates in a FH model for the alternative approach conducted on a reduced sample size and a univariate FH model where the direct estimates for the discontinuities are modeled directly. An adjusted step forward selection procedure is proposed that minimizes the WAIC until the reduction of the WAIC is smaller than the standard error of this criteria. With this approach more parsimonious models are selected, which prevents selecting complex models that tend to overfit the data.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X201900300001
    Description:

    Standard linearization estimators of the variance of the general regression estimator are often too small, leading to confidence intervals that do not cover at the desired rate. Hat matrix adjustments can be used in two-stage sampling that help remedy this problem. We present theory for several new variance estimators and compare them to standard estimators in a series of simulations. The proposed estimators correct negative biases and improve confidence interval coverage rates in a variety of situations that mirror ones that are met in practice.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300005
    Description:

    Monthly estimates of provincial unemployment based on the Dutch Labour Force Survey (LFS) are obtained using time series models. The models account for rotation group bias and serial correlation due to the rotating panel design of the LFS. This paper compares two approaches of estimating structural time series models (STM). In the first approach STMs are expressed as state space models, fitted using a Kalman filter and smoother in a frequentist framework. As an alternative, these STMs are expressed as time series multilevel models in an hierarchical Bayesian framework, and estimated using a Gibbs sampler. Monthly unemployment estimates and standard errors based on these models are compared for the twelve provinces of the Netherlands. Pros and cons of the multilevel approach and state space approach are discussed. Multivariate STMs are appropriate to borrow strength over time and space. Modeling the full correlation matrix between time series components rapidly increases the numbers of hyperparameters to be estimated. Modeling common factors is one possibility to obtain more parsimonious models that still account for cross-sectional correlation. In this paper an even more parsimonious approach is proposed, where domains share one overall trend, and have their own independent trends for the domain-specific deviations from this overall trend. The time series modeling approach is particularly appropriate to estimate month-to-month change of unemployment.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900100004
    Description:

    In this paper, we make use of auxiliary information to improve the efficiency of the estimates of the censored quantile regression parameters. Utilizing the information available from previous studies, we computed empirical likelihood probabilities as weights and proposed weighted censored quantile regression. Theoretical properties of the proposed method are derived. Our simulation studies shown that our proposed method has advantages compared to standard censored quantile regression.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201800154963
    Description:

    The probability-sampling-based framework has dominated survey research because it provides precise mathematical tools to assess sampling variability. However increasing costs and declining response rates are expanding the use of non-probability samples, particularly in general population settings, where samples of individuals pulled from web surveys are becoming increasingly cheap and easy to access. But non-probability samples are at risk for selection bias due to differential access, degrees of interest, and other factors. Calibration to known statistical totals in the population provide a means of potentially diminishing the effect of selection bias in non-probability samples. Here we show that model calibration using adaptive LASSO can yield a consistent estimator of a population total as long as a subset of the true predictors is included in the prediction model, thus allowing large numbers of possible covariates to be included without risk of overfitting. We show that the model calibration using adaptive LASSO provides improved estimation with respect to mean square error relative to standard competitors such as generalized regression (GREG) estimators when a large number of covariates are required to determine the true model, with effectively no loss in efficiency over GREG when smaller models will suffice. We also derive closed form variance estimators of population totals, and compare their behavior with bootstrap estimators. We conclude with a real world example using data from the National Health Interview Survey.

    Release date: 2018-06-21

  • Articles and reports: 12-001-X201700114819
    Description:

    Structural time series models are a powerful technique for variance reduction in the framework of small area estimation (SAE) based on repeatedly conducted surveys. Statistics Netherlands implemented a structural time series model to produce monthly figures about the labour force with the Dutch Labour Force Survey (DLFS). Such models, however, contain unknown hyperparameters that have to be estimated before the Kalman filter can be launched to estimate state variables of the model. This paper describes a simulation aimed at studying the properties of hyperparameter estimators in the model. Simulating distributions of the hyperparameter estimators under different model specifications complements standard model diagnostics for state space models. Uncertainty around the model hyperparameters is another major issue. To account for hyperparameter uncertainty in the mean squared errors (MSE) estimates of the DLFS, several estimation approaches known in the literature are considered in a simulation. Apart from the MSE bias comparison, this paper also provides insight into the variances and MSEs of the MSE estimators considered.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600114544
    Description:

    In the Netherlands, statistical information about income and wealth is based on two large scale household panels that are completely derived from administrative data. A problem with using households as sampling units in the sample design of panels is the instability of these units over time. Changes in the household composition affect the inclusion probabilities required for design-based and model-assisted inference procedures. Such problems are circumvented in the two aforementioned household panels by sampling persons, who are followed over time. At each period the household members of these sampled persons are included in the sample. This is equivalent to sampling with probabilities proportional to household size where households can be selected more than once but with a maximum equal to the number of household members. In this paper properties of this sample design are described and contrasted with the Generalized Weight Share method for indirect sampling (Lavallée 1995, 2007). Methods are illustrated with an application to the Dutch Regional Income Survey.

    Release date: 2016-06-22
Reference (0)

Reference (0) (0 results)

No content available at this time.

Date modified: