Weighting and estimation

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Type

1 facets displayed. 0 facets selected.

Survey or statistical program

2 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (16)

All (16) (0 to 10 of 16 results)

  • Articles and reports: 75F0002M2007007
    Description:

    The Survey of Labour and Income Dynamics (SLID), introduced in the 1993 reference year, is a longitudinal panel survey of individuals. The purpose of the survey is to measure changes in the economic well-being of individuals and the factors that influence these changes. SLID's sample is divided into two overlapping panels, each six years in length. Longitudinal surveys like SLID are complex due to the dynamic nature of the sample, which in turn is due to the ever-changing composition of households and families over the years. For each reference year, SLID produces two sets of weights: one is representative of the initial population (the longitudinal weights), while the other is representative of the current population (the cross-sectional weights). Since 2002, SLID has been producing a third set of weights which combines two panels that overlap to form a new longitudinal sample. The new weights are referred to as combined longitudinal weights.

    For the production of the cross-sectional weights, SLID combines two independent samples and assigns a probability of selection to individuals who joined the sample after the panel was selected. Like cross-sectional weights, longitudinal weights are adjusted for non-response and influential values. In addition, the sample is adjusted to make it representative of the target population. The purpose of this document is to describe SLID's methodology for the longitudinal and cross-sectional weights, as well as to present problems that have been encountered, and solutions that have been proposed. For the purpose of illustration, results for the 2003 reference year are used. The methodology used to produce the combined longitudinal weights will not be presented in this document as there is a complete description in Naud (2004).

    Release date: 2007-10-18

  • Articles and reports: 12-001-X20070019847
    Description:

    We investigate the impact of cluster sampling on standard errors in the analysis of longitudinal survey data. We consider a widely used class of regression models for longitudinal data and a standard class of point estimators of a generalized least squares type. We argue theoretically that the impact of ignoring clustering in standard error estimation will tend to increase with the number of waves in the analysis, under some patterns of clustering which are realistic for many social surveys. The implication is that it is, in general, at least as important to allow for clustering in standard errors for longitudinal analyses as for cross-sectional analyses. We illustrate this theoretical argument with empirical evidence from a regression analysis of longitudinal data on gender role attitudes from the British Household Panel Survey. We also compare two approaches to variance estimation in the analysis of longitudinal survey data: a survey sampling approach based upon linearization and a multilevel modelling approach. We conclude that the impact of clustering can be seriously underestimated if it is simply handled by including an additive random effect to represent the clustering in a multilevel model.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019849
    Description:

    In sample surveys where units have unequal probabilities of inclusion in the sample, associations between the probability of inclusion and the statistic of interest can induce bias. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have large weights, which can introduce undesirable variability in statistics such as the population mean estimator or population regression estimator. Weight trimming reduces large weights to a fixed cutpoint value and adjusts weights below this value to maintain the untrimmed weight sum, reducing variability at the cost of introducing some bias. Most standard approaches are ad-hoc in that they do not use the data to optimize bias-variance tradeoffs. Approaches described in the literature that are data-driven are a little more efficient than fully-weighted estimators. This paper develops Bayesian methods for weight trimming of linear and generalized linear regression estimators in unequal probability-of-inclusion designs. An application to estimate injury risk of children rear-seated in compact extended-cab pickup trucks using the Partners for Child Passenger Safety surveillance survey is considered.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019850
    Description:

    Auxiliary information is often used to improve the precision of survey estimators of finite population means and totals through ratio or linear regression estimation techniques. Resulting estimators have good theoretical and practical properties, including invariance, calibration and design consistency. However, it is not always clear that ratio or linear models are good approximations to the true relationship between the auxiliary variables and the variable of interest in the survey, resulting in efficiency loss when the model is not appropriate. In this article, we explain how regression estimation can be extended to incorporate semiparametric regression models, in both simple and more complicated designs. While maintaining the good theoretical and practical properties of the linear models, semiparametric models are better able to capture complicated relationships between variables. This often results in substantial gains in efficiency. The applicability of the approach for complex designs using multiple types of auxiliary variables will be illustrated by estimating several acidification-related characteristics for a survey of lakes in the Northeastern US.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019851
    Description:

    To model economic depreciation, a database is used that contains information on assets discarded by companies. The acquisition and resale prices are known along with the length of use of these assets. However, the assets for which prices are known are only those that were involved in a transaction. While an asset depreciates on a continuous basis during its service life, the value of the asset is only known when there has been a transaction. This article proposes an ex post weighting to offset the effect of source of error in building econometric models.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019852
    Description:

    A common class of survey designs involves selecting all people within selected households. Generalized regression estimators can be calculated at either the person or household level. Implementing the estimator at the household level has the convenience of equal estimation weights for people within households. In this article the two approaches are compared theoretically and empirically for the case of simple random sampling of households and selection of all persons in each selected household. We find that the household level approach is theoretically more efficient in large samples and any empirical inefficiency in small samples is limited.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019853
    Description:

    Two-phase sampling is a useful design when the auxiliary variables are unavailable in advance. Variance estimation under this design, however, is complicated particularly when sampling fractions are high. This article addresses a simple bootstrap method for two-phase simple random sampling without replacement at each phase with high sampling fractions. It works for the estimation of distribution functions and quantiles since no rescaling is performed. The method can be extended to stratified two-phase sampling by independently repeating the proposed procedure in different strata. Variance estimation of some conventional estimators, such as the ratio and regression estimators, is studied for illustration. A simulation study is conducted to compare the proposed method with existing variance estimators for estimating distribution functions and quantiles.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019854
    Description:

    We derive an estimator of the mean squared error (MSE) of the empirical Bayes and composite estimator of the local-area mean in the standard small-area setting. The MSE estimator is a composition of the established estimator based on the conditional expectation of the random deviation associated with the area and a naïve estimator of the design-based MSE. Its performance is assessed by simulations. Variants of this MSE estimator are explored and some extensions outlined.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019855
    Description:

    In surveys under cluster sampling, nonresponse on a variable is often dependent on a cluster level random effect and, hence, is nonignorable. Estimators of the population mean obtained by mean imputation or reweighting under the ignorable nonresponse assumption are then biased. We propose an unbiased estimator of the population mean by imputing or reweighting within each sampled cluster or a group of sampled clusters sharing some common feature. Some simulation results are presented to study the performance of the proposed estimator.

    Release date: 2007-06-28

  • Articles and reports: 11-522-X20050019449
    Description:

    Literature about Multiple Frame estimation theory mainly concentrates over the Dual Frame case and it is only rarely concerned with the important practical issue of the variance estimation. By using a multiplicity approach a fixed weights Single Frame estimator for Multiple Frame Survey is proposed.

    Release date: 2007-03-02
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (16)

Analysis (16) (0 to 10 of 16 results)

  • Articles and reports: 75F0002M2007007
    Description:

    The Survey of Labour and Income Dynamics (SLID), introduced in the 1993 reference year, is a longitudinal panel survey of individuals. The purpose of the survey is to measure changes in the economic well-being of individuals and the factors that influence these changes. SLID's sample is divided into two overlapping panels, each six years in length. Longitudinal surveys like SLID are complex due to the dynamic nature of the sample, which in turn is due to the ever-changing composition of households and families over the years. For each reference year, SLID produces two sets of weights: one is representative of the initial population (the longitudinal weights), while the other is representative of the current population (the cross-sectional weights). Since 2002, SLID has been producing a third set of weights which combines two panels that overlap to form a new longitudinal sample. The new weights are referred to as combined longitudinal weights.

    For the production of the cross-sectional weights, SLID combines two independent samples and assigns a probability of selection to individuals who joined the sample after the panel was selected. Like cross-sectional weights, longitudinal weights are adjusted for non-response and influential values. In addition, the sample is adjusted to make it representative of the target population. The purpose of this document is to describe SLID's methodology for the longitudinal and cross-sectional weights, as well as to present problems that have been encountered, and solutions that have been proposed. For the purpose of illustration, results for the 2003 reference year are used. The methodology used to produce the combined longitudinal weights will not be presented in this document as there is a complete description in Naud (2004).

    Release date: 2007-10-18

  • Articles and reports: 12-001-X20070019847
    Description:

    We investigate the impact of cluster sampling on standard errors in the analysis of longitudinal survey data. We consider a widely used class of regression models for longitudinal data and a standard class of point estimators of a generalized least squares type. We argue theoretically that the impact of ignoring clustering in standard error estimation will tend to increase with the number of waves in the analysis, under some patterns of clustering which are realistic for many social surveys. The implication is that it is, in general, at least as important to allow for clustering in standard errors for longitudinal analyses as for cross-sectional analyses. We illustrate this theoretical argument with empirical evidence from a regression analysis of longitudinal data on gender role attitudes from the British Household Panel Survey. We also compare two approaches to variance estimation in the analysis of longitudinal survey data: a survey sampling approach based upon linearization and a multilevel modelling approach. We conclude that the impact of clustering can be seriously underestimated if it is simply handled by including an additive random effect to represent the clustering in a multilevel model.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019849
    Description:

    In sample surveys where units have unequal probabilities of inclusion in the sample, associations between the probability of inclusion and the statistic of interest can induce bias. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have large weights, which can introduce undesirable variability in statistics such as the population mean estimator or population regression estimator. Weight trimming reduces large weights to a fixed cutpoint value and adjusts weights below this value to maintain the untrimmed weight sum, reducing variability at the cost of introducing some bias. Most standard approaches are ad-hoc in that they do not use the data to optimize bias-variance tradeoffs. Approaches described in the literature that are data-driven are a little more efficient than fully-weighted estimators. This paper develops Bayesian methods for weight trimming of linear and generalized linear regression estimators in unequal probability-of-inclusion designs. An application to estimate injury risk of children rear-seated in compact extended-cab pickup trucks using the Partners for Child Passenger Safety surveillance survey is considered.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019850
    Description:

    Auxiliary information is often used to improve the precision of survey estimators of finite population means and totals through ratio or linear regression estimation techniques. Resulting estimators have good theoretical and practical properties, including invariance, calibration and design consistency. However, it is not always clear that ratio or linear models are good approximations to the true relationship between the auxiliary variables and the variable of interest in the survey, resulting in efficiency loss when the model is not appropriate. In this article, we explain how regression estimation can be extended to incorporate semiparametric regression models, in both simple and more complicated designs. While maintaining the good theoretical and practical properties of the linear models, semiparametric models are better able to capture complicated relationships between variables. This often results in substantial gains in efficiency. The applicability of the approach for complex designs using multiple types of auxiliary variables will be illustrated by estimating several acidification-related characteristics for a survey of lakes in the Northeastern US.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019851
    Description:

    To model economic depreciation, a database is used that contains information on assets discarded by companies. The acquisition and resale prices are known along with the length of use of these assets. However, the assets for which prices are known are only those that were involved in a transaction. While an asset depreciates on a continuous basis during its service life, the value of the asset is only known when there has been a transaction. This article proposes an ex post weighting to offset the effect of source of error in building econometric models.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019852
    Description:

    A common class of survey designs involves selecting all people within selected households. Generalized regression estimators can be calculated at either the person or household level. Implementing the estimator at the household level has the convenience of equal estimation weights for people within households. In this article the two approaches are compared theoretically and empirically for the case of simple random sampling of households and selection of all persons in each selected household. We find that the household level approach is theoretically more efficient in large samples and any empirical inefficiency in small samples is limited.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019853
    Description:

    Two-phase sampling is a useful design when the auxiliary variables are unavailable in advance. Variance estimation under this design, however, is complicated particularly when sampling fractions are high. This article addresses a simple bootstrap method for two-phase simple random sampling without replacement at each phase with high sampling fractions. It works for the estimation of distribution functions and quantiles since no rescaling is performed. The method can be extended to stratified two-phase sampling by independently repeating the proposed procedure in different strata. Variance estimation of some conventional estimators, such as the ratio and regression estimators, is studied for illustration. A simulation study is conducted to compare the proposed method with existing variance estimators for estimating distribution functions and quantiles.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019854
    Description:

    We derive an estimator of the mean squared error (MSE) of the empirical Bayes and composite estimator of the local-area mean in the standard small-area setting. The MSE estimator is a composition of the established estimator based on the conditional expectation of the random deviation associated with the area and a naïve estimator of the design-based MSE. Its performance is assessed by simulations. Variants of this MSE estimator are explored and some extensions outlined.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20070019855
    Description:

    In surveys under cluster sampling, nonresponse on a variable is often dependent on a cluster level random effect and, hence, is nonignorable. Estimators of the population mean obtained by mean imputation or reweighting under the ignorable nonresponse assumption are then biased. We propose an unbiased estimator of the population mean by imputing or reweighting within each sampled cluster or a group of sampled clusters sharing some common feature. Some simulation results are presented to study the performance of the proposed estimator.

    Release date: 2007-06-28

  • Articles and reports: 11-522-X20050019449
    Description:

    Literature about Multiple Frame estimation theory mainly concentrates over the Dual Frame case and it is only rarely concerned with the important practical issue of the variance estimation. By using a multiplicity approach a fixed weights Single Frame estimator for Multiple Frame Survey is proposed.

    Release date: 2007-03-02
Reference (0)

Reference (0) (0 results)

No content available at this time.

Date modified: