Editing and imputation

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Survey or statistical program

1 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (12)

All (12) (0 to 10 of 12 results)

  • Articles and reports: 12-001-X202100100009
    Description:

    Predictive mean matching is a commonly used imputation procedure for addressing the problem of item nonresponse in surveys. The customary approach relies upon the specification of a single outcome regression model. In this note, we propose a novel predictive mean matching procedure that allows the user to specify multiple outcome regression models. The resulting estimator is multiply robust in the sense that it remains consistent if one of the specified outcome regression models is correctly specified. The results from a simulation study suggest that the proposed method performs well in terms of bias and efficiency.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X201700114823
    Description:

    The derivation of estimators in a multi-phase calibration process requires a sequential computation of estimators and calibrated weights of previous phases in order to obtain those of later ones. Already after two phases of calibration the estimators and their variances involve calibration factors from both phases and the formulae become cumbersome and uninformative. As a consequence the literature so far deals mainly with two phases while three phases or more are rarely being considered. The analysis in some cases is ad-hoc for a specific design and no comprehensive methodology for constructing calibrated estimators, and more challengingly, estimating their variances in three or more phases was formed. We provide a closed form formula for the variance of multi-phase calibrated estimators that holds for any number of phases. By specifying a new presentation of multi-phase calibrated weights it is possible to construct calibrated estimators that have the form of multi-variate regression estimators which enables a computation of a consistent estimator for their variance. This new variance estimator is not only general for any number of phases but also has some favorable characteristics. A comparison to other estimators in the special case of two-phase calibration and another independent study for three phases are presented.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X20050018088
    Description:

    When administrative records are geographically linked to census block groups, local-area characteristics from the census can be used as contextual variables, which may be useful supplements to variables that are not directly observable from the administrative records. Often databases contain records that have insufficient address information to permit geographical links with census block groups; the contextual variables for these records are therefore unobserved. We propose a new method that uses information from "matched cases" and multivariate regression models to create multiple imputations for the unobserved variables. Our method outperformed alternative methods in simulation evaluations using census data, and was applied to the dataset for a study on treatment patterns for colorectal cancer patients.

    Release date: 2005-07-21

  • Articles and reports: 11-522-X20030017724
    Description:

    This document presents results for two edit and imputation applications, the UK Annual Business Inquiry and the UK Census 1% household data file (the SARS), and for a missing data application based on the Danish Labour Force Survey.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20020016715
    Description:

    This paper will describe the multiple imputation of income in the National Health Interview Survey and discuss the methodological issues involved. In addition, the paper will present empirical summaries of the imputations as well as results of a Monte Carlo evaluation of inferences based on multiply imputed income items.

    Analysts of health data are often interested in studying relationships between income and health. The National Health Interview Survey, conducted by the National Center for Health Statistics of the U.S. Centers for Disease Control and Prevention, provides a rich source of data for studying such relationships. However, the nonresponse rates on two key income items, an individual's earned income and a family's total income, are over 20%. Moreover, these nonresponse rates appear to be increasing over time. A project is currently underway to multiply impute individual earnings and family income along with some other covariates for the National Health Interview Survey in 1997 and subsequent years.

    There are many challenges in developing appropriate multiple imputations for such large-scale surveys. First, there are many variables of different types, with different skip patterns and logical relationships. Second, it is not known what types of associations will be investigated by the analysts of multiply imputed data. Finally, some variables, such as family income, are collected at the family level and others, such as earned income, are collected at the individual level. To make the imputations for both the family- and individual-level variables conditional on as many predictors as possible, and to simplify modelling, we are using a modified version of the sequential regression imputation method described in Raghunathan et al. ( Survey Methodology, 2001).

    Besides issues related to the hierarchical nature of the imputations just described, there are other methodological issues of interest such as the use of transformations of the income variables, the imposition of restrictions on the values of variables, the general validity of sequential regression imputation and, even more generally, the validity of multiple-imputation inferences for surveys with complex sample designs.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20010016303
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    In large-scale surveys, it is almost guaranteed that some level of non-response will occur. Generally, statistical agencies use imputation as a way to treat non-response items. A common preliminary step to imputation is the formation of imputation cells. In this article, the formation of these cells is studied using two methods. The first method is similar to that of Eltinge and Yansaneh (1997) in the case of weighting cells and the second is the method currently used in the Canadian Labour Force Survey. Using Labour Force data, simulation studies are performed to test the impact of the response rate, the response mechanism, and constraints on the quality of the point estimator in both methods.

    Release date: 2002-09-12

  • Surveys and statistical programs – Documentation: 75F0002M1998012
    Description:

    This paper looks at the work of the task force responsible for reviewing Statistics Canada's household and family income statistics programs, and at one of associated program changes, namely, the integration of two major sources of annual income data in Canada, the Survey of Consumer Finances (SCF) and the Survey of Labour and Income Dynamics (SLID).

    Release date: 1998-12-30

  • Articles and reports: 12-001-X199400114433
    Description:

    Imputation is a common technique employed by survey-taking organizations in order to address the problem of item nonresponse. While in most of the cases the resulting completed data sets provide good estimates of means and totals, the corresponding variances are often grossly underestimated. A number of methods to remedy this problem exists, but most of them depend on the sampling design and the imputation method. Recently, Rao (1992), and Rao and Shao (1992) have proposed a unified jackknife approach to variance estimation of imputed data sets. The present paper explores this technique empirically, using a real population of businesses, under a simple random sampling design and a uniform nonresponse mechanism. Extensions to stratified multistage sample designs are considered, and the performance of the proposed variance estimator under non-uniform response mechanisms is briefly investigated.

    Release date: 1994-06-15

  • Articles and reports: 12-001-X198600214449
    Description:

    Nearly all surveys and censuses are subject to two types of nonresponse: unit (total) and item (partial). Several methods of compensating for nonresponse have been developed in an attempt to reduce the bias associated with nonresponse. This paper summarizes the nonresponse adjustment procedures used at the U.S. Census Bureau, focusing on unit nonresponse. Some discussion of current and future research in this area is also included.

    Release date: 1986-12-15

  • Articles and reports: 12-001-X198600114441
    Description:

    The analysis of survey data becomes difficult in the presence of incomplete responses. By the use of the maximum likelihood method, estimators for the parameters of interest and test statistics can be generated. In this paper the maximum likelihood estimators are given for the case where the data is considered missing at random. A method for imputing the missing values is considered along with the problem of estimating the change points in the mean. Possible extensions of the results to structured covariances and to non-randomly incomplete data are also proposed.

    Release date: 1986-06-16
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (11)

Analysis (11) (0 to 10 of 11 results)

  • Articles and reports: 12-001-X202100100009
    Description:

    Predictive mean matching is a commonly used imputation procedure for addressing the problem of item nonresponse in surveys. The customary approach relies upon the specification of a single outcome regression model. In this note, we propose a novel predictive mean matching procedure that allows the user to specify multiple outcome regression models. The resulting estimator is multiply robust in the sense that it remains consistent if one of the specified outcome regression models is correctly specified. The results from a simulation study suggest that the proposed method performs well in terms of bias and efficiency.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X201700114823
    Description:

    The derivation of estimators in a multi-phase calibration process requires a sequential computation of estimators and calibrated weights of previous phases in order to obtain those of later ones. Already after two phases of calibration the estimators and their variances involve calibration factors from both phases and the formulae become cumbersome and uninformative. As a consequence the literature so far deals mainly with two phases while three phases or more are rarely being considered. The analysis in some cases is ad-hoc for a specific design and no comprehensive methodology for constructing calibrated estimators, and more challengingly, estimating their variances in three or more phases was formed. We provide a closed form formula for the variance of multi-phase calibrated estimators that holds for any number of phases. By specifying a new presentation of multi-phase calibrated weights it is possible to construct calibrated estimators that have the form of multi-variate regression estimators which enables a computation of a consistent estimator for their variance. This new variance estimator is not only general for any number of phases but also has some favorable characteristics. A comparison to other estimators in the special case of two-phase calibration and another independent study for three phases are presented.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X20050018088
    Description:

    When administrative records are geographically linked to census block groups, local-area characteristics from the census can be used as contextual variables, which may be useful supplements to variables that are not directly observable from the administrative records. Often databases contain records that have insufficient address information to permit geographical links with census block groups; the contextual variables for these records are therefore unobserved. We propose a new method that uses information from "matched cases" and multivariate regression models to create multiple imputations for the unobserved variables. Our method outperformed alternative methods in simulation evaluations using census data, and was applied to the dataset for a study on treatment patterns for colorectal cancer patients.

    Release date: 2005-07-21

  • Articles and reports: 11-522-X20030017724
    Description:

    This document presents results for two edit and imputation applications, the UK Annual Business Inquiry and the UK Census 1% household data file (the SARS), and for a missing data application based on the Danish Labour Force Survey.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20020016715
    Description:

    This paper will describe the multiple imputation of income in the National Health Interview Survey and discuss the methodological issues involved. In addition, the paper will present empirical summaries of the imputations as well as results of a Monte Carlo evaluation of inferences based on multiply imputed income items.

    Analysts of health data are often interested in studying relationships between income and health. The National Health Interview Survey, conducted by the National Center for Health Statistics of the U.S. Centers for Disease Control and Prevention, provides a rich source of data for studying such relationships. However, the nonresponse rates on two key income items, an individual's earned income and a family's total income, are over 20%. Moreover, these nonresponse rates appear to be increasing over time. A project is currently underway to multiply impute individual earnings and family income along with some other covariates for the National Health Interview Survey in 1997 and subsequent years.

    There are many challenges in developing appropriate multiple imputations for such large-scale surveys. First, there are many variables of different types, with different skip patterns and logical relationships. Second, it is not known what types of associations will be investigated by the analysts of multiply imputed data. Finally, some variables, such as family income, are collected at the family level and others, such as earned income, are collected at the individual level. To make the imputations for both the family- and individual-level variables conditional on as many predictors as possible, and to simplify modelling, we are using a modified version of the sequential regression imputation method described in Raghunathan et al. ( Survey Methodology, 2001).

    Besides issues related to the hierarchical nature of the imputations just described, there are other methodological issues of interest such as the use of transformations of the income variables, the imposition of restrictions on the values of variables, the general validity of sequential regression imputation and, even more generally, the validity of multiple-imputation inferences for surveys with complex sample designs.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20010016303
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    In large-scale surveys, it is almost guaranteed that some level of non-response will occur. Generally, statistical agencies use imputation as a way to treat non-response items. A common preliminary step to imputation is the formation of imputation cells. In this article, the formation of these cells is studied using two methods. The first method is similar to that of Eltinge and Yansaneh (1997) in the case of weighting cells and the second is the method currently used in the Canadian Labour Force Survey. Using Labour Force data, simulation studies are performed to test the impact of the response rate, the response mechanism, and constraints on the quality of the point estimator in both methods.

    Release date: 2002-09-12

  • Articles and reports: 12-001-X199400114433
    Description:

    Imputation is a common technique employed by survey-taking organizations in order to address the problem of item nonresponse. While in most of the cases the resulting completed data sets provide good estimates of means and totals, the corresponding variances are often grossly underestimated. A number of methods to remedy this problem exists, but most of them depend on the sampling design and the imputation method. Recently, Rao (1992), and Rao and Shao (1992) have proposed a unified jackknife approach to variance estimation of imputed data sets. The present paper explores this technique empirically, using a real population of businesses, under a simple random sampling design and a uniform nonresponse mechanism. Extensions to stratified multistage sample designs are considered, and the performance of the proposed variance estimator under non-uniform response mechanisms is briefly investigated.

    Release date: 1994-06-15

  • Articles and reports: 12-001-X198600214449
    Description:

    Nearly all surveys and censuses are subject to two types of nonresponse: unit (total) and item (partial). Several methods of compensating for nonresponse have been developed in an attempt to reduce the bias associated with nonresponse. This paper summarizes the nonresponse adjustment procedures used at the U.S. Census Bureau, focusing on unit nonresponse. Some discussion of current and future research in this area is also included.

    Release date: 1986-12-15

  • Articles and reports: 12-001-X198600114441
    Description:

    The analysis of survey data becomes difficult in the presence of incomplete responses. By the use of the maximum likelihood method, estimators for the parameters of interest and test statistics can be generated. In this paper the maximum likelihood estimators are given for the case where the data is considered missing at random. A method for imputing the missing values is considered along with the problem of estimating the change points in the mean. Possible extensions of the results to structured covariances and to non-randomly incomplete data are also proposed.

    Release date: 1986-06-16

  • Articles and reports: 12-001-X198600114444
    Description:

    A new processing system using the nearest neighbour (N-N) imputation method is being implemented for the National Farm Survey (NFS). An empirical study was conducted to determine if the NFS estimates would be affected by using imputation groups based on type of farm. For the specific imputation rule examined, the study showed evidence that the effect might be small.

    Release date: 1986-06-16
Reference (1)

Reference (1) ((1 result))

  • Surveys and statistical programs – Documentation: 75F0002M1998012
    Description:

    This paper looks at the work of the task force responsible for reviewing Statistics Canada's household and family income statistics programs, and at one of associated program changes, namely, the integration of two major sources of annual income data in Canada, the Survey of Consumer Finances (SCF) and the Survey of Labour and Income Dynamics (SLID).

    Release date: 1998-12-30
Date modified: