Editing and imputation

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Type

1 facets displayed. 0 facets selected.

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (14)

All (14) (0 to 10 of 14 results)

  • Articles and reports: 12-001-X202100100004
    Description:

    Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining data from a probability survey and big found data. We focus on the case when the study variable is observed in the big data only, but the other auxiliary variables are commonly observed in both data. Unlike the usual imputation for missing data analysis, we create imputed values for all units in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100009
    Description:

    Predictive mean matching is a commonly used imputation procedure for addressing the problem of item nonresponse in surveys. The customary approach relies upon the specification of a single outcome regression model. In this note, we propose a novel predictive mean matching procedure that allows the user to specify multiple outcome regression models. The resulting estimator is multiply robust in the sense that it remains consistent if one of the specified outcome regression models is correctly specified. The results from a simulation study suggest that the proposed method performs well in terms of bias and efficiency.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X201900100009
    Description:

    The demand for small area estimates by users of Statistics Canada’s data has been steadily increasing over recent years. In this paper, we provide a summary of procedures that have been incorporated into a SAS based production system for producing official small area estimates at Statistics Canada. This system includes: procedures based on unit or area level models; the incorporation of the sampling design; the ability to smooth the design variance for each small area if an area level model is used; the ability to ensure that the small area estimates add up to reliable higher level estimates; and the development of diagnostic tools to test the adequacy of the model. The production system has been used to produce small area estimates on an experimental basis for several surveys at Statistics Canada that include: the estimation of health characteristics, the estimation of under-coverage in the census, the estimation of manufacturing sales and the estimation of unemployment rates and employment counts for the Labour Force Survey. Some of the diagnostics implemented in the system are illustrated using Labour Force Survey data along with administrative auxiliary data.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X200700210493
    Description:

    In this paper, we study the problem of variance estimation for a ratio of two totals when marginal random hot deck imputation has been used to fill in missing data. We consider two approaches to inference. In the first approach, the validity of an imputation model is required. In the second approach, the validity of an imputation model is not required but response probabilities need to be estimated, in which case the validity of a nonresponse model is required. We derive variance estimators under two distinct frameworks: the customary two-phase framework and the reverse framework.

    Release date: 2008-01-03

  • Articles and reports: 11-522-X20050019458
    Description:

    The proposed paper presents an alternative methodology that gives the data the possibility of defining homogenous groups determined by a bottom up classification of the values of observed details. The problem is then to assign a non respondent business to one of these groups. Several assignment procedures, based on explanatory variables available in the tax returns, are compared, using gross or distributed data: parametric and non parametric classification analyses, log linear models, etc.

    Release date: 2007-03-02

  • Articles and reports: 75F0002M2005010
    Description:

    For some time, Canada Mortgage and Housing Corporation (CMHC) has used data on housing characteristics and housing-related expenditures from the Census of Population. Although the Census data source serves CMHC's purposes to a large extent, the federal government agency turned to the annual household surveys of Statistics Canada to provide information on a more frequent basis. This would allow them to have a better picture of annual trends, and perhaps have a greater choice of other characteristics with which to cross housing data on Canadian households. In 2001, CMHC began to sponsor additional content in both the Survey of Labour and Income Dynamics (SLID) and the Survey of Household Spending (SHS), starting with reference year 2002.

    Release date: 2005-07-22

  • Articles and reports: 11-522-X20030017722
    Description:

    This paper shows how to adapt design-based and model-based frameworks to the case of two-stage sampling.

    Release date: 2005-01-26

  • Articles and reports: 12-001-X20030016610
    Description:

    In the presence of item nonreponse, unweighted imputation methods are often used in practice but they generally lead to biased estimators under uniform response within imputation classes. Following Skinner and Rao (2002), we propose a bias-adjusted estimator of a population mean under unweighted ratio imputation and random hot-deck imputation and derive linearization variance estimators. A small simulation study is conducted to study the performance of the methods in terms of bias and mean square error. Relative bias and relative stability of the variance estimators are also studied.

    Release date: 2003-07-31

  • Articles and reports: 11-522-X20010016303
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    In large-scale surveys, it is almost guaranteed that some level of non-response will occur. Generally, statistical agencies use imputation as a way to treat non-response items. A common preliminary step to imputation is the formation of imputation cells. In this article, the formation of these cells is studied using two methods. The first method is similar to that of Eltinge and Yansaneh (1997) in the case of weighting cells and the second is the method currently used in the Canadian Labour Force Survey. Using Labour Force data, simulation studies are performed to test the impact of the response rate, the response mechanism, and constraints on the quality of the point estimator in both methods.

    Release date: 2002-09-12

  • Articles and reports: 12-001-X198600214450
    Description:

    From an annual sample of U.S. corporate tax returns, the U.S. Internal Revenue Service provides estimates of population and subpopulation totals for several hundred financial items. The basic sample design is highly stratified and fairly complex. Starting with the 1981 and 1982 samples, the design was altered to include a double sampling procedure. This was motivated by the need for better allocation of resources, in an environment of shrinking budgets. Items not observed in the subsample are predicted, using a modified hot deck imputation procedure. The present paper describes the design, estimation, and evaluation of the effects of the new procedure.

    Release date: 1986-12-15
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (14)

Analysis (14) (0 to 10 of 14 results)

  • Articles and reports: 12-001-X202100100004
    Description:

    Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining data from a probability survey and big found data. We focus on the case when the study variable is observed in the big data only, but the other auxiliary variables are commonly observed in both data. Unlike the usual imputation for missing data analysis, we create imputed values for all units in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100009
    Description:

    Predictive mean matching is a commonly used imputation procedure for addressing the problem of item nonresponse in surveys. The customary approach relies upon the specification of a single outcome regression model. In this note, we propose a novel predictive mean matching procedure that allows the user to specify multiple outcome regression models. The resulting estimator is multiply robust in the sense that it remains consistent if one of the specified outcome regression models is correctly specified. The results from a simulation study suggest that the proposed method performs well in terms of bias and efficiency.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X201900100009
    Description:

    The demand for small area estimates by users of Statistics Canada’s data has been steadily increasing over recent years. In this paper, we provide a summary of procedures that have been incorporated into a SAS based production system for producing official small area estimates at Statistics Canada. This system includes: procedures based on unit or area level models; the incorporation of the sampling design; the ability to smooth the design variance for each small area if an area level model is used; the ability to ensure that the small area estimates add up to reliable higher level estimates; and the development of diagnostic tools to test the adequacy of the model. The production system has been used to produce small area estimates on an experimental basis for several surveys at Statistics Canada that include: the estimation of health characteristics, the estimation of under-coverage in the census, the estimation of manufacturing sales and the estimation of unemployment rates and employment counts for the Labour Force Survey. Some of the diagnostics implemented in the system are illustrated using Labour Force Survey data along with administrative auxiliary data.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X200700210493
    Description:

    In this paper, we study the problem of variance estimation for a ratio of two totals when marginal random hot deck imputation has been used to fill in missing data. We consider two approaches to inference. In the first approach, the validity of an imputation model is required. In the second approach, the validity of an imputation model is not required but response probabilities need to be estimated, in which case the validity of a nonresponse model is required. We derive variance estimators under two distinct frameworks: the customary two-phase framework and the reverse framework.

    Release date: 2008-01-03

  • Articles and reports: 11-522-X20050019458
    Description:

    The proposed paper presents an alternative methodology that gives the data the possibility of defining homogenous groups determined by a bottom up classification of the values of observed details. The problem is then to assign a non respondent business to one of these groups. Several assignment procedures, based on explanatory variables available in the tax returns, are compared, using gross or distributed data: parametric and non parametric classification analyses, log linear models, etc.

    Release date: 2007-03-02

  • Articles and reports: 75F0002M2005010
    Description:

    For some time, Canada Mortgage and Housing Corporation (CMHC) has used data on housing characteristics and housing-related expenditures from the Census of Population. Although the Census data source serves CMHC's purposes to a large extent, the federal government agency turned to the annual household surveys of Statistics Canada to provide information on a more frequent basis. This would allow them to have a better picture of annual trends, and perhaps have a greater choice of other characteristics with which to cross housing data on Canadian households. In 2001, CMHC began to sponsor additional content in both the Survey of Labour and Income Dynamics (SLID) and the Survey of Household Spending (SHS), starting with reference year 2002.

    Release date: 2005-07-22

  • Articles and reports: 11-522-X20030017722
    Description:

    This paper shows how to adapt design-based and model-based frameworks to the case of two-stage sampling.

    Release date: 2005-01-26

  • Articles and reports: 12-001-X20030016610
    Description:

    In the presence of item nonreponse, unweighted imputation methods are often used in practice but they generally lead to biased estimators under uniform response within imputation classes. Following Skinner and Rao (2002), we propose a bias-adjusted estimator of a population mean under unweighted ratio imputation and random hot-deck imputation and derive linearization variance estimators. A small simulation study is conducted to study the performance of the methods in terms of bias and mean square error. Relative bias and relative stability of the variance estimators are also studied.

    Release date: 2003-07-31

  • Articles and reports: 11-522-X20010016303
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    In large-scale surveys, it is almost guaranteed that some level of non-response will occur. Generally, statistical agencies use imputation as a way to treat non-response items. A common preliminary step to imputation is the formation of imputation cells. In this article, the formation of these cells is studied using two methods. The first method is similar to that of Eltinge and Yansaneh (1997) in the case of weighting cells and the second is the method currently used in the Canadian Labour Force Survey. Using Labour Force data, simulation studies are performed to test the impact of the response rate, the response mechanism, and constraints on the quality of the point estimator in both methods.

    Release date: 2002-09-12

  • Articles and reports: 12-001-X198600214450
    Description:

    From an annual sample of U.S. corporate tax returns, the U.S. Internal Revenue Service provides estimates of population and subpopulation totals for several hundred financial items. The basic sample design is highly stratified and fairly complex. Starting with the 1981 and 1982 samples, the design was altered to include a double sampling procedure. This was motivated by the need for better allocation of resources, in an environment of shrinking budgets. Items not observed in the subsample are predicted, using a modified hot deck imputation procedure. The present paper describes the design, estimation, and evaluation of the effects of the new procedure.

    Release date: 1986-12-15
Reference (0)

Reference (0) (0 results)

No content available at this time.

Date modified: