Statistics by subject – Statistical methods

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (7)

All (7) (7 of 7 results)

  • Articles and reports: 12-001-X201300211871
    Description:

    Regression models are routinely used in the analysis of survey data, where one common issue of interest is to identify influential factors that are associated with certain behavioral, social, or economic indices within a target population. When data are collected through complex surveys, the properties of classical variable selection approaches developed in i.i.d. non-survey settings need to be re-examined. In this paper, we derive a pseudo-likelihood-based BIC criterion for variable selection in the analysis of survey data and suggest a sample-based penalized likelihood approach for its implementation. The sampling weights are appropriately assigned to correct the biased selection result caused by the distortion between the sample and the target population. Under a joint randomization framework, we establish the consistency of the proposed selection procedure. The finite-sample performance of the approach is assessed through analysis and computer simulations based on data from the hypertension component of the 2009 Survey on Living with Chronic Diseases in Canada.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201200211753
    Description:

    Nonresponse in longitudinal studies often occurs in a nonmonotone pattern. In the Survey of Industrial Research and Development (SIRD), it is reasonable to assume that the nonresponse mechanism is past-value-dependent in the sense that the response propensity of a study variable at time point t depends on response status and observed or missing values of the same variable at time points prior to t. Since this nonresponse is nonignorable, the parametric likelihood approach is sensitive to the specification of parametric models on both the joint distribution of variables at different time points and the nonresponse mechanism. The nonmonotone nonresponse also limits the application of inverse propensity weighting methods. By discarding all observed data from a subject after its first missing value, one can create a dataset with a monotone ignorable nonresponse and then apply established methods for ignorable nonresponse. However, discarding observed data is not desirable and it may result in inefficient estimators when many observed data are discarded. We propose to impute nonrespondents through regression under imputation models carefully created under the past-value-dependent nonresponse mechanism. This method does not require any parametric model on the joint distribution of the variables across time points or the nonresponse mechanism. Performance of the estimated means based on the proposed imputation method is investigated through some simulation studies and empirical analysis of the SIRD data.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X200900211038
    Description:

    We examine overcoming the overestimation in using generalized weight share method (GWSM) caused by link nonresponse in indirect sampling. A few adjustment methods incorporating link nonresponse in using GWSM have been constructed for situations both with and without the availability of auxiliary variables. A simulation study on a longitudinal survey is presented using some of the adjustment methods we recommend. The simulation results show that these adjusted GWSMs perform well in reducing both estimation bias and variance. The advancement in bias reduction is significant.

    Release date: 2009-12-23

  • Technical products: 11-522-X200800010959
    Description:

    The Unified Enterprise Survey (UES) at Statistics Canada is an annual business survey that unifies more than 60 surveys from different industries. Two types of collection follow-up score functions are currently used in the UES data collection. The objective of using a score function is to maximize the economically weighted response rates of the survey in terms of the primary variables of interest, under the constraint of a limited follow-up budget. Since the two types of score functions are based on different methodologies, they could have different impacts on the final estimates.

    This study generally compares the two types of score functions based on the collection data obtained from the two recent years. For comparison purposes, this study applies each score function method to the same data respectively and computes various estimates of the published financial and commodity variables, their deviation from the true pseudo value and their mean square deviation, based on each method. These estimates of deviation and mean square deviation based on each method are then used to measure the impact of each score function on the final estimates of the financial and commodity variables.

    Release date: 2009-12-03

  • Articles and reports: 12-001-X200800210756
    Description:

    In longitudinal surveys nonresponse often occurs in a pattern that is not monotone. We consider estimation of time-dependent means under the assumption that the nonresponse mechanism is last-value-dependent. Since the last value itself may be missing when nonresponse is nonmonotone, the nonresponse mechanism under consideration is nonignorable. We propose an imputation method by first deriving some regression imputation models according to the nonresponse mechanism and then applying nonparametric regression imputation. We assume that the longitudinal data follow a Markov chain with finite second-order moments. No other assumption is imposed on the joint distribution of longitudinal data and their nonresponse indicators. A bootstrap method is applied for variance estimation. Some simulation results and an example concerning the Current Employment Survey are presented.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X20040027750
    Description:

    Intelligent Character Recognition (ICR) has been widely used as a new technology in data capture processing. It was used for the first time at Statistics Canada to process the 2001 Canadian Census of Agriculture. This involved many new challenges, both operational and methodological. This paper presents an overview of the methodological tools used to put in place an efficient ICR system. Since the potential for high levels of error existed at various stages of the operation, Quality Assurance (QA) and Quality Control (QC) methods and procedures were built into this operation to ensure a high degree of accuracy in the captured data. This paper describes these QA / QC methods along with their results and shows how quality improvements were achieved in the ICR Data Capture operation. This paper also identifies the positive impacts of these procedures on this operation.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X199200114492
    Description:

    The scenario considered here is that of a sample survey having the following two major objectives: (1) identification for future follow up studies of n^* subjects in each of H subdomains, and (2) estimation as of this time of conduct of the survey of the level of some characteristic in each of these subdomains. An additional constraint imposed here is that the sample design is restricted to single stage cluster sampling. A variation of single stage cluster sampling called telescopic single stage cluster sampling (TSSCS) had been proposed in an earlier paper (Levy et al. 1989) as a cost effective method of identifying n^* individuals in each sub domain and, in this article, we investigate the statistical properties of TSSCS in crossectional estimation of the level of a population characteristic. In particular, TSSCS is compared to ordinary single stage cluster sampling (OSSCS) with respect to the reliability of estimates at fixed cost. Motivation for this investigation comes from problems faced during the statistical design of the Shanghai Survey of Alzheimer’s Disease and Dementia (SSADD), an epidemiological study of the prevalence and incidence of Alzheimer’s disease and dementia.

    Release date: 1992-06-15

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (6)

Analysis (6) (6 of 6 results)

  • Articles and reports: 12-001-X201300211871
    Description:

    Regression models are routinely used in the analysis of survey data, where one common issue of interest is to identify influential factors that are associated with certain behavioral, social, or economic indices within a target population. When data are collected through complex surveys, the properties of classical variable selection approaches developed in i.i.d. non-survey settings need to be re-examined. In this paper, we derive a pseudo-likelihood-based BIC criterion for variable selection in the analysis of survey data and suggest a sample-based penalized likelihood approach for its implementation. The sampling weights are appropriately assigned to correct the biased selection result caused by the distortion between the sample and the target population. Under a joint randomization framework, we establish the consistency of the proposed selection procedure. The finite-sample performance of the approach is assessed through analysis and computer simulations based on data from the hypertension component of the 2009 Survey on Living with Chronic Diseases in Canada.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201200211753
    Description:

    Nonresponse in longitudinal studies often occurs in a nonmonotone pattern. In the Survey of Industrial Research and Development (SIRD), it is reasonable to assume that the nonresponse mechanism is past-value-dependent in the sense that the response propensity of a study variable at time point t depends on response status and observed or missing values of the same variable at time points prior to t. Since this nonresponse is nonignorable, the parametric likelihood approach is sensitive to the specification of parametric models on both the joint distribution of variables at different time points and the nonresponse mechanism. The nonmonotone nonresponse also limits the application of inverse propensity weighting methods. By discarding all observed data from a subject after its first missing value, one can create a dataset with a monotone ignorable nonresponse and then apply established methods for ignorable nonresponse. However, discarding observed data is not desirable and it may result in inefficient estimators when many observed data are discarded. We propose to impute nonrespondents through regression under imputation models carefully created under the past-value-dependent nonresponse mechanism. This method does not require any parametric model on the joint distribution of the variables across time points or the nonresponse mechanism. Performance of the estimated means based on the proposed imputation method is investigated through some simulation studies and empirical analysis of the SIRD data.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X200900211038
    Description:

    We examine overcoming the overestimation in using generalized weight share method (GWSM) caused by link nonresponse in indirect sampling. A few adjustment methods incorporating link nonresponse in using GWSM have been constructed for situations both with and without the availability of auxiliary variables. A simulation study on a longitudinal survey is presented using some of the adjustment methods we recommend. The simulation results show that these adjusted GWSMs perform well in reducing both estimation bias and variance. The advancement in bias reduction is significant.

    Release date: 2009-12-23

  • Articles and reports: 12-001-X200800210756
    Description:

    In longitudinal surveys nonresponse often occurs in a pattern that is not monotone. We consider estimation of time-dependent means under the assumption that the nonresponse mechanism is last-value-dependent. Since the last value itself may be missing when nonresponse is nonmonotone, the nonresponse mechanism under consideration is nonignorable. We propose an imputation method by first deriving some regression imputation models according to the nonresponse mechanism and then applying nonparametric regression imputation. We assume that the longitudinal data follow a Markov chain with finite second-order moments. No other assumption is imposed on the joint distribution of longitudinal data and their nonresponse indicators. A bootstrap method is applied for variance estimation. Some simulation results and an example concerning the Current Employment Survey are presented.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X20040027750
    Description:

    Intelligent Character Recognition (ICR) has been widely used as a new technology in data capture processing. It was used for the first time at Statistics Canada to process the 2001 Canadian Census of Agriculture. This involved many new challenges, both operational and methodological. This paper presents an overview of the methodological tools used to put in place an efficient ICR system. Since the potential for high levels of error existed at various stages of the operation, Quality Assurance (QA) and Quality Control (QC) methods and procedures were built into this operation to ensure a high degree of accuracy in the captured data. This paper describes these QA / QC methods along with their results and shows how quality improvements were achieved in the ICR Data Capture operation. This paper also identifies the positive impacts of these procedures on this operation.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X199200114492
    Description:

    The scenario considered here is that of a sample survey having the following two major objectives: (1) identification for future follow up studies of n^* subjects in each of H subdomains, and (2) estimation as of this time of conduct of the survey of the level of some characteristic in each of these subdomains. An additional constraint imposed here is that the sample design is restricted to single stage cluster sampling. A variation of single stage cluster sampling called telescopic single stage cluster sampling (TSSCS) had been proposed in an earlier paper (Levy et al. 1989) as a cost effective method of identifying n^* individuals in each sub domain and, in this article, we investigate the statistical properties of TSSCS in crossectional estimation of the level of a population characteristic. In particular, TSSCS is compared to ordinary single stage cluster sampling (OSSCS) with respect to the reliability of estimates at fixed cost. Motivation for this investigation comes from problems faced during the statistical design of the Shanghai Survey of Alzheimer’s Disease and Dementia (SSADD), an epidemiological study of the prevalence and incidence of Alzheimer’s disease and dementia.

    Release date: 1992-06-15

Reference (1)

Reference (1) (1 result)

  • Technical products: 11-522-X200800010959
    Description:

    The Unified Enterprise Survey (UES) at Statistics Canada is an annual business survey that unifies more than 60 surveys from different industries. Two types of collection follow-up score functions are currently used in the UES data collection. The objective of using a score function is to maximize the economically weighted response rates of the survey in terms of the primary variables of interest, under the constraint of a limited follow-up budget. Since the two types of score functions are based on different methodologies, they could have different impacts on the final estimates.

    This study generally compares the two types of score functions based on the collection data obtained from the two recent years. For comparison purposes, this study applies each score function method to the same data respectively and computes various estimates of the published financial and commodity variables, their deviation from the true pseudo value and their mean square deviation, based on each method. These estimates of deviation and mean square deviation based on each method are then used to measure the impact of each score function on the final estimates of the financial and commodity variables.

    Release date: 2009-12-03

Date modified: