Statistics by subject – Statistical methods

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (17)

All (17) (17 of 17 results)

  • Articles and reports: 12-001-X199300214452
    Description:

    Surveys across time can serve many objectives. The first half of the paper reviews the abilities of alternative survey designs across time - repeated surveys, panel surveys, rotating panel surveys and split panel surveys - to meet these objectives. The second half concentrates on panel surveys. It discusses the decisions that need to be made in designing a panel survey, the problems of wave nonresponse, time-in-sample bias and the seam effect, and some methods for the longitudinal analysis of panel survey data.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214456
    Description:

    This study is based on the use of superpopulation models to anticipate, before data collection, the variance of a measure by ratio sampling. The method, based on models that are both simple and fairly realistic, produces expressions of varying complexity and then optimizes them, in some cases rigorously, in others approximately. The solution to the final problem discussed points up a rarely considered factor in sample design optimization: the cost related to collecting individual information.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214453
    Description:

    A generalized concept is presented for all of the commonly used methods of forest sampling. The concept views the forest as a two-dimensional picture which is cut up into pieces like a jigsaw puzzle, with the pieces defined by the individual selection probabilities of the trees in the forest. This concept results in a finite number of independently selected sample units, in contrast to every other generalized conceptualization of forest sampling presented to date.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214455
    Description:

    Post-stratification is a common technique for improving precision of estimators by using data items not available at the design stage of a survey. In large, complex samples, the vector of Horvitz-Thompson estimators of survey target variables and of post-stratum population sizes will, under appropriate conditions, be approximately multivariate normal. This large sample normality leads to a new post-stratified regression estimator, which is analogous to the linear regression estimator in simple random sampling. We derive the large sample design bias and mean squared errors of this new estimator, the standard post-stratified estimator, the Horvitz-Thompson estimator, and a ratio estimator. We use both real and artificial populations to study empirically the conditional and unconditional properties of the estimators in multistage sampling.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214457
    Description:

    The maximum likelihood estimation of a non-linear benchmarking model, proposed by Laniel and Fyfe (1989; 1990), is considered. This model takes into account the biases and sampling errors associated with the original series. Since the maximum likelihood estimators of the model parameters are not obtainable in closed forms, two iterative procedures to find the maximum likelihood estimates are discussed. The closed form expressions for the asymptotic variances and covariances of the benchmarked series, and of the fitted values are also provided. The methodology is illustrated using published Canadian retail trade data.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214459
    Description:

    Record linkage is the matching of records containing data on individuals, businesses or dwellings when a unique identifier is not available. Methods used in practice involve classification of record pairs as links and non-links using an automated procedure based on the theoretical framework introduced by Fellegi and Sunter (1969). The estimation of classification error rates is an important issue. Fellegi and Sunter provide a method for calculation of classification error rate estimates as a direct by-product of linkage. These model-based estimates are easier to produce than the estimates based on manual matching of samples that are typically used in practice. Properties of model-based classification error rate estimates obtained using three estimators of model parameters are compared.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214460
    Description:

    Methods for estimating response bias in surveys require “unbiased” remeasurements for at least a subsample of observations. The usual estimator of response bias is the difference between the mean of the original observations and the mean of the unbiased observations. In this article, we explore a number of alternative estimators of response bias derived from a model prediction approach. The assumed sampling design is a stratified two-phase design implementing simple random sampling in each phase. We assume that the characteristic, y, is observed for each unit selected in phase 1 while the true value of the characteristic, \mu, is obtained for each unit in the subsample selected at phase 2. We further assume that an auxiliary variable x is known for each unit in the phase 1 sample and that the population total of x is known. A number of models relating y, \mu and x are assumed which yield alternative estimators of E (y - \mu), the response bias. The estimators are evaluated using a bootstrap procedure for estimating variance, bias, and mean squared error. Our bootstrap procedure is an extension of the Bickel-Freedman single phase method to the case of a stratified two-phase design. As an illustration, the methodology is applied to data from the National Agricultural Statistics Service reinterview program. For these data, we show that the usual difference estimator is outperformed by the model-assisted estimator suggested by Särndal, Swensson and Wretman (1991), thus indicating that improvements over the traditional estimator are possible using the model prediction approach.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214454
    Description:

    This study covers such imperfect frames in which no population unit has been excluded from the frame but an unspecified number of population units may have been included in the list an unspecified number of times each with a separate identification. When the availability of auxiliary information on any unit in the imperfect frame is not assumed, it is established that for estimation of a population ratio or a mean, the mean square errors of estimators based on the imperfect frame are less than those based on the perfect frame for simple random sampling when the sampling fractions of perfect and imperfect frames are the same. For estimation of a population total, however, this is not always true. Also, there are situations in which estimators of a ratio, a mean or a total based on smaller sampling fraction from imperfect frame can have smaller mean square error than those based on a larger sampling fraction from the perfect frame.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300114477
    Description:

    A record-linkage process brings together records from two files into pairs of two records, one from each file, for the purpose of comparison. Each record represents an individual. The status of the pair is a “matched pair” status if the two records in the pair represent the same individual. The status is an “unmatched pair” status if the two records do not represent the same individual. The record-linkage process is governed by an underlying probabilistic process. A record-linkage rule infers the status of each pair of records based on the value of the comparison. The pair is declared a “link” if the inferred status is that of a matched pair, and it is declared a “non-link” if the inferred status is that of an unmatched pair. The discrimination power of a record-linkage rule is the capacity of the rule to designate a maximum number of matched pairs as links, while keeping the rate of unmatched pairs designated as links to a minimum. In general, to construct a discriminatory record-linkage rule, some assumptions must be made on the structure of the underlying probabilistic process. In most of the existing literature, it is assumed that the underlying probabilistic process is an instance of the conditional independence latent class model. However, in many situations, this assumption is false. In fact, many underlying probabilistic processes do not exhibit key properties associated with conditional independence latent class models. The paper introduces more general models. In particular, latent class models with dependencies are studied and it is shown how they can improve the discrimination power of particular record-linkage rules.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114472
    Description:

    Two stage random digit dialing procedures as developed by Mitofsky and elaborated by Waksberg are widely used in telephone sampling of the U.S. household population. Current alternative approaches have, relative to this procedure, coverage and cost deficiencies. These deficiencies are addressed through telephone sample designs which use listed number information to improve the cost-efficiency of random digit dialing. The telephone number frame is divided into a stratum in which listed number information is available at the 100-bank level and one for which no such information is available. The efficiencies of various sampling schemes for this stratified design are compared to simple random digit dialing and the Mitofsky-Waksberg technique. Gains in efficiency are demonstrated for nearly all such designs. Simplifying assumptions about the values of population parameters in each stratum are shown to have little overall impact on the estimated efficiency.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114471
    Description:

    Binomial-Poisson and Poisson-Poisson sampling are introduced for use in forest sampling. Several estimators of the population total are discussed for these designs. Simulation comparisons of the properties of the estimators were made for three small forestry populations. A modification of the standard estimator used for Poisson sampling and a new estimator, called a modified Srivastava estimator, appear to be most efficient. The latter is unfortunately badly biased for all 3 populations.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114476
    Description:

    This paper focuses on how to deal with record linkage errors when engaged in regression analysis. Recent work by Rubin and Belin (1991) and by Winkler and Thibaudeau (1991) provides the theory, computational algorithms, and software necessary for estimating matching probabilities. These advances allow us to update the work of Neter, Maynes, and Ramanathan (1965). Adjustment procedures are outlined and some successful simulations are described. Our results are preliminary and intended largely to stimulate further work.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114473
    Description:

    Double sampling is a common alternative to simple random sampling when there are expected to be gains from using stratified sampling, but the units cannot be assigned to strata prior to sampling. It is assumed throughout that the survey objective is estimation of the finite population mean. We compare simple random sampling and three allocation methods for double sampling: (a) proportional, (b) Rao’s (Rao 1973a, b) and (c) optimal. There is also an investigation of the effect on sample size selection of misspecification of an important design parameter.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114479
    Description:

    Matching records in different administrative data bases is a useful tool for conducting epidemiological studies to study relationships between environmental hazards and health status. With large data bases, sophisticated computerized record linkage algorithms can be used to evaluate the likelihood of a match between two records based on a comparison of one or more identifying variables for those records. Since matching errors are inevitable, consideration needs to be given to the effects of such errors on statistical inferences based on the linked files. This article provides an overview of record linkage methodology, and a discussion of the statistical issues associated with linkage errors.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114474
    Description:

    The need for standards introduced for the gathering and reporting of information on nonresponse across surveys within a statistical agency is discussed. Standards being adopted at Statistics Canada are then described. Measures to reduce nonresponse undertaken at different stages in the design of surveys at Statistics Canada that have a bearing on nonresponse are described. These points are illustrated by examining nonresponse experiences for two major surveys at Statistics Canada.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114478
    Description:

    Record linkage refers to the use of an algorithmic technique for identifying pairs of records in separate data files that correspond to the same individual. This paper discusses a framework for evaluating sources of variation in record linkage based on viewing the procedure as a “black box” that takes input data and produces output (a set of declared matched pairs) that has certain properties. We illustrate the idea with a factorial experiment using census/post-enumeration survey data to assess the influence of a variety of factors thought to affect the accuracy of the procedure. The evaluation of record linkage becomes a standard statistical problem using this experimental framework. The investigation provides answers to several research questions, and it is argued that taking an experimental approach similar to that offered here is essential if progress is to be made in understanding the factors that contribute to the error properties of record-linkage procedures.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114475
    Description:

    In the creation of micro-simulation databases which are frequently used by policy analysts and planners, several datafiles are combined by statistical matching techniques for enriching the host datafile. This process requires the conditional independence assumption (CIA) which could lead to serious bias in the resulting joint relationships among variables. Appropriate auxiliary information could be used to avoid the CIA. In this report, methods of statistical matching corresponding to three methods of imputation, namely, regression, hot deck, and log linear, with and without auxiliary information are considered. The log linear methods consist of adding categorical constraints to either the regression or hot deck methods. Based on an extensive simulation study with synthetic data, sensitivity analyses for departures from the CIA are performed and gains from using auxiliary information are discussed. Different scenarios for the underlying distribution and relationships, such as symmetric versus skewed data and proxy versus nonproxy auxiliary data, are created using synthetic data. Some recommendations on the use of statistical matching methods are also made. Specifically, it was confirmed that the CIA could be a serious limitation which could be overcome by the use of appropriate auxiliary information. Hot deck methods were found to be generally preferable to regression methods. Also, when auxiliary information is available, log linear categorical constraints can improve performance of hot deck methods. This study was motivated by concerns about the use of the CIA in the construction of the Social Policy Simulation Database at Statistics Canada.

    Release date: 1993-06-15

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (17)

Analysis (17) (17 of 17 results)

  • Articles and reports: 12-001-X199300214452
    Description:

    Surveys across time can serve many objectives. The first half of the paper reviews the abilities of alternative survey designs across time - repeated surveys, panel surveys, rotating panel surveys and split panel surveys - to meet these objectives. The second half concentrates on panel surveys. It discusses the decisions that need to be made in designing a panel survey, the problems of wave nonresponse, time-in-sample bias and the seam effect, and some methods for the longitudinal analysis of panel survey data.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214456
    Description:

    This study is based on the use of superpopulation models to anticipate, before data collection, the variance of a measure by ratio sampling. The method, based on models that are both simple and fairly realistic, produces expressions of varying complexity and then optimizes them, in some cases rigorously, in others approximately. The solution to the final problem discussed points up a rarely considered factor in sample design optimization: the cost related to collecting individual information.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214453
    Description:

    A generalized concept is presented for all of the commonly used methods of forest sampling. The concept views the forest as a two-dimensional picture which is cut up into pieces like a jigsaw puzzle, with the pieces defined by the individual selection probabilities of the trees in the forest. This concept results in a finite number of independently selected sample units, in contrast to every other generalized conceptualization of forest sampling presented to date.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214455
    Description:

    Post-stratification is a common technique for improving precision of estimators by using data items not available at the design stage of a survey. In large, complex samples, the vector of Horvitz-Thompson estimators of survey target variables and of post-stratum population sizes will, under appropriate conditions, be approximately multivariate normal. This large sample normality leads to a new post-stratified regression estimator, which is analogous to the linear regression estimator in simple random sampling. We derive the large sample design bias and mean squared errors of this new estimator, the standard post-stratified estimator, the Horvitz-Thompson estimator, and a ratio estimator. We use both real and artificial populations to study empirically the conditional and unconditional properties of the estimators in multistage sampling.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214457
    Description:

    The maximum likelihood estimation of a non-linear benchmarking model, proposed by Laniel and Fyfe (1989; 1990), is considered. This model takes into account the biases and sampling errors associated with the original series. Since the maximum likelihood estimators of the model parameters are not obtainable in closed forms, two iterative procedures to find the maximum likelihood estimates are discussed. The closed form expressions for the asymptotic variances and covariances of the benchmarked series, and of the fitted values are also provided. The methodology is illustrated using published Canadian retail trade data.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214459
    Description:

    Record linkage is the matching of records containing data on individuals, businesses or dwellings when a unique identifier is not available. Methods used in practice involve classification of record pairs as links and non-links using an automated procedure based on the theoretical framework introduced by Fellegi and Sunter (1969). The estimation of classification error rates is an important issue. Fellegi and Sunter provide a method for calculation of classification error rate estimates as a direct by-product of linkage. These model-based estimates are easier to produce than the estimates based on manual matching of samples that are typically used in practice. Properties of model-based classification error rate estimates obtained using three estimators of model parameters are compared.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214460
    Description:

    Methods for estimating response bias in surveys require “unbiased” remeasurements for at least a subsample of observations. The usual estimator of response bias is the difference between the mean of the original observations and the mean of the unbiased observations. In this article, we explore a number of alternative estimators of response bias derived from a model prediction approach. The assumed sampling design is a stratified two-phase design implementing simple random sampling in each phase. We assume that the characteristic, y, is observed for each unit selected in phase 1 while the true value of the characteristic, \mu, is obtained for each unit in the subsample selected at phase 2. We further assume that an auxiliary variable x is known for each unit in the phase 1 sample and that the population total of x is known. A number of models relating y, \mu and x are assumed which yield alternative estimators of E (y - \mu), the response bias. The estimators are evaluated using a bootstrap procedure for estimating variance, bias, and mean squared error. Our bootstrap procedure is an extension of the Bickel-Freedman single phase method to the case of a stratified two-phase design. As an illustration, the methodology is applied to data from the National Agricultural Statistics Service reinterview program. For these data, we show that the usual difference estimator is outperformed by the model-assisted estimator suggested by Särndal, Swensson and Wretman (1991), thus indicating that improvements over the traditional estimator are possible using the model prediction approach.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214454
    Description:

    This study covers such imperfect frames in which no population unit has been excluded from the frame but an unspecified number of population units may have been included in the list an unspecified number of times each with a separate identification. When the availability of auxiliary information on any unit in the imperfect frame is not assumed, it is established that for estimation of a population ratio or a mean, the mean square errors of estimators based on the imperfect frame are less than those based on the perfect frame for simple random sampling when the sampling fractions of perfect and imperfect frames are the same. For estimation of a population total, however, this is not always true. Also, there are situations in which estimators of a ratio, a mean or a total based on smaller sampling fraction from imperfect frame can have smaller mean square error than those based on a larger sampling fraction from the perfect frame.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300114477
    Description:

    A record-linkage process brings together records from two files into pairs of two records, one from each file, for the purpose of comparison. Each record represents an individual. The status of the pair is a “matched pair” status if the two records in the pair represent the same individual. The status is an “unmatched pair” status if the two records do not represent the same individual. The record-linkage process is governed by an underlying probabilistic process. A record-linkage rule infers the status of each pair of records based on the value of the comparison. The pair is declared a “link” if the inferred status is that of a matched pair, and it is declared a “non-link” if the inferred status is that of an unmatched pair. The discrimination power of a record-linkage rule is the capacity of the rule to designate a maximum number of matched pairs as links, while keeping the rate of unmatched pairs designated as links to a minimum. In general, to construct a discriminatory record-linkage rule, some assumptions must be made on the structure of the underlying probabilistic process. In most of the existing literature, it is assumed that the underlying probabilistic process is an instance of the conditional independence latent class model. However, in many situations, this assumption is false. In fact, many underlying probabilistic processes do not exhibit key properties associated with conditional independence latent class models. The paper introduces more general models. In particular, latent class models with dependencies are studied and it is shown how they can improve the discrimination power of particular record-linkage rules.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114472
    Description:

    Two stage random digit dialing procedures as developed by Mitofsky and elaborated by Waksberg are widely used in telephone sampling of the U.S. household population. Current alternative approaches have, relative to this procedure, coverage and cost deficiencies. These deficiencies are addressed through telephone sample designs which use listed number information to improve the cost-efficiency of random digit dialing. The telephone number frame is divided into a stratum in which listed number information is available at the 100-bank level and one for which no such information is available. The efficiencies of various sampling schemes for this stratified design are compared to simple random digit dialing and the Mitofsky-Waksberg technique. Gains in efficiency are demonstrated for nearly all such designs. Simplifying assumptions about the values of population parameters in each stratum are shown to have little overall impact on the estimated efficiency.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114471
    Description:

    Binomial-Poisson and Poisson-Poisson sampling are introduced for use in forest sampling. Several estimators of the population total are discussed for these designs. Simulation comparisons of the properties of the estimators were made for three small forestry populations. A modification of the standard estimator used for Poisson sampling and a new estimator, called a modified Srivastava estimator, appear to be most efficient. The latter is unfortunately badly biased for all 3 populations.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114476
    Description:

    This paper focuses on how to deal with record linkage errors when engaged in regression analysis. Recent work by Rubin and Belin (1991) and by Winkler and Thibaudeau (1991) provides the theory, computational algorithms, and software necessary for estimating matching probabilities. These advances allow us to update the work of Neter, Maynes, and Ramanathan (1965). Adjustment procedures are outlined and some successful simulations are described. Our results are preliminary and intended largely to stimulate further work.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114473
    Description:

    Double sampling is a common alternative to simple random sampling when there are expected to be gains from using stratified sampling, but the units cannot be assigned to strata prior to sampling. It is assumed throughout that the survey objective is estimation of the finite population mean. We compare simple random sampling and three allocation methods for double sampling: (a) proportional, (b) Rao’s (Rao 1973a, b) and (c) optimal. There is also an investigation of the effect on sample size selection of misspecification of an important design parameter.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114479
    Description:

    Matching records in different administrative data bases is a useful tool for conducting epidemiological studies to study relationships between environmental hazards and health status. With large data bases, sophisticated computerized record linkage algorithms can be used to evaluate the likelihood of a match between two records based on a comparison of one or more identifying variables for those records. Since matching errors are inevitable, consideration needs to be given to the effects of such errors on statistical inferences based on the linked files. This article provides an overview of record linkage methodology, and a discussion of the statistical issues associated with linkage errors.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114474
    Description:

    The need for standards introduced for the gathering and reporting of information on nonresponse across surveys within a statistical agency is discussed. Standards being adopted at Statistics Canada are then described. Measures to reduce nonresponse undertaken at different stages in the design of surveys at Statistics Canada that have a bearing on nonresponse are described. These points are illustrated by examining nonresponse experiences for two major surveys at Statistics Canada.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114478
    Description:

    Record linkage refers to the use of an algorithmic technique for identifying pairs of records in separate data files that correspond to the same individual. This paper discusses a framework for evaluating sources of variation in record linkage based on viewing the procedure as a “black box” that takes input data and produces output (a set of declared matched pairs) that has certain properties. We illustrate the idea with a factorial experiment using census/post-enumeration survey data to assess the influence of a variety of factors thought to affect the accuracy of the procedure. The evaluation of record linkage becomes a standard statistical problem using this experimental framework. The investigation provides answers to several research questions, and it is argued that taking an experimental approach similar to that offered here is essential if progress is to be made in understanding the factors that contribute to the error properties of record-linkage procedures.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199300114475
    Description:

    In the creation of micro-simulation databases which are frequently used by policy analysts and planners, several datafiles are combined by statistical matching techniques for enriching the host datafile. This process requires the conditional independence assumption (CIA) which could lead to serious bias in the resulting joint relationships among variables. Appropriate auxiliary information could be used to avoid the CIA. In this report, methods of statistical matching corresponding to three methods of imputation, namely, regression, hot deck, and log linear, with and without auxiliary information are considered. The log linear methods consist of adding categorical constraints to either the regression or hot deck methods. Based on an extensive simulation study with synthetic data, sensitivity analyses for departures from the CIA are performed and gains from using auxiliary information are discussed. Different scenarios for the underlying distribution and relationships, such as symmetric versus skewed data and proxy versus nonproxy auxiliary data, are created using synthetic data. Some recommendations on the use of statistical matching methods are also made. Specifically, it was confirmed that the CIA could be a serious limitation which could be overcome by the use of appropriate auxiliary information. Hot deck methods were found to be generally preferable to regression methods. Also, when auxiliary information is available, log linear categorical constraints can improve performance of hot deck methods. This study was motivated by concerns about the use of the CIA in the construction of the Social Policy Simulation Database at Statistics Canada.

    Release date: 1993-06-15

Reference (0)

Reference (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Date modified: