Weighting and estimation

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (17)

All (17) (0 to 10 of 17 results)

  • Articles and reports: 12-001-X20060029547
    Description:

    Calibration weighting can be used to adjust for unit nonresponse and/or coverage errors under appropriate quasi-randomization models. Alternative calibration adjustments that are asymptotically identical in a purely sampling context can diverge when used in this manner. Introducing instrumental variables into calibration weighting makes it possible for nonresponse (say) to be a function of a set of characteristics other than those in the calibration vector. When the calibration adjustment has a nonlinear form, a variant of the jackknife can remove the need for iteration in variance estimation.

    Release date: 2006-12-21

  • Articles and reports: 12-001-X20060029549
    Description:

    In this article, we propose a Bernoulli-type bootstrap method that can easily handle multi-stage stratified designs where sampling fractions are large, provided simple random sampling without replacement is used at each stage. The method provides a set of replicate weights which yield consistent variance estimates for both smooth and non-smooth estimators. The method's strength is in its simplicity. It can easily be extended to any number of stages without much complication. The main idea is to either keep or replace a sampling unit at each stage with preassigned probabilities, to construct the bootstrap sample. A limited simulation study is presented to evaluate performance and, as an illustration, we apply the method to the 1997 Japanese National Survey of Prices.

    Release date: 2006-12-21

  • Articles and reports: 12-001-X20060029551
    Description:

    To select a survey sample, it happens that one does not have a frame containing the desired collection units, but rather another frame of units linked in a certain way to the list of collection units. It can then be considered to select a sample from the available frame in order to produce an estimate for the desired target population by using the links existing between the two. This can be designated by Indirect Sampling.

    Estimation for the target population surveyed by Indirect Sampling can constitute a big challenge, in particular if the links between the units of the two are not one-to-one. The problem comes especially from the difficulty to associate a selection probability, or an estimation weight, to the surveyed units of the target population. In order to solve this type of estimation problem, the Generalized Weight Share Method (GWSM) has been developed by Lavallée (1995) and Lavallée (2002). The GWSM provides an estimation weight for every surveyed unit of the target population.

    This paper first describes Indirect Sampling, which constitutes the foundations of the GWSM. Second, an overview of the GWSM is given where we formulate the GWSM in a theoretical framework using matrix notation. Third, we present some properties of the GWSM such as unbiasedness and transitivity. Fourth, we consider the special case where the links between the two populations are expressed by indicator variables. Fifth, some special typical linkages are studied to assess their impact on the GWSM. Finally, we consider the problem of optimality. We obtain optimal weights in a weak sense (for specific values of the variable of interest), and conditions for which these weights are also optimal in a strong sense and independent of the variable of interest.

    Release date: 2006-12-21

  • Articles and reports: 12-001-X20060029553
    Description:

    Félix-Medina and Thompson (2004) proposed a variant of Link-tracing sampling in which it is assumed that a portion of the population, not necessarily the major portion, is covered by a frame of disjoint sites where members of the population can be found with high probabilities. A sample of sites is selected and the people in each of the selected sites are asked to nominate other members of the population. They proposed maximum likelihood estimators of the population sizes which perform acceptably provided that for each site the probability that a member is nominated by that site, called the nomination probability, is not small. In this research we consider Félix-Medina and Thompson's variant and propose three sets of estimators of the population sizes derived under the Bayesian approach. Two of the sets of estimators were obtained using improper prior distributions of the population sizes, and the other using Poisson prior distributions. However, we use the Bayesian approach only to assist us in the construction of estimators, while inferences about the population sizes are made under the frequentist approach. We propose two types of partly design-based variance estimators and confidence intervals. One of them is obtained using a bootstrap and the other using the delta method along with the assumption of asymptotic normality. The results of a simulation study indicate that (i) when the nomination probabilities are not small each of the proposed sets of estimators performs well and very similarly to maximum likelihood estimators; (ii) when the nomination probabilities are small the set of estimators derived using Poisson prior distributions still performs acceptably and does not have the problems of bias that maximum likelihood estimators have, and (iii) the previous results do not depend on the size of the fraction of the population covered by the frame.

    Release date: 2006-12-21

  • Articles and reports: 12-001-X20060019255
    Description:

    In this paper, we consider the estimation of quantiles using the calibration paradigm. The proposed methodology relies on an approach similar to the one leading to the original calibration estimators of Deville and Särndal (1992). An appealing property of the new methodology is that it is not necessary to know the values of the auxiliary variables for all units in the population. It suffices instead to know the corresponding quantiles for the auxiliary variables. When the quadratic metric is adopted, an analytic representation of the calibration weights is obtained. In this situation, the weights are similar to those leading to the generalized regression (GREG) estimator. Variance estimation and construction of confidence intervals are discussed. In a small simulation study, a calibration estimator is compared to other popular estimators for quantiles that also make use of auxiliary information.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019256
    Description:

    In some situations the sample design of a survey is rather complex, consisting of fundamentally different designs in different domains. The design effect for estimates based upon the total sample is a weighted sum of the domain-specific design effects. We derive these weights under an appropriate model and illustrate their use with data from the European Social Survey (ESS).

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019258
    Description:

    This paper primarily aims at proposing a cost-effective strategy to estimate the intercensal unemployment rate at the provincial level in Iran. Taking advantage of the small area estimation (SAE) methods, this strategy is based on a single sampling at the national level. Three methods of synthetic, composite, and empirical Bayes estimators are used to find the indirect estimates of interest for the year 1996. Findings not only confirm the adequacy of the suggested strategy, but they also indicate that the composite and empirical Bayes estimators perform well and similarly.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019260
    Description:

    This paper considers the use of imputation and weighting to correct for measurement error in the estimation of a distribution function. The paper is motivated by the problem of estimating the distribution of hourly pay in the United Kingdom, using data from the Labour Force Survey. Errors in measurement lead to bias and the aim is to use auxiliary data, measured accurately for a subsample, to correct for this bias. Alternative point estimators are considered, based upon a variety of imputation and weighting approaches, including fractional imputation, nearest neighbour imputation, predictive mean matching and propensity score weighting. Properties of these point estimators are then compared both theoretically and by simulation. A fractional predictive mean matching imputation approach is advocated. It performs similarly to propensity score weighting, but displays slight advantages of robustness and efficiency.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019263
    Description:

    In small area estimation, area level models such as the Fay - Herriot model (Fay and Herriot 1979) are widely used to obtain efficient model-based estimators for small areas. The sampling error variances are customarily assumed to be known in the model. In this paper we consider the situation where the sampling error variances are estimated individually by direct estimators. A full hierarchical Bayes (HB) model is constructed for the direct survey estimators and the sampling error variances estimators. The Gibbs sampling method is employed to obtain the small area HB estimators. The proposed HB approach automatically takes account of the extra uncertainty of estimating the sampling error variances, especially when the area-specific sample sizes are small. We compare the proposed HB model with the Fay - Herriot model through analysis of two survey data sets. Our results have shown that the proposed HB estimators perform quite well compared to the direct estimates. We also discussed the problem of priors on the variance components.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019264
    Description:

    Sampling for nonresponse follow-up (NRFU) was an innovation for U.S. Decennial Census methodology considered for the year 2000. Sampling for NRFU involves sending field enumerators to only a sample of the housing units that did not respond to the initial mailed questionnaire, thereby reducing costs but creating a major small-area estimation problem. We propose a model to impute the characteristics of the housing units that did not respond to the mailed questionnaire, to benefit from the large cost savings of NRFU sampling while still attaining acceptable levels of accuracy for small areas. Our strategy is to model household characteristics using low-dimensional covariates at detailed levels of geography and more detailed covariates at larger levels of geography. To do this, households are first classified into a small number of types. A hierarchical loglinear model then estimates the distribution of household types among the nonsample nonrespondent households in each block. This distribution depends on the characteristics of mailback respondents in the same block and sampled nonrespondents in nearby blocks. Nonsample nonrespondent households can then be imputed according to this estimated household type distribution. We evaluate the performance of our loglinear model through simulation. Results show that, when compared to estimates from alternative models, our loglinear model produces estimates with much smaller MSE in many cases and estimates with approximately the same size MSE in most other cases. Although sampling for NRFU was not used in the 2000 census, our estimation and imputation strategy can be used in any census or survey using sampling for NRFU where units are clustered such that the characteristics of nonrespondents are related to the characteristics of respondents in the same area and also related to the characteristics of sampled nonrespondents in nearby areas.

    Release date: 2006-07-20
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (16)

Analysis (16) (0 to 10 of 16 results)

  • Articles and reports: 12-001-X20060029547
    Description:

    Calibration weighting can be used to adjust for unit nonresponse and/or coverage errors under appropriate quasi-randomization models. Alternative calibration adjustments that are asymptotically identical in a purely sampling context can diverge when used in this manner. Introducing instrumental variables into calibration weighting makes it possible for nonresponse (say) to be a function of a set of characteristics other than those in the calibration vector. When the calibration adjustment has a nonlinear form, a variant of the jackknife can remove the need for iteration in variance estimation.

    Release date: 2006-12-21

  • Articles and reports: 12-001-X20060029549
    Description:

    In this article, we propose a Bernoulli-type bootstrap method that can easily handle multi-stage stratified designs where sampling fractions are large, provided simple random sampling without replacement is used at each stage. The method provides a set of replicate weights which yield consistent variance estimates for both smooth and non-smooth estimators. The method's strength is in its simplicity. It can easily be extended to any number of stages without much complication. The main idea is to either keep or replace a sampling unit at each stage with preassigned probabilities, to construct the bootstrap sample. A limited simulation study is presented to evaluate performance and, as an illustration, we apply the method to the 1997 Japanese National Survey of Prices.

    Release date: 2006-12-21

  • Articles and reports: 12-001-X20060029551
    Description:

    To select a survey sample, it happens that one does not have a frame containing the desired collection units, but rather another frame of units linked in a certain way to the list of collection units. It can then be considered to select a sample from the available frame in order to produce an estimate for the desired target population by using the links existing between the two. This can be designated by Indirect Sampling.

    Estimation for the target population surveyed by Indirect Sampling can constitute a big challenge, in particular if the links between the units of the two are not one-to-one. The problem comes especially from the difficulty to associate a selection probability, or an estimation weight, to the surveyed units of the target population. In order to solve this type of estimation problem, the Generalized Weight Share Method (GWSM) has been developed by Lavallée (1995) and Lavallée (2002). The GWSM provides an estimation weight for every surveyed unit of the target population.

    This paper first describes Indirect Sampling, which constitutes the foundations of the GWSM. Second, an overview of the GWSM is given where we formulate the GWSM in a theoretical framework using matrix notation. Third, we present some properties of the GWSM such as unbiasedness and transitivity. Fourth, we consider the special case where the links between the two populations are expressed by indicator variables. Fifth, some special typical linkages are studied to assess their impact on the GWSM. Finally, we consider the problem of optimality. We obtain optimal weights in a weak sense (for specific values of the variable of interest), and conditions for which these weights are also optimal in a strong sense and independent of the variable of interest.

    Release date: 2006-12-21

  • Articles and reports: 12-001-X20060029553
    Description:

    Félix-Medina and Thompson (2004) proposed a variant of Link-tracing sampling in which it is assumed that a portion of the population, not necessarily the major portion, is covered by a frame of disjoint sites where members of the population can be found with high probabilities. A sample of sites is selected and the people in each of the selected sites are asked to nominate other members of the population. They proposed maximum likelihood estimators of the population sizes which perform acceptably provided that for each site the probability that a member is nominated by that site, called the nomination probability, is not small. In this research we consider Félix-Medina and Thompson's variant and propose three sets of estimators of the population sizes derived under the Bayesian approach. Two of the sets of estimators were obtained using improper prior distributions of the population sizes, and the other using Poisson prior distributions. However, we use the Bayesian approach only to assist us in the construction of estimators, while inferences about the population sizes are made under the frequentist approach. We propose two types of partly design-based variance estimators and confidence intervals. One of them is obtained using a bootstrap and the other using the delta method along with the assumption of asymptotic normality. The results of a simulation study indicate that (i) when the nomination probabilities are not small each of the proposed sets of estimators performs well and very similarly to maximum likelihood estimators; (ii) when the nomination probabilities are small the set of estimators derived using Poisson prior distributions still performs acceptably and does not have the problems of bias that maximum likelihood estimators have, and (iii) the previous results do not depend on the size of the fraction of the population covered by the frame.

    Release date: 2006-12-21

  • Articles and reports: 12-001-X20060019255
    Description:

    In this paper, we consider the estimation of quantiles using the calibration paradigm. The proposed methodology relies on an approach similar to the one leading to the original calibration estimators of Deville and Särndal (1992). An appealing property of the new methodology is that it is not necessary to know the values of the auxiliary variables for all units in the population. It suffices instead to know the corresponding quantiles for the auxiliary variables. When the quadratic metric is adopted, an analytic representation of the calibration weights is obtained. In this situation, the weights are similar to those leading to the generalized regression (GREG) estimator. Variance estimation and construction of confidence intervals are discussed. In a small simulation study, a calibration estimator is compared to other popular estimators for quantiles that also make use of auxiliary information.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019256
    Description:

    In some situations the sample design of a survey is rather complex, consisting of fundamentally different designs in different domains. The design effect for estimates based upon the total sample is a weighted sum of the domain-specific design effects. We derive these weights under an appropriate model and illustrate their use with data from the European Social Survey (ESS).

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019258
    Description:

    This paper primarily aims at proposing a cost-effective strategy to estimate the intercensal unemployment rate at the provincial level in Iran. Taking advantage of the small area estimation (SAE) methods, this strategy is based on a single sampling at the national level. Three methods of synthetic, composite, and empirical Bayes estimators are used to find the indirect estimates of interest for the year 1996. Findings not only confirm the adequacy of the suggested strategy, but they also indicate that the composite and empirical Bayes estimators perform well and similarly.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019260
    Description:

    This paper considers the use of imputation and weighting to correct for measurement error in the estimation of a distribution function. The paper is motivated by the problem of estimating the distribution of hourly pay in the United Kingdom, using data from the Labour Force Survey. Errors in measurement lead to bias and the aim is to use auxiliary data, measured accurately for a subsample, to correct for this bias. Alternative point estimators are considered, based upon a variety of imputation and weighting approaches, including fractional imputation, nearest neighbour imputation, predictive mean matching and propensity score weighting. Properties of these point estimators are then compared both theoretically and by simulation. A fractional predictive mean matching imputation approach is advocated. It performs similarly to propensity score weighting, but displays slight advantages of robustness and efficiency.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019263
    Description:

    In small area estimation, area level models such as the Fay - Herriot model (Fay and Herriot 1979) are widely used to obtain efficient model-based estimators for small areas. The sampling error variances are customarily assumed to be known in the model. In this paper we consider the situation where the sampling error variances are estimated individually by direct estimators. A full hierarchical Bayes (HB) model is constructed for the direct survey estimators and the sampling error variances estimators. The Gibbs sampling method is employed to obtain the small area HB estimators. The proposed HB approach automatically takes account of the extra uncertainty of estimating the sampling error variances, especially when the area-specific sample sizes are small. We compare the proposed HB model with the Fay - Herriot model through analysis of two survey data sets. Our results have shown that the proposed HB estimators perform quite well compared to the direct estimates. We also discussed the problem of priors on the variance components.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019264
    Description:

    Sampling for nonresponse follow-up (NRFU) was an innovation for U.S. Decennial Census methodology considered for the year 2000. Sampling for NRFU involves sending field enumerators to only a sample of the housing units that did not respond to the initial mailed questionnaire, thereby reducing costs but creating a major small-area estimation problem. We propose a model to impute the characteristics of the housing units that did not respond to the mailed questionnaire, to benefit from the large cost savings of NRFU sampling while still attaining acceptable levels of accuracy for small areas. Our strategy is to model household characteristics using low-dimensional covariates at detailed levels of geography and more detailed covariates at larger levels of geography. To do this, households are first classified into a small number of types. A hierarchical loglinear model then estimates the distribution of household types among the nonsample nonrespondent households in each block. This distribution depends on the characteristics of mailback respondents in the same block and sampled nonrespondents in nearby blocks. Nonsample nonrespondent households can then be imputed according to this estimated household type distribution. We evaluate the performance of our loglinear model through simulation. Results show that, when compared to estimates from alternative models, our loglinear model produces estimates with much smaller MSE in many cases and estimates with approximately the same size MSE in most other cases. Although sampling for NRFU was not used in the 2000 census, our estimation and imputation strategy can be used in any census or survey using sampling for NRFU where units are clustered such that the characteristics of nonrespondents are related to the characteristics of respondents in the same area and also related to the characteristics of sampled nonrespondents in nearby areas.

    Release date: 2006-07-20
Reference (1)

Reference (1) ((1 result))

  • Surveys and statistical programs – Documentation: 71F0031X2006003
    Description:

    This paper introduces and explains modifications made to the Labour Force Survey estimates in January 2006. Some of these modifications include changes to the population estimates, improvements to the public and private sector estimates and historical updates to several small Census Agglomerations (CA).

    Release date: 2006-01-25
Date modified: