Survey design

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (39)

All (39) (0 to 10 of 39 results)

  • Articles and reports: 12-001-X202300200001
    Description: When a Medicare healthcare provider is suspected of billing abuse, a population of payments X made to that provider over a fixed timeframe is isolated. A certified medical reviewer, in a time-consuming process, can determine the overpayment Y = X - (amount justified by the evidence) associated with each payment. Typically, there are too many payments in the population to examine each with care, so a probability sample is selected. The sample overpayments are then used to calculate a 90% lower confidence bound for the total population overpayment. This bound is the amount demanded for recovery from the provider. Unfortunately, classical methods for calculating this bound sometimes fail to provide the 90% confidence level, especially when using a stratified sample.

    In this paper, 166 redacted samples from Medicare integrity investigations are displayed and described, along with 156 associated payment populations. The 7,588 examined (Y, X) sample pairs show (1) Medicare audits have high error rates: more than 76% of these payments were considered to have been paid in error; and (2) the patterns in these samples support an “All-or-Nothing” mixture model for (Y, X) previously defined in the literature. Model-based Monte Carlo testing procedures for Medicare sampling plans are discussed, as well as stratification methods based on anticipated model moments. In terms of viability (achieving the 90% confidence level) a new stratification method defined here is competitive with the best of the many existing methods tested and seems less sensitive to choice of operating parameters. In terms of overpayment recovery (equivalent to precision) the new method is also comparable to the best of the many existing methods tested. Unfortunately, no stratification algorithm tested was ever viable for more than about half of the 104 test populations.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200008
    Description: In this article, we use a slightly simplified version of the method by Fickus, Mixon and Poteet (2013) to define a flexible parameterization of the kernels of determinantal sampling designs with fixed first-order inclusion probabilities. For specific values of the multidimensional parameter, we get back to a matrix from the family PII from Loonis and Mary (2019). We speculate that, among the determinantal designs with fixed inclusion probabilities, the minimum variance of the Horvitz and Thompson estimator (1952) of a variable of interest is expressed relative to PII. We provide experimental R programs that facilitate the appropriation of various concepts presented in the article, some of which are described as non-trivial by Fickus et al. (2013). A longer version of this article, including proofs and a more detailed presentation of the determinantal designs, is also available.
    Release date: 2024-01-03

  • Articles and reports: 75F0002M2023005
    Description: The Canadian Income Survey (CIS) has introduced improvements to the methods and systems used to produce income estimates with the release of its 2021 reference year estimates. This paper describes the changes and presents the approximate net result of these changes on income estimates using data for 2019 and 2020. The changes described in this paper highlight the ways in which data quality has been improved while producing minimal impact on key CIS estimates and trends.
    Release date: 2023-08-29

  • Articles and reports: 12-001-X202100200008
    Description:

    Multiple-frame surveys, in which independent probability samples are selected from each of Q sampling frames, have long been used to improve coverage, to reduce costs, or to increase sample sizes for subpopulations of interest. Much of the theory has been developed assuming that (1) the union of the frames covers the population of interest, (2) a full-response probability sample is selected from each frame, (3) the variables of interest are measured in each sample with no measurement error, and (4) sufficient information exists to account for frame overlap when computing estimates. After reviewing design, estimation, and calibration for traditional multiple-frame surveys, I consider modifications of the assumptions that allow a multiple-frame structure to serve as an organizing principle for other data combination methods such as mass imputation, sample matching, small area estimation, and capture-recapture estimation. Finally, I discuss how results from multiple-frame survey research can be used when designing and evaluating data collection systems that integrate multiple sources of data.

    Release date: 2022-01-06

  • Articles and reports: 12-001-X201900300004
    Description:

    Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201400214096
    Description:

    In order to obtain better coverage of the population of interest and cost less, a number of surveys employ dual frame structure, in which independent samples are taken from two overlapping sampling frames. This research considers chi-squared tests in dual frame surveys when categorical data is encountered. We extend generalized Wald’s test (Wald 1943), Rao-Scott first-order and second-order corrected tests (Rao and Scott 1981) from a single survey to a dual frame survey and derive the asymptotic distributions. Simulation studies show that both Rao-Scott type corrected tests work well and thus are recommended for use in dual frame surveys. An example is given to illustrate the usage of the developed tests.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201300111824
    Description:

    In most surveys all sample units receive the same treatment and the same design features apply to all selected people and households. In this paper, it is explained how survey designs may be tailored to optimize quality given constraints on costs. Such designs are called adaptive survey designs. The basic ingredients of such designs are introduced, discussed and illustrated with various examples.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201300111829
    Description:

    Indirect Sampling is used when the sampling frame is not the same as the target population, but related to the latter. The estimation process for Indirect Sampling is carried out using the Generalised Weight Share Method (GWSM), which is an unbiased procedure (see Lavallée 2002, 2007). For business surveys, Indirect Sampling is applied as follows: the sampling frame is one of establishments, while the target population is one of enterprises. Enterprises are selected through their establishments. This allows stratifying according to the establishment characteristics, rather than those associated with enterprises. Because the variables of interest of establishments are generally highly skewed (a small portion of the establishments covers the major portion of the economy), the GWSM results in unbiased estimates, but their variance can be large. The purpose of this paper is to suggest some adjustments to the weights to reduce the variance of the estimates in the context of skewed populations, while keeping the method unbiased. After a brief overview of Indirect Sampling and the GWSM, we describe the required adjustments to the GWSM. The estimates produced with these adjustments are compared to those from the original GWSM, via a small numerical example, and using real data originating from the Statistics Canada's Business Register.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201100211608
    Description:

    Designs and estimators for the single frame surveys currently used by U.S. government agencies were developed in response to practical problems. Federal household surveys now face challenges of decreasing response rates and frame coverage, higher data collection costs, and increasing demand for small area statistics. Multiple frame surveys, in which independent samples are drawn from separate frames, can be used to help meet some of these challenges. Examples include combining a list frame with an area frame or using two frames to sample landline telephone households and cellular telephone households. We review point estimators and weight adjustments that can be used to analyze multiple frame surveys with standard survey software, and summarize construction of replicate weights for variance estimation. Because of their increased complexity, multiple frame surveys face some challenges not found in single frame surveys. We investigate misclassification bias in multiple frame surveys, and propose a method for correcting for this bias when misclassification probabilities are known. Finally, we discuss research that is needed on nonsampling errors with multiple frame surveys.

    Release date: 2011-12-21

  • Articles and reports: 12-001-X201000111243
    Description:

    The 2003 National Assessment of Adult Literacy (NAAL) and the international Adult Literacy and Lifeskills (ALL) surveys each involved stratified multi-stage area sample designs. During the last stage, a household roster was constructed, the eligibility status of each individual was determined, and the selection procedure was invoked to randomly select one or two eligible persons within the household. The objective of this paper is to evaluate the within-household selection rules under a multi-stage design while improving the procedure in future literacy surveys. The analysis is based on the current US household size distribution and intracluster correlation coefficients using the adult literacy data. In our evaluation, several feasible household selection rules are studied, considering effects from clustering, differential sampling rates, cost per interview, and household burden. In doing so, an evaluation of within-household sampling under a two-stage design is extended to a four-stage design and some generalizations are made to multi-stage samples with different cost ratios.

    Release date: 2010-06-29
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (36)

Analysis (36) (0 to 10 of 36 results)

  • Articles and reports: 12-001-X202300200001
    Description: When a Medicare healthcare provider is suspected of billing abuse, a population of payments X made to that provider over a fixed timeframe is isolated. A certified medical reviewer, in a time-consuming process, can determine the overpayment Y = X - (amount justified by the evidence) associated with each payment. Typically, there are too many payments in the population to examine each with care, so a probability sample is selected. The sample overpayments are then used to calculate a 90% lower confidence bound for the total population overpayment. This bound is the amount demanded for recovery from the provider. Unfortunately, classical methods for calculating this bound sometimes fail to provide the 90% confidence level, especially when using a stratified sample.

    In this paper, 166 redacted samples from Medicare integrity investigations are displayed and described, along with 156 associated payment populations. The 7,588 examined (Y, X) sample pairs show (1) Medicare audits have high error rates: more than 76% of these payments were considered to have been paid in error; and (2) the patterns in these samples support an “All-or-Nothing” mixture model for (Y, X) previously defined in the literature. Model-based Monte Carlo testing procedures for Medicare sampling plans are discussed, as well as stratification methods based on anticipated model moments. In terms of viability (achieving the 90% confidence level) a new stratification method defined here is competitive with the best of the many existing methods tested and seems less sensitive to choice of operating parameters. In terms of overpayment recovery (equivalent to precision) the new method is also comparable to the best of the many existing methods tested. Unfortunately, no stratification algorithm tested was ever viable for more than about half of the 104 test populations.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200008
    Description: In this article, we use a slightly simplified version of the method by Fickus, Mixon and Poteet (2013) to define a flexible parameterization of the kernels of determinantal sampling designs with fixed first-order inclusion probabilities. For specific values of the multidimensional parameter, we get back to a matrix from the family PII from Loonis and Mary (2019). We speculate that, among the determinantal designs with fixed inclusion probabilities, the minimum variance of the Horvitz and Thompson estimator (1952) of a variable of interest is expressed relative to PII. We provide experimental R programs that facilitate the appropriation of various concepts presented in the article, some of which are described as non-trivial by Fickus et al. (2013). A longer version of this article, including proofs and a more detailed presentation of the determinantal designs, is also available.
    Release date: 2024-01-03

  • Articles and reports: 75F0002M2023005
    Description: The Canadian Income Survey (CIS) has introduced improvements to the methods and systems used to produce income estimates with the release of its 2021 reference year estimates. This paper describes the changes and presents the approximate net result of these changes on income estimates using data for 2019 and 2020. The changes described in this paper highlight the ways in which data quality has been improved while producing minimal impact on key CIS estimates and trends.
    Release date: 2023-08-29

  • Articles and reports: 12-001-X202100200008
    Description:

    Multiple-frame surveys, in which independent probability samples are selected from each of Q sampling frames, have long been used to improve coverage, to reduce costs, or to increase sample sizes for subpopulations of interest. Much of the theory has been developed assuming that (1) the union of the frames covers the population of interest, (2) a full-response probability sample is selected from each frame, (3) the variables of interest are measured in each sample with no measurement error, and (4) sufficient information exists to account for frame overlap when computing estimates. After reviewing design, estimation, and calibration for traditional multiple-frame surveys, I consider modifications of the assumptions that allow a multiple-frame structure to serve as an organizing principle for other data combination methods such as mass imputation, sample matching, small area estimation, and capture-recapture estimation. Finally, I discuss how results from multiple-frame survey research can be used when designing and evaluating data collection systems that integrate multiple sources of data.

    Release date: 2022-01-06

  • Articles and reports: 12-001-X201900300004
    Description:

    Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201400214096
    Description:

    In order to obtain better coverage of the population of interest and cost less, a number of surveys employ dual frame structure, in which independent samples are taken from two overlapping sampling frames. This research considers chi-squared tests in dual frame surveys when categorical data is encountered. We extend generalized Wald’s test (Wald 1943), Rao-Scott first-order and second-order corrected tests (Rao and Scott 1981) from a single survey to a dual frame survey and derive the asymptotic distributions. Simulation studies show that both Rao-Scott type corrected tests work well and thus are recommended for use in dual frame surveys. An example is given to illustrate the usage of the developed tests.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201300111824
    Description:

    In most surveys all sample units receive the same treatment and the same design features apply to all selected people and households. In this paper, it is explained how survey designs may be tailored to optimize quality given constraints on costs. Such designs are called adaptive survey designs. The basic ingredients of such designs are introduced, discussed and illustrated with various examples.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201300111829
    Description:

    Indirect Sampling is used when the sampling frame is not the same as the target population, but related to the latter. The estimation process for Indirect Sampling is carried out using the Generalised Weight Share Method (GWSM), which is an unbiased procedure (see Lavallée 2002, 2007). For business surveys, Indirect Sampling is applied as follows: the sampling frame is one of establishments, while the target population is one of enterprises. Enterprises are selected through their establishments. This allows stratifying according to the establishment characteristics, rather than those associated with enterprises. Because the variables of interest of establishments are generally highly skewed (a small portion of the establishments covers the major portion of the economy), the GWSM results in unbiased estimates, but their variance can be large. The purpose of this paper is to suggest some adjustments to the weights to reduce the variance of the estimates in the context of skewed populations, while keeping the method unbiased. After a brief overview of Indirect Sampling and the GWSM, we describe the required adjustments to the GWSM. The estimates produced with these adjustments are compared to those from the original GWSM, via a small numerical example, and using real data originating from the Statistics Canada's Business Register.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201100211608
    Description:

    Designs and estimators for the single frame surveys currently used by U.S. government agencies were developed in response to practical problems. Federal household surveys now face challenges of decreasing response rates and frame coverage, higher data collection costs, and increasing demand for small area statistics. Multiple frame surveys, in which independent samples are drawn from separate frames, can be used to help meet some of these challenges. Examples include combining a list frame with an area frame or using two frames to sample landline telephone households and cellular telephone households. We review point estimators and weight adjustments that can be used to analyze multiple frame surveys with standard survey software, and summarize construction of replicate weights for variance estimation. Because of their increased complexity, multiple frame surveys face some challenges not found in single frame surveys. We investigate misclassification bias in multiple frame surveys, and propose a method for correcting for this bias when misclassification probabilities are known. Finally, we discuss research that is needed on nonsampling errors with multiple frame surveys.

    Release date: 2011-12-21

  • Articles and reports: 12-001-X201000111243
    Description:

    The 2003 National Assessment of Adult Literacy (NAAL) and the international Adult Literacy and Lifeskills (ALL) surveys each involved stratified multi-stage area sample designs. During the last stage, a household roster was constructed, the eligibility status of each individual was determined, and the selection procedure was invoked to randomly select one or two eligible persons within the household. The objective of this paper is to evaluate the within-household selection rules under a multi-stage design while improving the procedure in future literacy surveys. The analysis is based on the current US household size distribution and intracluster correlation coefficients using the adult literacy data. In our evaluation, several feasible household selection rules are studied, considering effects from clustering, differential sampling rates, cost per interview, and household burden. In doing so, an evaluation of within-household sampling under a two-stage design is extended to a four-stage design and some generalizations are made to multi-stage samples with different cost ratios.

    Release date: 2010-06-29
Reference (3)

Reference (3) ((3 results))

  • Surveys and statistical programs – Documentation: 75F0002M1992007
    Description:

    A Preliminary Interview will be conducted on the first panel of SLID, in January 1993, as a supplement to the Labour Force Survey. The first panel is made up of about 20,000 households that are rotating out of the Labour Force Survey in January and February, 1993.

    The purpose of this document is to provide a description of the purpose of the SLID Preliminary Interview and the question wordings to be used.

    Release date: 2008-02-29

  • Surveys and statistical programs – Documentation: 75F0002M2005002
    Description:

    This paper describes the changes made to the structure of geography information on SLID from reference year 1999 onwards. It goes into reasons for changing to the 2001 Census-based geography, shows how the overlap between the 1991 and 2001 Census-based concepts are handled, provides detail on how the geographic concepts are implemented, discusses a new imputation procedure and finishes with an illustration of the impact of these changes on selected tables.

    Release date: 2005-03-31

  • Surveys and statistical programs – Documentation: 75F0002M1993019
    Description:

    This paper examines the issues and the procedures designed to maintain a representative sample of the population for the Survey of Labour and Income Dynamics (SLID).

    Release date: 1995-12-30
Date modified: