Statistics by subject – Survey design

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (34)

All (34) (25 of 34 results)

  • Articles and reports: 12-001-X201600214662
    Description:

    Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.

    Release date: 2016-12-20

  • Technical products: 11-522-X201700014749
    Description:

    As part of the Tourism Statistics Program redesign, Statistics Canada is developing the National Travel Survey (NTS) to collect travel information from Canadian travellers. This new survey will replace the Travel Survey of Residents of Canada and the Canadian resident component of the International Travel Survey. The NTS will take advantage of Statistics Canada’s common sampling frames and common processing tools while maximizing the use of administrative data. This paper discusses the potential uses of administrative data such as Passport Canada files, Canada Border Service Agency files and Canada Revenue Agency files, to increase the efficiency of the NTS sample design.

    Release date: 2016-03-24

  • Articles and reports: 12-001-X201500214229
    Description:

    Self-weighting estimation through equal probability selection methods (epsem) is desirable for variance efficiency. Traditionally, the epsem property for (one phase) two stage designs for estimating population-level parameters is realized by using each primary sampling unit (PSU) population count as the measure of size for PSU selection along with equal sample size allocation per PSU under simple random sampling (SRS) of elementary units. However, when self-weighting estimates are desired for parameters corresponding to multiple domains under a pre-specified sample allocation to domains, Folsom, Potter and Williams (1987) showed that a composite measure of size can be used to select PSUs to obtain epsem designs when besides domain-level PSU counts (i.e., distribution of domain population over PSUs), frame-level domain identifiers for elementary units are also assumed to be available. The term depsem-A will be used to denote such (one phase) two stage designs to obtain domain-level epsem estimation. Folsom et al. also considered two phase two stage designs when domain-level PSU counts are unknown, but whole PSU counts are known. For these designs (to be termed depsem-B) with PSUs selected proportional to the usual size measure (i.e., the total PSU count) at the first stage, all elementary units within each selected PSU are first screened for classification into domains in the first phase of data collection before SRS selection at the second stage. Domain-stratified samples are then selected within PSUs with suitably chosen domain sampling rates such that the desired domain sample sizes are achieved and the resulting design is self-weighting. In this paper, we first present a simple justification of composite measures of size for the depsem-A design and of the domain sampling rates for the depsem-B design. Then, for depsem-A and -B designs, we propose generalizations, first to cases where frame-level domain identifiers for elementary units are not available and domain-level PSU counts are only approximately known from alternative sources, and second to cases where PSU size measures are pre-specified based on other practical and desirable considerations of over- and under-sampling of certain domains. We also present a further generalization in the presence of subsampling of elementary units and nonresponse within selected PSUs at the first phase before selecting phase two elementary units from domains within each selected PSU. This final generalization of depsem-B is illustrated for an area sample of housing units.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201400214119
    Description:

    When considering sample stratification by several variables, we often face the case where the expected number of sample units to be selected in each stratum is very small and the total number of units to be selected is smaller than the total number of strata. These stratified sample designs are specifically represented by the tabular arrays with real numbers, called controlled selection problems, and are beyond the reach of conventional methods of allocation. Many algorithms for solving these problems have been studied over about 60 years beginning with Goodman and Kish (1950). Those developed more recently are especially computer intensive and always find the solutions. However, there still remains the unanswered question: In what sense are the solutions to a controlled selection problem obtained from those algorithms optimal? We introduce the general concept of optimal solutions, and propose a new controlled selection algorithm based on typical distance functions to achieve solutions. This algorithm can be easily performed by a new SAS-based software. This study focuses on two-way stratification designs. The controlled selection solutions from the new algorithm are compared with those from existing algorithms using several examples. The new algorithm successfully obtains robust solutions to two-way controlled selection problems that meet the optimality criteria.

    Release date: 2014-12-19

  • Technical products: 11-522-X201300014276
    Description:

    In France, budget restrictions are making it more difficult to hire casual interviewers to deal with collection problems. As a result, it has become necessary to adhere to a predetermined annual work quota. For surveys of the National Institute of Statistics and Economic Studies (INSEE), which use a master sample, problems arise when an interviewer is on extended leave throughout the entire collection period of a survey. When that occurs, an area may cease to be covered by the survey, and this effectively generates a bias. In response to this new problem, we have implemented two methods, depending on when the problem is identified: If an area is ‘abandoned’ before or at the very beginning of collection, we carry out a ‘sub-allocation’ procedure. The procedure involves interviewing a minimum number of households in each collection area at the expense of other areas in which no collection problems have been identified. The idea is to minimize the dispersion of weights while meeting collection targets. If an area is ‘abandoned’ during collection, we prioritize the remaining surveys. Prioritization is based on a representativeness indicator (R indicator) that measures the degree of similarity between a sample and the base population. The goal of this prioritization process during collection is to get as close as possible to equal response probability for respondents. The R indicator is based on the dispersion of the estimated response probabilities of the sampled households, and it is composed of partial R indicators that measure representativeness variable by variable. These R indicators are tools that we can use to analyze collection by isolating underrepresented population groups. We can increase collection efforts for groups that have been identified beforehand. In the oral presentation, we covered these two points concisely. By contrast, this paper deals exclusively with the first point: sub-allocation. Prioritization is being implemented for the first time at INSEE for the assets survey, and it will be covered in a specific paper by A. Rebecq.

    Release date: 2014-10-31

  • Articles and reports: 12-001-X201200111682
    Description:

    Sample allocation issues are studied in the context of estimating sub-population (stratum or domain) means as well as the aggregate population mean under stratified simple random sampling. A non-linear programming method is used to obtain "optimal" sample allocation to strata that minimizes the total sample size subject to specified tolerances on the coefficient of variation of the estimators of strata means and the population mean. The resulting total sample size is then used to determine sample allocations for the methods of Costa, Satorra and Ventura (2004) based on compromise allocation and Longford (2006) based on specified "inferential priorities". In addition, we study sample allocation to strata when reliability requirements for domains, cutting across strata, are also specified. Performance of the three methods is studied using data from Statistics Canada's Monthly Retail Trade Survey (MRTS) of single establishments.

    Release date: 2012-06-27

  • Articles and reports: 12-001-X201000211385
    Description:

    In this short note, we show that simple random sampling without replacement and Bernoulli sampling have approximately the same entropy when the population size is large. An empirical example is given as an illustration.

    Release date: 2010-12-21

  • Technical products: 11-522-X200800011008
    Description:

    In one sense, a questionnaire is never complete. Test results, paradata and research findings constantly provide reasons to update and improve the questionnaire. In addition, establishments change over time and questions need to be updated accordingly. In reality, it doesn't always work like this. At Statistics Sweden there are several examples of questionnaires that were designed at one point in time and rarely improved later on. However, we are currently trying to shift the perspective on questionnaire design from a linear to a cyclic one. We are developing a cyclic model in which the questionnaire can be improved continuously in multiple rounds. In this presentation, we will discuss this model and how we work with it.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010940
    Description:

    Data Collection Methodology (DCM) enable the collection of good quality data by providing expert advice and assistance on questionnaire design, methods of evaluation and respondent engagement. DCM assist in the development of client skills, undertake research and lead innovation in data collection methods. This is done in a challenging environment of organisational change and limited resources. This paper will cover 'how DCM do business' with clients and the wider methodological community to achieve our goals.

    Release date: 2009-12-03

  • Articles and reports: 12-001-X200700210498
    Description:

    In this paper we describe a methodology for combining a convenience sample with a probability sample in order to produce an estimator with a smaller mean squared error (MSE) than estimators based on only the probability sample. We then explore the properties of the resulting composite estimator, a linear combination of the convenience and probability sample estimators with weights that are a function of bias. We discuss the estimator's properties in the context of web-based convenience sampling. Our analysis demonstrates that the use of a convenience sample to supplement a probability sample for improvements in the MSE of estimation may be practical only under limited circumstances. First, the remaining bias of the estimator based on the convenience sample must be quite small, equivalent to no more than 0.1 of the outcome's population standard deviation. For a dichotomous outcome, this implies a bias of no more than five percentage points at 50 percent prevalence and no more than three percentage points at 10 percent prevalence. Second, the probability sample should contain at least 1,000-10,000 observations for adequate estimation of the bias of the convenience sample estimator. Third, it must be inexpensive and feasible to collect at least thousands (and probably tens of thousands) of web-based convenience observations. The conclusions about the limited usefulness of convenience samples with estimator bias of more than 0.1 standard deviations also apply to direct use of estimators based on that sample.

    Release date: 2008-01-03

  • Technical products: 11-522-X20050019444
    Description:

    There are several ways to improve data quality. One of them is to re-design and test questionnaires for ongoing surveys. The benefits of questionnaire re-design and testing include improving the accuracy by ensuring the questions collect the required data, as well as decreased response burden.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019493
    Description:

    This article introduces the General Statistics Office in Hanoi, Vietnam, and gives a description of socio-economic surveys conducted since the early nineties in Vietnam, with a discussion of their methods and achievements, as well as remaining challenges.

    Release date: 2007-03-02

  • Articles and reports: 12-001-X20040027749
    Description:

    A simple and practicable algorithm for constructing stratum boundaries in such a way that the coefficients of variation are equal in each stratum is derived for positively skewed populations. The new algorithm is shown to compare favourably with the cumulative root frequency method (Dalenius and Hodges 1957) and the Lavallée and Hidiroglou (1988) approximation method for estimating the optimum stratum boundaries.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20040027756
    Description:

    It is usually discovered in the data collection phase of a survey that some units in the sample are ineligible even if the frame information has indicated otherwise. For example, in many business surveys a nonnegligible proportion of the sampled units will have ceased trading since the latest update of the frame. This information may be fed back to the frame and used in subsequent surveys, thereby making forthcoming samples more efficient by avoiding sampling ineligible units. On the first of two survey occasions, we assume that all ineligible units in the sample (or set of samples) are detected and excluded from the frame. On the second occasion, a subsample of the eligible part is observed again. The subsample may be augmented with a fresh sample that will contain both eligible and ineligible units. We investigate what effect on survey estimation the process of feeding back information on ineligibility may have, and derive an expression for the bias that can occur as a result of feeding back. The focus is on estimation of the total using the common expansion estimator. An estimator that is nearly unbiased in the presence of feed back is obtained. This estimator relies on consistent estimates of the number of eligible and ineligible units in the population being available.

    Release date: 2005-02-03

  • Technical products: 11-522-X20030017726
    Description:

    This paper addresses issues of how to use auxiliary information efficiently in sampling from finite populations.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017718
    Description:

    The paper describes the approach, models, issues, problems, lessons learned and experience gained from bringing together and making widely available statistics down to the small area level. It will also outline the work done on psychiatric morbidity and crime statistics and implications for future research.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017699
    Description:

    This paper illustrates the link between the strategic needs of a national statistical office (NSO) and the methodological needs that this generates.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017600
    Description:

    This paper extends the Sen-Yates-Grundy (SYG) variance estimators two-phase sampling designs with stratification at the second phase or both phases. It also develops SYG-type variance estimators of the two-phase regression estimators that make use of the first phase auxiliary data.

    Release date: 2005-01-26

  • Technical products: 11-522-X20020016721
    Description:

    This paper examines the simulation study that was conducted to assess the sampling scheme designed for the World Health Organization (WHO) Injection Safety Assessment Survey. The objective of this assessment survey is to determine whether facilities in which injections are given meet the necessary safety requirements for injection administration, equipment, supplies and waste disposal. The main parameter of interest is the proportion of health care facilities in a country that have safe injection practices.

    The objective of this simulation study was to assess the accuracy and precision of the proposed sampling design. To this end, two artificial populations were created based on the two African countries of Niger and Burkina Faso, in which the pilot survey was tested. To create a wide variety of hypothetical populations, the assignment of whether a health care facility was safe or not was based on the different combinations of the population proportion of safe health care facilities in the country, the homogeneity of the districts in the country with respect to injection safety, and whether the health care facility was located in an urban or rural district.

    Using the results of the simulation, a multi-factor analysis of variance was used to determine which factors affect the outcome measures of absolute bias, standard error and mean-squared error.

    Release date: 2004-09-13

  • Articles and reports: 12-001-X20030026782
    Description:

    This paper discusses both the general question of designing a post-enumeration survey, and how these general questions were addressed in the U.S. Census Bureau's coverage measurement planned as part of Census 2000. It relates the basic concepts of the Dual System Estimator to questions of the definition and measurement of correct enumerations, the measurement of census omissions, operational independence, reporting of residence, and the role of after-matching reinterview. It discusses estimation issues such as the treatment of movers, missing data, and synthetic estimation of local corrected population size. It also discusses where the design failed in Census 2000.

    Release date: 2004-01-27

  • Articles and reports: 12-001-X20030016610
    Description:

    In the presence of item nonreponse, unweighted imputation methods are often used in practice but they generally lead to biased estimators under uniform response within imputation classes. Following Skinner and Rao (2002), we propose a bias-adjusted estimator of a population mean under unweighted ratio imputation and random hot-deck imputation and derive linearization variance estimators. A small simulation study is conducted to study the performance of the methods in terms of bias and mean square error. Relative bias and relative stability of the variance estimators are also studied.

    Release date: 2003-07-31

  • Technical products: 11-522-X20010016247
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    This paper describes joint research by the Office for National Statistics (ONS) and Southampton University regarding the evaluation of several different approaches to the local estimation of International Labour Office (ILO) unemployment. The need to compare estimators with different underlying assumptions has led to a focus on evaluation methods that are (partly at least) model-independent. Model-fit diagnostics that have been considered include: various residual procedures, cross-validation, predictive validation, consistency with marginals, and consistency with direct estimates within single cells. These diagnostics have been used to compare different model-based estimators with each other and with direct estimators.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016258
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    To fill statistical gaps in the areas of health determinants, health status and health system usage by the Canadian population at the health region levels (sub-provincial areas or regions of interest to health authorities), Statistics Canada established a new survey called the Canadian Community Health Survey (CCHS). The CCHS consists of two separate components: a regional survey in the first year and a provincial survey in the second year. The main purpose of the regional survey, for which collection took place between September 2000 and October 2001, was to produce cross-sectional estimates for 136 health regions in Canada, based on a sample of more than 134,000 respondents. This article focuses on the various measures taken at the time of data collection to ensure a high level of quality for this large-scale survey.

    Release date: 2002-09-12

  • Articles and reports: 81-003-X19990045145
    Description:

    This paper examines the characteristics of young people who responded to the 1991 School Leavers Survey (SLS), but who subsequently failed to respond to the 1995 School Leavers Follow-up Survey (SLF).

    Release date: 2000-09-01

  • Technical products: 11-522-X19980015030
    Description:

    Two-phase sampling designs have been conducted in waves to estimate the incidence of a rare disease such as dementia. Estimation of disease incidence from longitudinal dementia study has to appropriately adjust for data missing by death as well as the sampling design used at each study wave. In this paper we adopt a selection model approach to model the missing data by death and use a likelihood approach to derive incidence estimates. A modified EM algorithm is used to deal with data missing by sampling selection. The non-paramedic jackknife variance estimator is used to derive variance estimates for the model parameters and the incidence estimates. The proposed approaches are applied to data from the Indianapolis-Ibadan Dementia Study.

    Release date: 1999-10-22

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (20)

Analysis (20) (20 of 20 results)

  • Articles and reports: 12-001-X201600214662
    Description:

    Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201500214229
    Description:

    Self-weighting estimation through equal probability selection methods (epsem) is desirable for variance efficiency. Traditionally, the epsem property for (one phase) two stage designs for estimating population-level parameters is realized by using each primary sampling unit (PSU) population count as the measure of size for PSU selection along with equal sample size allocation per PSU under simple random sampling (SRS) of elementary units. However, when self-weighting estimates are desired for parameters corresponding to multiple domains under a pre-specified sample allocation to domains, Folsom, Potter and Williams (1987) showed that a composite measure of size can be used to select PSUs to obtain epsem designs when besides domain-level PSU counts (i.e., distribution of domain population over PSUs), frame-level domain identifiers for elementary units are also assumed to be available. The term depsem-A will be used to denote such (one phase) two stage designs to obtain domain-level epsem estimation. Folsom et al. also considered two phase two stage designs when domain-level PSU counts are unknown, but whole PSU counts are known. For these designs (to be termed depsem-B) with PSUs selected proportional to the usual size measure (i.e., the total PSU count) at the first stage, all elementary units within each selected PSU are first screened for classification into domains in the first phase of data collection before SRS selection at the second stage. Domain-stratified samples are then selected within PSUs with suitably chosen domain sampling rates such that the desired domain sample sizes are achieved and the resulting design is self-weighting. In this paper, we first present a simple justification of composite measures of size for the depsem-A design and of the domain sampling rates for the depsem-B design. Then, for depsem-A and -B designs, we propose generalizations, first to cases where frame-level domain identifiers for elementary units are not available and domain-level PSU counts are only approximately known from alternative sources, and second to cases where PSU size measures are pre-specified based on other practical and desirable considerations of over- and under-sampling of certain domains. We also present a further generalization in the presence of subsampling of elementary units and nonresponse within selected PSUs at the first phase before selecting phase two elementary units from domains within each selected PSU. This final generalization of depsem-B is illustrated for an area sample of housing units.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201400214119
    Description:

    When considering sample stratification by several variables, we often face the case where the expected number of sample units to be selected in each stratum is very small and the total number of units to be selected is smaller than the total number of strata. These stratified sample designs are specifically represented by the tabular arrays with real numbers, called controlled selection problems, and are beyond the reach of conventional methods of allocation. Many algorithms for solving these problems have been studied over about 60 years beginning with Goodman and Kish (1950). Those developed more recently are especially computer intensive and always find the solutions. However, there still remains the unanswered question: In what sense are the solutions to a controlled selection problem obtained from those algorithms optimal? We introduce the general concept of optimal solutions, and propose a new controlled selection algorithm based on typical distance functions to achieve solutions. This algorithm can be easily performed by a new SAS-based software. This study focuses on two-way stratification designs. The controlled selection solutions from the new algorithm are compared with those from existing algorithms using several examples. The new algorithm successfully obtains robust solutions to two-way controlled selection problems that meet the optimality criteria.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201200111682
    Description:

    Sample allocation issues are studied in the context of estimating sub-population (stratum or domain) means as well as the aggregate population mean under stratified simple random sampling. A non-linear programming method is used to obtain "optimal" sample allocation to strata that minimizes the total sample size subject to specified tolerances on the coefficient of variation of the estimators of strata means and the population mean. The resulting total sample size is then used to determine sample allocations for the methods of Costa, Satorra and Ventura (2004) based on compromise allocation and Longford (2006) based on specified "inferential priorities". In addition, we study sample allocation to strata when reliability requirements for domains, cutting across strata, are also specified. Performance of the three methods is studied using data from Statistics Canada's Monthly Retail Trade Survey (MRTS) of single establishments.

    Release date: 2012-06-27

  • Articles and reports: 12-001-X201000211385
    Description:

    In this short note, we show that simple random sampling without replacement and Bernoulli sampling have approximately the same entropy when the population size is large. An empirical example is given as an illustration.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X200700210498
    Description:

    In this paper we describe a methodology for combining a convenience sample with a probability sample in order to produce an estimator with a smaller mean squared error (MSE) than estimators based on only the probability sample. We then explore the properties of the resulting composite estimator, a linear combination of the convenience and probability sample estimators with weights that are a function of bias. We discuss the estimator's properties in the context of web-based convenience sampling. Our analysis demonstrates that the use of a convenience sample to supplement a probability sample for improvements in the MSE of estimation may be practical only under limited circumstances. First, the remaining bias of the estimator based on the convenience sample must be quite small, equivalent to no more than 0.1 of the outcome's population standard deviation. For a dichotomous outcome, this implies a bias of no more than five percentage points at 50 percent prevalence and no more than three percentage points at 10 percent prevalence. Second, the probability sample should contain at least 1,000-10,000 observations for adequate estimation of the bias of the convenience sample estimator. Third, it must be inexpensive and feasible to collect at least thousands (and probably tens of thousands) of web-based convenience observations. The conclusions about the limited usefulness of convenience samples with estimator bias of more than 0.1 standard deviations also apply to direct use of estimators based on that sample.

    Release date: 2008-01-03

  • Articles and reports: 12-001-X20040027749
    Description:

    A simple and practicable algorithm for constructing stratum boundaries in such a way that the coefficients of variation are equal in each stratum is derived for positively skewed populations. The new algorithm is shown to compare favourably with the cumulative root frequency method (Dalenius and Hodges 1957) and the Lavallée and Hidiroglou (1988) approximation method for estimating the optimum stratum boundaries.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20040027756
    Description:

    It is usually discovered in the data collection phase of a survey that some units in the sample are ineligible even if the frame information has indicated otherwise. For example, in many business surveys a nonnegligible proportion of the sampled units will have ceased trading since the latest update of the frame. This information may be fed back to the frame and used in subsequent surveys, thereby making forthcoming samples more efficient by avoiding sampling ineligible units. On the first of two survey occasions, we assume that all ineligible units in the sample (or set of samples) are detected and excluded from the frame. On the second occasion, a subsample of the eligible part is observed again. The subsample may be augmented with a fresh sample that will contain both eligible and ineligible units. We investigate what effect on survey estimation the process of feeding back information on ineligibility may have, and derive an expression for the bias that can occur as a result of feeding back. The focus is on estimation of the total using the common expansion estimator. An estimator that is nearly unbiased in the presence of feed back is obtained. This estimator relies on consistent estimates of the number of eligible and ineligible units in the population being available.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20030026782
    Description:

    This paper discusses both the general question of designing a post-enumeration survey, and how these general questions were addressed in the U.S. Census Bureau's coverage measurement planned as part of Census 2000. It relates the basic concepts of the Dual System Estimator to questions of the definition and measurement of correct enumerations, the measurement of census omissions, operational independence, reporting of residence, and the role of after-matching reinterview. It discusses estimation issues such as the treatment of movers, missing data, and synthetic estimation of local corrected population size. It also discusses where the design failed in Census 2000.

    Release date: 2004-01-27

  • Articles and reports: 12-001-X20030016610
    Description:

    In the presence of item nonreponse, unweighted imputation methods are often used in practice but they generally lead to biased estimators under uniform response within imputation classes. Following Skinner and Rao (2002), we propose a bias-adjusted estimator of a population mean under unweighted ratio imputation and random hot-deck imputation and derive linearization variance estimators. A small simulation study is conducted to study the performance of the methods in terms of bias and mean square error. Relative bias and relative stability of the variance estimators are also studied.

    Release date: 2003-07-31

  • Articles and reports: 81-003-X19990045145
    Description:

    This paper examines the characteristics of young people who responded to the 1991 School Leavers Survey (SLS), but who subsequently failed to respond to the 1995 School Leavers Follow-up Survey (SLF).

    Release date: 2000-09-01

  • Articles and reports: 12-001-X19980013905
    Description:

    Two-phase sampling designs offer a variety of possibilities for use of auxiliary information. We begin by reviewing the different forms that auxiliary information may take in two-phase surveys. We then set up the procedure by which this information is transformed into calibrated weights, which we use to construct efficient estimators of a population total. The calibration is done in two steps: (i) at the population level; (ii) at the level of the first-phase sample. We go on to show that the resulting calibration estimators are also derivable via regression fitting in two steps. We examine these estimators for a special case of interest, namely, when auxiliary information is available for population subgroups called calibration groups. Postrata are the simplest example of such groups. Estimation for domains of interest and variance estimation are also discussed. These results are illustrated by applying them to two-phase designs at Statistics Canada. The general theory for using auxiliary information in two-phase sampling is being incorporated into Statistics Canada's Generalized Estimation System.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013904
    Description:

    Many economic and agricultural surveys are multi-purpose. It would be convenient if one could stratify the target population of such a survey in a number of different purposes and then combine the samples for enumeration. We explore four different sampling methods that select similar samples across all stratifications thereby reducing the overall sample size. Data from an agriculture survey is used to evaluate the effectiveness of these alternative sampling strategies. We then show how a calibration (i.e., reweighted) estimator can increase statistical efficiency by capturing what is known about the original stratum sizes in the estimation. Raking, which has been suggested in the literature for this purpose, is simply one method of calibration.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19970023618
    Description:

    Statistical agencies often constitute their business panels by Poisson sampling, or by stratified sampling of fixed size and uniform probabilities in each stratum. This stampling corresponds to algorithms which use permanent numbers following a uniform distribution. Since the characteristics of the units change over time, it is necessary to periodically conduct resamplings while endeavouring to conserve the maximum number of units. The solution by Poisson sampling is the simplest and provides the maximum theoretical coverage, but with the disadvantage of a random sample size. On the other hand, in the case of stratified sampling of fixed size, the changes in strata cause difficulties precisely because of these fixed size constraints. An initial difficulty is that the finer the stratification, the more the coverage is decreased. Indeed, this is likely to occur if births constitute separate strata. We show how this effect can be corrected by rendering the numbers equidistant before resampling. The disadvantage, a fairly minor one, is that in each stratum the sampling is no longer a simple random sampling, which makes the estimation of the variance less rigorous. Another difficulty is reconciling the resampling with an eventual rotation of the units in the sample. We present a type of algorithm which extends after resampling the rotation before resampling. It is based on transformations of the random numbers used for the sampling, so as to return to resampling without rotation. These transformations are particularly simple when they involve equidistant numbers, but can also be carried out with the numbers following a uniform distribution.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970013101
    Description:

    In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X199100214502
    Description:

    A sample design for the initial selection, sample rotation and updating for sub-annual business surveys is proposed. The sample design is a stratified clustered design, with the stratification being carried out on the basis of industry, geography and size. Sample rotation of the sample units is carried out under time-in and time-out constraints. Updating is with respect to the selection of births (new businesses), removal of deaths (defunct businesses) and implementation of changes in the classification variables used for stratification, i.e. industry, geography and size. A number of alternate estimators, including the simple expansion estimator and Mickey’s (1959) unbiased ratio-type estimator have been evaluated for this design in an empirical study under various survey conditions. The problem of variance estimation has also been considered using the Taylor linearization method and the jackknife technique.

    Release date: 1991-12-16

  • Articles and reports: 12-001-X199000214536
    Description:

    We discuss frame and sample maintenance issues that arise in recurring surveys. A new system is described that meets four objectives. Through time, it maintains (1) the geographical balance of a sample; (2) the sample size; (3) the unbiased character of estimators; and (4) the lack of distortion in estimated trends. The system is based upon the Peano key, which creates a fractal, space-filling curve. An example of the new system is presented using a national survey of establishments in the United States conducted by the A.C. Nielsen Company.

    Release date: 1990-12-14

  • Articles and reports: 12-001-X198800114602
    Description:

    For a given level of precision, Hidiroglou (1986) provided an algorithm for dividing the population into a take-all stratum and a take-some stratum so as to minimize the overall sample size assuming simple random sampling without replacement in the take-some stratum. Sethi (1963) provided an algorithm for optimum stratification of the population into a number of take-some strata. For the stratification of a highly skewed population, this article presents an iterative algorithm which has as objective the determination of stratification boundaries which split the population into a take-all stratum and a number of take-some strata. These boundaries are computed so as to minimize the resulting sample size given a level of relative precision, simple random sampling without replacement from the take-some strata and use of a power allocation among the take-some strata. The resulting algorithm is a combination of the procedures of Hidiroglou (1986) and Sethi (1963).

    Release date: 1988-06-15

  • Articles and reports: 12-001-X198700114465
    Description:

    The two-stage rejection rule telephone sample design described by Waksberg (1978) is modified to improve the efficiency of telephone surveys of the U.S. Black population. Experimental tests of sample design alternatives demonstrate that: a) use of rough stratification based on telephone exchange names and states; b) use of large cluster definitions (200 and 400 consecutive numbers) at the first stage; and c) rejection rules based on racial status of the household combine to offer improvements in the relative precision of a sample, given fixed resources. Cost and error models are examined to simulate design alternatives.

    Release date: 1987-06-15

  • Articles and reports: 12-001-X198600214450
    Description:

    From an annual sample of U.S. corporate tax returns, the U.S. Internal Revenue Service provides estimates of population and subpopulation totals for several hundred financial items. The basic sample design is highly stratified and fairly complex. Starting with the 1981 and 1982 samples, the design was altered to include a double sampling procedure. This was motivated by the need for better allocation of resources, in an environment of shrinking budgets. Items not observed in the subsample are predicted, using a modified hot deck imputation procedure. The present paper describes the design, estimation, and evaluation of the effects of the new procedure.

    Release date: 1986-12-15

Reference (14)

Reference (14) (14 of 14 results)

  • Technical products: 11-522-X201700014749
    Description:

    As part of the Tourism Statistics Program redesign, Statistics Canada is developing the National Travel Survey (NTS) to collect travel information from Canadian travellers. This new survey will replace the Travel Survey of Residents of Canada and the Canadian resident component of the International Travel Survey. The NTS will take advantage of Statistics Canada’s common sampling frames and common processing tools while maximizing the use of administrative data. This paper discusses the potential uses of administrative data such as Passport Canada files, Canada Border Service Agency files and Canada Revenue Agency files, to increase the efficiency of the NTS sample design.

    Release date: 2016-03-24

  • Technical products: 11-522-X201300014276
    Description:

    In France, budget restrictions are making it more difficult to hire casual interviewers to deal with collection problems. As a result, it has become necessary to adhere to a predetermined annual work quota. For surveys of the National Institute of Statistics and Economic Studies (INSEE), which use a master sample, problems arise when an interviewer is on extended leave throughout the entire collection period of a survey. When that occurs, an area may cease to be covered by the survey, and this effectively generates a bias. In response to this new problem, we have implemented two methods, depending on when the problem is identified: If an area is ‘abandoned’ before or at the very beginning of collection, we carry out a ‘sub-allocation’ procedure. The procedure involves interviewing a minimum number of households in each collection area at the expense of other areas in which no collection problems have been identified. The idea is to minimize the dispersion of weights while meeting collection targets. If an area is ‘abandoned’ during collection, we prioritize the remaining surveys. Prioritization is based on a representativeness indicator (R indicator) that measures the degree of similarity between a sample and the base population. The goal of this prioritization process during collection is to get as close as possible to equal response probability for respondents. The R indicator is based on the dispersion of the estimated response probabilities of the sampled households, and it is composed of partial R indicators that measure representativeness variable by variable. These R indicators are tools that we can use to analyze collection by isolating underrepresented population groups. We can increase collection efforts for groups that have been identified beforehand. In the oral presentation, we covered these two points concisely. By contrast, this paper deals exclusively with the first point: sub-allocation. Prioritization is being implemented for the first time at INSEE for the assets survey, and it will be covered in a specific paper by A. Rebecq.

    Release date: 2014-10-31

  • Technical products: 11-522-X200800011008
    Description:

    In one sense, a questionnaire is never complete. Test results, paradata and research findings constantly provide reasons to update and improve the questionnaire. In addition, establishments change over time and questions need to be updated accordingly. In reality, it doesn't always work like this. At Statistics Sweden there are several examples of questionnaires that were designed at one point in time and rarely improved later on. However, we are currently trying to shift the perspective on questionnaire design from a linear to a cyclic one. We are developing a cyclic model in which the questionnaire can be improved continuously in multiple rounds. In this presentation, we will discuss this model and how we work with it.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010940
    Description:

    Data Collection Methodology (DCM) enable the collection of good quality data by providing expert advice and assistance on questionnaire design, methods of evaluation and respondent engagement. DCM assist in the development of client skills, undertake research and lead innovation in data collection methods. This is done in a challenging environment of organisational change and limited resources. This paper will cover 'how DCM do business' with clients and the wider methodological community to achieve our goals.

    Release date: 2009-12-03

  • Technical products: 11-522-X20050019444
    Description:

    There are several ways to improve data quality. One of them is to re-design and test questionnaires for ongoing surveys. The benefits of questionnaire re-design and testing include improving the accuracy by ensuring the questions collect the required data, as well as decreased response burden.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019493
    Description:

    This article introduces the General Statistics Office in Hanoi, Vietnam, and gives a description of socio-economic surveys conducted since the early nineties in Vietnam, with a discussion of their methods and achievements, as well as remaining challenges.

    Release date: 2007-03-02

  • Technical products: 11-522-X20030017726
    Description:

    This paper addresses issues of how to use auxiliary information efficiently in sampling from finite populations.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017718
    Description:

    The paper describes the approach, models, issues, problems, lessons learned and experience gained from bringing together and making widely available statistics down to the small area level. It will also outline the work done on psychiatric morbidity and crime statistics and implications for future research.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017699
    Description:

    This paper illustrates the link between the strategic needs of a national statistical office (NSO) and the methodological needs that this generates.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017600
    Description:

    This paper extends the Sen-Yates-Grundy (SYG) variance estimators two-phase sampling designs with stratification at the second phase or both phases. It also develops SYG-type variance estimators of the two-phase regression estimators that make use of the first phase auxiliary data.

    Release date: 2005-01-26

  • Technical products: 11-522-X20020016721
    Description:

    This paper examines the simulation study that was conducted to assess the sampling scheme designed for the World Health Organization (WHO) Injection Safety Assessment Survey. The objective of this assessment survey is to determine whether facilities in which injections are given meet the necessary safety requirements for injection administration, equipment, supplies and waste disposal. The main parameter of interest is the proportion of health care facilities in a country that have safe injection practices.

    The objective of this simulation study was to assess the accuracy and precision of the proposed sampling design. To this end, two artificial populations were created based on the two African countries of Niger and Burkina Faso, in which the pilot survey was tested. To create a wide variety of hypothetical populations, the assignment of whether a health care facility was safe or not was based on the different combinations of the population proportion of safe health care facilities in the country, the homogeneity of the districts in the country with respect to injection safety, and whether the health care facility was located in an urban or rural district.

    Using the results of the simulation, a multi-factor analysis of variance was used to determine which factors affect the outcome measures of absolute bias, standard error and mean-squared error.

    Release date: 2004-09-13

  • Technical products: 11-522-X20010016247
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    This paper describes joint research by the Office for National Statistics (ONS) and Southampton University regarding the evaluation of several different approaches to the local estimation of International Labour Office (ILO) unemployment. The need to compare estimators with different underlying assumptions has led to a focus on evaluation methods that are (partly at least) model-independent. Model-fit diagnostics that have been considered include: various residual procedures, cross-validation, predictive validation, consistency with marginals, and consistency with direct estimates within single cells. These diagnostics have been used to compare different model-based estimators with each other and with direct estimators.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016258
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    To fill statistical gaps in the areas of health determinants, health status and health system usage by the Canadian population at the health region levels (sub-provincial areas or regions of interest to health authorities), Statistics Canada established a new survey called the Canadian Community Health Survey (CCHS). The CCHS consists of two separate components: a regional survey in the first year and a provincial survey in the second year. The main purpose of the regional survey, for which collection took place between September 2000 and October 2001, was to produce cross-sectional estimates for 136 health regions in Canada, based on a sample of more than 134,000 respondents. This article focuses on the various measures taken at the time of data collection to ensure a high level of quality for this large-scale survey.

    Release date: 2002-09-12

  • Technical products: 11-522-X19980015030
    Description:

    Two-phase sampling designs have been conducted in waves to estimate the incidence of a rare disease such as dementia. Estimation of disease incidence from longitudinal dementia study has to appropriately adjust for data missing by death as well as the sampling design used at each study wave. In this paper we adopt a selection model approach to model the missing data by death and use a likelihood approach to derive incidence estimates. A modified EM algorithm is used to deal with data missing by sampling selection. The non-paramedic jackknife variance estimator is used to derive variance estimates for the model parameters and the incidence estimates. The proposed approaches are applied to data from the Indianapolis-Ibadan Dementia Study.

    Release date: 1999-10-22

Date modified: