Survey design

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

1 facets displayed. 0 facets selected.

Survey or statistical program

1 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (20)

All (20) (0 to 10 of 20 results)

  • Articles and reports: 11-522-X202200100010
    Description: Growing Up in Québec is a longitudinal population survey that began in the spring of 2021 at the Institut de la statistique du Québec. Among the children targeted by this longitudinal follow-up, some will experience developmental difficulties at some point in their lives. Those same children often have characteristics associated with higher sample attrition (low-income family, parents with a low level of education). This article describes the two main challenges we encountered when trying to ensure sufficient representativeness of these children, in both the overall results and the subpopulation analyses.
    Release date: 2024-03-25

  • Articles and reports: 12-001-X201900300004
    Description:

    Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900200007
    Description:

    When fitting an ordered categorical variable with L > 2 levels to a set of covariates onto complex survey data, it is common to assume that the elements of the population fit a simple cumulative logistic regression model (proportional-odds logistic-regression model). This means the probability that the categorical variable is at or below some level is a binary logistic function of the model covariates. Moreover, except for the intercept, the values of the logistic-regression parameters are the same at each level. The conventional “design-based” method used for fitting the proportional-odds model is based on pseudo-maximum likelihood. We compare estimates computed using pseudo-maximum likelihood with those computed by assuming an alternative design-sensitive robust model-based framework. We show with a simple numerical example how estimates using the two approaches can differ. The alternative approach is easily extended to fit a general cumulative logistic model, in which the parallel-lines assumption can fail. A test of that assumption easily follows.

    Release date: 2019-06-27

  • Articles and reports: 11-522-X201300014276
    Description:

    In France, budget restrictions are making it more difficult to hire casual interviewers to deal with collection problems. As a result, it has become necessary to adhere to a predetermined annual work quota. For surveys of the National Institute of Statistics and Economic Studies (INSEE), which use a master sample, problems arise when an interviewer is on extended leave throughout the entire collection period of a survey. When that occurs, an area may cease to be covered by the survey, and this effectively generates a bias. In response to this new problem, we have implemented two methods, depending on when the problem is identified: If an area is ‘abandoned’ before or at the very beginning of collection, we carry out a ‘sub-allocation’ procedure. The procedure involves interviewing a minimum number of households in each collection area at the expense of other areas in which no collection problems have been identified. The idea is to minimize the dispersion of weights while meeting collection targets. If an area is ‘abandoned’ during collection, we prioritize the remaining surveys. Prioritization is based on a representativeness indicator (R indicator) that measures the degree of similarity between a sample and the base population. The goal of this prioritization process during collection is to get as close as possible to equal response probability for respondents. The R indicator is based on the dispersion of the estimated response probabilities of the sampled households, and it is composed of partial R indicators that measure representativeness variable by variable. These R indicators are tools that we can use to analyze collection by isolating underrepresented population groups. We can increase collection efforts for groups that have been identified beforehand. In the oral presentation, we covered these two points concisely. By contrast, this paper deals exclusively with the first point: sub-allocation. Prioritization is being implemented for the first time at INSEE for the assets survey, and it will be covered in a specific paper by A. Rebecq.

    Release date: 2014-10-31

  • Articles and reports: 12-001-X201100211606
    Description:

    This paper introduces a U.S. Census Bureau special compilation by presenting four other papers of the current issue: three papers from authors Tillé, Lohr and Thompson as well as a discussion paper from Opsomer.

    Release date: 2011-12-21

  • Articles and reports: 12-001-X200800210763
    Description:

    The present work illustrates a sampling strategy useful for obtaining planned sample size for domains belonging to different partitions of the population and in order to guarantee the sampling errors of domain estimates be lower than given thresholds. The sampling strategy that covers the multivariate multi-domain case is useful when the overall sample size is bounded and consequently the standard solution of using a stratified sample with the strata given by cross-classification of variables defining the different partitions is not feasible since the number of strata is larger than the overall sample size. The proposed sampling strategy is based on the use of balanced sampling selection technique and on a GREG-type estimation. The main advantages of the solution is the computational feasibility which allows one to easily implement an overall small area strategy considering jointly the sampling design and the estimator and improving the efficiency of the direct domain estimators. An empirical simulation on real population data and different domain estimators shows the empirical properties of the examined sample strategy.

    Release date: 2008-12-23

  • Articles and reports: 11-522-X200600110441
    Description:

    How does one efficiently estimate sample size while building concensus among multiple investigators for multi-purpose projects? We present a template using common spreadsheet software to provide estimates of power, precision, and financial costs under varying sampling scenarios, as used in development of the Ontario Tobacco Survey. In addition to cost estimates, complex sample size formulae were nested within a spreadsheet to determine power and precision, incorporating user-defined design effects and loss-to-followup. Common spreadsheet software can be used in conjunction with complex formulae to enhance knowledge exchange between the methodologists and stakeholders; in effect demystifying the "sample size black box".

    Release date: 2008-03-17

  • Articles and reports: 11-522-X200600110444
    Geography: Province or territory
    Description:

    General population health surveys often include small samples of smokers. Few longitudinal studies specific to smoking have been carried out. We discuss development of the Ontario Tobacco Survey (OTS) which combines a rolling longitudinal, and repeated cross-sectional components. The OTS began in July 2005 using random selection and data-collection by telephones. Every 6 months, new samples of smokers and non-smokers provide data on smoking behaviours and attitudes. Smokers enter a panel study and are followed for changes in smoking influences and behaviour. The design is proving to be cost effective in meeting sample requirements for multiple research objectives.

    Release date: 2008-03-17

  • Surveys and statistical programs – Documentation: 16-001-M2007004
    Description:

    Statistics Canada administers a number of environmental surveys that fill important data gaps but also pose numerous challenges to administer. This paper focuses on two on-going environment surveys - one newly initiated and one in the process of a redesign.

    Release date: 2007-11-23

  • Articles and reports: 12-001-X20060029553
    Description:

    Félix-Medina and Thompson (2004) proposed a variant of Link-tracing sampling in which it is assumed that a portion of the population, not necessarily the major portion, is covered by a frame of disjoint sites where members of the population can be found with high probabilities. A sample of sites is selected and the people in each of the selected sites are asked to nominate other members of the population. They proposed maximum likelihood estimators of the population sizes which perform acceptably provided that for each site the probability that a member is nominated by that site, called the nomination probability, is not small. In this research we consider Félix-Medina and Thompson's variant and propose three sets of estimators of the population sizes derived under the Bayesian approach. Two of the sets of estimators were obtained using improper prior distributions of the population sizes, and the other using Poisson prior distributions. However, we use the Bayesian approach only to assist us in the construction of estimators, while inferences about the population sizes are made under the frequentist approach. We propose two types of partly design-based variance estimators and confidence intervals. One of them is obtained using a bootstrap and the other using the delta method along with the assumption of asymptotic normality. The results of a simulation study indicate that (i) when the nomination probabilities are not small each of the proposed sets of estimators performs well and very similarly to maximum likelihood estimators; (ii) when the nomination probabilities are small the set of estimators derived using Poisson prior distributions still performs acceptably and does not have the problems of bias that maximum likelihood estimators have, and (iii) the previous results do not depend on the size of the fraction of the population covered by the frame.

    Release date: 2006-12-21
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (16)

Analysis (16) (0 to 10 of 16 results)

  • Articles and reports: 11-522-X202200100010
    Description: Growing Up in Québec is a longitudinal population survey that began in the spring of 2021 at the Institut de la statistique du Québec. Among the children targeted by this longitudinal follow-up, some will experience developmental difficulties at some point in their lives. Those same children often have characteristics associated with higher sample attrition (low-income family, parents with a low level of education). This article describes the two main challenges we encountered when trying to ensure sufficient representativeness of these children, in both the overall results and the subpopulation analyses.
    Release date: 2024-03-25

  • Articles and reports: 12-001-X201900300004
    Description:

    Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900200007
    Description:

    When fitting an ordered categorical variable with L > 2 levels to a set of covariates onto complex survey data, it is common to assume that the elements of the population fit a simple cumulative logistic regression model (proportional-odds logistic-regression model). This means the probability that the categorical variable is at or below some level is a binary logistic function of the model covariates. Moreover, except for the intercept, the values of the logistic-regression parameters are the same at each level. The conventional “design-based” method used for fitting the proportional-odds model is based on pseudo-maximum likelihood. We compare estimates computed using pseudo-maximum likelihood with those computed by assuming an alternative design-sensitive robust model-based framework. We show with a simple numerical example how estimates using the two approaches can differ. The alternative approach is easily extended to fit a general cumulative logistic model, in which the parallel-lines assumption can fail. A test of that assumption easily follows.

    Release date: 2019-06-27

  • Articles and reports: 11-522-X201300014276
    Description:

    In France, budget restrictions are making it more difficult to hire casual interviewers to deal with collection problems. As a result, it has become necessary to adhere to a predetermined annual work quota. For surveys of the National Institute of Statistics and Economic Studies (INSEE), which use a master sample, problems arise when an interviewer is on extended leave throughout the entire collection period of a survey. When that occurs, an area may cease to be covered by the survey, and this effectively generates a bias. In response to this new problem, we have implemented two methods, depending on when the problem is identified: If an area is ‘abandoned’ before or at the very beginning of collection, we carry out a ‘sub-allocation’ procedure. The procedure involves interviewing a minimum number of households in each collection area at the expense of other areas in which no collection problems have been identified. The idea is to minimize the dispersion of weights while meeting collection targets. If an area is ‘abandoned’ during collection, we prioritize the remaining surveys. Prioritization is based on a representativeness indicator (R indicator) that measures the degree of similarity between a sample and the base population. The goal of this prioritization process during collection is to get as close as possible to equal response probability for respondents. The R indicator is based on the dispersion of the estimated response probabilities of the sampled households, and it is composed of partial R indicators that measure representativeness variable by variable. These R indicators are tools that we can use to analyze collection by isolating underrepresented population groups. We can increase collection efforts for groups that have been identified beforehand. In the oral presentation, we covered these two points concisely. By contrast, this paper deals exclusively with the first point: sub-allocation. Prioritization is being implemented for the first time at INSEE for the assets survey, and it will be covered in a specific paper by A. Rebecq.

    Release date: 2014-10-31

  • Articles and reports: 12-001-X201100211606
    Description:

    This paper introduces a U.S. Census Bureau special compilation by presenting four other papers of the current issue: three papers from authors Tillé, Lohr and Thompson as well as a discussion paper from Opsomer.

    Release date: 2011-12-21

  • Articles and reports: 12-001-X200800210763
    Description:

    The present work illustrates a sampling strategy useful for obtaining planned sample size for domains belonging to different partitions of the population and in order to guarantee the sampling errors of domain estimates be lower than given thresholds. The sampling strategy that covers the multivariate multi-domain case is useful when the overall sample size is bounded and consequently the standard solution of using a stratified sample with the strata given by cross-classification of variables defining the different partitions is not feasible since the number of strata is larger than the overall sample size. The proposed sampling strategy is based on the use of balanced sampling selection technique and on a GREG-type estimation. The main advantages of the solution is the computational feasibility which allows one to easily implement an overall small area strategy considering jointly the sampling design and the estimator and improving the efficiency of the direct domain estimators. An empirical simulation on real population data and different domain estimators shows the empirical properties of the examined sample strategy.

    Release date: 2008-12-23

  • Articles and reports: 11-522-X200600110441
    Description:

    How does one efficiently estimate sample size while building concensus among multiple investigators for multi-purpose projects? We present a template using common spreadsheet software to provide estimates of power, precision, and financial costs under varying sampling scenarios, as used in development of the Ontario Tobacco Survey. In addition to cost estimates, complex sample size formulae were nested within a spreadsheet to determine power and precision, incorporating user-defined design effects and loss-to-followup. Common spreadsheet software can be used in conjunction with complex formulae to enhance knowledge exchange between the methodologists and stakeholders; in effect demystifying the "sample size black box".

    Release date: 2008-03-17

  • Articles and reports: 11-522-X200600110444
    Geography: Province or territory
    Description:

    General population health surveys often include small samples of smokers. Few longitudinal studies specific to smoking have been carried out. We discuss development of the Ontario Tobacco Survey (OTS) which combines a rolling longitudinal, and repeated cross-sectional components. The OTS began in July 2005 using random selection and data-collection by telephones. Every 6 months, new samples of smokers and non-smokers provide data on smoking behaviours and attitudes. Smokers enter a panel study and are followed for changes in smoking influences and behaviour. The design is proving to be cost effective in meeting sample requirements for multiple research objectives.

    Release date: 2008-03-17

  • Articles and reports: 12-001-X20060029553
    Description:

    Félix-Medina and Thompson (2004) proposed a variant of Link-tracing sampling in which it is assumed that a portion of the population, not necessarily the major portion, is covered by a frame of disjoint sites where members of the population can be found with high probabilities. A sample of sites is selected and the people in each of the selected sites are asked to nominate other members of the population. They proposed maximum likelihood estimators of the population sizes which perform acceptably provided that for each site the probability that a member is nominated by that site, called the nomination probability, is not small. In this research we consider Félix-Medina and Thompson's variant and propose three sets of estimators of the population sizes derived under the Bayesian approach. Two of the sets of estimators were obtained using improper prior distributions of the population sizes, and the other using Poisson prior distributions. However, we use the Bayesian approach only to assist us in the construction of estimators, while inferences about the population sizes are made under the frequentist approach. We propose two types of partly design-based variance estimators and confidence intervals. One of them is obtained using a bootstrap and the other using the delta method along with the assumption of asymptotic normality. The results of a simulation study indicate that (i) when the nomination probabilities are not small each of the proposed sets of estimators performs well and very similarly to maximum likelihood estimators; (ii) when the nomination probabilities are small the set of estimators derived using Poisson prior distributions still performs acceptably and does not have the problems of bias that maximum likelihood estimators have, and (iii) the previous results do not depend on the size of the fraction of the population covered by the frame.

    Release date: 2006-12-21

  • Articles and reports: 11-522-X20010016228
    Description:

    The Current Population Survey is the primary source of labour force data for the United States. Throughout any survey process, it is critical that data quality be ensured. This paper discusses how quality issues are addressed during all steps of the survey process, including the development of the sample frame, sampling operations, sample control, data collection, editing, imputation, estimation, questionnaire development. It also reviews the quality evaluations that are built into the survey process. The paper concludes with a discussion of current research and possible future improvements to the survey.

    Release date: 2002-09-12
Reference (4)

Reference (4) ((4 results))

  • Surveys and statistical programs – Documentation: 16-001-M2007004
    Description:

    Statistics Canada administers a number of environmental surveys that fill important data gaps but also pose numerous challenges to administer. This paper focuses on two on-going environment surveys - one newly initiated and one in the process of a redesign.

    Release date: 2007-11-23

  • Surveys and statistical programs – Documentation: 75F0002M2004006
    Description:

    This document presents information about the entry-exit portion of the annual labour and the income interviews of the Survey of Labour and Income Dynamics (SLID).

    Release date: 2004-06-21

  • Surveys and statistical programs – Documentation: 11-522-X20010016225
    Description:

    The European Union Labour Forces Survey (LFS) is based on national surveys that were originally very different. For the past decade, under pressure from increasingly demanding users (particularly with respect to timeliness, comparability and flexibility), the LFS has been subjected to a constant process of quality improvement.

    The following topics are presented in this paper:A. the quality improvement process, which comprises screening national survey methods, target structure, legal foundations, quality reports, more accurate and more explicit definitions of components, etc.;B. expected or achieved results, which include an ongoing survey producing quarterly results within reasonable time frames, comparable employment and unemployment rates over time and space in more than 25 countries, specific information on current political topics, etc.;C. continuing shortcomings, such as implementation delays in certain countries, possibilities of longitudinal analysis, public access to microdata, etc.; D. future tasks envisioned, such as adaptation of the list of ISCO and ISCED variables and nomenclatures (to take into account evolution in employment and teaching methods), differential treatment of structural variables and increased recourse to administrative files (to limit respondent burden), harmonization of questionnaires, etc.

    Release date: 2002-09-12

  • Surveys and statistical programs – Documentation: 11-522-X19980015035
    Description:

    In a longitudinal survey conducted for k periods some units may be observed for less than k of the periods. Examples include, surveys designed with partially overlapping subsamples, a pure panel survey with nonresponse, and a panel survey supplemented with additional samples for some of the time periods. Estimators of the regression type are exhibited for such surveys. An application to special studies associated with the National Resources Inventory is discussed.

    Release date: 1999-10-22
Date modified: