Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (57)

All (57) (25 of 57 results)

  • Articles and reports: 12-001-X201700254871
    Description:

    In this paper the question is addressed how alternative data sources, such as administrative and social media data, can be used in the production of official statistics. Since most surveys at national statistical institutes are conducted repeatedly over time, a multivariate structural time series modelling approach is proposed to model the series observed by a repeated surveys with related series obtained from such alternative data sources. Generally, this improves the precision of the direct survey estimates by using sample information observed in preceding periods and information from related auxiliary series. This model also makes it possible to utilize the higher frequency of the social media to produce more precise estimates for the sample survey in real time at the moment that statistics for the social media become available but the sample data are not yet available. The concept of cointegration is applied to address the question to which extent the alternative series represent the same phenomena as the series observed with the repeated survey. The methodology is applied to the Dutch Consumer Confidence Survey and a sentiment index derived from social media.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700114819
    Description:

    Structural time series models are a powerful technique for variance reduction in the framework of small area estimation (SAE) based on repeatedly conducted surveys. Statistics Netherlands implemented a structural time series model to produce monthly figures about the labour force with the Dutch Labour Force Survey (DLFS). Such models, however, contain unknown hyperparameters that have to be estimated before the Kalman filter can be launched to estimate state variables of the model. This paper describes a simulation aimed at studying the properties of hyperparameter estimators in the model. Simulating distributions of the hyperparameter estimators under different model specifications complements standard model diagnostics for state space models. Uncertainty around the model hyperparameters is another major issue. To account for hyperparameter uncertainty in the mean squared errors (MSE) estimates of the DLFS, several estimation approaches known in the literature are considered in a simulation. Apart from the MSE bias comparison, this paper also provides insight into the variances and MSEs of the MSE estimators considered.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201700114820
    Description:

    Measurement errors can induce bias in the estimation of transitions, leading to erroneous conclusions about labour market dynamics. Traditional literature on gross flows estimation is based on the assumption that measurement errors are uncorrelated over time. This assumption is not realistic in many contexts, because of survey design and data collection strategies. In this work, we use a model-based approach to correct observed gross flows from classification errors with latent class Markov models. We refer to data collected with the Italian Continuous Labour Force Survey, which is cross-sectional, quarterly, with a 2-2-2 rotating design. The questionnaire allows us to use multiple indicators of labour force conditions for each quarter: two collected in the first interview, and a third collected one year later. Our approach provides a method to estimate labour market mobility, taking into account correlated errors and the rotating design of the survey. The best-fitting model is a mixed latent class Markov model with covariates affecting latent transitions and correlated errors among indicators; the mixture components are of mover-stayer type. The better fit of the mixture specification is due to more accurately estimated latent transitions.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600114544
    Description:

    In the Netherlands, statistical information about income and wealth is based on two large scale household panels that are completely derived from administrative data. A problem with using households as sampling units in the sample design of panels is the instability of these units over time. Changes in the household composition affect the inclusion probabilities required for design-based and model-assisted inference procedures. Such problems are circumvented in the two aforementioned household panels by sampling persons, who are followed over time. At each period the household members of these sampled persons are included in the sample. This is equivalent to sampling with probabilities proportional to household size where households can be selected more than once but with a maximum equal to the number of household members. In this paper properties of this sample design are described and contrasted with the Generalized Weight Share method for indirect sampling (Lavallée 1995, 2007). Methods are illustrated with an application to the Dutch Regional Income Survey.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201500214236
    Description:

    We propose a model-assisted extension of weighting design-effect measures. We develop a summary-level statistic for different variables of interest, in single-stage sampling and under calibration weight adjustments. Our proposed design effect measure captures the joint effects of a non-epsem sampling design, unequal weights produced using calibration adjustments, and the strength of the association between an analysis variable and the auxiliaries used in calibration. We compare our proposed measure to existing design effect measures in simulations using variables like those collected in establishment surveys and telephone surveys of households.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214231
    Description:

    Rotating panels are widely applied by national statistical institutes, for example, to produce official statistics about the labour force. Estimation procedures are generally based on traditional design-based procedures known from classical sampling theory. A major drawback of this class of estimators is that small sample sizes result in large standard errors and that they are not robust for measurement bias. Two examples showing the effects of measurement bias are rotation group bias in rotating panels, and systematic differences in the outcome of a survey due to a major redesign of the underlying process. In this paper we apply a multivariate structural time series model to the Dutch Labour Force Survey to produce model-based figures about the monthly labour force. The model reduces the standard errors of the estimates by taking advantage of sample information collected in previous periods, accounts for rotation group bias and autocorrelation induced by the rotating panel, and models discontinuities due to a survey redesign. Additionally, we discuss the use of correlated auxiliary series in the model to further improve the accuracy of the model estimates. The method is applied by Statistics Netherlands to produce accurate official monthly statistics about the labour force that are consistent over time, despite a redesign of the survey process.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214248
    Description:

    Unit level population models are often used in model-based small area estimation of totals and means, but the models may not hold for the sample if the sampling design is informative for the model. As a result, standard methods, assuming that the model holds for the sample, can lead to biased estimators. We study alternative methods that use a suitable function of the unit selection probability as an additional auxiliary variable in the sample model. We report the results of a simulation study on the bias and mean squared error (MSE) of the proposed estimators of small area means and on the relative bias of the associated MSE estimators, using informative sampling schemes to generate the samples. Alternative methods, based on modeling the conditional expectation of the design weight as a function of the model covariates and the response, are also included in the simulation study.

    Release date: 2015-12-17

  • Articles and reports: 82-003-X201501114243
    Description:

    A surveillance tool was developed to assess dietary intake collected by surveys in relation to Eating Well with Canada’s Food Guide (CFG). The tool classifies foods in the Canadian Nutrient File (CNF) according to how closely they reflect CFG. This article describes the validation exercise conducted to ensure that CNF foods determined to be “in line with CFG” were appropriately classified.

    Release date: 2015-11-18

  • Articles and reports: 12-001-X201500114151
    Description:

    One of the main variables in the Dutch Labour Force Survey is the variable measuring whether a respondent has a permanent or a temporary job. The aim of our study is to determine the measurement error in this variable by matching the information obtained by the longitudinal part of this survey with unique register data from the Dutch Institute for Employee Insurance. Contrary to previous approaches confronting such datasets, we take into account that also register data are not error-free and that measurement error in these data is likely to be correlated over time. More specifically, we propose the estimation of the measurement error in these two sources using an extended hidden Markov model with two observed indicators for the type of contract. Our results indicate that none of the two sources should be considered as error-free. For both indicators, we find that workers in temporary contracts are often misclassified as having a permanent contract. Particularly for the register data, we find that measurement errors are strongly autocorrelated, as, if made, they tend to repeat themselves. In contrast, when the register is correct, the probability of an error at the next time period is almost zero. Finally, we find that temporary contracts are more widespread than the Labour Force Survey suggests, while transition rates between temporary to permanent contracts are much less common than both datasets suggest.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114162
    Description:

    The operationalization of the Population and Housing Census in Portugal is managed by a hierarchical structure in which Statistics Portugal is at the top and local government institutions at the bottom. When the Census takes place every ten years, local governments are asked to collaborate with Statistics Portugal in the execution and monitoring of the fieldwork operations at the local level. During the Pilot Test stage of the 2011 Census, local governments were asked for additional collaboration: to answer the Perception of Risk survey, whose aim was to gather information to design a quality assurance instrument that could be used to monitor the Census operations. The response rate of the survey was desired to be 100%, however, by the deadline of data collection nearly a quarter of local governments had not responded to the survey and thus a decision was made to make a follow up mailing. In this paper, we examine whether the same conclusions could have been reached from survey without follow ups as with them and evaluate the influence of follow ups on the conception of the quality assurance instrument. Comparison of responses on a set of perception variables revealed that local governments answering previous or after the follow up did not differ. However the configuration of the quality assurance instrument changed when including follow up responses.

    Release date: 2015-06-29

  • Articles and reports: 82-003-X201500614196
    Description:

    This study investigates the feasibility and validity of using personal health insurance numbers to deterministically link the CCR and the Discharge Abstract Database to obtain hospitalization information about people with primary cancers.

    Release date: 2015-06-17

  • Technical products: 11-522-X201300014285
    Description:

    The 2011 National Household Survey (NHS) is a voluntary survey that replaced the traditional mandatory long-form questionnaire of the Canadian census of population. The NHS sampled about 30% of Canadian households and achieved a design-weighted response rate of 77%. In comparison, the last census long form was sent to 20% of households and achieved a response rate of 94%. Based on the long-form data, Statistics Canada traditionally produces two public use microdata files (PUMFs): the individual PUMF and the hierarchical PUMF. Both give information on individuals, but the hierarchical PUMF provides extra information on the household and family relationships between the individuals. To produce two PUMFs, based on the NHS data, that cover the whole country evenly and that do not overlap, we applied a special sub-sampling strategy. Difficulties in the confidentiality analyses have increased because of the numerous new variables, the more detailed geographic information and the voluntary nature of the NHS. This paper describes the 2011 PUMF methodology and how it balances the requirements for more information and for low risk of disclosure.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014288
    Description:

    Probability-based surveys, those including with samples selected through a known randomization mechanism, are considered by many to be the gold standard in contrast to non-probability samples. Probability sampling theory was first developed in the early 1930’s and continues today to justify the estimation of population values from these data. Conversely, studies using non-probability samples have gained attention in recent years but they are not new. Touted as cheaper, faster (even better) than probability designs, these surveys capture participants through various “on the ground” methods (e.g., opt-in web survey). But, which type of survey is better? This paper is the first in a series on the quest for a quality framework under which all surveys, probability- and non-probability-based, may be measured on a more equal footing. First, we highlight a few frameworks currently in use, noting that “better” is almost always relative to a survey’s fit for purpose. Next, we focus on the question of validity, particularly external validity when population estimates are desired. Estimation techniques used to date for non-probability surveys are reviewed, along with a few comparative studies of these estimates against those from a probability-based sample. Finally, the next research steps in the quest are described, followed by a few parting comments.

    Release date: 2014-10-31

  • Articles and reports: 12-001-X201300211887
    Description:

    Multi-level models are extensively used for analyzing survey data with the design hierarchy matching the model hierarchy. We propose a unified approach, based on a design-weighted log composite likelihood, for two-level models that leads to design-model consistent estimators of the model parameters even when the within cluster sample sizes are small provided the number of sample clusters is large. This method can handle both linear and generalized linear two-level models and it requires level 2 and level 1 inclusion probabilities and level 1 joint inclusion probabilities, where level 2 represents a cluster and level 1 an element within a cluster. Results of a simulation study demonstrating superior performance of the proposed method relative to existing methods under informative sampling are also reported.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300211870
    Description:

    At national statistical institutes experiments embedded in ongoing sample surveys are frequently conducted, for example to test the effect of modifications in the survey process on the main parameter estimates of the survey, to quantify the effect of alternative survey implementations on these estimates, or to obtain insight into the various sources of non-sampling errors. A design-based analysis procedure for factorial completely randomized designs and factorial randomized block designs embedded in probability samples is proposed in this paper. Design-based Wald statistics are developed to test whether estimated population parameters, like means, totals and ratios of two population totals, that are observed under the different treatment combinations of the experiment are significantly different. The methods are illustrated with a real life application of an experiment embedded in the Dutch Labor Force Survey.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201200211757
    Description:

    Collinearities among explanatory variables in linear regression models affect estimates from survey data just as they do in non-survey data. Undesirable effects are unnecessarily inflated standard errors, spuriously low or high t-statistics, and parameter estimates with illogical signs. The available collinearity diagnostics are not generally appropriate for survey data because the variance estimators they incorporate do not properly account for stratification, clustering, and survey weights. In this article, we derive condition indexes and variance decompositions to diagnose collinearity problems in complex survey data. The adapted diagnostics are illustrated with data based on a survey of health characteristics.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X201200111685
    Description:

    Survey data are often used to fit linear regression models. The values of covariates used in modeling are not controlled as they might be in an experiment. Thus, collinearity among the covariates is an inevitable problem in the analysis of survey data. Although many books and articles have described the collinearity problem and proposed strategies to understand, assess and handle its presence, the survey literature has not provided appropriate diagnostic tools to evaluate its impact on regression estimation when the survey complexities are considered. We have developed variance inflation factors (VIFs) that measure the amount that variances of parameter estimators are increased due to having non-orthogonal predictors. The VIFs are appropriate for survey-weighted regression estimators and account for complex design features, e.g., weights, clusters, and strata. Illustrations of these methods are given using a probability sample from a household survey of health and nutrition.

    Release date: 2012-06-27

  • Articles and reports: 12-001-X201000111251
    Description:

    Calibration techniques, such as poststratification, use auxiliary information to improve the efficiency of survey estimates. The control totals, to which sample weights are poststratified (or calibrated), are assumed to be population values. Often, however, the controls are estimated from other surveys. Many researchers apply traditional poststratification variance estimators to situations where the control totals are estimated, thus assuming that any additional sampling variance associated with these controls is negligible. The goal of the research presented here is to evaluate variance estimators for stratified, multi-stage designs under estimated-control (EC) poststratification using design-unbiased controls. We compare the theoretical and empirical properties of linearization and jackknife variance estimators for a poststratified estimator of a population total. Illustrations are given of the effects on variances from different levels of precision in the estimated controls. Our research suggests (i) traditional variance estimators can seriously underestimate the theoretical variance, and (ii) two EC poststratification variance estimators can mitigate the negative bias.

    Release date: 2010-06-29

  • Articles and reports: 12-001-X200900211040
    Description:

    In this paper a multivariate structural time series model is described that accounts for the panel design of the Dutch Labour Force Survey and is applied to estimate monthly unemployment rates. Compared to the generalized regression estimator, this approach results in a substantial increase of the accuracy due to a reduction of the standard error and the explicit modelling of the bias between the subsequent waves.

    Release date: 2009-12-23

  • Technical products: 11-522-X200800010985
    Description:

    In Canada, although complex businesses represent less than 1% of the total number of businesses, they contribute more than 45% of the total revenue. Statistics Canada recognized that the quality of the data collected from them is of great importance and has adopted several initiatives to improve the quality. One of the initiatives is the evaluation of the coherence of the data collected from large, complex enterprises. The findings of these recent coherence analyses have been instrumental in identifying areas for improvement. These, once addressed and improved, would be increasing the quality of the data collected from the large, complex enterprises while reducing the response burden imposed on them.

    Release date: 2009-12-03

  • Articles and reports: 12-001-X200900110881
    Description:

    Regression diagnostics are geared toward identifying individual points or groups of points that have an important influence on a fitted model. When fitting a model with survey data, the sources of influence are the response variable Y, the predictor variables X, and the survey weights, W. This article discusses the use of the hat matrix and leverages to identify points that may be influential in fitting linear models due to large weights or values of predictors. We also contrast findings that an analyst will obtain if ordinary least squares is used rather than survey weighted least squares to determine which points are influential.

    Release date: 2009-06-22

  • Technical products: 11-522-X200600110444
    Description:

    General population health surveys often include small samples of smokers. Few longitudinal studies specific to smoking have been carried out. We discuss development of the Ontario Tobacco Survey (OTS) which combines a rolling longitudinal, and repeated cross-sectional components. The OTS began in July 2005 using random selection and data-collection by telephones. Every 6 months, new samples of smokers and non-smokers provide data on smoking behaviours and attitudes. Smokers enter a panel study and are followed for changes in smoking influences and behaviour. The design is proving to be cost effective in meeting sample requirements for multiple research objectives.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110453
    Description:

    National Food and Nutrition Surveys provide critical information to support the understanding the complex relationship between health and diet in the population. Many of these surveys use 24 hour recall methodology which collects at a detailed level all food and beverages consumed over a day. Often it is the longer term intake of foods and nutrients that is of interest and a number of techniques are available that allow estimation of population usual intakes. These techniques require that at least one repeat 24 hour recall be collected from at least a subset of the population in order to estimate the intra individual variability of intakes. Deciding on the number of individuals required to provide a repeat is an important step in the survey design that must recognize that too few repeat individuals compromises the ability to estimate usual intakes, but large numbers of repeats are costly and pose added burden to the respondents. This paper looks at the statistical issues related to the number of repeat individuals, assessing the impact of the number of repeaters on the stability and uncertainty in the estimate of intra individual variability and provides guidance on required number of repeat responders .

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110441
    Description:

    How does one efficiently estimate sample size while building concensus among multiple investigators for multi-purpose projects? We present a template using common spreadsheet software to provide estimates of power, precision, and financial costs under varying sampling scenarios, as used in development of the Ontario Tobacco Survey. In addition to cost estimates, complex sample size formulae were nested within a spreadsheet to determine power and precision, incorporating user-defined design effects and loss-to-followup. Common spreadsheet software can be used in conjunction with complex formulae to enhance knowledge exchange between the methodologists and stakeholders; in effect demystifying the "sample size black box".

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110446
    Description:

    Immigrants have health advantages over native-born Canadians, but those advantages are threatened by specific risk situations. This study explores cardiovascular health outcomes in districts of Montréal classified by the proportion of immigrants in the population, using a principal component analysis. The first three components are immigration, degree of socio-economic disadvantage and degree of economic disadvantage. The incidence of myocardial infarction is lower in districts with large immigrant populations than in districts dominated by native-born Canadians. Mortality rates are associated with the degree of socio-economic disadvantage, while revascularization is associated with the proportion of seniors in the population.

    Release date: 2008-03-17

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (35)

Analysis (35) (25 of 35 results)

  • Articles and reports: 12-001-X201700254871
    Description:

    In this paper the question is addressed how alternative data sources, such as administrative and social media data, can be used in the production of official statistics. Since most surveys at national statistical institutes are conducted repeatedly over time, a multivariate structural time series modelling approach is proposed to model the series observed by a repeated surveys with related series obtained from such alternative data sources. Generally, this improves the precision of the direct survey estimates by using sample information observed in preceding periods and information from related auxiliary series. This model also makes it possible to utilize the higher frequency of the social media to produce more precise estimates for the sample survey in real time at the moment that statistics for the social media become available but the sample data are not yet available. The concept of cointegration is applied to address the question to which extent the alternative series represent the same phenomena as the series observed with the repeated survey. The methodology is applied to the Dutch Consumer Confidence Survey and a sentiment index derived from social media.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700114819
    Description:

    Structural time series models are a powerful technique for variance reduction in the framework of small area estimation (SAE) based on repeatedly conducted surveys. Statistics Netherlands implemented a structural time series model to produce monthly figures about the labour force with the Dutch Labour Force Survey (DLFS). Such models, however, contain unknown hyperparameters that have to be estimated before the Kalman filter can be launched to estimate state variables of the model. This paper describes a simulation aimed at studying the properties of hyperparameter estimators in the model. Simulating distributions of the hyperparameter estimators under different model specifications complements standard model diagnostics for state space models. Uncertainty around the model hyperparameters is another major issue. To account for hyperparameter uncertainty in the mean squared errors (MSE) estimates of the DLFS, several estimation approaches known in the literature are considered in a simulation. Apart from the MSE bias comparison, this paper also provides insight into the variances and MSEs of the MSE estimators considered.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201700114820
    Description:

    Measurement errors can induce bias in the estimation of transitions, leading to erroneous conclusions about labour market dynamics. Traditional literature on gross flows estimation is based on the assumption that measurement errors are uncorrelated over time. This assumption is not realistic in many contexts, because of survey design and data collection strategies. In this work, we use a model-based approach to correct observed gross flows from classification errors with latent class Markov models. We refer to data collected with the Italian Continuous Labour Force Survey, which is cross-sectional, quarterly, with a 2-2-2 rotating design. The questionnaire allows us to use multiple indicators of labour force conditions for each quarter: two collected in the first interview, and a third collected one year later. Our approach provides a method to estimate labour market mobility, taking into account correlated errors and the rotating design of the survey. The best-fitting model is a mixed latent class Markov model with covariates affecting latent transitions and correlated errors among indicators; the mixture components are of mover-stayer type. The better fit of the mixture specification is due to more accurately estimated latent transitions.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600114544
    Description:

    In the Netherlands, statistical information about income and wealth is based on two large scale household panels that are completely derived from administrative data. A problem with using households as sampling units in the sample design of panels is the instability of these units over time. Changes in the household composition affect the inclusion probabilities required for design-based and model-assisted inference procedures. Such problems are circumvented in the two aforementioned household panels by sampling persons, who are followed over time. At each period the household members of these sampled persons are included in the sample. This is equivalent to sampling with probabilities proportional to household size where households can be selected more than once but with a maximum equal to the number of household members. In this paper properties of this sample design are described and contrasted with the Generalized Weight Share method for indirect sampling (Lavallée 1995, 2007). Methods are illustrated with an application to the Dutch Regional Income Survey.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201500214236
    Description:

    We propose a model-assisted extension of weighting design-effect measures. We develop a summary-level statistic for different variables of interest, in single-stage sampling and under calibration weight adjustments. Our proposed design effect measure captures the joint effects of a non-epsem sampling design, unequal weights produced using calibration adjustments, and the strength of the association between an analysis variable and the auxiliaries used in calibration. We compare our proposed measure to existing design effect measures in simulations using variables like those collected in establishment surveys and telephone surveys of households.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214231
    Description:

    Rotating panels are widely applied by national statistical institutes, for example, to produce official statistics about the labour force. Estimation procedures are generally based on traditional design-based procedures known from classical sampling theory. A major drawback of this class of estimators is that small sample sizes result in large standard errors and that they are not robust for measurement bias. Two examples showing the effects of measurement bias are rotation group bias in rotating panels, and systematic differences in the outcome of a survey due to a major redesign of the underlying process. In this paper we apply a multivariate structural time series model to the Dutch Labour Force Survey to produce model-based figures about the monthly labour force. The model reduces the standard errors of the estimates by taking advantage of sample information collected in previous periods, accounts for rotation group bias and autocorrelation induced by the rotating panel, and models discontinuities due to a survey redesign. Additionally, we discuss the use of correlated auxiliary series in the model to further improve the accuracy of the model estimates. The method is applied by Statistics Netherlands to produce accurate official monthly statistics about the labour force that are consistent over time, despite a redesign of the survey process.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214248
    Description:

    Unit level population models are often used in model-based small area estimation of totals and means, but the models may not hold for the sample if the sampling design is informative for the model. As a result, standard methods, assuming that the model holds for the sample, can lead to biased estimators. We study alternative methods that use a suitable function of the unit selection probability as an additional auxiliary variable in the sample model. We report the results of a simulation study on the bias and mean squared error (MSE) of the proposed estimators of small area means and on the relative bias of the associated MSE estimators, using informative sampling schemes to generate the samples. Alternative methods, based on modeling the conditional expectation of the design weight as a function of the model covariates and the response, are also included in the simulation study.

    Release date: 2015-12-17

  • Articles and reports: 82-003-X201501114243
    Description:

    A surveillance tool was developed to assess dietary intake collected by surveys in relation to Eating Well with Canada’s Food Guide (CFG). The tool classifies foods in the Canadian Nutrient File (CNF) according to how closely they reflect CFG. This article describes the validation exercise conducted to ensure that CNF foods determined to be “in line with CFG” were appropriately classified.

    Release date: 2015-11-18

  • Articles and reports: 12-001-X201500114151
    Description:

    One of the main variables in the Dutch Labour Force Survey is the variable measuring whether a respondent has a permanent or a temporary job. The aim of our study is to determine the measurement error in this variable by matching the information obtained by the longitudinal part of this survey with unique register data from the Dutch Institute for Employee Insurance. Contrary to previous approaches confronting such datasets, we take into account that also register data are not error-free and that measurement error in these data is likely to be correlated over time. More specifically, we propose the estimation of the measurement error in these two sources using an extended hidden Markov model with two observed indicators for the type of contract. Our results indicate that none of the two sources should be considered as error-free. For both indicators, we find that workers in temporary contracts are often misclassified as having a permanent contract. Particularly for the register data, we find that measurement errors are strongly autocorrelated, as, if made, they tend to repeat themselves. In contrast, when the register is correct, the probability of an error at the next time period is almost zero. Finally, we find that temporary contracts are more widespread than the Labour Force Survey suggests, while transition rates between temporary to permanent contracts are much less common than both datasets suggest.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114162
    Description:

    The operationalization of the Population and Housing Census in Portugal is managed by a hierarchical structure in which Statistics Portugal is at the top and local government institutions at the bottom. When the Census takes place every ten years, local governments are asked to collaborate with Statistics Portugal in the execution and monitoring of the fieldwork operations at the local level. During the Pilot Test stage of the 2011 Census, local governments were asked for additional collaboration: to answer the Perception of Risk survey, whose aim was to gather information to design a quality assurance instrument that could be used to monitor the Census operations. The response rate of the survey was desired to be 100%, however, by the deadline of data collection nearly a quarter of local governments had not responded to the survey and thus a decision was made to make a follow up mailing. In this paper, we examine whether the same conclusions could have been reached from survey without follow ups as with them and evaluate the influence of follow ups on the conception of the quality assurance instrument. Comparison of responses on a set of perception variables revealed that local governments answering previous or after the follow up did not differ. However the configuration of the quality assurance instrument changed when including follow up responses.

    Release date: 2015-06-29

  • Articles and reports: 82-003-X201500614196
    Description:

    This study investigates the feasibility and validity of using personal health insurance numbers to deterministically link the CCR and the Discharge Abstract Database to obtain hospitalization information about people with primary cancers.

    Release date: 2015-06-17

  • Articles and reports: 12-001-X201300211887
    Description:

    Multi-level models are extensively used for analyzing survey data with the design hierarchy matching the model hierarchy. We propose a unified approach, based on a design-weighted log composite likelihood, for two-level models that leads to design-model consistent estimators of the model parameters even when the within cluster sample sizes are small provided the number of sample clusters is large. This method can handle both linear and generalized linear two-level models and it requires level 2 and level 1 inclusion probabilities and level 1 joint inclusion probabilities, where level 2 represents a cluster and level 1 an element within a cluster. Results of a simulation study demonstrating superior performance of the proposed method relative to existing methods under informative sampling are also reported.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300211870
    Description:

    At national statistical institutes experiments embedded in ongoing sample surveys are frequently conducted, for example to test the effect of modifications in the survey process on the main parameter estimates of the survey, to quantify the effect of alternative survey implementations on these estimates, or to obtain insight into the various sources of non-sampling errors. A design-based analysis procedure for factorial completely randomized designs and factorial randomized block designs embedded in probability samples is proposed in this paper. Design-based Wald statistics are developed to test whether estimated population parameters, like means, totals and ratios of two population totals, that are observed under the different treatment combinations of the experiment are significantly different. The methods are illustrated with a real life application of an experiment embedded in the Dutch Labor Force Survey.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201200211757
    Description:

    Collinearities among explanatory variables in linear regression models affect estimates from survey data just as they do in non-survey data. Undesirable effects are unnecessarily inflated standard errors, spuriously low or high t-statistics, and parameter estimates with illogical signs. The available collinearity diagnostics are not generally appropriate for survey data because the variance estimators they incorporate do not properly account for stratification, clustering, and survey weights. In this article, we derive condition indexes and variance decompositions to diagnose collinearity problems in complex survey data. The adapted diagnostics are illustrated with data based on a survey of health characteristics.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X201200111685
    Description:

    Survey data are often used to fit linear regression models. The values of covariates used in modeling are not controlled as they might be in an experiment. Thus, collinearity among the covariates is an inevitable problem in the analysis of survey data. Although many books and articles have described the collinearity problem and proposed strategies to understand, assess and handle its presence, the survey literature has not provided appropriate diagnostic tools to evaluate its impact on regression estimation when the survey complexities are considered. We have developed variance inflation factors (VIFs) that measure the amount that variances of parameter estimators are increased due to having non-orthogonal predictors. The VIFs are appropriate for survey-weighted regression estimators and account for complex design features, e.g., weights, clusters, and strata. Illustrations of these methods are given using a probability sample from a household survey of health and nutrition.

    Release date: 2012-06-27

  • Articles and reports: 12-001-X201000111251
    Description:

    Calibration techniques, such as poststratification, use auxiliary information to improve the efficiency of survey estimates. The control totals, to which sample weights are poststratified (or calibrated), are assumed to be population values. Often, however, the controls are estimated from other surveys. Many researchers apply traditional poststratification variance estimators to situations where the control totals are estimated, thus assuming that any additional sampling variance associated with these controls is negligible. The goal of the research presented here is to evaluate variance estimators for stratified, multi-stage designs under estimated-control (EC) poststratification using design-unbiased controls. We compare the theoretical and empirical properties of linearization and jackknife variance estimators for a poststratified estimator of a population total. Illustrations are given of the effects on variances from different levels of precision in the estimated controls. Our research suggests (i) traditional variance estimators can seriously underestimate the theoretical variance, and (ii) two EC poststratification variance estimators can mitigate the negative bias.

    Release date: 2010-06-29

  • Articles and reports: 12-001-X200900211040
    Description:

    In this paper a multivariate structural time series model is described that accounts for the panel design of the Dutch Labour Force Survey and is applied to estimate monthly unemployment rates. Compared to the generalized regression estimator, this approach results in a substantial increase of the accuracy due to a reduction of the standard error and the explicit modelling of the bias between the subsequent waves.

    Release date: 2009-12-23

  • Articles and reports: 12-001-X200900110881
    Description:

    Regression diagnostics are geared toward identifying individual points or groups of points that have an important influence on a fitted model. When fitting a model with survey data, the sources of influence are the response variable Y, the predictor variables X, and the survey weights, W. This article discusses the use of the hat matrix and leverages to identify points that may be influential in fitting linear models due to large weights or values of predictors. We also contrast findings that an analyst will obtain if ordinary least squares is used rather than survey weighted least squares to determine which points are influential.

    Release date: 2009-06-22

  • Articles and reports: 12-001-X200700210491
    Description:

    Poststratification is a common method of estimation in household surveys. Cells are formed based on characteristics that are known for all sample respondents and for which external control counts are available from a census or another source. The inverses of the poststratification adjustments are usually referred to as coverage ratios. Coverage of some demographic groups may be substantially below 100 percent, and poststratifying serves to correct for biases due to poor coverage. A standard procedure in poststratification is to collapse or combine cells when the sample sizes fall below some minimum or the weight adjustments are above some maximum. Collapsing can either increase or decrease the variance of an estimate but may simultaneously increase its bias. We study the effects on bias and variance of this type of dynamic cell collapsing theoretically and through simulation using a population based on the 2003 National Health Interview Survey. Two alternative estimators are also proposed that restrict the size of weight adjustments when cells are collapsed.

    Release date: 2008-01-03

  • Articles and reports: 12-001-X20060029550
    Description:

    In this paper, the geometric, optimization-based, and Lavallée and Hidiroglou (LH) approaches to stratification are compared. The geometric stratification method is an approximation, whereas the other two approaches, which employ numerical methods to perform stratification, may be seen as optimal stratification methods. The algorithm of the geometric stratification is very simple compared to the two other approaches, but it does not take into account the construction of a take-all stratum, which is usually constructed when a positively skewed population is stratified. In the optimization-based stratification, one may consider any form of optimization function and its constraints. In a comparative numerical study based on five positively skewed artificial populations, the optimization approach was more efficient in each of the cases studied compared to the geometric stratification. In addition, the geometric and optimization approaches are compared with the LH algorithm. In this comparison, the geometric stratification approach was found to be less efficient than the LH algorithm, whereas efficiency of the optimization approach was similar to the efficiency of the LH algorithm. Nevertheless, strata boundaries evaluated via the geometric stratification may be seen as efficient starting points for the optimization approach.

    Release date: 2006-12-21

  • Articles and reports: 12-001-X20050029046
    Description:

    Nonresponse weighting is a common method for handling unit nonresponse in surveys. The method is aimed at reducing nonresponse bias, and it is often accompanied by an increase in variance. Hence, the efficacy of weighting adjustments is often seen as a bias-variance trade-off. This view is an oversimplification, nonresponse weighting can in fact lead to a reduction in variance as well as bias. A covariate for a weighting adjustment must have two characteristics to reduce nonresponse bias: it needs to be related to the probability of response, and it needs to be related to the survey outcome. If the latter is true, then weighting can reduce, not increase, sampling variance. A detailed analysis of bias and variance is provided in the setting of weighting for an estimate of a survey mean based on adjustment cells. The analysis suggests that the most important feature of variables for inclusion in weighting adjustments is that they are predictive of survey outcomes; prediction of the propensity to respond is a secondary, though useful, goal. Empirical estimates of root mean squared error for assessing when weighting is effective are proposed and evaluated in a simulation study. A simple composite estimator based on the empirical root mean squared error yields some gains over the weighted estimator in the simulations.

    Release date: 2006-02-17

  • Articles and reports: 12-001-X20050029044
    Description:

    Complete data methods for estimating the variances of survey estimates are biased when some data are imputed. This paper uses simulation to compare the performance of the model-assisted, the adjusted jackknife, and the multiple imputation methods for estimating the variance of a total when missing items have been imputed using hot deck imputation. The simulation studies the properties of the variance estimates for imputed estimates of totals for the full population and for domains from a single-stage disproportionate stratified sample design when underlying assumptions, such as unbiasedness of the point estimate and item responses being randomly missing within hot deck cells, do not hold. The variance estimators for full population estimates produce confidence intervals with coverage rates near the nominal level even under modest departures from the assumptions, but this finding does not apply for the domain estimates. Coverage is most sensitive to bias in the point estimates. As the simulation demonstrates, even if an imputation method gives almost unbiased estimates for the full population, estimates for domains may be very biased.

    Release date: 2006-02-17

  • Articles and reports: 12-001-X20050018084
    Description:

    At national statistical institutes, experiments embedded in ongoing sample surveys are conducted occasionally to investigate possible effects of alternative survey methodologies on estimates of finite population parameters. To test hypotheses about differences between sample estimates due to alternative survey implementations, a design-based theory is developed for the analysis of completely randomized designs or randomized block designs embedded in general complex sampling designs. For both experimental designs, design-based Wald statistics are derived for the Horvitz-Thompson estimator and the generalized regression estimator. The theory is illustrated with a simulation study.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018085
    Description:

    Record linkage is a process of pairing records from two files and trying to select the pairs that belong to the same entity. The basic framework uses a match weight to measure the likelihood of a correct match and a decision rule to assign record pairs as "true" or "false" match pairs. Weight thresholds for selecting a record pair as matched or unmatched depend on the desired control over linkage errors. Current methods to determine the selection thresholds and estimate linkage errors can provide divergent results, depending on the type of linkage error and the approach to linkage. This paper presents a case study that uses existing linkage methods to link record pairs but a new simulation approach (SimRate) to help determine selection thresholds and estimate linkage errors. SimRate uses the observed distribution of data in matched and unmatched pairs to generate a large simulated set of record pairs, assigns a match weight to each pair based on specified match rules, and uses the weight curves of the simulated pairs for error estimation.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20030016605
    Description:

    In this paper, we examine the effects of model choice on different types of estimators for totals of domains (including small domains or small areas) for a sampled finite population. The paper asks how different estimator types compare for a common underlying model statement. We argue that estimator type - synthetic, generalized regression (GREG), composite, empirical best linear unbiased predicition (EBLUP), hierarchical Bayes, and so on - is one important aspect of domain estimation, and that the choice of the model, including its parameters and effects, is a second aspect, conceptually different from the first. Earlier work has not always made this distinction clear. For a given estimator type, one can derive different estimators, depending on the choice of model. In recent literature, a number of estimator types have been proposed, but there is relatively little impartial comparisons made among them. In this paper, we discuss three types: synthetic, GREG, and, to a limited extent, composite. We show that model improvement - the transition from a weaker to a stronger model - has very different effects on the different estimator types. We also show that the difference in accuracy between the different estimator types depends on the choice of model. For a well-specified model, the difference in accuracy between synthetic and GREG is negligible, but it can be substantial if the model is mis-specified. The synthetic type then tends to be highly inaccurate. We rely partly on theoretical results (for simple random sampling only) and partly on empirical results. The empirical results are based on simulations with repeated samples drawn from two finite populations, one artificially constructed, the other constructed from the real data of the Finnish Labour Force Survey.

    Release date: 2003-07-31

Reference (22)

Reference (22) (22 of 22 results)

  • Technical products: 11-522-X201300014285
    Description:

    The 2011 National Household Survey (NHS) is a voluntary survey that replaced the traditional mandatory long-form questionnaire of the Canadian census of population. The NHS sampled about 30% of Canadian households and achieved a design-weighted response rate of 77%. In comparison, the last census long form was sent to 20% of households and achieved a response rate of 94%. Based on the long-form data, Statistics Canada traditionally produces two public use microdata files (PUMFs): the individual PUMF and the hierarchical PUMF. Both give information on individuals, but the hierarchical PUMF provides extra information on the household and family relationships between the individuals. To produce two PUMFs, based on the NHS data, that cover the whole country evenly and that do not overlap, we applied a special sub-sampling strategy. Difficulties in the confidentiality analyses have increased because of the numerous new variables, the more detailed geographic information and the voluntary nature of the NHS. This paper describes the 2011 PUMF methodology and how it balances the requirements for more information and for low risk of disclosure.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014288
    Description:

    Probability-based surveys, those including with samples selected through a known randomization mechanism, are considered by many to be the gold standard in contrast to non-probability samples. Probability sampling theory was first developed in the early 1930’s and continues today to justify the estimation of population values from these data. Conversely, studies using non-probability samples have gained attention in recent years but they are not new. Touted as cheaper, faster (even better) than probability designs, these surveys capture participants through various “on the ground” methods (e.g., opt-in web survey). But, which type of survey is better? This paper is the first in a series on the quest for a quality framework under which all surveys, probability- and non-probability-based, may be measured on a more equal footing. First, we highlight a few frameworks currently in use, noting that “better” is almost always relative to a survey’s fit for purpose. Next, we focus on the question of validity, particularly external validity when population estimates are desired. Estimation techniques used to date for non-probability surveys are reviewed, along with a few comparative studies of these estimates against those from a probability-based sample. Finally, the next research steps in the quest are described, followed by a few parting comments.

    Release date: 2014-10-31

  • Technical products: 11-522-X200800010985
    Description:

    In Canada, although complex businesses represent less than 1% of the total number of businesses, they contribute more than 45% of the total revenue. Statistics Canada recognized that the quality of the data collected from them is of great importance and has adopted several initiatives to improve the quality. One of the initiatives is the evaluation of the coherence of the data collected from large, complex enterprises. The findings of these recent coherence analyses have been instrumental in identifying areas for improvement. These, once addressed and improved, would be increasing the quality of the data collected from the large, complex enterprises while reducing the response burden imposed on them.

    Release date: 2009-12-03

  • Technical products: 11-522-X200600110444
    Description:

    General population health surveys often include small samples of smokers. Few longitudinal studies specific to smoking have been carried out. We discuss development of the Ontario Tobacco Survey (OTS) which combines a rolling longitudinal, and repeated cross-sectional components. The OTS began in July 2005 using random selection and data-collection by telephones. Every 6 months, new samples of smokers and non-smokers provide data on smoking behaviours and attitudes. Smokers enter a panel study and are followed for changes in smoking influences and behaviour. The design is proving to be cost effective in meeting sample requirements for multiple research objectives.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110453
    Description:

    National Food and Nutrition Surveys provide critical information to support the understanding the complex relationship between health and diet in the population. Many of these surveys use 24 hour recall methodology which collects at a detailed level all food and beverages consumed over a day. Often it is the longer term intake of foods and nutrients that is of interest and a number of techniques are available that allow estimation of population usual intakes. These techniques require that at least one repeat 24 hour recall be collected from at least a subset of the population in order to estimate the intra individual variability of intakes. Deciding on the number of individuals required to provide a repeat is an important step in the survey design that must recognize that too few repeat individuals compromises the ability to estimate usual intakes, but large numbers of repeats are costly and pose added burden to the respondents. This paper looks at the statistical issues related to the number of repeat individuals, assessing the impact of the number of repeaters on the stability and uncertainty in the estimate of intra individual variability and provides guidance on required number of repeat responders .

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110441
    Description:

    How does one efficiently estimate sample size while building concensus among multiple investigators for multi-purpose projects? We present a template using common spreadsheet software to provide estimates of power, precision, and financial costs under varying sampling scenarios, as used in development of the Ontario Tobacco Survey. In addition to cost estimates, complex sample size formulae were nested within a spreadsheet to determine power and precision, incorporating user-defined design effects and loss-to-followup. Common spreadsheet software can be used in conjunction with complex formulae to enhance knowledge exchange between the methodologists and stakeholders; in effect demystifying the "sample size black box".

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110446
    Description:

    Immigrants have health advantages over native-born Canadians, but those advantages are threatened by specific risk situations. This study explores cardiovascular health outcomes in districts of Montréal classified by the proportion of immigrants in the population, using a principal component analysis. The first three components are immigration, degree of socio-economic disadvantage and degree of economic disadvantage. The incidence of myocardial infarction is lower in districts with large immigrant populations than in districts dominated by native-born Canadians. Mortality rates are associated with the degree of socio-economic disadvantage, while revascularization is associated with the proportion of seniors in the population.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110394
    Description:

    Statistics Canada conducted the Canadian Community Health Survey - Nutrition in 2004. The survey's main objective was to estimate the distributions of Canadians' usual dietary intake at the provincial level for 15 age-sex groups. Such distributions are generally estimated with the SIDE application, but with the choices that were made concerning sample design and method of estimating sampling variability, obtaining those estimates is not a simple matter. This article describes the methodological challenges in estimating usual intake distributions from the survey data using SIDE.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110390
    Description:

    We propose an aggregate level generalized linear model with additive random components (GLMARC) for binary count data from surveys. It has both linear (for random effects) and nonlinear (for fixed effects) parts in modeling the mean function and hence belongs to a class termed as mixed linear non-linear models. The model allows for linear mixed model (LMM)-type approach to small area estimation (SAE) somewhat similar to the well-known Fay-Herriot (1979) method and thus takes full account of the sampling design. Unlike the alternative hierarchical Bayes (HB) approach of You and Rao (2002), the proposed method gives rise to easily interpretable SAEs and frequentist diagnostics as well as self-benchmarking to reliable large area direct estimates. The usual LMM methodology is not appropriate for the problem with count data because of lack of range restrictions on the mean function and the possibility of unrealistic (e.g. zero in the context of SAE) estimates of the variance component as the model does not allow the random effect part of the conditional mean function to depend on the marginal mean. The proposed method is an improvement of the earlier method due to Vonesh and Carter (1992) which also uses mixed linear nonlinear models but the variance-mean relationship was not accounted for although typically done via range restrictions on the random effect. Also the implications of survey design were not considered as well as the estimation of random effects. In our application for SAE, however, it is important to obtain suitable estimates of both fixed and random effects. It may be noted that unlike the generalized linear mixed model (GLMM), GLMARC like LMM offers considerable simplicity in model fitting. This was made possible by replacing the original fixed and random effects of GLMM with a new set of parameters of GLMARC with quite a different interpretation as the random effect is no longer inside the nonlinear predictor function. However, this is of no consequence for SAE because the small area parameters correspond to the overall conditional means and not on individual model parameters. We propose a method of iterative BLUP for parameters estimation which allows for self-benchmarking after a suitable model enlargement. The problem of small areas with small or no sample sizes or zero direct estimates is addressed by collapsing domains only for the stage of parameter estimation. Application to the 2000-01 Canadian Community Health Survey for estimation of the proportion of daily smokers in subpopulations defined by provincial health regions by age-sex groups is presented as an illustration.

    Release date: 2008-03-17

  • Technical products: 11-522-X20050019457
    Description:

    The administrative data project has helped reduce the response burden of small and medium-sized business. We are continuing this work and expanding our objectives to maximize the use of administrative data. In addition, by exploring the single window reporting method, we plan to decrease the response burden of complex enterprises while ensuring consistent data collection. We will have to overcome some major challenges, some of which may be methodological in nature. Let's see what the future holds!

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019459
    Description:

    The subject of this paper is the use of administrative data like tax data and social security data for structural business statistics. In this paper also the newly developed statistics on general practitioners is discussed.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019466
    Description:

    A class of estimators based on the dependency structure of a multivariate variable of interest and the survey design is defined. It will be shown by a MonteCarlo simulation how the adoption of the estimator corresponding to the population structure is more efficient than the others.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019452
    Description:

    The redesign of the Dutch Business Register was started for both technical and statistical reasons. The major changes in the new register are the use of the new Dutch Basic Business Register as the source for legal and local units, the inclusion of administrative units in the register and a new automated algorithm to derive the statistical frame from administrative sources.

    Release date: 2007-03-02

  • Technical products: 11-522-X20040018653
    Description:

    This paper discusses the development of the tailored approach strategy, the pre-test, the sample design of the Dutch Family and Fertility Survey, the embedded experiment and its results.

    Release date: 2005-10-27

  • Technical products: 11-522-X20030017603
    Description:

    This paper describes the current status of the adoption of questionnaire development and testing methods for establishment surveys internationally and suggests a program of methodological research and strategies for improving this adoption.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017702
    Description:

    This paper proposes a procedure to test hypotheses about differences between sample estimates observed under alternative survey methodologies.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017712
    Description:

    This paper discusse variance estimation in the presence of imputations with an application to price index estimation, multiphase sampling, and the use of graphics in publications.

    Release date: 2005-01-26

  • Technical products: 11-522-X20020016728
    Description:

    Nearly all surveys use complex sampling designs to collect data and these data are frequently used for statistical analyses beyond the estimation of simple descriptive parameters of the target population. Many procedures available in popular statistical software packages are not appropriate for this purpose because the analyses are based on the assumption that the sample has been drawn with simple random sampling. Therefore, the results of the analyses conducted using these software packages would not be valid when the sample design incorporates multistage sampling, stratification, or clustering. Two commonly used methods for analysing data from complex surveys are replication and Taylor linearization techniques. We discuss the use of WESVAR software to compute estimates and replicate variance estimates by properly reflecting complex sampling and estimation procedures. We also illustrate the WESVAR features by using data from two Westat surveys that employ complex survey designs: the Third International Mathematics and Science Study (TIMSS) and the National Health and Nutrition Examination Survey (NHANES).

    Release date: 2004-09-13

  • Technical products: 11-522-X20010016234
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    With the goal of obtaining a complete enumeration of the Canadian agricultural sector, the 2001 Census of Agriculture has been conducted using several collection methods. Challenges to the traditional drop-off and mail-back of paper questionnaires in a household-based enumeration have led to the adoption of supplemental methods using newer technologies to maintain the coverage and content of the census. Overall, this mixed-mode data collection process responds to the critical needs of the census programme at various points. This paper examines these data collection methods, several quality assessments, and the future challenges of obtaining a co-ordinated view of the methods' individual approaches to achieving data quality.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016242
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    "Remembering Leslie Kish" provides us with a personal view of his many contributions to the international development of statistics. One of the elements that made his contributions so special and effective was the "Kish approach". The characteristic features of this approach include: identifying what is important; formulating and answering practical questions; seeking patterns and frameworks; and above all, persisting in the promotion of good ideas. Areas in which his technical contributions have made the most impact on practical survey work in developing countries have been identified. A unique aspect of Leslie's contribution is the motivation he created for the development of a world-wide community of survey samplers.

    Release date: 2002-09-12

  • Technical products: 11-522-X19990015650
    Description:

    The U.S. Manufacturing Plant Ownership Change Database (OCD) was constructed using plant-level data taken from the Census Bureau's Longitudinal Research Database (LRD). It contains data on all manufacturing plants that have experienced ownership change at least once during the period 1963-92. This paper reports the status of the OCD and discuss its research possibilities. For an empirical demonstration, data taken from the database are used to study the effects of ownership changes on plant closure.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015654
    Description:

    A meta analysis was performed to estimate the proportion of liver carcinogens, the proportion of chemicals carcinogenic at any site, and the corresponding proportion of anticarcinogens among chemicals tested in 397 long-term cancer bioassays conducted by the U.S. National Toxicology Program. Although the estimator used was negatively biased, the study provided persuasive evidence for a larger proportion of liver carcinogens (0.43,90%CI: 0.35,0.51) than was identified by the NTP (0.28). A larger proportion of chemicals carcinogenic at any site was also estimated (0.59,90%CI: 0.49,0.69) than was identified by the NTP (0.51), although this excess was not statistically significant. A larger proportion of anticarcinogens (0.66) was estimated than carcinogens (0.59). Despite the negative bias, it was estimated that 85% of the chemicals were either carcinogenic or anticarcinogenic at some site in some sex-species group. This suggests that most chemicals tested at high enough doses will cause some sort of perturbation in tumor rates.

    Release date: 2000-03-02

Date modified: