Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

78 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

78 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

78 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

78 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (101)

All (101) (25 of 101 results)

  • Articles and reports: 12-001-X201700254894
    Description:

    This note by Danny Pfeffermann presents a discussion of the paper “Sample survey theory and methods: Past, present, and future directions” where J.N.K. Rao and Wayne A. Fuller share their views regarding the developments in sample survey theory and methods covering the past 100 years.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700114817
    Description:

    We present research results on sample allocations for efficient model-based small area estimation in cases where the areas of interest coincide with the strata. Although model-assisted and model-based estimation methods are common in the production of small area statistics, utilization of the underlying model and estimation method are rarely included in the sample area allocation scheme. Therefore, we have developed a new model-based allocation named g1-allocation. For comparison, one recently developed model-assisted allocation is presented. These two allocations are based on an adjusted measure of homogeneity which is computed using an auxiliary variable and is an approximation of the intra-class correlation within areas. Five model-free area allocation solutions presented in the past are selected from the literature as reference allocations. Equal and proportional allocations need the number of areas and area-specific numbers of basic statistical units. The Neyman, Bankier and NLP (Non-Linear Programming) allocation need values for the study variable concerning area level parameters such as standard deviation, coefficient of variation or totals. In general, allocation methods can be classified according to the optimization criteria and use of auxiliary data. Statistical properties of the various methods are assessed through sample simulation experiments using real population register data. It can be concluded from simulation results that inclusion of the model and estimation method into the allocation method improves estimation results.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201700114819
    Description:

    Structural time series models are a powerful technique for variance reduction in the framework of small area estimation (SAE) based on repeatedly conducted surveys. Statistics Netherlands implemented a structural time series model to produce monthly figures about the labour force with the Dutch Labour Force Survey (DLFS). Such models, however, contain unknown hyperparameters that have to be estimated before the Kalman filter can be launched to estimate state variables of the model. This paper describes a simulation aimed at studying the properties of hyperparameter estimators in the model. Simulating distributions of the hyperparameter estimators under different model specifications complements standard model diagnostics for state space models. Uncertainty around the model hyperparameters is another major issue. To account for hyperparameter uncertainty in the mean squared errors (MSE) estimates of the DLFS, several estimation approaches known in the literature are considered in a simulation. Apart from the MSE bias comparison, this paper also provides insight into the variances and MSEs of the MSE estimators considered.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600214684
    Description:

    This paper introduces an incomplete adaptive cluster sampling design that is easy to implement, controls the sample size well, and does not need to follow the neighbourhood. In this design, an initial sample is first selected, using one of the conventional designs. If a cell satisfies a prespecified condition, a specified radius around the cell is sampled completely. The population mean is estimated using the \pi-estimator. If all the inclusion probabilities are known, then an unbiased \pi estimator is available; if, depending on the situation, the inclusion probabilities are not known for some of the final sample units, then they are estimated. To estimate the inclusion probabilities, a biased estimator is constructed. However, the simulations show that if the sample size is large enough, the error of the inclusion probabilities is negligible, and the relative \pi-estimator is almost unbiased. This design rivals adaptive cluster sampling because it controls the final sample size and is easy to manage. It rivals adaptive two-stage sequential sampling because it considers the cluster form of the population and reduces the cost of moving across the area. Using real data on a bird population and simulations, the paper compares the design with adaptive two-stage sequential sampling. The simulations show that the design has significant efficiency in comparison with its rival.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114539
    Description:

    Statistical matching is a technique for integrating two or more data sets when information available for matching records for individual participants across data sets is incomplete. Statistical matching can be viewed as a missing data problem where a researcher wants to perform a joint analysis of variables that are never jointly observed. A conditional independence assumption is often used to create imputed data for statistical matching. We consider a general approach to statistical matching using parametric fractional imputation of Kim (2011) to create imputed data under the assumption that the specified model is fully identified. The proposed method does not have a convergent EM sequence if the model is not identified. We also present variance estimators appropriate for the imputation procedure. We explain how the method applies directly to the analysis of data from split questionnaire designs and measurement error models.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114541
    Description:

    In this work we compare nonparametric estimators for finite population distribution functions based on two types of fitted values: the fitted values from the well-known Kuo estimator and a modified version of them, which incorporates a nonparametric estimate for the mean regression function. For each type of fitted values we consider the corresponding model-based estimator and, after incorporating design weights, the corresponding generalized difference estimator. We show under fairly general conditions that the leading term in the model mean square error is not affected by the modification of the fitted values, even though it slows down the convergence rate for the model bias. Second order terms of the model mean square errors are difficult to obtain and will not be derived in the present paper. It remains thus an open question whether the modified fitted values bring about some benefit from the model-based perspective. We discuss also design-based properties of the estimators and propose a variance estimator for the generalized difference estimator based on the modified fitted values. Finally, we perform a simulation study. The simulation results suggest that the modified fitted values lead to a considerable reduction of the design mean square error if the sample size is small.

    Release date: 2016-06-22

  • Technical products: 11-522-X201700014755
    Description:

    The National Children’s Study Vanguard Study was a pilot epidemiological cohort study of children and their parents. Measures were to be taken from pre-pregnancy until adulthood. The use of extant data was planned to supplement direct data collection from the respondents. Our paper outlines a strategy for cataloging and evaluating extant data sources for use with large scale longitudinal. Through our review we selected five evaluation factors to guide a researcher through available data sources including 1) relevance, 2) timeliness, 3) spatiality, 4) accessibility, and 5) accuracy.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014750
    Description:

    The Educational Master File (EMF) system was built to allow the analysis of educational programs in Canada. At the core of the system are administrative files that record all of the registrations to post-secondary and apprenticeship programs in Canada. New administrative files become available on an annual basis. Once a new file becomes available, a first round of processing is performed, which includes linkage to other administrative records. This linkage yields information that can improve the quality of the file, it allows further linkages to other data describing labour market outcomes, and it’s the first step in adding the file to the EMF. Once part of the EMF, information from the file can be included in cross-sectional and longitudinal projects, to study academic pathways and labour market outcomes after graduation. The EMF currently consists of data from 2005 to 2013, but it evolves as new data become available. This paper gives an overview of the mechanisms used to build the EMF, with focus on the structure of the final system and some of its analytical potential.

    Release date: 2016-03-24

  • Articles and reports: 12-001-X201500114151
    Description:

    One of the main variables in the Dutch Labour Force Survey is the variable measuring whether a respondent has a permanent or a temporary job. The aim of our study is to determine the measurement error in this variable by matching the information obtained by the longitudinal part of this survey with unique register data from the Dutch Institute for Employee Insurance. Contrary to previous approaches confronting such datasets, we take into account that also register data are not error-free and that measurement error in these data is likely to be correlated over time. More specifically, we propose the estimation of the measurement error in these two sources using an extended hidden Markov model with two observed indicators for the type of contract. Our results indicate that none of the two sources should be considered as error-free. For both indicators, we find that workers in temporary contracts are often misclassified as having a permanent contract. Particularly for the register data, we find that measurement errors are strongly autocorrelated, as, if made, they tend to repeat themselves. In contrast, when the register is correct, the probability of an error at the next time period is almost zero. Finally, we find that temporary contracts are more widespread than the Labour Force Survey suggests, while transition rates between temporary to permanent contracts are much less common than both datasets suggest.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114160
    Description:

    Composite estimation is a technique applicable to repeated surveys with controlled overlap between successive surveys. This paper examines the modified regression estimators that incorporate information from previous time periods into estimates for the current time period. The range of modified regression estimators are extended to the situation of business surveys with survey frames that change over time, due to the addition of “births” and the deletion of “deaths”. Since the modified regression estimators can deviate from the generalized regression estimator over time, it is proposed to use a compromise modified regression estimator, a weighted average of the modified regression estimator and the generalised regression estimator. A Monte Carlo simulation study shows that the proposed compromise modified regression estimator leads to significant efficiency gains in both the point-in-time and movement estimates.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114193
    Description:

    Imputed micro data often contain conflicting information. The situation may e.g., arise from partial imputation, where one part of the imputed record consists of the observed values of the original record and the other the imputed values. Edit-rules that involve variables from both parts of the record will often be violated. Or, inconsistency may be caused by adjustment for errors in the observed data, also referred to as imputation in Editing. Under the assumption that the remaining inconsistency is not due to systematic errors, we propose to make adjustments to the micro data such that all constraints are simultaneously satisfied and the adjustments are minimal according to a chosen distance metric. Different approaches to the distance metric are considered, as well as several extensions of the basic situation, including the treatment of categorical data, unit imputation and macro-level benchmarking. The properties and interpretations of the proposed methods are illustrated using business-economic data.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114150
    Description:

    An area-level model approach to combining information from several sources is considered in the context of small area estimation. At each small area, several estimates are computed and linked through a system of structural error models. The best linear unbiased predictor of the small area parameter can be computed by the general least squares method. Parameters in the structural error models are estimated using the theory of measurement error models. Estimation of mean squared errors is also discussed. The proposed method is applied to the real problem of labor force surveys in Korea.

    Release date: 2015-06-29

  • Technical products: 11-522-X201300014276
    Description:

    In France, budget restrictions are making it more difficult to hire casual interviewers to deal with collection problems. As a result, it has become necessary to adhere to a predetermined annual work quota. For surveys of the National Institute of Statistics and Economic Studies (INSEE), which use a master sample, problems arise when an interviewer is on extended leave throughout the entire collection period of a survey. When that occurs, an area may cease to be covered by the survey, and this effectively generates a bias. In response to this new problem, we have implemented two methods, depending on when the problem is identified: If an area is ‘abandoned’ before or at the very beginning of collection, we carry out a ‘sub-allocation’ procedure. The procedure involves interviewing a minimum number of households in each collection area at the expense of other areas in which no collection problems have been identified. The idea is to minimize the dispersion of weights while meeting collection targets. If an area is ‘abandoned’ during collection, we prioritize the remaining surveys. Prioritization is based on a representativeness indicator (R indicator) that measures the degree of similarity between a sample and the base population. The goal of this prioritization process during collection is to get as close as possible to equal response probability for respondents. The R indicator is based on the dispersion of the estimated response probabilities of the sampled households, and it is composed of partial R indicators that measure representativeness variable by variable. These R indicators are tools that we can use to analyze collection by isolating underrepresented population groups. We can increase collection efforts for groups that have been identified beforehand. In the oral presentation, we covered these two points concisely. By contrast, this paper deals exclusively with the first point: sub-allocation. Prioritization is being implemented for the first time at INSEE for the assets survey, and it will be covered in a specific paper by A. Rebecq.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014279
    Description:

    As part of the European SustainCity project, a microsimulation model of individuals and households was created to simulate the population of various European cities. The aim of the project was to combine several transportation and land-use microsimulation models (land-use modelling), add on a dynamic population module and apply these microsimulation approaches to three geographic areas of Europe (the Île-de-France region and the Brussels and Zurich agglomerations

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014273
    Description:

    More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term ‘Big Data’. Because of the fact that these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. However, first experiences obtained with analyses of large amounts of Dutch traffic loop detection records, call detail records of mobile phones and Dutch social media messages reveal that a number of challenges need to be addressed to enable the application of these data sources for official statistics. These and the lessons learned during these initial studies will be addressed and illustrated by examples. More specifically, the following topics are discussed: the three general types of Big Data discerned, the need to access and analyse large amounts of data, how we deal with noisy data and look at selectivity (and our own bias towards this topic), how to go beyond correlation, how we found people with the right skills and mind-set to perform the work, and how we have dealt with privacy and security issues.

    Release date: 2014-10-31

  • Articles and reports: 82-003-X201300611796
    Description:

    The study assesses the feasibility of using statistical modelling techniques to fill information gaps related to risk factors, specifically, smoking status, in linked long-form census data.

    Release date: 2013-06-19

  • Articles and reports: 12-001-X201200211755
    Description:

    Non-response in longitudinal studies is addressed by assessing the accuracy of response propensity models constructed to discriminate between and predict different types of non-response. Particular attention is paid to summary measures derived from receiver operating characteristic (ROC) curves and logit rank plots. The ideas are applied to data from the UK Millennium Cohort Study. The results suggest that the ability to discriminate between and predict non-respondents is not high. Weights generated from the response propensity models lead to only small adjustments in employment transitions. Conclusions are drawn in terms of the potential of interventions to prevent non-response.

    Release date: 2012-12-19

  • Articles and reports: 82-003-X201200111633
    Description:

    This paper explains the methodology for creating Geozones, which are area-based thresholds of population characteristics derived from census data, which can be used in the analysis of social or economic differences in health and health service utilization.

    Release date: 2012-03-21

  • Articles and reports: 12-001-X201100211602
    Description:

    This article attempts to answer the three questions appearing in the title. It starts by discussing unique features of complex survey data not shared by other data sets, which require special attention but suggest a large variety of diverse inference procedures. Next a large number of different approaches proposed in the literature for handling these features are reviewed with discussion on their merits and limitations. The approaches differ in the conditions underlying their use, additional data required for their application, goodness of fit testing, the inference objectives that they accommodate, statistical efficiency, computational demands, and the skills required from analysts fitting the model. The last part of the paper presents simulation results, which compare the approaches when estimating linear regression coefficients from a stratified sample in terms of bias, variance, and coverage rates. It concludes with a short discussion of pending issues.

    Release date: 2011-12-21

  • Articles and reports: 12-001-X201000211383
    Description:

    Data collection for poverty assessments in Africa is time consuming, expensive and can be subject to numerous constraints. In this paper we present a procedure to collect data from poor households involved in small-scale inland fisheries as well as agricultural activities. A sampling scheme has been developed that captures the heterogeneity in ecological conditions and the seasonality of livelihood options. Sampling includes a three point panel survey of 300 households. The respondents belong to four different ethnic groups randomly chosen from three strata, each representing a different ecological zone. In the first part of the paper some background information is given on the objectives of the research, the study site and survey design, which were guiding the data collection process. The second part of the paper discusses the typical constraints that are hampering empirical work in Sub-Saharan Africa, and shows how different challenges have been resolved. These lessons could guide researchers in designing appropriate socio-economic surveys in comparable settings.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X200900211044
    Description:

    In large scaled sample surveys it is common practice to employ stratified multistage designs where units are selected using simple random sampling without replacement at each stage. Variance estimation for these types of designs can be quite cumbersome to implement, particularly for non-linear estimators. Various bootstrap methods for variance estimation have been proposed, but most of these are restricted to single-stage designs or two-stage cluster designs. An extension of the rescaled bootstrap method (Rao and Wu 1988) to stratified multistage designs is proposed which can easily be extended to any number of stages. The proposed method is suitable for a wide range of reweighting techniques, including the general class of calibration estimators. A Monte Carlo simulation study was conducted to examine the performance of the proposed multistage rescaled bootstrap variance estimator.

    Release date: 2009-12-23

  • Technical products: 11-522-X200800010979
    Description:

    Prior to 2006, the Canadian Census of Population relied on field staff to deliver questionnaires to all dwellings in Canada. For the 2006 Census, an address frame was created to cover almost 70% of dwellings in Canada, and these questionnaires were delivered by Canada Post. For the 2011 Census, Statistics Canada aims to expand this frame further, with a target of delivering questionnaires by mail to between 80% and 85% of dwellings. Mailing questionnaires for the Census raises a number of issues, among them: ensuring returned questionnaires are counted in the right area, creating an up to date address frame that includes all new growth, and determining which areas are unsuitable for having questionnaires delivered by mail. Changes to the address frame update procedures for 2011, most notably the decision to use purely administrative data as the frame wherever possible and conduct field update exercises only where deemed necessary, provide a new set of challenges for the 2011 Census.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010958
    Description:

    Telephone Data Entry (TDE) is a system by which survey respondents can return their data to the Office for National Statistics (ONS) using the keypad on their telephone and currently accounts for approximately 12% of total responses to ONS business surveys. ONS is currently increasing the number of surveys which use TDE as the primary mode of response and this paper gives an overview of the redevelopment project covering; the redevelopment of the paper questionnaire, enhancements made to the TDE system and the results from piloting these changes. Improvements to the quality of the data received and increased response via TDE as a result of these developments suggest that data quality improvements and cost savings are possible as a result of promoting TDE as the primary mode of response to short term surveys.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011005
    Description:

    In 2006 Statistics New Zealand started developing a strategy aimed at coordinating new and existing initiatives focused on respondent load. The development of the strategy lasted more than a year and the resulting commitment to reduce respondent load has meant that the organisation has had to confront a number of issues that impact on the way we conduct our surveys.

    The next challenge for Statistics NZ is the transition from the project based initiatives outlined in the strategy to managing load on an ongoing basis.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010954
    Description:

    Over the past year, Statistics Canada has been developing and testing a new way to monitor the performance of interviewers conducting computer-assisted personal interviews (CAPI). A formal process already exists for monitoring centralized telephone interviews. Monitors listen to telephone interviews as they take place to assess the interviewer's performance using pre-defined criteria and provide feedback to the interviewer on what was well done and what needs improvement. For the CAPI program, we have developed and are testing a pilot approach whereby interviews are digitally recorded and later a monitor listens to these recordings to assess the field interviewer's performance and provide feedback in order to help improve the quality of the data. In this paper, we will present an overview of the CAPI monitoring project at Statistics Canada by describing the CAPI monitoring methodology and the plans for implementation.

    Release date: 2009-12-03

Data (1)

Data (1) (1 result)

  • Table: 62-010-X19970023422
    Description:

    The current official time base of the Consumer Price Index (CPI) is 1986=100. This time base was first used when the CPI for June 1990 was released. Statistics Canada is about to convert all price index series to the time base 1992=100. As a result, all constant dollar series will be converted to 1992 dollars. The CPI will shift to the new time base when the CPI for January 1998 is released on February 27th, 1998.

    Release date: 1997-11-17

Analysis (56)

Analysis (56) (25 of 56 results)

  • Articles and reports: 12-001-X201700254894
    Description:

    This note by Danny Pfeffermann presents a discussion of the paper “Sample survey theory and methods: Past, present, and future directions” where J.N.K. Rao and Wayne A. Fuller share their views regarding the developments in sample survey theory and methods covering the past 100 years.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700114817
    Description:

    We present research results on sample allocations for efficient model-based small area estimation in cases where the areas of interest coincide with the strata. Although model-assisted and model-based estimation methods are common in the production of small area statistics, utilization of the underlying model and estimation method are rarely included in the sample area allocation scheme. Therefore, we have developed a new model-based allocation named g1-allocation. For comparison, one recently developed model-assisted allocation is presented. These two allocations are based on an adjusted measure of homogeneity which is computed using an auxiliary variable and is an approximation of the intra-class correlation within areas. Five model-free area allocation solutions presented in the past are selected from the literature as reference allocations. Equal and proportional allocations need the number of areas and area-specific numbers of basic statistical units. The Neyman, Bankier and NLP (Non-Linear Programming) allocation need values for the study variable concerning area level parameters such as standard deviation, coefficient of variation or totals. In general, allocation methods can be classified according to the optimization criteria and use of auxiliary data. Statistical properties of the various methods are assessed through sample simulation experiments using real population register data. It can be concluded from simulation results that inclusion of the model and estimation method into the allocation method improves estimation results.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201700114819
    Description:

    Structural time series models are a powerful technique for variance reduction in the framework of small area estimation (SAE) based on repeatedly conducted surveys. Statistics Netherlands implemented a structural time series model to produce monthly figures about the labour force with the Dutch Labour Force Survey (DLFS). Such models, however, contain unknown hyperparameters that have to be estimated before the Kalman filter can be launched to estimate state variables of the model. This paper describes a simulation aimed at studying the properties of hyperparameter estimators in the model. Simulating distributions of the hyperparameter estimators under different model specifications complements standard model diagnostics for state space models. Uncertainty around the model hyperparameters is another major issue. To account for hyperparameter uncertainty in the mean squared errors (MSE) estimates of the DLFS, several estimation approaches known in the literature are considered in a simulation. Apart from the MSE bias comparison, this paper also provides insight into the variances and MSEs of the MSE estimators considered.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600214684
    Description:

    This paper introduces an incomplete adaptive cluster sampling design that is easy to implement, controls the sample size well, and does not need to follow the neighbourhood. In this design, an initial sample is first selected, using one of the conventional designs. If a cell satisfies a prespecified condition, a specified radius around the cell is sampled completely. The population mean is estimated using the \pi-estimator. If all the inclusion probabilities are known, then an unbiased \pi estimator is available; if, depending on the situation, the inclusion probabilities are not known for some of the final sample units, then they are estimated. To estimate the inclusion probabilities, a biased estimator is constructed. However, the simulations show that if the sample size is large enough, the error of the inclusion probabilities is negligible, and the relative \pi-estimator is almost unbiased. This design rivals adaptive cluster sampling because it controls the final sample size and is easy to manage. It rivals adaptive two-stage sequential sampling because it considers the cluster form of the population and reduces the cost of moving across the area. Using real data on a bird population and simulations, the paper compares the design with adaptive two-stage sequential sampling. The simulations show that the design has significant efficiency in comparison with its rival.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114539
    Description:

    Statistical matching is a technique for integrating two or more data sets when information available for matching records for individual participants across data sets is incomplete. Statistical matching can be viewed as a missing data problem where a researcher wants to perform a joint analysis of variables that are never jointly observed. A conditional independence assumption is often used to create imputed data for statistical matching. We consider a general approach to statistical matching using parametric fractional imputation of Kim (2011) to create imputed data under the assumption that the specified model is fully identified. The proposed method does not have a convergent EM sequence if the model is not identified. We also present variance estimators appropriate for the imputation procedure. We explain how the method applies directly to the analysis of data from split questionnaire designs and measurement error models.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114541
    Description:

    In this work we compare nonparametric estimators for finite population distribution functions based on two types of fitted values: the fitted values from the well-known Kuo estimator and a modified version of them, which incorporates a nonparametric estimate for the mean regression function. For each type of fitted values we consider the corresponding model-based estimator and, after incorporating design weights, the corresponding generalized difference estimator. We show under fairly general conditions that the leading term in the model mean square error is not affected by the modification of the fitted values, even though it slows down the convergence rate for the model bias. Second order terms of the model mean square errors are difficult to obtain and will not be derived in the present paper. It remains thus an open question whether the modified fitted values bring about some benefit from the model-based perspective. We discuss also design-based properties of the estimators and propose a variance estimator for the generalized difference estimator based on the modified fitted values. Finally, we perform a simulation study. The simulation results suggest that the modified fitted values lead to a considerable reduction of the design mean square error if the sample size is small.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201500114151
    Description:

    One of the main variables in the Dutch Labour Force Survey is the variable measuring whether a respondent has a permanent or a temporary job. The aim of our study is to determine the measurement error in this variable by matching the information obtained by the longitudinal part of this survey with unique register data from the Dutch Institute for Employee Insurance. Contrary to previous approaches confronting such datasets, we take into account that also register data are not error-free and that measurement error in these data is likely to be correlated over time. More specifically, we propose the estimation of the measurement error in these two sources using an extended hidden Markov model with two observed indicators for the type of contract. Our results indicate that none of the two sources should be considered as error-free. For both indicators, we find that workers in temporary contracts are often misclassified as having a permanent contract. Particularly for the register data, we find that measurement errors are strongly autocorrelated, as, if made, they tend to repeat themselves. In contrast, when the register is correct, the probability of an error at the next time period is almost zero. Finally, we find that temporary contracts are more widespread than the Labour Force Survey suggests, while transition rates between temporary to permanent contracts are much less common than both datasets suggest.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114160
    Description:

    Composite estimation is a technique applicable to repeated surveys with controlled overlap between successive surveys. This paper examines the modified regression estimators that incorporate information from previous time periods into estimates for the current time period. The range of modified regression estimators are extended to the situation of business surveys with survey frames that change over time, due to the addition of “births” and the deletion of “deaths”. Since the modified regression estimators can deviate from the generalized regression estimator over time, it is proposed to use a compromise modified regression estimator, a weighted average of the modified regression estimator and the generalised regression estimator. A Monte Carlo simulation study shows that the proposed compromise modified regression estimator leads to significant efficiency gains in both the point-in-time and movement estimates.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114193
    Description:

    Imputed micro data often contain conflicting information. The situation may e.g., arise from partial imputation, where one part of the imputed record consists of the observed values of the original record and the other the imputed values. Edit-rules that involve variables from both parts of the record will often be violated. Or, inconsistency may be caused by adjustment for errors in the observed data, also referred to as imputation in Editing. Under the assumption that the remaining inconsistency is not due to systematic errors, we propose to make adjustments to the micro data such that all constraints are simultaneously satisfied and the adjustments are minimal according to a chosen distance metric. Different approaches to the distance metric are considered, as well as several extensions of the basic situation, including the treatment of categorical data, unit imputation and macro-level benchmarking. The properties and interpretations of the proposed methods are illustrated using business-economic data.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114150
    Description:

    An area-level model approach to combining information from several sources is considered in the context of small area estimation. At each small area, several estimates are computed and linked through a system of structural error models. The best linear unbiased predictor of the small area parameter can be computed by the general least squares method. Parameters in the structural error models are estimated using the theory of measurement error models. Estimation of mean squared errors is also discussed. The proposed method is applied to the real problem of labor force surveys in Korea.

    Release date: 2015-06-29

  • Articles and reports: 82-003-X201300611796
    Description:

    The study assesses the feasibility of using statistical modelling techniques to fill information gaps related to risk factors, specifically, smoking status, in linked long-form census data.

    Release date: 2013-06-19

  • Articles and reports: 12-001-X201200211755
    Description:

    Non-response in longitudinal studies is addressed by assessing the accuracy of response propensity models constructed to discriminate between and predict different types of non-response. Particular attention is paid to summary measures derived from receiver operating characteristic (ROC) curves and logit rank plots. The ideas are applied to data from the UK Millennium Cohort Study. The results suggest that the ability to discriminate between and predict non-respondents is not high. Weights generated from the response propensity models lead to only small adjustments in employment transitions. Conclusions are drawn in terms of the potential of interventions to prevent non-response.

    Release date: 2012-12-19

  • Articles and reports: 82-003-X201200111633
    Description:

    This paper explains the methodology for creating Geozones, which are area-based thresholds of population characteristics derived from census data, which can be used in the analysis of social or economic differences in health and health service utilization.

    Release date: 2012-03-21

  • Articles and reports: 12-001-X201100211602
    Description:

    This article attempts to answer the three questions appearing in the title. It starts by discussing unique features of complex survey data not shared by other data sets, which require special attention but suggest a large variety of diverse inference procedures. Next a large number of different approaches proposed in the literature for handling these features are reviewed with discussion on their merits and limitations. The approaches differ in the conditions underlying their use, additional data required for their application, goodness of fit testing, the inference objectives that they accommodate, statistical efficiency, computational demands, and the skills required from analysts fitting the model. The last part of the paper presents simulation results, which compare the approaches when estimating linear regression coefficients from a stratified sample in terms of bias, variance, and coverage rates. It concludes with a short discussion of pending issues.

    Release date: 2011-12-21

  • Articles and reports: 12-001-X201000211383
    Description:

    Data collection for poverty assessments in Africa is time consuming, expensive and can be subject to numerous constraints. In this paper we present a procedure to collect data from poor households involved in small-scale inland fisheries as well as agricultural activities. A sampling scheme has been developed that captures the heterogeneity in ecological conditions and the seasonality of livelihood options. Sampling includes a three point panel survey of 300 households. The respondents belong to four different ethnic groups randomly chosen from three strata, each representing a different ecological zone. In the first part of the paper some background information is given on the objectives of the research, the study site and survey design, which were guiding the data collection process. The second part of the paper discusses the typical constraints that are hampering empirical work in Sub-Saharan Africa, and shows how different challenges have been resolved. These lessons could guide researchers in designing appropriate socio-economic surveys in comparable settings.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X200900211044
    Description:

    In large scaled sample surveys it is common practice to employ stratified multistage designs where units are selected using simple random sampling without replacement at each stage. Variance estimation for these types of designs can be quite cumbersome to implement, particularly for non-linear estimators. Various bootstrap methods for variance estimation have been proposed, but most of these are restricted to single-stage designs or two-stage cluster designs. An extension of the rescaled bootstrap method (Rao and Wu 1988) to stratified multistage designs is proposed which can easily be extended to any number of stages. The proposed method is suitable for a wide range of reweighting techniques, including the general class of calibration estimators. A Monte Carlo simulation study was conducted to examine the performance of the proposed multistage rescaled bootstrap variance estimator.

    Release date: 2009-12-23

  • Articles and reports: 12-001-X200900110884
    Description:

    The paper considers small domain estimation of the proportion of persons without health insurance for different minority groups. The small domains are cross-classified by age, sex and other demographic characteristics. Both hierarchical and empirical Bayes estimation methods are used. Also, second order accurate approximations of the mean squared errors of the empirical Bayes estimators and bias-corrected estimators of these mean squared errors are provided. The general methodology is illustrated with estimates of the proportion of uninsured persons for several cross-sections of the Asian subpopulation.

    Release date: 2009-06-22

  • Articles and reports: 12-001-X200900110883
    Description:

    We use a Bayesian method to resolve the boundary solution problem of the maximum likelihood (ML) estimate in an incomplete two-way contingency table, using a loglinear model and Dirichlet priors. We compare five Dirichlet priors in estimating multinomial cell probabilities under nonignorable nonresponse. Three priors among them have been used for an incomplete one-way table, while the remaining two new priors are newly proposed to reflect the difference in the response patterns between respondents and the undecided. The Bayesian estimates with the previous three priors do not always perform better than ML estimates unlike previous studies, whereas the two new priors perform better than both the previous three priors and the ML estimates whenever a boundary solution occurs. We use four sets of data from the 1998 Ohio state polls to illustrate how to use and interpret estimation results for the elections. We use simulation studies to compare performance of the five Bayesian estimates under nonignorable nonresponse.

    Release date: 2009-06-22

  • Articles and reports: 12-001-X200800210755
    Description:

    Dependent interviewing (DI) is used in many longitudinal surveys to "feed forward" data from one wave to the next. Though it is a promising technique which has been demonstrated to enhance data quality in certain respects, relatively little is known about how it is actually administered in the field. This research seeks to address this issue through behavior coding. Various styles of DI were employed in the English Longitudinal Study of Ageing (ELSA) in January, 2006, and recordings were made of pilot field interviews. These recordings were analysed to determine whether the questions (particularly the DI aspects) were administered appropriately and to explore the respondent's reaction to the fed-forward data. Of particular interest was whether respondents confirmed or challenged the previously-reported information, whether the prior wave data came into play when respondents were providing their current-wave answers, and how any discrepancies were negotiated by the interviewer and respondent. Also of interest was to examine the effectiveness of various styles of DI. For example, in some cases the prior wave data was brought forward and respondents were asked to explicitly confirm it; in other cases the previous data was read and respondents were asked if the situation was still the same. Results indicate varying levels of compliance in terms of initial question-reading, and suggest that some styles of DI may be more effective than others.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210764
    Description:

    This paper considers situations where the target response value is either zero or an observation from a continuous distribution. A typical example analyzed in the paper is the assessment of literacy proficiency with the possible outcome being either zero, indicating illiteracy, or a positive score measuring the level of literacy. Our interest is in how to obtain valid estimates of the average response, or the proportion of positive responses in small areas, for which only small samples or no samples are available. As in other small area estimation problems, the small sample sizes in at least some of the sampled areas and/or the existence of nonsampled areas requires the use of model based methods. Available methods, however, are not suitable for this kind of data because of the mixed distribution of the responses, having a large peak at zero, juxtaposed to a continuous distribution for the rest of the responses. We develop, therefore, a suitable two-part random effects model and show how to fit the model and assess its goodness of fit, and how to compute the small area estimators of interest and measure their precision. The proposed method is illustrated using simulated data and data obtained from a literacy survey conducted in Cambodia.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210759
    Description:

    The analysis of stratified multistage sample data requires the use of design information such as stratum and primary sampling unit (PSU) identifiers, or associated replicate weights, in variance estimation. In some public release data files, such design information is masked as an effort to avoid their disclosure risk and yet to allow the user to obtain valid variance estimation. For example, in area surveys with a limited number of PSUs, the original PSUs are split or/and recombined to construct pseudo-PSUs with swapped second or subsequent stage sampling units. Such PSU masking methods, however, obviously distort the clustering structure of the sample design, yielding biased variance estimates possibly with certain systematic patterns between two variance estimates from the unmasked and masked PSU identifiers. Some of the previous work observed patterns in the ratio of the masked and unmasked variance estimates when plotted against the unmasked design effect. This paper investigates the effect of PSU masking on variance estimates under cluster sampling regarding various aspects including the clustering structure and the degree of masking. Also, we seek a PSU masking strategy through swapping of subsequent stage sampling units that helps reduce the resulting biases of the variance estimates. For illustration, we used data from the National Health Interview Survey (NHIS) with some artificial modification. The proposed strategy performs very well in reducing the biases of variance estimates. Both theory and empirical results indicate that the effect of PSU masking on variance estimates is modest with minimal swapping of subsequent stage sampling units. The proposed masking strategy has been applied to the 2003-2004 National Health and Nutrition Examination Survey (NHANES) data release.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210756
    Description:

    In longitudinal surveys nonresponse often occurs in a pattern that is not monotone. We consider estimation of time-dependent means under the assumption that the nonresponse mechanism is last-value-dependent. Since the last value itself may be missing when nonresponse is nonmonotone, the nonresponse mechanism under consideration is nonignorable. We propose an imputation method by first deriving some regression imputation models according to the nonresponse mechanism and then applying nonparametric regression imputation. We assume that the longitudinal data follow a Markov chain with finite second-order moments. No other assumption is imposed on the joint distribution of longitudinal data and their nonresponse indicators. A bootstrap method is applied for variance estimation. Some simulation results and an example concerning the Current Employment Survey are presented.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200700210496
    Description:

    The European Community Household Panel (ECHP) is a panel survey covering a wide range of topics regarding economic, social and living conditions. In particular, it makes it possible to calculate disposable equivalized household income, which is a key variable in the study of economic inequity and poverty. To obtain reliable estimates of the average of this variable for regions within countries it is necessary to have recourse to small area estimation methods. In this paper, we focus on empirical best linear predictors of the average equivalized income based on "unit level models" borrowing strength across both areas and times. Using a simulation study based on ECHP data, we compare the suggested estimators with cross-sectional model-based and design-based estimators. In the case of these empirical predictors, we also compare three different MSE estimators. Results show that those estimators connected to models that take units' autocorrelation into account lead to a significant gain in efficiency, even when there are no covariates available whose population mean is known.

    Release date: 2008-01-03

  • Articles and reports: 12-001-X200700210494
    Description:

    The Australian Bureau of Statistics has recently developed a generalized estimation system for processing its large scale annual and sub-annual business surveys. Designs for these surveys have a large number of strata, use Simple Random Sampling within Strata, have non-negligible sampling fractions, are overlapping in consecutive periods, and are subject to frame changes. A significant challenge was to choose a variance estimation method that would best meet the following requirements: valid for a wide range of estimators (e.g., ratio and generalized regression), requires limited computation time, can be easily adapted to different designs and estimators, and has good theoretical properties measured in terms of bias and variance. This paper describes the Without Replacement Scaled Bootstrap (WOSB) that was implemented at the ABS and shows that it is appreciably more efficient than the Rao and Wu (1988)'s With Replacement Scaled Bootstrap (WSB). The main advantages of the Bootstrap over alternative replicate variance estimators are its efficiency (i.e., accuracy per unit of storage space) and the relative simplicity with which it can be specified in a system. This paper describes the WOSB variance estimator for point-in-time and movement estimates that can be expressed as a function of finite population means. Simulation results obtained as part of the evaluation process show that the WOSB was more efficient than the WSB, especially when the stratum sample sizes are sometimes as small as 5.

    Release date: 2008-01-03

  • Articles and reports: 82-003-S200700010362
    Description:

    This article summarizes the design, methods and results emerging from the Canadian Health Measures Survey pre-test, which took place from October through December 2004 in Calgary, Alberta.

    Release date: 2007-12-05

Reference (44)

Reference (44) (25 of 44 results)

  • Technical products: 11-522-X201700014755
    Description:

    The National Children’s Study Vanguard Study was a pilot epidemiological cohort study of children and their parents. Measures were to be taken from pre-pregnancy until adulthood. The use of extant data was planned to supplement direct data collection from the respondents. Our paper outlines a strategy for cataloging and evaluating extant data sources for use with large scale longitudinal. Through our review we selected five evaluation factors to guide a researcher through available data sources including 1) relevance, 2) timeliness, 3) spatiality, 4) accessibility, and 5) accuracy.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014750
    Description:

    The Educational Master File (EMF) system was built to allow the analysis of educational programs in Canada. At the core of the system are administrative files that record all of the registrations to post-secondary and apprenticeship programs in Canada. New administrative files become available on an annual basis. Once a new file becomes available, a first round of processing is performed, which includes linkage to other administrative records. This linkage yields information that can improve the quality of the file, it allows further linkages to other data describing labour market outcomes, and it’s the first step in adding the file to the EMF. Once part of the EMF, information from the file can be included in cross-sectional and longitudinal projects, to study academic pathways and labour market outcomes after graduation. The EMF currently consists of data from 2005 to 2013, but it evolves as new data become available. This paper gives an overview of the mechanisms used to build the EMF, with focus on the structure of the final system and some of its analytical potential.

    Release date: 2016-03-24

  • Technical products: 11-522-X201300014276
    Description:

    In France, budget restrictions are making it more difficult to hire casual interviewers to deal with collection problems. As a result, it has become necessary to adhere to a predetermined annual work quota. For surveys of the National Institute of Statistics and Economic Studies (INSEE), which use a master sample, problems arise when an interviewer is on extended leave throughout the entire collection period of a survey. When that occurs, an area may cease to be covered by the survey, and this effectively generates a bias. In response to this new problem, we have implemented two methods, depending on when the problem is identified: If an area is ‘abandoned’ before or at the very beginning of collection, we carry out a ‘sub-allocation’ procedure. The procedure involves interviewing a minimum number of households in each collection area at the expense of other areas in which no collection problems have been identified. The idea is to minimize the dispersion of weights while meeting collection targets. If an area is ‘abandoned’ during collection, we prioritize the remaining surveys. Prioritization is based on a representativeness indicator (R indicator) that measures the degree of similarity between a sample and the base population. The goal of this prioritization process during collection is to get as close as possible to equal response probability for respondents. The R indicator is based on the dispersion of the estimated response probabilities of the sampled households, and it is composed of partial R indicators that measure representativeness variable by variable. These R indicators are tools that we can use to analyze collection by isolating underrepresented population groups. We can increase collection efforts for groups that have been identified beforehand. In the oral presentation, we covered these two points concisely. By contrast, this paper deals exclusively with the first point: sub-allocation. Prioritization is being implemented for the first time at INSEE for the assets survey, and it will be covered in a specific paper by A. Rebecq.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014279
    Description:

    As part of the European SustainCity project, a microsimulation model of individuals and households was created to simulate the population of various European cities. The aim of the project was to combine several transportation and land-use microsimulation models (land-use modelling), add on a dynamic population module and apply these microsimulation approaches to three geographic areas of Europe (the Île-de-France region and the Brussels and Zurich agglomerations

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014273
    Description:

    More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term ‘Big Data’. Because of the fact that these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. However, first experiences obtained with analyses of large amounts of Dutch traffic loop detection records, call detail records of mobile phones and Dutch social media messages reveal that a number of challenges need to be addressed to enable the application of these data sources for official statistics. These and the lessons learned during these initial studies will be addressed and illustrated by examples. More specifically, the following topics are discussed: the three general types of Big Data discerned, the need to access and analyse large amounts of data, how we deal with noisy data and look at selectivity (and our own bias towards this topic), how to go beyond correlation, how we found people with the right skills and mind-set to perform the work, and how we have dealt with privacy and security issues.

    Release date: 2014-10-31

  • Technical products: 11-522-X200800010979
    Description:

    Prior to 2006, the Canadian Census of Population relied on field staff to deliver questionnaires to all dwellings in Canada. For the 2006 Census, an address frame was created to cover almost 70% of dwellings in Canada, and these questionnaires were delivered by Canada Post. For the 2011 Census, Statistics Canada aims to expand this frame further, with a target of delivering questionnaires by mail to between 80% and 85% of dwellings. Mailing questionnaires for the Census raises a number of issues, among them: ensuring returned questionnaires are counted in the right area, creating an up to date address frame that includes all new growth, and determining which areas are unsuitable for having questionnaires delivered by mail. Changes to the address frame update procedures for 2011, most notably the decision to use purely administrative data as the frame wherever possible and conduct field update exercises only where deemed necessary, provide a new set of challenges for the 2011 Census.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010958
    Description:

    Telephone Data Entry (TDE) is a system by which survey respondents can return their data to the Office for National Statistics (ONS) using the keypad on their telephone and currently accounts for approximately 12% of total responses to ONS business surveys. ONS is currently increasing the number of surveys which use TDE as the primary mode of response and this paper gives an overview of the redevelopment project covering; the redevelopment of the paper questionnaire, enhancements made to the TDE system and the results from piloting these changes. Improvements to the quality of the data received and increased response via TDE as a result of these developments suggest that data quality improvements and cost savings are possible as a result of promoting TDE as the primary mode of response to short term surveys.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011005
    Description:

    In 2006 Statistics New Zealand started developing a strategy aimed at coordinating new and existing initiatives focused on respondent load. The development of the strategy lasted more than a year and the resulting commitment to reduce respondent load has meant that the organisation has had to confront a number of issues that impact on the way we conduct our surveys.

    The next challenge for Statistics NZ is the transition from the project based initiatives outlined in the strategy to managing load on an ongoing basis.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010954
    Description:

    Over the past year, Statistics Canada has been developing and testing a new way to monitor the performance of interviewers conducting computer-assisted personal interviews (CAPI). A formal process already exists for monitoring centralized telephone interviews. Monitors listen to telephone interviews as they take place to assess the interviewer's performance using pre-defined criteria and provide feedback to the interviewer on what was well done and what needs improvement. For the CAPI program, we have developed and are testing a pilot approach whereby interviews are digitally recorded and later a monitor listens to these recordings to assess the field interviewer's performance and provide feedback in order to help improve the quality of the data. In this paper, we will present an overview of the CAPI monitoring project at Statistics Canada by describing the CAPI monitoring methodology and the plans for implementation.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010962
    Description:

    The ÉLDEQ initiated a special data gathering project in March 2008 with the collection of biological materials from 1,973 families. During a typical visit, a nurse collects a blood or saliva sample from the selected child, makes a series of measurements (anthropometry, pulse rate and blood pressure) and administers questionnaires. Planned and supervised by the Institut de la Statistique du Québec (ISQ) and the Université de Montréal, the study is being conducted in cooperation with two private firms and a number of hospitals. This article examines the choice of collection methods, the division of effort among the various players, the sequence of communications and contacts with respondents, the tracing of families who are not contacted, and follow-up on the biological samples. Preliminary field results are also presented.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010970
    Description:

    RTI International is currently conducting a longitudinal education study. One component of the study involved collecting transcripts and course catalogs from high schools that the sample members attended. Information from the transcripts and course catalogs also needed to be keyed and coded. This presented a challenge because the transcripts and course catalogs were collected from different types of schools, including public, private, and religious schools, from across the nation and they varied widely in both content and format. The challenge called for a sophisticated system that could be used by multiple users simultaneously. RTI developed such a system possessing all the characteristics of a high-end, high-tech, multi-user, multitask, user-friendly and low maintenance cost high school transcript and course catalog keying and coding system. The system is web based and has three major functions: transcript and catalog keying and coding, transcript and catalog keying quality control (keyer-coder end), and transcript and catalog coding QC (management end). Given the complex nature of transcript and catalog keying and coding, the system was designed to be flexible and to have the ability to transport keyed and coded data throughout the system to reduce the keying time, the ability to logically guide users through all the pages that a type of activity required, the ability to display appropriate information to help keying performance, and the ability to track all the keying, coding, and QC activities. Hundreds of catalogs and thousands of transcripts were successfully keyed, coded, and verified using the system. This paper will report on the system needs and design, implementation tips, problems faced and their solutions, and lessons learned.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011015
    Description:

    Statistics South Africa (StatsSA) prides itself in the accuracy and validity of data collected, processed and disseminated. The introduction of a Real Time Management System (RTMS) and the Global Positioning System (GPS) into field operations is aimed at enhancing the process of data collection and minimising errors with regard to locating sampled dwelling units and tracking material from one point in the survey chain to another.

    The Quarterly Labour Force Survey (QLFS) is a pioneering project at Stats SA where the Master sample (MS) is linked to a GPS data base, where every record listed on the MS listing book has a corresponding GPS coordinate captured for it. These GPS points allows the Survey Officer to record spatially where different records are on the ground that are being listed (i.e. shops, houses, schools, churches etc.). The captured information is then linked to a shape file which populates where the structures are on the ground in relation to the manual listing records.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010965
    Description:

    Surveys that employ simultaneous web, CATI, and paper modes (or any two-way subset) are increasingly common. Mathematica Policy Research, Inc (MPR) has deployed several of these types of surveys in Blaise. This paper reviews MPR's experiences and issues with these efforts by addressing instrumentation, survey management, and other considerations. The paper emphasizes the electronic implementation of these surveys and covers topics that emerge solely from the surveys' multimode nature; that is, material that goes beyond the implementation of a single-mode survey.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010950
    Description:

    The next census will be conducted in May 2011. Being a major survey, it presents a formidable challenge for Statistics Canada and requires a great deal of time and resources. Careful planning has been done to ensure that all deadlines are met. A number of steps have been planned in the questionnaire testing process. These tests apply to both census content and the proposed communications strategy. This paper presents an overview of the strategy, with a focus on combining qualitative studies with the 2008 quantitative study so that the results can be analyzed and the proposals properly evaluated.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010996
    Description:

    In recent years, the use of paradata has become increasingly important to the management of collection activities at Statistics Canada. Particular attention has been paid to social surveys conducted over the phone, like the Survey of Labour and Income Dynamics (SLID). For recent SLID data collections, the number of call attempts was capped at 40 calls. Investigations of the SLID Blaise Transaction History (BTH) files were undertaken to assess the impact of the cap on calls.The purpose of the first study was to inform decisions as to the capping of call attempts, the second study focused on the nature of nonresponse given the limit of 40 attempts.

    The use of paradata as auxiliary information for studying and accounting for survey nonresponse was also examined. Nonresponse adjustment models using different paradata variables gathered at the collection stage were compared to the current models based on available auxiliary information from the Labour Force Survey.

    Release date: 2009-12-03

  • Technical products: 11-536-X200900110812
    Description:

    Variance estimation in the presence of imputed data has been widely studied in the literature. It is well known that treating the imputed values as if they were observed could lead to serious underestimation of the variance of the imputed estimator. Several approaches/techniques have been developed in recent years. In particular, Rao and Shao (1992) have proposed an adjusted jackknife that works well when the sampling fraction is small. However, in many situations, this condition is not satisfied. As a result, the Rao-Shao adjusted jackknife may lead to invalid variance estimators. To overcome this problem, Lee, Rancourt and Särndal (1995) have proposed a simple correction to the Rao-Shao adjusted jackknife. In this presentation, we discuss the properties of the resulting variance estimator under stratified simple random sampling without replacement. Also, using the reverse approach developed by Shao and Steel (1999), we consider another variance estimator that works well when the sampling fractions are not negligible. The case of unequal probability sampling designs such as proportional-to-size-designs will be briefly discussed.

    Release date: 2009-08-11

  • Technical products: 11-522-X200600110452
    Description:

    Accurate information about the timing of access to primary mental health care is critically important in order to identify potentially modifiable factors which could facilitate timely and on-going management of care. No "gold standard" measure of mental health care utilization exists, so it useful to know how strengths, gaps, and limitations in different data sources influence study results. This study compares two population-wide measures of primary mental health care utilization data: the Canadian Community Health Survey of Mental Health and Well-being (CCHS, cycle 1.2) and provincial health insurance records in the province of British Columbia. It explores four questions: (1) Is 12-month prevalence of contacts with general practitioners for mental heath issues the same regardless of whether survey data or administrative data are used? (2) What is the level of agreement between the survey data and administrative data for having had any contact with a general practitioner for mental heath issues during the 12 month period before the survey interview? (3) Is the level of agreement constant throughout the 12-month period or does it decline over more distant sub-timeframes within the 12-month period? (4) What kinds of respondent characteristics, including mental disorders, are associated with agreement or lack of agreement? The results of this study will provide useful information about how to use and interpret each measure of health care utilization. In addition, it will contribute to survey design research, and to research which aims to improve the methods for using administrative data for mental health services research.

    Release date: 2008-03-17

  • Technical products: 11-522-X20050019454
    Description:

    The goal of the BR Redesign Project is to simplify, optimize, and harmonize its processes and methods. This paper provides an overview of the BR Redesign with emphasis on the issues that affect the methodology of business surveys.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019453
    Description:

    The UK's Office for National Statistics (ONS) is starting a development programme for business surveys to meet the recommendations of a recent government report calling for improvements to economic statistics, in particular regional economic statistics.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019493
    Description:

    This article introduces the General Statistics Office in Hanoi, Vietnam, and gives a description of socio-economic surveys conducted since the early nineties in Vietnam, with a discussion of their methods and achievements, as well as remaining challenges.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019469
    Description:

    The 1990s was the decade of longitudinal surveys in Canada. The focus was squarely on the benefits that could be derived from the increased analytical power of longitudinal surveys. This presentation explores issues of insights gained, timeliness, data access, survey design, complexity, research capacity, survey governance and knowledge mobilisation. This presentation outlines some of the issues that are likely to be raised in any debate regarding longitudinal surveys.

    Release date: 2007-03-02

  • Technical products: 11-522-X20040018757
    Description:

    In its attempt to deal with bad frames and measure characteristics that are rare, ONS builds satellite registers from an administrative source, builds up historic information into a panel, or uses filter questions from more general surveys.

    Release date: 2005-10-27

  • Technical products: 11-522-X20040018749
    Description:

    In its attempt to measure the mental health of Cambodian refugees in the U.S., Rand Corporation introduces novel methodology for efficiently listing, screening, and identifying households to ultimately yield a random sample of eligible participants.

    Release date: 2005-10-27

  • Technical products: 11-522-X20030017708
    Description:

    This article provides an overview of the work to date using GST data at Statistics Canada as direct replacement in imputation or estimation or as a data certification tool.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017714
    Description:

    This paper looks at new conceptual, organizational, statistical and computational ways of improving surveys.

    Release date: 2005-01-26

Date modified: