• Articles and reports: 12-001-X201600114540
Description:

In this paper, we compare the EBLUP and pseudo-EBLUP estimators for small area estimation under the nested error regression model and three area level model-based estimators using the Fay-Herriot model. We conduct a design-based simulation study to compare the model-based estimators for unit level and area level models under informative and non-informative sampling. In particular, we are interested in the confidence interval coverage rate of the unit level and area level estimators. We also compare the estimators if the model has been misspecified. Our simulation results show that estimators based on the unit level model perform better than those based on the area level. The pseudo-EBLUP estimator is the best among unit level and area level estimators.

Release date: 2016-06-22

• Articles and reports: 12-001-X201600114542
Description:

The restricted maximum likelihood (REML) method is generally used to estimate the variance of the random area effect under the Fay-Herriot model (Fay and Herriot 1979) to obtain the empirical best linear unbiased (EBLUP) estimator of a small area mean. When the REML estimate is zero, the weight of the direct sample estimator is zero and the EBLUP becomes a synthetic estimator. This is not often desirable. As a solution to this problem, Li and Lahiri (2011) and Yoshimori and Lahiri (2014) developed adjusted maximum likelihood (ADM) consistent variance estimators which always yield positive variance estimates. Some of the ADM estimators always yield positive estimates but they have a large bias and this affects the estimation of the mean squared error (MSE) of the EBLUP. We propose to use a MIX variance estimator, defined as a combination of the REML and ADM methods. We show that it is unbiased up to the second order and it always yields a positive variance estimate. Furthermore, we propose an MSE estimator under the MIX method and show via a model-based simulation that in many situations, it performs better than other ‘Taylor linearization’ MSE estimators proposed recently.

Release date: 2016-06-22

• Articles and reports: 82-003-X201600114306
Description:

Release date: 2016-01-20

• Articles and reports: 82-003-X201501214295
Description:

Using the Wisconsin Cancer Intervention and Surveillance Monitoring Network breast cancer simulation model adapted to the Canadian context, costs and quality-adjusted life years were evaluated for 11 mammography screening strategies that varied by start/stop age and screening frequency for the general population. Incremental cost-effectiveness ratios are presented, and sensitivity analyses are used to assess the robustness of model conclusions.

Release date: 2015-12-16

• Articles and reports: 12-001-X201400214091
Description:

Parametric fractional imputation (PFI), proposed by Kim (2011), is a tool for general purpose parameter estimation under missing data. We propose a fractional hot deck imputation (FHDI) which is more robust than PFI or multiple imputation. In the proposed method, the imputed values are chosen from the set of respondents and assigned proper fractional weights. The weights are then adjusted to meet certain calibration conditions, which makes the resulting FHDI estimator efficient. Two simulation studies are presented to compare the proposed method with existing methods.

Release date: 2014-12-19

• Technical products: 11-522-X201300014271
Description:

The purpose of this paper is to present the use of administrative records in the U.S. Census for Group Quarters, or known as collective dwellings elsewhere. Group Quarters enumeration involves collecting data from such hard-to-access places as correctional facilities, skilled nursing facilities, and military barracks. We discuss benefits and constraints of using various sources of administrative records in constructing the Group Quarters frame for coverage improvement. This paper is a companion to the paper by Chun and Gan (2014), discusing the potential uses of administrative records in the Group Quarters enumeration.

Release date: 2014-10-31

• Articles and reports: 12-001-X201300111830
Description:

We consider two different self-benchmarking methods for the estimation of small area means based on the Fay-Herriot (FH) area level model: the method of You and Rao (2002) applied to the FH model and the method of Wang, Fuller and Qu (2008) based on augmented models. We derive an estimator of the mean squared prediction error (MSPE) of the You-Rao (YR) estimator of a small area mean that, under the true model, is correct to second-order terms. We report the results of a simulation study on the relative bias of the MSPE estimator of the YR estimator and the MSPE estimator of the Wang, Fuller and Qu (WFQ) estimator obtained under an augmented model. We also study the MSPE and the estimators of MSPE for the YR and WFQ estimators obtained under a misspecified model.

Release date: 2013-06-28

• Articles and reports: 12-001-X201100111448
Description:

In two-phase sampling for stratification, the second-phase sample is selected by a stratified sample based on the information observed in the first-phase sample. We develop a replication-based bias adjusted variance estimator that extends the method of Kim, Navarro and Fuller (2006). The proposed method is also applicable when the first-phase sampling rate is not negligible and when second-phase sample selection is unequal probability Poisson sampling within each stratum. The proposed method can be extended to variance estimation for two-phase regression estimators. Results from a limited simulation study are presented.

Release date: 2011-06-29

• Articles and reports: 12-001-X201100111445
Description:

In this paper we study small area estimation using area level models. We first consider the Fay-Herriot model (Fay and Herriot 1979) for the case of smoothed known sampling variances and the You-Chapman model (You and Chapman 2006) for the case of sampling variance modeling. Then we consider hierarchical Bayes (HB) spatial models that extend the Fay-Herriot and You-Chapman models by capturing both the geographically unstructured heterogeneity and spatial correlation effects among areas for local smoothing. The proposed models are implemented using the Gibbs sampling method for fully Bayesian inference. We apply the proposed models to the analysis of health survey data and make comparisons among the HB model-based estimates and direct design-based estimates. Our results have shown that the HB model-based estimates perform much better than the direct estimates. In addition, the proposed area level spatial models achieve smaller CVs than the Fay-Herriot and You-Chapman models, particularly for the areas with three or more neighbouring areas. Bayesian model comparison and model fit analysis are also presented.

Release date: 2011-06-29

• Articles and reports: 12-001-X201000111246
Description:

Many surveys employ weight adjustment procedures to reduce nonresponse bias. These adjustments make use of available auxiliary data. This paper addresses the issue of jackknife variance estimation for estimators that have been adjusted for nonresponse. Using the reverse approach for variance estimation proposed by Fay (1991) and Shao and Steel (1999), we study the effect of not re-calculating the nonresponse weight adjustment within each jackknife replicate. We show that the resulting 'shortcut' jackknife variance estimator tends to overestimate the true variance of point estimators in the case of several weight adjustment procedures used in practice. These theoretical results are confirmed through a simulation study where we compare the shortcut jackknife variance estimator with the full jackknife variance estimator obtained by re-calculating the nonresponse weight adjustment within each jackknife replicate.

Release date: 2010-06-29

• Articles and reports: 12-001-X201000111249
Description:

For many designs, there is a nonzero probability of selecting a sample that provides poor estimates for known quantities. Stratified random sampling reduces the set of such possible samples by fixing the sample size within each stratum. However, undesirable samples are still possible with stratification. Rejective sampling removes poor performing samples by only retaining a sample if specified functions of sample estimates are within a tolerance of known values. The resulting samples are often said to be balanced on the function of the variables used in the rejection procedure. We provide modifications to the rejection procedure of Fuller (2009a) that allow more flexibility on the rejection rules. Through simulation, we compare estimation properties of a rejective sampling procedure to those of cube sampling.

Release date: 2010-06-29

• Technical products: 11-522-X200800011010
Description:

The Survey of Employment, Payrolls and Hours (SEPH) is a monthly survey using two sources of data: a census of payroll deduction (PD7) forms (administrative data) and a survey of business establishments. This paper focuses on the processing of the administrative data, from the weekly receipt of data from the Canada Revenue Agency to the production of monthly estimates produced by SEPH.

The edit and imputation methods used to process the administrative data have been revised in the last several years. The goals of this redesign were primarily to improve the data quality and to increase the consistency with another administrative data source (T4) which is a benchmark measure for Statistics Canada's System of National Accounts people. An additional goal was to ensure that the new process would be easier to understand and to modify, if needed. As a result, a new processing module was developed to edit and impute PD7 forms before their data is aggregated to the monthly level.

This paper presents an overview of both the current and new processes, including a description of challenges that we faced during development. Improved quality is demonstrated both conceptually (by presenting examples of PD7 forms and their treatment under the old and new systems) and quantitatively (by comparison to T4 data).

Release date: 2009-12-03

• Articles and reports: 12-001-X200800110614
Description:

The Canadian Labour Force Survey (LFS) produces monthly estimates of the unemployment rate at national and provincial levels. The LFS also releases unemployment estimates for sub-provincial areas such as Census Metropolitan Areas (CMAs) and Urban Centers (UCs). However, for some sub-provincial areas, the direct estimates are not reliable since the sample size in some areas is quite small. The small area estimation in LFS concerns estimation of unemployment rates for local sub-provincial areas such as CMA/UCs using small area models. In this paper, we will discuss various models including the Fay-Herriot model and cross-sectional and time series models. In particular, an integrated non-linear mixed effects model will be proposed under the hierarchical Bayes (HB) framework for the LFS unemployment rate estimation. Monthly Employment Insurance (EI) beneficiary data at the CMA/UC level are used as auxiliary covariates in the model. A HB approach with the Gibbs sampling method is used to obtain the estimates of posterior means and posterior variances of the CMA/UC level unemployment rates. The proposed HB model leads to reliable model-based estimates in terms of CV reduction. Model fit analysis and comparison of the model-based estimates with the direct estimates are presented in the paper.

Release date: 2008-06-26

• Technical products: 11-522-X200600110397
Description:

In practice it often happens that some collected data are subject to measurement error. Sometimes covariates (or risk factors) of interest may be difficult to observe precisely due to physical location or cost. Sometimes it is impossible to measure covariates accurately due to the nature of the covariates. In other situations, a covariate may represent an average of a certain quantity over time, and any practical way of measuring such a quantity necessarily features measurement error. When carrying out statistical inference in such settings, it is important to account for the effects of mismeasured covariates; otherwise, erroneous or even misleading results may be produced. In this paper, we discuss several measurement error examples arising in distinct contexts. Specific attention is focused on survival data with covariates subject to measurement error. We discuss a simulation-extrapolation method for adjusting for measurement error effects. A simulation study is reported.

Release date: 2008-03-17

• Technical products: 11-522-X200600110433
Description:

The process of public-use micro-data files creation involves a number of components. One of its key elements is RTI International's innovative MASSC methodology. However, there are other major components in this process such as treatment of non-core identifying variables and extreme outcomes for extra protection. The statistical disclosure limitation is designed to counter both inside and outside intrusion. The components of the process are accordingly designed.

Release date: 2008-03-17

• Technical products: 11-522-X20050019474
Description:

Missingness is a common feature of longitudinal studies. In recent years there has been considerable research devoted to the development of methods for the analysis of incomplete longitudinal data. One common practice is imputation by the " last observation carried forward" (LOCF) approach, in which values for missing responses are imputed using observations from the most recently completed assessment. In this talk I will first examine the performance of the LOCF approach where the generalized estimating equations (GEE) are employed as the inferential procedures.

Release date: 2007-03-02

• Articles and reports: 12-001-X20060019263
Description:

In small area estimation, area level models such as the Fay - Herriot model (Fay and Herriot 1979) are widely used to obtain efficient model-based estimators for small areas. The sampling error variances are customarily assumed to be known in the model. In this paper we consider the situation where the sampling error variances are estimated individually by direct estimators. A full hierarchical Bayes (HB) model is constructed for the direct survey estimators and the sampling error variances estimators. The Gibbs sampling method is employed to obtain the small area HB estimators. The proposed HB approach automatically takes account of the extra uncertainty of estimating the sampling error variances, especially when the area-specific sample sizes are small. We compare the proposed HB model with the Fay - Herriot model through analysis of two survey data sets. Our results have shown that the proposed HB estimators perform quite well compared to the direct estimates. We also discussed the problem of priors on the variance components.

Release date: 2006-07-20

• Technical products: 12-002-X20060019253
Description:

Before any analytical results are released from the Research Data Centres (RDCs), RDC analysts must conduct disclosure risk analysis (or vetting). RDC analysts apply Statistics Canada's disclosure control guidelines, when reviewing all analytical output, as a means of ensuring the protection of survey respondents' confidentiality. For some data sets, such as the Aboriginal People's Survey (APS), Ethnic Diversity Survey (EDS), the Participation, Activity and Limitation Survey (PALS) and the Longitudinal Survey of Immigrants to Canada (LSIC), Statistics Canada has developed an additional set of guidelines that involve rounding analytical results, in order to ensure further confidentiality protection. This article will discuss the rationale for the additional rounding procedures used for these data sets, and describe the specifics of the rounding guidelines. More importantly, this paper will suggest several approaches to assist researchers in following these protocols more effectively and efficiently.

Release date: 2006-07-18

• Technical products: 11-522-X20040018733
Description:

A survey on injecting drug users is designed to use the information collected from needle exchange centres and from sampled injecting drug users. A methodology is developed to produce various estimates.

Release date: 2005-10-27

• Articles and reports: 12-001-X20050018088
Description:

When administrative records are geographically linked to census block groups, local-area characteristics from the census can be used as contextual variables, which may be useful supplements to variables that are not directly observable from the administrative records. Often databases contain records that have insufficient address information to permit geographical links with census block groups; the contextual variables for these records are therefore unobserved. We propose a new method that uses information from "matched cases" and multivariate regression models to create multiple imputations for the unobserved variables. Our method outperformed alternative methods in simulation evaluations using census data, and was applied to the dataset for a study on treatment patterns for colorectal cancer patients.

Release date: 2005-07-21

• Articles and reports: 12-001-X20040027758
Description:

In this article, we study the use of Bayesian neural networks in finite population estimation.We propose estimators for finite population mean and the associated mean squared error. We also propose to use the student t-distribution to model the disturbances in order to accommodate extreme observations that are often present in the data from social sample surveys. Numerical results show that Bayesian neural networks have made a significant improvement in finite population estimation over linear regression based methods

Release date: 2005-02-03

• Technical products: 11-522-X20030017715
Description:

This paper reports on a program of the Australian Bureau of Statistics (ABS) designed to increase awareness of the quality of ABS data, and to educate users about the importance of knowledge of data quality.

Release date: 2005-01-26

• Technical products: 11-522-X20030017705
Description:

This paper develops an iterative weighted estimating equations (IWEE) to estimate the fixed effects and the variance components in the random intercept model using sampling weights.

Release date: 2005-01-26

• Technical products: 11-522-X20020016719
Description:

This study takes a look at the modelling methods used for public health data. Public health has a renewed interest in the impact of the environment on health. Ecological or contextual studies ideally investigate these relationships using public health data augmented with environmental characteristics in multilevel or hierarchical models. In these models, individual respondents in health data are the first level and community data are the second level. Most public health data use complex sample survey designs, which require analyses accounting for the clustering, nonresponse, and poststratification to obtain representative estimates of prevalence of health risk behaviours.

This study uses the Behavioral Risk Factor Surveillance System (BRFSS), a state-specific US health risk factor surveillance system conducted by the Center for Disease Control and Prevention, which assesses health risk factors in over 200,000 adults annually. BRFSS data are now available at the metropolitan statistical area (MSA) level and provide quality health information for studies of environmental effects. MSA-level analyses combining health and environmental data are further complicated by joint requirements of the survey sample design and the multilevel analyses.

We compare three modelling methods in a study of physical activity and selected environmental factors using BRFSS 2000 data. Each of the methods described here is a valid way to analyse complex sample survey data augmented with environmental information, although each accounts for the survey design and multilevel data structure in a different manner and is thus appropriate for slightly different research questions.

Release date: 2004-09-13

• Technical products: 11-522-X20020016727
Description:

The census data are widely used in the distribution and targeting of resources at national, regional and local levels. In the United Kingdom (UK), a population census is conducted every 10 years. As time elapses, the census data become outdated and less relevant, thus making the distribution of resources less equitable. This paper examines alternative methods in rectifying this.

A number of small area methods have been developed for producing postcensal estimates, including the Structural Preserving Estimation technique as a result of Purcell and Kish (1980). This paper develops an alternative approach that is based on a linear mixed modelling approach to producing postcensal estimates. The validity of the methodology is tested on simulated data from the Finnish population register and the technique is applied to producing updated estimates for a number of the 1991 UK census variables.

Release date: 2004-09-13

• Technical products: 11-522-X201300014271
Description:

The purpose of this paper is to present the use of administrative records in the U.S. Census for Group Quarters, or known as collective dwellings elsewhere. Group Quarters enumeration involves collecting data from such hard-to-access places as correctional facilities, skilled nursing facilities, and military barracks. We discuss benefits and constraints of using various sources of administrative records in constructing the Group Quarters frame for coverage improvement. This paper is a companion to the paper by Chun and Gan (2014), discusing the potential uses of administrative records in the Group Quarters enumeration.

Release date: 2014-10-31

• Technical products: 11-522-X200800011010
Description:

The Survey of Employment, Payrolls and Hours (SEPH) is a monthly survey using two sources of data: a census of payroll deduction (PD7) forms (administrative data) and a survey of business establishments. This paper focuses on the processing of the administrative data, from the weekly receipt of data from the Canada Revenue Agency to the production of monthly estimates produced by SEPH.

The edit and imputation methods used to process the administrative data have been revised in the last several years. The goals of this redesign were primarily to improve the data quality and to increase the consistency with another administrative data source (T4) which is a benchmark measure for Statistics Canada's System of National Accounts people. An additional goal was to ensure that the new process would be easier to understand and to modify, if needed. As a result, a new processing module was developed to edit and impute PD7 forms before their data is aggregated to the monthly level.

This paper presents an overview of both the current and new processes, including a description of challenges that we faced during development. Improved quality is demonstrated both conceptually (by presenting examples of PD7 forms and their treatment under the old and new systems) and quantitatively (by comparison to T4 data).

Release date: 2009-12-03

• Technical products: 11-522-X200600110397
Description:

In practice it often happens that some collected data are subject to measurement error. Sometimes covariates (or risk factors) of interest may be difficult to observe precisely due to physical location or cost. Sometimes it is impossible to measure covariates accurately due to the nature of the covariates. In other situations, a covariate may represent an average of a certain quantity over time, and any practical way of measuring such a quantity necessarily features measurement error. When carrying out statistical inference in such settings, it is important to account for the effects of mismeasured covariates; otherwise, erroneous or even misleading results may be produced. In this paper, we discuss several measurement error examples arising in distinct contexts. Specific attention is focused on survival data with covariates subject to measurement error. We discuss a simulation-extrapolation method for adjusting for measurement error effects. A simulation study is reported.

Release date: 2008-03-17

• Technical products: 11-522-X200600110433
Description:

The process of public-use micro-data files creation involves a number of components. One of its key elements is RTI International's innovative MASSC methodology. However, there are other major components in this process such as treatment of non-core identifying variables and extreme outcomes for extra protection. The statistical disclosure limitation is designed to counter both inside and outside intrusion. The components of the process are accordingly designed.

Release date: 2008-03-17

• Technical products: 11-522-X20050019474
Description:

Missingness is a common feature of longitudinal studies. In recent years there has been considerable research devoted to the development of methods for the analysis of incomplete longitudinal data. One common practice is imputation by the " last observation carried forward" (LOCF) approach, in which values for missing responses are imputed using observations from the most recently completed assessment. In this talk I will first examine the performance of the LOCF approach where the generalized estimating equations (GEE) are employed as the inferential procedures.

Release date: 2007-03-02

• Technical products: 12-002-X20060019253
Description:

Before any analytical results are released from the Research Data Centres (RDCs), RDC analysts must conduct disclosure risk analysis (or vetting). RDC analysts apply Statistics Canada's disclosure control guidelines, when reviewing all analytical output, as a means of ensuring the protection of survey respondents' confidentiality. For some data sets, such as the Aboriginal People's Survey (APS), Ethnic Diversity Survey (EDS), the Participation, Activity and Limitation Survey (PALS) and the Longitudinal Survey of Immigrants to Canada (LSIC), Statistics Canada has developed an additional set of guidelines that involve rounding analytical results, in order to ensure further confidentiality protection. This article will discuss the rationale for the additional rounding procedures used for these data sets, and describe the specifics of the rounding guidelines. More importantly, this paper will suggest several approaches to assist researchers in following these protocols more effectively and efficiently.

Release date: 2006-07-18

• Technical products: 11-522-X20040018733
Description:

A survey on injecting drug users is designed to use the information collected from needle exchange centres and from sampled injecting drug users. A methodology is developed to produce various estimates.

Release date: 2005-10-27

• Technical products: 11-522-X20030017715
Description:

This paper reports on a program of the Australian Bureau of Statistics (ABS) designed to increase awareness of the quality of ABS data, and to educate users about the importance of knowledge of data quality.

Release date: 2005-01-26

• Technical products: 11-522-X20030017705
Description:

This paper develops an iterative weighted estimating equations (IWEE) to estimate the fixed effects and the variance components in the random intercept model using sampling weights.

Release date: 2005-01-26

• Technical products: 11-522-X20020016719
Description:

This study takes a look at the modelling methods used for public health data. Public health has a renewed interest in the impact of the environment on health. Ecological or contextual studies ideally investigate these relationships using public health data augmented with environmental characteristics in multilevel or hierarchical models. In these models, individual respondents in health data are the first level and community data are the second level. Most public health data use complex sample survey designs, which require analyses accounting for the clustering, nonresponse, and poststratification to obtain representative estimates of prevalence of health risk behaviours.

This study uses the Behavioral Risk Factor Surveillance System (BRFSS), a state-specific US health risk factor surveillance system conducted by the Center for Disease Control and Prevention, which assesses health risk factors in over 200,000 adults annually. BRFSS data are now available at the metropolitan statistical area (MSA) level and provide quality health information for studies of environmental effects. MSA-level analyses combining health and environmental data are further complicated by joint requirements of the survey sample design and the multilevel analyses.

We compare three modelling methods in a study of physical activity and selected environmental factors using BRFSS 2000 data. Each of the methods described here is a valid way to analyse complex sample survey data augmented with environmental information, although each accounts for the survey design and multilevel data structure in a different manner and is thus appropriate for slightly different research questions.

Release date: 2004-09-13

• Technical products: 11-522-X20020016727
Description:

The census data are widely used in the distribution and targeting of resources at national, regional and local levels. In the United Kingdom (UK), a population census is conducted every 10 years. As time elapses, the census data become outdated and less relevant, thus making the distribution of resources less equitable. This paper examines alternative methods in rectifying this.

A number of small area methods have been developed for producing postcensal estimates, including the Structural Preserving Estimation technique as a result of Purcell and Kish (1980). This paper develops an alternative approach that is based on a linear mixed modelling approach to producing postcensal estimates. The validity of the methodology is tested on simulated data from the Finnish population register and the technique is applied to producing updated estimates for a number of the 1991 UK census variables.

Release date: 2004-09-13

• Technical products: 11-522-X20020016717
Description:

In the United States, the National Health and Nutrition Examination Survey (NHANES) is linked to the National Health Interview Survey (NHIS) at the primary sampling unit level (the same counties, but not necessarily the same persons, are in both surveys). The NHANES examines about 5,000 persons per year, while the NHIS samples about 100,000 persons per year. In this paper, we present and develop properties of models that allow NHIS and administrative data to be used as auxiliary information for estimating quantities of interest in the NHANES. The methodology, related to Fay-Herriot (1979) small-area models and to calibration estimators in Deville and Sarndal (1992), accounts for the survey designs in the error structure.

Release date: 2004-09-13

• Technical products: 11-522-X19990015640
Description:

This paper states how SN is preparing for a new era in the making of statistics, as it is triggered by technological and methodological developments. An essential feature of the turn to the new era is the farewell to the stovepipe way of data processing. The paper discusses how new technological and methodological tools will affect processes and their organization. Special emphasis is put on one of the major chances and challenges the new tools offer: establishing coherence in the content of statistics and in the presentation to users.

Release date: 2000-03-02

• Technical products: 11-522-X19980015016
Description:

Models for fitting longitudinal binary responses are explored using a panel study of voting intentions. A standard repeated measures multilevel logistic model is shown inadequate due to the presence of a substantial proportion of respondents who maintain a constant response over time. A multivariate binary response model is shown a better fit to the data.

Release date: 1999-10-22

