Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

81 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Author(s)

81 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

81 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

81 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (161)

All (161) (25 of 161 results)

  • Articles and reports: 12-001-X201700254888
    Description:

    We discuss developments in sample survey theory and methods covering the past 100 years. Neyman’s 1934 landmark paper laid the theoretical foundations for the probability sampling approach to inference from survey samples. Classical sampling books by Cochran, Deming, Hansen, Hurwitz and Madow, Sukhatme, and Yates, which appeared in the early 1950s, expanded and elaborated the theory of probability sampling, emphasizing unbiasedness, model free features, and designs that minimize variance for a fixed cost. During the period 1960-1970, theoretical foundations of inference from survey data received attention, with the model-dependent approach generating considerable discussion. Introduction of general purpose statistical software led to the use of such software with survey data, which led to the design of methods specifically for complex survey data. At the same time, weighting methods, such as regression estimation and calibration, became practical and design consistency replaced unbiasedness as the requirement for standard estimators. A bit later, computer-intensive resampling methods also became practical for large scale survey samples. Improved computer power led to more sophisticated imputation for missing data, use of more auxiliary data, some treatment of measurement errors in estimation, and more complex estimation procedures. A notable use of models was in the expanded use of small area estimation. Future directions in research and methods will be influenced by budgets, response rates, timeliness, improved data collection devices, and availability of auxiliary data, some of which will come from “Big Data”. Survey taking will be impacted by changing cultural behavior and by a changing physical-technical environment.

    Release date: 2017-12-21

  • Articles and reports: 82-003-X201601214687
    Description:

    This study describes record linkage of the Canadian Community Health Survey and the Canadian Mortality Database. The article explains the record linkage process and presents results about associations between health behaviours and mortality among a representative sample of Canadians.

    Release date: 2016-12-21

  • Articles and reports: 12-001-X201600214677
    Description:

    How do we tell whether weighting adjustments reduce nonresponse bias? If a variable is measured for everyone in the selected sample, then the design weights can be used to calculate an approximately unbiased estimate of the population mean or total for that variable. A second estimate of the population mean or total can be calculated using the survey respondents only, with weights that have been adjusted for nonresponse. If the two estimates disagree, then there is evidence that the weight adjustments may not have removed the nonresponse bias for that variable. In this paper we develop the theoretical properties of linearization and jackknife variance estimators for evaluating the bias of an estimated population mean or total by comparing estimates calculated from overlapping subsets of the same data with different sets of weights, when poststratification or inverse propensity weighting is used for the nonresponse adjustments to the weights. We provide sufficient conditions on the population, sample, and response mechanism for the variance estimators to be consistent, and demonstrate their small-sample properties through a simulation study.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114542
    Description:

    The restricted maximum likelihood (REML) method is generally used to estimate the variance of the random area effect under the Fay-Herriot model (Fay and Herriot 1979) to obtain the empirical best linear unbiased (EBLUP) estimator of a small area mean. When the REML estimate is zero, the weight of the direct sample estimator is zero and the EBLUP becomes a synthetic estimator. This is not often desirable. As a solution to this problem, Li and Lahiri (2011) and Yoshimori and Lahiri (2014) developed adjusted maximum likelihood (ADM) consistent variance estimators which always yield positive variance estimates. Some of the ADM estimators always yield positive estimates but they have a large bias and this affects the estimation of the mean squared error (MSE) of the EBLUP. We propose to use a MIX variance estimator, defined as a combination of the REML and ADM methods. We show that it is unbiased up to the second order and it always yields a positive variance estimate. Furthermore, we propose an MSE estimator under the MIX method and show via a model-based simulation that in many situations, it performs better than other ‘Taylor linearization’ MSE estimators proposed recently.

    Release date: 2016-06-22

  • Technical products: 11-522-X201700014722
    Description:

    The U.S. Census Bureau is researching ways to incorporate administrative data in decennial census and survey operations. Critical to this work is an understanding of the coverage of the population by administrative records. Using federal and third party administrative data linked to the American Community Survey (ACS), we evaluate the extent to which administrative records provide data on foreign-born individuals in the ACS and employ multinomial logistic regression techniques to evaluate characteristics of those who are in administrative records relative to those who are not. We find that overall, administrative records provide high coverage of foreign-born individuals in our sample for whom a match can be determined. The odds of being in administrative records are found to be tied to the processes of immigrant assimilation – naturalization, higher English proficiency, educational attainment, and full-time employment are associated with greater odds of being in administrative records. These findings suggest that as immigrants adapt and integrate into U.S. society, they are more likely to be involved in government and commercial processes and programs for which we are including data. We further explore administrative records coverage for the two largest race/ethnic groups in our sample – Hispanic and non-Hispanic single-race Asian foreign born, finding again that characteristics related to assimilation are associated with administrative records coverage for both groups. However, we observe that neighborhood context impacts Hispanics and Asians differently.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014714
    Description:

    The Labour Market Development Agreements (LMDAs) between Canada and the provinces and territories fund labour market training and support services to Employment Insurance claimants. The objective of this paper is to discuss the improvements over the years in the impact assessment methodology. The paper describes the LMDAs and past evaluation work and discusses the drivers to make better use of large administrative data holdings. It then explains how the new approach made the evaluation less resource-intensive, while results are more relevant to policy development. The paper outlines the lessons learned from a methodological perspective and provides insight into ways for making this type of use of administrative data effective, especially in the context of large programs.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014723
    Description:

    The U.S. Census Bureau is researching uses of administrative records in survey and decennial operations in order to reduce costs and respondent burden while preserving data quality. One potential use of administrative records is to utilize the data when race and Hispanic origin responses are missing. When federal and third party administrative records are compiled, race and Hispanic origin responses are not always the same for an individual across different administrative records sources. We explore different sets of business rules used to assign one race and one Hispanic response when these responses are discrepant across sources. We also describe the characteristics of individuals with matching, non-matching, and missing race and Hispanic origin data across several demographic, household, and contextual variables. We find that minorities, especially Hispanics, are more likely to have non-matching Hispanic origin and race responses in administrative records than in the 2010 Census. Hispanics are less likely to have missing Hispanic origin data but more likely to have missing race data in administrative records. Non-Hispanic Asians and non-Hispanic Pacific Islanders are more likely to have missing race and Hispanic origin data in administrative records. Younger individuals, renters, individuals living in households with two or more people, individuals who responded to the census in the nonresponse follow-up operation, and individuals residing in urban areas are more likely to have non-matching race and Hispanic origin responses.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014715
    Description:

    In preparation for 2021 UK Census the ONS has committed to an extensive research programme exploring how linked administrative data can be used to support conventional statistical processes. Item-level edit and imputation (E&I) will play an important role in adjusting the 2021 Census database. However, uncertainty associated with the accuracy and quality of available administrative data renders the efficacy of an integrated census-administrative data approach to E&I unclear. Current constraints that dictate an anonymised ‘hash-key’ approach to record linkage to ensure confidentiality add to that uncertainty. Here, we provide preliminary results from a simulation study comparing the predictive and distributional accuracy of the conventional E&I strategy implemented in CANCEIS for the 2011 UK Census to that of an integrated approach using synthetic administrative data with systematically increasing error as auxiliary information. In this initial phase of research we focus on imputing single year of age. The aim of the study is to gain insight into whether auxiliary information from admin data can improve imputation estimates and where the different strategies fall on a continuum of accuracy.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014718
    Description:

    This study assessed whether starting participation in Employment Assistance Services (EAS) earlier after initiating an Employment Insurance (EI) claim leads to better impacts for unemployed individuals than participating later during the EI benefit period. As in Sianesi (2004) and Hujer and Thomsen (2010), the analysis relied on a stratified propensity score matching approach conditional on the discretized duration of unemployment until the program starts. The results showed that individuals who participated in EAS within the first four weeks after initiating an EI claim had the best impacts on earnings and incidence of employment while also experiencing reduced use of EI starting the second year post-program.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014757
    Description:

    The Unified Brazilian Health System (SUS) was created in 1988 and, with the aim of organizing the health information systems and databases already in use, a unified databank (DataSUS) was created in 1991. DataSUS files are freely available via Internet. Access and visualization of such data is done through a limited number of customized tables and simple diagrams, which do not entirely meet the needs of health managers and other users for a flexible and easy-to-use tool that can tackle different aspects of health which are relevant to their purposes of knowledge-seeking and decision-making. We propose the interactive monthly generation of synthetic epidemiological reports, which are not only easily accessible but also easy to interpret and understand. Emphasis is put on data visualization through more informative diagrams and maps.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014740
    Description:

    In this paper, we discuss the impacts of Employment Benefit and Support Measures delivered in Canada under the Labour Market Development Agreements. We use linked rich longitudinal administrative data covering all LMDA participants from 2002 to 2005. We Apply propensity score matching as in Blundell et al. (2002), Gerfin and Lechner (2002), and Sianesi (2004), and produced the national incremental impact estimates using difference-in-differences and Kernel Matching estimator (Heckman and Smith, 1999). The findings suggest that, both Employment Assistance Services and employment benefit such as Skills Development and Targeted Wage Subsidies had positive effects on earnings and employment.

    Release date: 2016-03-24

  • Articles and reports: 12-001-X201500214248
    Description:

    Unit level population models are often used in model-based small area estimation of totals and means, but the models may not hold for the sample if the sampling design is informative for the model. As a result, standard methods, assuming that the model holds for the sample, can lead to biased estimators. We study alternative methods that use a suitable function of the unit selection probability as an additional auxiliary variable in the sample model. We report the results of a simulation study on the bias and mean squared error (MSE) of the proposed estimators of small area means and on the relative bias of the associated MSE estimators, using informative sampling schemes to generate the samples. Alternative methods, based on modeling the conditional expectation of the design weight as a function of the model covariates and the response, are also included in the simulation study.

    Release date: 2015-12-17

  • Articles and reports: 82-003-X201501014228
    Description:

    This study presents the results of a hierarchical exact matching approach to link the 2006 Census of Population with hospital data for all provinces and territories (excluding Quebec) to the 2006/2007-to-2008/2009 Discharge Abstract Database. The purpose is to determine if the Census–DAD linkage performed similarly in different jurisdictions, and if linkage and coverage rates declined as time passed since the census.

    Release date: 2015-10-21

  • Articles and reports: 12-001-X201500114161
    Description:

    A popular area level model used for the estimation of small area means is the Fay-Herriot model. This model involves unobservable random effects for the areas apart from the (fixed) linear regression based on area level covariates. Empirical best linear unbiased predictors of small area means are obtained by estimating the area random effects, and they can be expressed as a weighted average of area-specific direct estimators and regression-synthetic estimators. In some cases the observed data do not support the inclusion of the area random effects in the model. Excluding these area effects leads to the regression-synthetic estimator, that is, a zero weight is attached to the direct estimator. A preliminary test estimator of a small area mean obtained after testing for the presence of area random effects is studied. On the other hand, empirical best linear unbiased predictors of small area means that always give non-zero weights to the direct estimators in all areas together with alternative estimators based on the preliminary test are also studied. The preliminary testing procedure is also used to define new mean squared error estimators of the point estimators of small area means. Results of a limited simulation study show that, for small number of areas, the preliminary testing procedure leads to mean squared error estimators with considerably smaller average absolute relative bias than the usual mean squared error estimators, especially when the variance of the area effects is small relative to the sampling variances.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114200
    Description:

    We consider the observed best prediction (OBP; Jiang, Nguyen and Rao 2011) for small area estimation under the nested-error regression model, where both the mean and variance functions may be misspecified. We show via a simulation study that the OBP may significantly outperform the empirical best linear unbiased prediction (EBLUP) method not just in the overall mean squared prediction error (MSPE) but also in the area-specific MSPE for every one of the small areas. A bootstrap method is proposed for estimating the design-based area-specific MSPE, which is simple and always produces positive MSPE estimates. The performance of the proposed MSPE estimator is evaluated through a simulation study. An application to the Television School and Family Smoking Prevention and Cessation study is considered.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114173
    Description:

    Nonresponse is present in almost all surveys and can severely bias estimates. It is usually distinguished between unit and item nonresponse. By noting that for a particular survey variable, we just have observed and unobserved values, in this work we exploit the connection between unit and item nonresponse. In particular, we assume that the factors that drive unit response are the same as those that drive item response on selected variables of interest. Response probabilities are then estimated using a latent covariate that measures the will to respond to the survey and that can explain a part of the unknown behavior of a unit to participate in the survey. This latent covariate is estimated using latent trait models. This approach is particularly relevant for sensitive items and, therefore, can handle non-ignorable nonresponse. Auxiliary information known for both respondents and nonrespondents can be included either in the latent variable model or in the response probability estimation process. The approach can also be used when auxiliary information is not available, and we focus here on this case. We propose an estimator using a reweighting system based on the previous latent covariate when no other observed auxiliary information is available. Results on its performance are encouraging from simulation studies on both real and simulated data.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114162
    Description:

    The operationalization of the Population and Housing Census in Portugal is managed by a hierarchical structure in which Statistics Portugal is at the top and local government institutions at the bottom. When the Census takes place every ten years, local governments are asked to collaborate with Statistics Portugal in the execution and monitoring of the fieldwork operations at the local level. During the Pilot Test stage of the 2011 Census, local governments were asked for additional collaboration: to answer the Perception of Risk survey, whose aim was to gather information to design a quality assurance instrument that could be used to monitor the Census operations. The response rate of the survey was desired to be 100%, however, by the deadline of data collection nearly a quarter of local governments had not responded to the survey and thus a decision was made to make a follow up mailing. In this paper, we examine whether the same conclusions could have been reached from survey without follow ups as with them and evaluate the influence of follow ups on the conception of the quality assurance instrument. Comparison of responses on a set of perception variables revealed that local governments answering previous or after the follow up did not differ. However the configuration of the quality assurance instrument changed when including follow up responses.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114149
    Description:

    This paper introduces a general framework for deriving the optimal inclusion probabilities for a variety of survey contexts in which disseminating survey estimates of pre-established accuracy for a multiplicity of both variables and domains of interest is required. The framework can define either standard stratified or incomplete stratified sampling designs. The optimal inclusion probabilities are obtained by minimizing costs through an algorithm that guarantees the bounding of sampling errors at the domains level, assuming that the domain membership variables are available in the sampling frame. The target variables are unknown, but can be predicted with suitable super-population models. The algorithm takes properly into account this model uncertainty. Some experiments based on real data show the empirical properties of the algorithm.

    Release date: 2015-06-29

  • Technical products: 12-002-X201500114147
    Description:

    Influential observations in logistic regression are those that have a notable effect on certain aspects of the model fit. Large sample size alone does not eliminate this concern; it is still important to examine potentially influential observations, especially in complex survey data. This paper describes a straightforward algorithm for examining potentially influential observations in complex survey data using SAS software. This algorithm was applied in a study using the 2005 Canadian Community Health Survey that examined factors associated with family physician utilization for adolescents.

    Release date: 2015-03-25

  • Articles and reports: 12-001-X201400214089
    Description:

    This manuscript describes the use of multiple imputation to combine information from multiple surveys of the same underlying population. We use a newly developed method to generate synthetic populations nonparametrically using a finite population Bayesian bootstrap that automatically accounts for complex sample designs. We then analyze each synthetic population with standard complete-data software for simple random samples and obtain valid inference by combining the point and variance estimates using extensions of existing combining rules for synthetic data. We illustrate the approach by combining data from the 2006 National Health Interview Survey (NHIS) and the 2006 Medical Expenditure Panel Survey (MEPS).

    Release date: 2014-12-19

  • Technical products: 11-522-X201300014268
    Description:

    Information collection is critical for chronic-disease surveillance to measure the scope of diseases, assess the use of services, identify at-risk groups and track the course of diseases and risk factors over time with the goal of planning and implementing public-health programs for disease prevention. It is in this context that the Quebec Integrated Chronic Disease Surveillance System (QICDSS) was established. The QICDSS is a database created by linking administrative files covering the period from 1996 to 2013. It is an attractive alternative to survey data, since it covers the entire population, is not affected by recall bias and can track the population over time and space. In this presentation, we describe the relevance of using administrative data as an alternative to survey data, the methods selected to build the population cohort by linking various sources of raw data, and the processing applied to minimize bias. We will also discuss the advantages and limitations associated with the analysis of administrative files.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014286
    Description:

    The Étude Longitudinale Française depuis l’Enfance (ELFE) [French longitudinal study from childhood on], which began in 2011, involves over 18,300 infants whose parents agreed to participate when they were in the maternity hospital. This cohort survey, which will track the children from birth to adulthood, covers the many aspects of their lives from the perspective of social science, health and environmental health. In randomly selected maternity hospitals, all infants in the target population, who were born on one of 25 days distributed across the four seasons, were chosen. This sample is the outcome of a non-standard sampling scheme that we call product sampling. In this survey, it takes the form of the cross-tabulation between two independent samples: a sampling of maternity hospitals and a sampling of days. While it is easy to imagine a cluster effect due to the sampling of maternity hospitals, one can also imagine a cluster effect due to the sampling of days. The scheme’s time dimension therefore cannot be ignored if the desired estimates are subject to daily or seasonal variation. While this non-standard scheme can be viewed as a particular kind of two-phase design, it needs to be defined within a more specific framework. Following a comparison of the product scheme with a conventional two-stage design, we propose variance estimators specially formulated for this sampling scheme. Our ideas are illustrated with a simulation study.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014261
    Description:

    National statistical offices are subject to two requirements that are difficult to reconcile. On the one hand, they must provide increasingly precise information on specific subjects and hard-to-reach or minority populations, using innovative methods that make the measurement more objective or ensure its confidentiality, and so on. On the other hand, they must deal with budget restrictions in a context where households are increasingly difficult to contact. This twofold demand has an impact on survey quality in the broad sense, that is, not only in terms of precision, but also in terms of relevance, comparability, coherence, clarity and timeliness. Because the cost of Internet collection is low and a large proportion of the population has an Internet connection, statistical offices see this modern collection mode as a solution to their problems. Consequently, the development of Internet collection and, more generally, of multimode collection is supposedly the solution for maximizing survey quality, particularly in terms of total survey error, because it addresses the problems of coverage, sampling, non-response or measurement while respecting budget constraints. However, while Internet collection is an inexpensive mode, it presents serious methodological problems: coverage, self-selection or selection bias, non-response and non-response adjustment difficulties, ‘satisficing,’ and so on. As a result, before developing or generalizing the use of multimode collection, the National Institute of Statistics and Economic Studies (INSEE) launched a wide-ranging set of experiments to study the various methodological issues, and the initial results show that multimode collection is a source of both solutions and new methodological problems.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014275
    Description:

    Since July 2014, the Office for National Statistics has committed to a predominantly online 2021 UK Census. Item-level imputation will play an important role in adjusting the 2021 Census database. Research indicates that the internet may yield cleaner data than paper based capture and attract people with particular characteristics. Here, we provide preliminary results from research directed at understanding how we might manage these features in a 2021 UK Census imputation strategy. Our findings suggest that if using a donor-based imputation method, it may need to consider including response mode as a matching variable in the underlying imputation model.

    Release date: 2014-10-31

  • Articles and reports: 82-003-X201401014098
    Description:

    This study compares registry and non-registry approaches to linking 2006 Census of Population data for Manitoba and Ontario to Hospital data from the Discharge Abstract Database.

    Release date: 2014-10-15

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (88)

Analysis (88) (25 of 88 results)

  • Articles and reports: 12-001-X201700254888
    Description:

    We discuss developments in sample survey theory and methods covering the past 100 years. Neyman’s 1934 landmark paper laid the theoretical foundations for the probability sampling approach to inference from survey samples. Classical sampling books by Cochran, Deming, Hansen, Hurwitz and Madow, Sukhatme, and Yates, which appeared in the early 1950s, expanded and elaborated the theory of probability sampling, emphasizing unbiasedness, model free features, and designs that minimize variance for a fixed cost. During the period 1960-1970, theoretical foundations of inference from survey data received attention, with the model-dependent approach generating considerable discussion. Introduction of general purpose statistical software led to the use of such software with survey data, which led to the design of methods specifically for complex survey data. At the same time, weighting methods, such as regression estimation and calibration, became practical and design consistency replaced unbiasedness as the requirement for standard estimators. A bit later, computer-intensive resampling methods also became practical for large scale survey samples. Improved computer power led to more sophisticated imputation for missing data, use of more auxiliary data, some treatment of measurement errors in estimation, and more complex estimation procedures. A notable use of models was in the expanded use of small area estimation. Future directions in research and methods will be influenced by budgets, response rates, timeliness, improved data collection devices, and availability of auxiliary data, some of which will come from “Big Data”. Survey taking will be impacted by changing cultural behavior and by a changing physical-technical environment.

    Release date: 2017-12-21

  • Articles and reports: 82-003-X201601214687
    Description:

    This study describes record linkage of the Canadian Community Health Survey and the Canadian Mortality Database. The article explains the record linkage process and presents results about associations between health behaviours and mortality among a representative sample of Canadians.

    Release date: 2016-12-21

  • Articles and reports: 12-001-X201600214677
    Description:

    How do we tell whether weighting adjustments reduce nonresponse bias? If a variable is measured for everyone in the selected sample, then the design weights can be used to calculate an approximately unbiased estimate of the population mean or total for that variable. A second estimate of the population mean or total can be calculated using the survey respondents only, with weights that have been adjusted for nonresponse. If the two estimates disagree, then there is evidence that the weight adjustments may not have removed the nonresponse bias for that variable. In this paper we develop the theoretical properties of linearization and jackknife variance estimators for evaluating the bias of an estimated population mean or total by comparing estimates calculated from overlapping subsets of the same data with different sets of weights, when poststratification or inverse propensity weighting is used for the nonresponse adjustments to the weights. We provide sufficient conditions on the population, sample, and response mechanism for the variance estimators to be consistent, and demonstrate their small-sample properties through a simulation study.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114542
    Description:

    The restricted maximum likelihood (REML) method is generally used to estimate the variance of the random area effect under the Fay-Herriot model (Fay and Herriot 1979) to obtain the empirical best linear unbiased (EBLUP) estimator of a small area mean. When the REML estimate is zero, the weight of the direct sample estimator is zero and the EBLUP becomes a synthetic estimator. This is not often desirable. As a solution to this problem, Li and Lahiri (2011) and Yoshimori and Lahiri (2014) developed adjusted maximum likelihood (ADM) consistent variance estimators which always yield positive variance estimates. Some of the ADM estimators always yield positive estimates but they have a large bias and this affects the estimation of the mean squared error (MSE) of the EBLUP. We propose to use a MIX variance estimator, defined as a combination of the REML and ADM methods. We show that it is unbiased up to the second order and it always yields a positive variance estimate. Furthermore, we propose an MSE estimator under the MIX method and show via a model-based simulation that in many situations, it performs better than other ‘Taylor linearization’ MSE estimators proposed recently.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201500214248
    Description:

    Unit level population models are often used in model-based small area estimation of totals and means, but the models may not hold for the sample if the sampling design is informative for the model. As a result, standard methods, assuming that the model holds for the sample, can lead to biased estimators. We study alternative methods that use a suitable function of the unit selection probability as an additional auxiliary variable in the sample model. We report the results of a simulation study on the bias and mean squared error (MSE) of the proposed estimators of small area means and on the relative bias of the associated MSE estimators, using informative sampling schemes to generate the samples. Alternative methods, based on modeling the conditional expectation of the design weight as a function of the model covariates and the response, are also included in the simulation study.

    Release date: 2015-12-17

  • Articles and reports: 82-003-X201501014228
    Description:

    This study presents the results of a hierarchical exact matching approach to link the 2006 Census of Population with hospital data for all provinces and territories (excluding Quebec) to the 2006/2007-to-2008/2009 Discharge Abstract Database. The purpose is to determine if the Census–DAD linkage performed similarly in different jurisdictions, and if linkage and coverage rates declined as time passed since the census.

    Release date: 2015-10-21

  • Articles and reports: 12-001-X201500114161
    Description:

    A popular area level model used for the estimation of small area means is the Fay-Herriot model. This model involves unobservable random effects for the areas apart from the (fixed) linear regression based on area level covariates. Empirical best linear unbiased predictors of small area means are obtained by estimating the area random effects, and they can be expressed as a weighted average of area-specific direct estimators and regression-synthetic estimators. In some cases the observed data do not support the inclusion of the area random effects in the model. Excluding these area effects leads to the regression-synthetic estimator, that is, a zero weight is attached to the direct estimator. A preliminary test estimator of a small area mean obtained after testing for the presence of area random effects is studied. On the other hand, empirical best linear unbiased predictors of small area means that always give non-zero weights to the direct estimators in all areas together with alternative estimators based on the preliminary test are also studied. The preliminary testing procedure is also used to define new mean squared error estimators of the point estimators of small area means. Results of a limited simulation study show that, for small number of areas, the preliminary testing procedure leads to mean squared error estimators with considerably smaller average absolute relative bias than the usual mean squared error estimators, especially when the variance of the area effects is small relative to the sampling variances.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114200
    Description:

    We consider the observed best prediction (OBP; Jiang, Nguyen and Rao 2011) for small area estimation under the nested-error regression model, where both the mean and variance functions may be misspecified. We show via a simulation study that the OBP may significantly outperform the empirical best linear unbiased prediction (EBLUP) method not just in the overall mean squared prediction error (MSPE) but also in the area-specific MSPE for every one of the small areas. A bootstrap method is proposed for estimating the design-based area-specific MSPE, which is simple and always produces positive MSPE estimates. The performance of the proposed MSPE estimator is evaluated through a simulation study. An application to the Television School and Family Smoking Prevention and Cessation study is considered.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114173
    Description:

    Nonresponse is present in almost all surveys and can severely bias estimates. It is usually distinguished between unit and item nonresponse. By noting that for a particular survey variable, we just have observed and unobserved values, in this work we exploit the connection between unit and item nonresponse. In particular, we assume that the factors that drive unit response are the same as those that drive item response on selected variables of interest. Response probabilities are then estimated using a latent covariate that measures the will to respond to the survey and that can explain a part of the unknown behavior of a unit to participate in the survey. This latent covariate is estimated using latent trait models. This approach is particularly relevant for sensitive items and, therefore, can handle non-ignorable nonresponse. Auxiliary information known for both respondents and nonrespondents can be included either in the latent variable model or in the response probability estimation process. The approach can also be used when auxiliary information is not available, and we focus here on this case. We propose an estimator using a reweighting system based on the previous latent covariate when no other observed auxiliary information is available. Results on its performance are encouraging from simulation studies on both real and simulated data.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114162
    Description:

    The operationalization of the Population and Housing Census in Portugal is managed by a hierarchical structure in which Statistics Portugal is at the top and local government institutions at the bottom. When the Census takes place every ten years, local governments are asked to collaborate with Statistics Portugal in the execution and monitoring of the fieldwork operations at the local level. During the Pilot Test stage of the 2011 Census, local governments were asked for additional collaboration: to answer the Perception of Risk survey, whose aim was to gather information to design a quality assurance instrument that could be used to monitor the Census operations. The response rate of the survey was desired to be 100%, however, by the deadline of data collection nearly a quarter of local governments had not responded to the survey and thus a decision was made to make a follow up mailing. In this paper, we examine whether the same conclusions could have been reached from survey without follow ups as with them and evaluate the influence of follow ups on the conception of the quality assurance instrument. Comparison of responses on a set of perception variables revealed that local governments answering previous or after the follow up did not differ. However the configuration of the quality assurance instrument changed when including follow up responses.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201500114149
    Description:

    This paper introduces a general framework for deriving the optimal inclusion probabilities for a variety of survey contexts in which disseminating survey estimates of pre-established accuracy for a multiplicity of both variables and domains of interest is required. The framework can define either standard stratified or incomplete stratified sampling designs. The optimal inclusion probabilities are obtained by minimizing costs through an algorithm that guarantees the bounding of sampling errors at the domains level, assuming that the domain membership variables are available in the sampling frame. The target variables are unknown, but can be predicted with suitable super-population models. The algorithm takes properly into account this model uncertainty. Some experiments based on real data show the empirical properties of the algorithm.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201400214089
    Description:

    This manuscript describes the use of multiple imputation to combine information from multiple surveys of the same underlying population. We use a newly developed method to generate synthetic populations nonparametrically using a finite population Bayesian bootstrap that automatically accounts for complex sample designs. We then analyze each synthetic population with standard complete-data software for simple random samples and obtain valid inference by combining the point and variance estimates using extensions of existing combining rules for synthetic data. We illustrate the approach by combining data from the 2006 National Health Interview Survey (NHIS) and the 2006 Medical Expenditure Panel Survey (MEPS).

    Release date: 2014-12-19

  • Articles and reports: 82-003-X201401014098
    Description:

    This study compares registry and non-registry approaches to linking 2006 Census of Population data for Manitoba and Ontario to Hospital data from the Discharge Abstract Database.

    Release date: 2014-10-15

  • Articles and reports: 12-001-X201400114003
    Description:

    Outside of the survey sampling literature, samples are often assumed to be generated by simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201400114002
    Description:

    We propose an approach for multiple imputation of items missing at random in large-scale surveys with exclusively categorical variables that have structural zeros. Our approach is to use mixtures of multinomial distributions as imputation engines, accounting for structural zeros by conceiving of the observed data as a truncated sample from a hypothetical population without structural zeros. This approach has several appealing features: imputations are generated from coherent, Bayesian joint models that automatically capture complex dependencies and readily scale to large numbers of variables. We outline a Gibbs sampling algorithm for implementing the approach, and we illustrate its potential with a repeated sampling study using public use census microdata from the state of New York, U.S.A.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211887
    Description:

    Multi-level models are extensively used for analyzing survey data with the design hierarchy matching the model hierarchy. We propose a unified approach, based on a design-weighted log composite likelihood, for two-level models that leads to design-model consistent estimators of the model parameters even when the within cluster sample sizes are small provided the number of sample clusters is large. This method can handle both linear and generalized linear two-level models and it requires level 2 and level 1 inclusion probabilities and level 1 joint inclusion probabilities, where level 2 represents a cluster and level 1 an element within a cluster. Results of a simulation study demonstrating superior performance of the proposed method relative to existing methods under informative sampling are also reported.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300111830
    Description:

    We consider two different self-benchmarking methods for the estimation of small area means based on the Fay-Herriot (FH) area level model: the method of You and Rao (2002) applied to the FH model and the method of Wang, Fuller and Qu (2008) based on augmented models. We derive an estimator of the mean squared prediction error (MSPE) of the You-Rao (YR) estimator of a small area mean that, under the true model, is correct to second-order terms. We report the results of a simulation study on the relative bias of the MSPE estimator of the YR estimator and the MSPE estimator of the Wang, Fuller and Qu (WFQ) estimator obtained under an augmented model. We also study the MSPE and the estimators of MSPE for the YR and WFQ estimators obtained under a misspecified model.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201200211756
    Description:

    We propose a new approach to small area estimation based on joint modelling of means and variances. The proposed model and methodology not only improve small area estimators but also yield "smoothed" estimators of the true sampling variances. Maximum likelihood estimation of model parameters is carried out using EM algorithm due to the non-standard form of the likelihood function. Confidence intervals of small area parameters are derived using a more general decision theory approach, unlike the traditional way based on minimizing the squared error loss. Numerical properties of the proposed method are investigated via simulation studies and compared with other competitive methods in the literature. Theoretical justification for the effective performance of the resulting estimators and confidence intervals is also provided.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X201200211754
    Description:

    The propensity-scoring-adjustment approach is commonly used to handle selection bias in survey sampling applications, including unit nonresponse and undercoverage. The propensity score is computed using auxiliary variables observed throughout the sample. We discuss some asymptotic properties of propensity-score-adjusted estimators and derive optimal estimators based on a regression model for the finite population. An optimal propensity-score-adjusted estimator can be implemented using an augmented propensity model. Variance estimation is discussed and the results from two simulation studies are presented.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X201200111682
    Description:

    Sample allocation issues are studied in the context of estimating sub-population (stratum or domain) means as well as the aggregate population mean under stratified simple random sampling. A non-linear programming method is used to obtain "optimal" sample allocation to strata that minimizes the total sample size subject to specified tolerances on the coefficient of variation of the estimators of strata means and the population mean. The resulting total sample size is then used to determine sample allocations for the methods of Costa, Satorra and Ventura (2004) based on compromise allocation and Longford (2006) based on specified "inferential priorities". In addition, we study sample allocation to strata when reliability requirements for domains, cutting across strata, are also specified. Performance of the three methods is studied using data from Statistics Canada's Monthly Retail Trade Survey (MRTS) of single establishments.

    Release date: 2012-06-27

  • Articles and reports: 12-001-X201200111687
    Description:

    To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants' confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemination. We propose to create disclosure-protected subsamples from large scale surveys based on multiple imputation. The idea is to replace identifying or sensitive values in the original sample with draws from statistical models, and release subsamples of the disclosure-protected data. We present methods for making inferences with the multiple synthetic subsamples.

    Release date: 2012-06-27

  • Articles and reports: 82-003-X201100411589
    Description:

    The objective of this article is to illustrate how combining data from several cycles of the Canadian Community Health Survey increases analytical power and yields a clearer picture of immigrant health by identifying more precise subgroups. Examples are presented to demonstrate how indicators of health status vary by birthplace and period of immigration.

    Release date: 2011-11-16

  • Articles and reports: 12-001-X201100111447
    Description:

    This paper introduces a R-package for the stratification of a survey population using a univariate stratification variable X and for the calculation of stratum sample sizes. Non iterative methods such as the cumulative root frequency method and the geometric stratum boundaries are implemented. Optimal designs, with stratum boundaries that minimize either the CV of the simple expansion estimator for a fixed sample size n or the n value for a fixed CV can be constructed. Two iterative algorithms are available to find the optimal stratum boundaries. The design can feature a user defined certainty stratum where all the units are sampled. Take-all and take-none strata can be included in the stratified design as they might lead to smaller sample sizes. The sample size calculations are based on the anticipated moments of the survey variable Y, given the stratification variable X. The package handles conditional distributions of Y given X that are either a heteroscedastic linear model, or a log-linear model. Stratum specific non-response can be accounted for in the design construction and in the sample size calculations.

    Release date: 2011-06-29

  • Articles and reports: 82-003-X201100211437
    Description:

    This article examines the internal consistency of the English and French versions of the Medical Outcomes Study social support scale for a sample of older adults. The second objective is to conduct a confirmatory factor analysis to assess the factor structure of the English and French versions of the scale. A third purpose is to determine if the items comprising the scale operate in the same way for English- and French-speaking respondents.

    Release date: 2011-05-18

  • Articles and reports: 12-001-X201000211381
    Description:

    Taylor linearization methods are often used to obtain variance estimators for calibration estimators of totals and nonlinear finite population (or census) parameters, such as ratios, regression and correlation coefficients, which can be expressed as smooth functions of totals. Taylor linearization is generally applicable to any sampling design, but it can lead to multiple variance estimators that are asymptotically design unbiased under repeated sampling. The choice among the variance estimators requires other considerations such as (i) approximate unbiasedness for the model variance of the estimator under an assumed model, and (ii) validity under a conditional repeated sampling framework. Demnati and Rao (2004) proposed a unified approach to deriving Taylor linearization variance estimators that leads directly to a unique variance estimator that satisfies the above considerations for general designs. When analyzing survey data, finite populations are often assumed to be generated from super-population models, and analytical inferences on model parameters are of interest. If the sampling fractions are small, then the sampling variance captures almost the entire variation generated by the design and model random processes. However, when the sampling fractions are not negligible, the model variance should be taken into account in order to construct valid inferences on model parameters under the combined process of generating the finite population from the assumed super-population model and the selection of the sample according to the specified sampling design. In this paper, we obtain an estimator of the total variance, using the Demnati-Rao approach, when the characteristics of interest are assumed to be random variables generated from a super-population model. We illustrate the method using ratio estimators and estimators defined as solutions to calibration weighted estimating equations. Simulation results on the performance of the proposed variance estimator for model parameters are also presented.

    Release date: 2010-12-21

Reference (73)

Reference (73) (25 of 73 results)

  • Technical products: 11-522-X201700014722
    Description:

    The U.S. Census Bureau is researching ways to incorporate administrative data in decennial census and survey operations. Critical to this work is an understanding of the coverage of the population by administrative records. Using federal and third party administrative data linked to the American Community Survey (ACS), we evaluate the extent to which administrative records provide data on foreign-born individuals in the ACS and employ multinomial logistic regression techniques to evaluate characteristics of those who are in administrative records relative to those who are not. We find that overall, administrative records provide high coverage of foreign-born individuals in our sample for whom a match can be determined. The odds of being in administrative records are found to be tied to the processes of immigrant assimilation – naturalization, higher English proficiency, educational attainment, and full-time employment are associated with greater odds of being in administrative records. These findings suggest that as immigrants adapt and integrate into U.S. society, they are more likely to be involved in government and commercial processes and programs for which we are including data. We further explore administrative records coverage for the two largest race/ethnic groups in our sample – Hispanic and non-Hispanic single-race Asian foreign born, finding again that characteristics related to assimilation are associated with administrative records coverage for both groups. However, we observe that neighborhood context impacts Hispanics and Asians differently.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014714
    Description:

    The Labour Market Development Agreements (LMDAs) between Canada and the provinces and territories fund labour market training and support services to Employment Insurance claimants. The objective of this paper is to discuss the improvements over the years in the impact assessment methodology. The paper describes the LMDAs and past evaluation work and discusses the drivers to make better use of large administrative data holdings. It then explains how the new approach made the evaluation less resource-intensive, while results are more relevant to policy development. The paper outlines the lessons learned from a methodological perspective and provides insight into ways for making this type of use of administrative data effective, especially in the context of large programs.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014723
    Description:

    The U.S. Census Bureau is researching uses of administrative records in survey and decennial operations in order to reduce costs and respondent burden while preserving data quality. One potential use of administrative records is to utilize the data when race and Hispanic origin responses are missing. When federal and third party administrative records are compiled, race and Hispanic origin responses are not always the same for an individual across different administrative records sources. We explore different sets of business rules used to assign one race and one Hispanic response when these responses are discrepant across sources. We also describe the characteristics of individuals with matching, non-matching, and missing race and Hispanic origin data across several demographic, household, and contextual variables. We find that minorities, especially Hispanics, are more likely to have non-matching Hispanic origin and race responses in administrative records than in the 2010 Census. Hispanics are less likely to have missing Hispanic origin data but more likely to have missing race data in administrative records. Non-Hispanic Asians and non-Hispanic Pacific Islanders are more likely to have missing race and Hispanic origin data in administrative records. Younger individuals, renters, individuals living in households with two or more people, individuals who responded to the census in the nonresponse follow-up operation, and individuals residing in urban areas are more likely to have non-matching race and Hispanic origin responses.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014715
    Description:

    In preparation for 2021 UK Census the ONS has committed to an extensive research programme exploring how linked administrative data can be used to support conventional statistical processes. Item-level edit and imputation (E&I) will play an important role in adjusting the 2021 Census database. However, uncertainty associated with the accuracy and quality of available administrative data renders the efficacy of an integrated census-administrative data approach to E&I unclear. Current constraints that dictate an anonymised ‘hash-key’ approach to record linkage to ensure confidentiality add to that uncertainty. Here, we provide preliminary results from a simulation study comparing the predictive and distributional accuracy of the conventional E&I strategy implemented in CANCEIS for the 2011 UK Census to that of an integrated approach using synthetic administrative data with systematically increasing error as auxiliary information. In this initial phase of research we focus on imputing single year of age. The aim of the study is to gain insight into whether auxiliary information from admin data can improve imputation estimates and where the different strategies fall on a continuum of accuracy.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014718
    Description:

    This study assessed whether starting participation in Employment Assistance Services (EAS) earlier after initiating an Employment Insurance (EI) claim leads to better impacts for unemployed individuals than participating later during the EI benefit period. As in Sianesi (2004) and Hujer and Thomsen (2010), the analysis relied on a stratified propensity score matching approach conditional on the discretized duration of unemployment until the program starts. The results showed that individuals who participated in EAS within the first four weeks after initiating an EI claim had the best impacts on earnings and incidence of employment while also experiencing reduced use of EI starting the second year post-program.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014757
    Description:

    The Unified Brazilian Health System (SUS) was created in 1988 and, with the aim of organizing the health information systems and databases already in use, a unified databank (DataSUS) was created in 1991. DataSUS files are freely available via Internet. Access and visualization of such data is done through a limited number of customized tables and simple diagrams, which do not entirely meet the needs of health managers and other users for a flexible and easy-to-use tool that can tackle different aspects of health which are relevant to their purposes of knowledge-seeking and decision-making. We propose the interactive monthly generation of synthetic epidemiological reports, which are not only easily accessible but also easy to interpret and understand. Emphasis is put on data visualization through more informative diagrams and maps.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014740
    Description:

    In this paper, we discuss the impacts of Employment Benefit and Support Measures delivered in Canada under the Labour Market Development Agreements. We use linked rich longitudinal administrative data covering all LMDA participants from 2002 to 2005. We Apply propensity score matching as in Blundell et al. (2002), Gerfin and Lechner (2002), and Sianesi (2004), and produced the national incremental impact estimates using difference-in-differences and Kernel Matching estimator (Heckman and Smith, 1999). The findings suggest that, both Employment Assistance Services and employment benefit such as Skills Development and Targeted Wage Subsidies had positive effects on earnings and employment.

    Release date: 2016-03-24

  • Technical products: 12-002-X201500114147
    Description:

    Influential observations in logistic regression are those that have a notable effect on certain aspects of the model fit. Large sample size alone does not eliminate this concern; it is still important to examine potentially influential observations, especially in complex survey data. This paper describes a straightforward algorithm for examining potentially influential observations in complex survey data using SAS software. This algorithm was applied in a study using the 2005 Canadian Community Health Survey that examined factors associated with family physician utilization for adolescents.

    Release date: 2015-03-25

  • Technical products: 11-522-X201300014268
    Description:

    Information collection is critical for chronic-disease surveillance to measure the scope of diseases, assess the use of services, identify at-risk groups and track the course of diseases and risk factors over time with the goal of planning and implementing public-health programs for disease prevention. It is in this context that the Quebec Integrated Chronic Disease Surveillance System (QICDSS) was established. The QICDSS is a database created by linking administrative files covering the period from 1996 to 2013. It is an attractive alternative to survey data, since it covers the entire population, is not affected by recall bias and can track the population over time and space. In this presentation, we describe the relevance of using administrative data as an alternative to survey data, the methods selected to build the population cohort by linking various sources of raw data, and the processing applied to minimize bias. We will also discuss the advantages and limitations associated with the analysis of administrative files.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014286
    Description:

    The Étude Longitudinale Française depuis l’Enfance (ELFE) [French longitudinal study from childhood on], which began in 2011, involves over 18,300 infants whose parents agreed to participate when they were in the maternity hospital. This cohort survey, which will track the children from birth to adulthood, covers the many aspects of their lives from the perspective of social science, health and environmental health. In randomly selected maternity hospitals, all infants in the target population, who were born on one of 25 days distributed across the four seasons, were chosen. This sample is the outcome of a non-standard sampling scheme that we call product sampling. In this survey, it takes the form of the cross-tabulation between two independent samples: a sampling of maternity hospitals and a sampling of days. While it is easy to imagine a cluster effect due to the sampling of maternity hospitals, one can also imagine a cluster effect due to the sampling of days. The scheme’s time dimension therefore cannot be ignored if the desired estimates are subject to daily or seasonal variation. While this non-standard scheme can be viewed as a particular kind of two-phase design, it needs to be defined within a more specific framework. Following a comparison of the product scheme with a conventional two-stage design, we propose variance estimators specially formulated for this sampling scheme. Our ideas are illustrated with a simulation study.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014261
    Description:

    National statistical offices are subject to two requirements that are difficult to reconcile. On the one hand, they must provide increasingly precise information on specific subjects and hard-to-reach or minority populations, using innovative methods that make the measurement more objective or ensure its confidentiality, and so on. On the other hand, they must deal with budget restrictions in a context where households are increasingly difficult to contact. This twofold demand has an impact on survey quality in the broad sense, that is, not only in terms of precision, but also in terms of relevance, comparability, coherence, clarity and timeliness. Because the cost of Internet collection is low and a large proportion of the population has an Internet connection, statistical offices see this modern collection mode as a solution to their problems. Consequently, the development of Internet collection and, more generally, of multimode collection is supposedly the solution for maximizing survey quality, particularly in terms of total survey error, because it addresses the problems of coverage, sampling, non-response or measurement while respecting budget constraints. However, while Internet collection is an inexpensive mode, it presents serious methodological problems: coverage, self-selection or selection bias, non-response and non-response adjustment difficulties, ‘satisficing,’ and so on. As a result, before developing or generalizing the use of multimode collection, the National Institute of Statistics and Economic Studies (INSEE) launched a wide-ranging set of experiments to study the various methodological issues, and the initial results show that multimode collection is a source of both solutions and new methodological problems.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014275
    Description:

    Since July 2014, the Office for National Statistics has committed to a predominantly online 2021 UK Census. Item-level imputation will play an important role in adjusting the 2021 Census database. Research indicates that the internet may yield cleaner data than paper based capture and attract people with particular characteristics. Here, we provide preliminary results from research directed at understanding how we might manage these features in a 2021 UK Census imputation strategy. Our findings suggest that if using a donor-based imputation method, it may need to consider including response mode as a matching variable in the underlying imputation model.

    Release date: 2014-10-31

  • Technical products: 12-002-X201400111901
    Description:

    This document is for analysts/researchers who are considering doing research with data from a survey where both survey weights and bootstrap weights are provided in the data files. This document gives directions, for some selected software packages, about how to get started in using survey weights and bootstrap weights for an analysis of survey data. We give brief directions for obtaining survey-weighted estimates, bootstrap variance estimates (and other desired error quantities) and some typical test statistics for each software package in turn. While these directions are provided just for the chosen examples, there will be information about the range of weighted and bootstrapped analyses that can be carried out by each software package.

    Release date: 2014-08-07

  • Technical products: 12-002-X201200111642
    Description:

    It is generally recommended that weighted estimation approaches be used when analyzing data from a long-form census microdata file. Since such data files are now available in the RDC's, there is a need to provide researchers there with more information about doing weighted estimation with these files. The purpose of this paper is to provide some of this information - in particular, how the weight variables were derived for the census microdata files and what weight should be used for different units of analysis. For the 1996, 2001 and 2006 censuses the same weight variable is appropriate regardless of whether people, families or households are being studied. For the 1991 census, recommendations are more complex: a different weight variable is required for households than for people and families, and additional restrictions apply to obtain the correct weight value for families.

    Release date: 2012-10-25

  • Technical products: 11-522-X200800010972
    Description:

    Background: Evaluation of the coverage that results from linking routinely collected administrative hospital data with survey data is an important preliminary step to undertaking analyses based on the linked file. Data and methods: To evaluate the coverage of the linkage between data from cycle 1.1 of the Canadian Community Health Survey (CCHS) and in-patient hospital data (Health Person-Oriented Information or HPOI), the number of people admitted to hospital according to HPOI was compared with the weighted estimate for CCHS respondents who were successfully linked to HPOI. Differences between HPOI and the linked and weighted CCHS estimate indicated linkage failure and/or undercoverage. Results: According to HPOI, from September 2000 through November 2001, 1,572,343 people (outside Quebec) aged 12 or older were hospitalized. Weighted estimates from the linked CCHS, adjusted for agreement to link and plausible health number, were 7.7% lower. Coverage rates were similar for males and females. Provincial rates did not differ from those for the rest of Canada, although differences were apparent for the territories. Coverage rates were significantly lower among people aged 75 or older than among those aged 12 to 74.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011016
    Description:

    Now that we have come to the end of a day of workshops plus three very full days of sessions, I have the very pleasant task of offering a few closing remarks and, more importantly, of recognizing the efforts of those who have contributed to the success of this year's symposium. And it has clearly been a success.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010967
    Description:

    In this paper the background of the eXtensible Business Reporting Language and the involvement of Statistics Netherlands in the Dutch Taxonomy Project are discussed. The discussion predominantly focuses on the statistical context of using XBRL and the Dutch Taxonomy for expressing data terms to companies.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011011
    Description:

    The Federation of Canadian Municipalities' (FCM) Quality of Life Reporting System (QOLRS) is a means by which to measure, monitor, and report on the quality of life in Canadian municipalities. To address that challenge of administrative data collection across member municipalities the QOLRS technical team collaborated on the development of the Municipal Data Collection Tool (MDCT) which has become a key component of QOLRS' data acquisition methodology. Offered as a case study on administrative data collection, this paper argues that the recent launch of the MDCT has enabled the FCM to access reliable pan-Canadian municipal administrative data for the QOLRS.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010960
    Description:

    Non-response is inevitable in any survey, despite all the effort put into reducing it at the various stages of the survey. In particular, non-response can cause bias in the estimates. In addition, non-response is an especially serious problem in longitudinal studies because the sample shrinks over time. France's ELFE (Étude Longitudinale Française depuis l'Enfance) is a project that aims to track 20,000 children from birth to adulthood using a multidisciplinary approach. This paper is based on the results of the initial pilot studies conducted in 2007 to test the survey's feasibility and acceptance. The participation rates are presented (response rate, non-response factors) along with a preliminary description of the non-response treatment methods being considered.

    Release date: 2009-12-03

  • Technical products: 11-536-X200900110806
    Description:

    Recent work using a pseudo empirical likelihood (EL) method for finite population inferences with complex survey data focused primarily on a single survey sample, non-stratified or stratified, with considerable effort devoted to computational procedures. In this talk we present a pseudo empirical likelihood approach to inference from multiple surveys and multiple-frame surveys, two commonly encountered problems in survey practice. We show that inferences about the common parameter of interest and the effective use of various types of auxiliary information can be conveniently carried out through the constrained maximization of joint pseudo EL function. We obtain asymptotic results which are used for constructing the pseudo EL ratio confidence intervals, either using a chi-square approximation or a bootstrap calibration. All related computational problems can be handled using existing algorithms on stratified sampling after suitable re-formulation.

    Release date: 2009-08-11

  • Technical products: 11-536-X200900110807
    Description:

    Model calibration (Wu & Sitter, JASA, 2001) has been shown to provide more efficient estimates than classical calibration when the values of one or more auxiliary variables are available for each unit in the population and the relationship between such variables and the variable of interest is more complex than a linear one. Model calibration, though, provides a different set of weights for each variable of interest. To overcome this problem an estimator is proposed: calibration is pursued with respect to both the auxiliary variables values and the fitted values of the variables of interest obtained with parametric and/or nonparametric models. This allows for coherence among estimates and more efficiency if the model is well specified. The asymptotic properties of the resulting estimator are studied with respect to the sampling design. The issue of high variability of the weights is addressed by relaxing binding constraints on the variables included for efficiency purposes in the calibration equations. A simulation study is also presented to better understand the finite size sample behavior of the proposed estimator

    Release date: 2009-08-11

  • Technical products: 11-522-X200600110443
    Description:

    The Brazilian population has experienced an ageing process, thus characterizing an increase in the number of elderly people. Instruments have been developed in order to measure the quality of life of elderly individuals. Hence, a questionnaire consisting of various validated instruments and an open question was applied to a group of elderly citizens in the city of Botucatu, SP, Brazil. The analysis of the open question, assessed by qualitative methods, generated eleven categories concerning the elderly people's opinions as regards quality of life and a cluster analysis of such answers was carried out, producing three groups of elderly individuals. Therefore, this work aimed at validating the categories obtained by the open question with the closed questions of the instrument by means of associations and application of chi-square tests at a level of significance of 5%. It was observed that qualitative analysis identifies phenomena regardless of category saturation. The quantitative method, on the other hand, shows the power of each category in a set, that is, as a whole.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110440
    Description:

    Now that we have come to the end of a day of workshops plus two very full days of sessions, I have the very pleasant task of offering a few closing remarks and, more importantly, of recognizing the efforts of those who have contributed to the success of this year's symposium. And it has clearly been a success.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110416
    Description:

    Application of standard methods to survey data without accounting for the design features and weight adjustments can lead to erroneous inferences. Bootstrap methods offer an attractive option to the analyst for taking account of the design features and weight adjustments. The data file consists of the full-sample final weights and associated bootstrap final weights for a large number of bootstrap replicates as well as the observed data on the sample elements. We show how such data files can be used to analyze survey data in a straightforward manner using weighted estimating equations. A one-step estimating function bootstrap method that avoids some difficulties with the bootstrap is also discussed.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110447
    Description:

    The classification and identification of locations where persons report to be more or less healthy or have more or less social capital, within a specific area such as a health region, is tremendously helpful for understanding place and health associations. The objective of the proposed study is to classify and map areas within the Zone 6 Health Region (Figure 1) of Nova Scotia (Halifax Regional Municipality and Annapolis Valley regions) according to health status (Dimension 1) and social capital (Dimension 2). We abstracted responses to questions about self-reported health status, mental health, and social capital from the master files of the Canadian Community Health Survey (Cycles 1.1, 1.2 and 2.1), National Population Health Survey (Cycle 5), and the General Social Survey (Cycles 13, 14, 17, and 18). Responses were geocoded using the Statistics Canada Postal Code Conversion File (PCCF+) and imported into a geographical information system (GIS) so that the postal code associated with the response will be assigned to a latitude and longitude within the Nova Scotia Zone 6 health region. Kernel density estimators and additional spatial interpolators were used to develop statistically-smoothed surfaces of the distribution of respondent values for each question. The smoothing process eliminates the possibility of revealing individual respondent location and confidential Statistics Canada sampling frame information. Using responses from similar questions across multiple surveys improves the likelihood of detecting heterogeneity among the responses within the health region area, as well as the accuracy of the smoothed map classification.

    Release date: 2008-03-17

Date modified: