Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (106)

All (106) (25 of 106 results)

  • Articles and reports: 12-001-X200800210755
    Description:

    Dependent interviewing (DI) is used in many longitudinal surveys to "feed forward" data from one wave to the next. Though it is a promising technique which has been demonstrated to enhance data quality in certain respects, relatively little is known about how it is actually administered in the field. This research seeks to address this issue through behavior coding. Various styles of DI were employed in the English Longitudinal Study of Ageing (ELSA) in January, 2006, and recordings were made of pilot field interviews. These recordings were analysed to determine whether the questions (particularly the DI aspects) were administered appropriately and to explore the respondent's reaction to the fed-forward data. Of particular interest was whether respondents confirmed or challenged the previously-reported information, whether the prior wave data came into play when respondents were providing their current-wave answers, and how any discrepancies were negotiated by the interviewer and respondent. Also of interest was to examine the effectiveness of various styles of DI. For example, in some cases the prior wave data was brought forward and respondents were asked to explicitly confirm it; in other cases the previous data was read and respondents were asked if the situation was still the same. Results indicate varying levels of compliance in terms of initial question-reading, and suggest that some styles of DI may be more effective than others.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210757
    Description:

    Sample weights can be calibrated to reflect the known population totals of a set of auxiliary variables. Predictors of finite population totals calculated using these weights have low bias if these variables are related to the variable of interest, but can have high variance if too many auxiliary variables are used. This article develops an "adaptive calibration" approach, where the auxiliary variables to be used in weighting are selected using sample data. Adaptively calibrated estimators are shown to have lower mean squared error and better coverage properties than non-adaptive estimators in many cases.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210762
    Description:

    This paper considers the optimum allocation in multivariate stratified sampling as a nonlinear matrix optimisation of integers. As a particular case, a nonlinear problem of the multi-objective optimisation of integers is studied. A full detailed example including some of proposed techniques is provided at the end of the work.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210761
    Description:

    Optimum stratification is the method of choosing the best boundaries that make strata internally homogeneous, given some sample allocation. In order to make the strata internally homogenous, the strata should be constructed in such a way that the strata variances for the characteristic under study be as small as possible. This could be achieved effectively by having the distribution of the main study variable known and create strata by cutting the range of the distribution at suitable points. If the frequency distribution of the study variable is unknown, it may be approximated from the past experience or some prior knowledge obtained at a recent study. In this paper the problem of finding Optimum Strata Boundaries (OSB) is considered as the problem of determining Optimum Strata Widths (OSW). The problem is formulated as a Mathematical Programming Problem (MPP), which minimizes the variance of the estimated population parameter under Neyman allocation subject to the restriction that sum of the widths of all the strata is equal to the total range of the distribution. The distributions of the study variable are considered as continuous with Triangular and Standard Normal density functions. The formulated MPPs, which turn out to be multistage decision problems, can then be solved using dynamic programming technique proposed by Bühler and Deutler (1975). Numerical examples are presented to illustrate the computational details. The results obtained are also compared with the method of Dalenius and Hodges (1959) with an example of normal distribution.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210760
    Description:

    The design of a stratified simple random sample without replacement from a finite population deals with two main issues: the definition of a rule to partition the population into strata, and the allocation of sampling units in the selected strata. This article examines a tree-based strategy which plans to approach jointly these issues when the survey is multipurpose and multivariate information, quantitative or qualitative, is available. Strata are formed through a hierarchical divisive algorithm that selects finer and finer partitions by minimizing, at each step, the sample allocation required to achieve the precision levels set for each surveyed variable. In this way, large numbers of constraints can be satisfied without drastically increasing the sample size, and also without discarding variables selected for stratification or diminishing the number of their class intervals. Furthermore, the algorithm tends not to define empty or almost empty strata, thus avoiding the need for strata collapsing aggregations. The procedure was applied to redesign the Italian Farm Structure Survey. The results indicate that the gain in efficiency held using our strategy is nontrivial. For a given sample size, this procedure achieves the required precision by exploiting a number of strata which is usually a very small fraction of the number of strata available when combining all possible classes from any of the covariates.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210763
    Description:

    The present work illustrates a sampling strategy useful for obtaining planned sample size for domains belonging to different partitions of the population and in order to guarantee the sampling errors of domain estimates be lower than given thresholds. The sampling strategy that covers the multivariate multi-domain case is useful when the overall sample size is bounded and consequently the standard solution of using a stratified sample with the strata given by cross-classification of variables defining the different partitions is not feasible since the number of strata is larger than the overall sample size. The proposed sampling strategy is based on the use of balanced sampling selection technique and on a GREG-type estimation. The main advantages of the solution is the computational feasibility which allows one to easily implement an overall small area strategy considering jointly the sampling design and the estimator and improving the efficiency of the direct domain estimators. An empirical simulation on real population data and different domain estimators shows the empirical properties of the examined sample strategy.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210758
    Description:

    We propose a method for estimating the variance of estimators of changes over time, a method that takes account of all the components of these estimators: the sampling design, treatment of non-response, treatment of large companies, correlation of non-response from one wave to another, the effect of using a panel, robustification, and calibration using a ratio estimator. This method, which serves to determine the confidence intervals of changes over time, is then applied to the Swiss survey of value added.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210764
    Description:

    This paper considers situations where the target response value is either zero or an observation from a continuous distribution. A typical example analyzed in the paper is the assessment of literacy proficiency with the possible outcome being either zero, indicating illiteracy, or a positive score measuring the level of literacy. Our interest is in how to obtain valid estimates of the average response, or the proportion of positive responses in small areas, for which only small samples or no samples are available. As in other small area estimation problems, the small sample sizes in at least some of the sampled areas and/or the existence of nonsampled areas requires the use of model based methods. Available methods, however, are not suitable for this kind of data because of the mixed distribution of the responses, having a large peak at zero, juxtaposed to a continuous distribution for the rest of the responses. We develop, therefore, a suitable two-part random effects model and show how to fit the model and assess its goodness of fit, and how to compute the small area estimators of interest and measure their precision. The proposed method is illustrated using simulated data and data obtained from a literacy survey conducted in Cambodia.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210759
    Description:

    The analysis of stratified multistage sample data requires the use of design information such as stratum and primary sampling unit (PSU) identifiers, or associated replicate weights, in variance estimation. In some public release data files, such design information is masked as an effort to avoid their disclosure risk and yet to allow the user to obtain valid variance estimation. For example, in area surveys with a limited number of PSUs, the original PSUs are split or/and recombined to construct pseudo-PSUs with swapped second or subsequent stage sampling units. Such PSU masking methods, however, obviously distort the clustering structure of the sample design, yielding biased variance estimates possibly with certain systematic patterns between two variance estimates from the unmasked and masked PSU identifiers. Some of the previous work observed patterns in the ratio of the masked and unmasked variance estimates when plotted against the unmasked design effect. This paper investigates the effect of PSU masking on variance estimates under cluster sampling regarding various aspects including the clustering structure and the degree of masking. Also, we seek a PSU masking strategy through swapping of subsequent stage sampling units that helps reduce the resulting biases of the variance estimates. For illustration, we used data from the National Health Interview Survey (NHIS) with some artificial modification. The proposed strategy performs very well in reducing the biases of variance estimates. Both theory and empirical results indicate that the effect of PSU masking on variance estimates is modest with minimal swapping of subsequent stage sampling units. The proposed masking strategy has been applied to the 2003-2004 National Health and Nutrition Examination Survey (NHANES) data release.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210756
    Description:

    In longitudinal surveys nonresponse often occurs in a pattern that is not monotone. We consider estimation of time-dependent means under the assumption that the nonresponse mechanism is last-value-dependent. Since the last value itself may be missing when nonresponse is nonmonotone, the nonresponse mechanism under consideration is nonignorable. We propose an imputation method by first deriving some regression imputation models according to the nonresponse mechanism and then applying nonparametric regression imputation. We assume that the longitudinal data follow a Markov chain with finite second-order moments. No other assumption is imposed on the joint distribution of longitudinal data and their nonresponse indicators. A bootstrap method is applied for variance estimation. Some simulation results and an example concerning the Current Employment Survey are presented.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210754
    Description:

    The context of the discussion is the increasing incidence of international surveys, of which one is the International Tobacco Control (ITC) Policy Evaluation Project, which began in 2002. The ITC country surveys are longitudinal, and their aim is to evaluate the effects of policy measures being introduced in various countries under the WHO Framework Convention on Tobacco Control. The challenges of organization, data collection and analysis in international surveys are reviewed and illustrated. Analysis is an increasingly important part of the motivation for large scale cross-cultural surveys. The fundamental challenge for analysis is to discern the real response (or lack of response) to policy change, separating it from the effects of data collection mode, differential non-response, external events, time-in-sample, culture, and language. Two problems relevant to statistical analysis are discussed. The first problem is the question of when and how to analyze pooled data from several countries, in order to strengthen conclusions which might be generally valid. While in some cases this seems to be straightforward, there are differing opinions on the extent to which pooling is possible and reasonable. It is suggested that for formal comparisons, random effects models are of conceptual use. The second problem is to find models of measurement across cultures and data collection modes which will enable calibration of continuous, binary and ordinal responses, and produce comparisons from which extraneous effects have been removed. It is noted that hierarchical models provide a natural way of relaxing requirements of model invariance across groups.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210768
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2008-12-23

  • Surveys and statistical programs – Documentation: 62F0026M2009001
    Description:

    This guide presents information of interest to users of data from the Survey of Household Spending, which gathers information on the spending habits, dwelling characteristics and household equipment of Canadian households. The survey covers private households in the 10 provinces. (The territories are surveyed every second year, starting in 1999.)

    This guide includes definitions of survey terms and variables, as well as descriptions of survey methodology and data quality. One section describes the various statistics that can be created using expenditure data (e.g., budget share, market share, aggregates and medians)

    Release date: 2008-12-22

  • Technical products: 75F0002M2008005
    Description:

    The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey initiated in 1993. The survey was designed to measure changes in the economic well-being of Canadians as well as the factors affecting these changes. Sample surveys are subject to sampling errors. In order to consider these errors, each estimates presented in the "Income Trends in Canada" series comes with a quality indicator based on the coefficient of variation. However, other factors must also be considered to make sure data are properly used. Statistics Canada puts considerable time and effort to control errors at every stage of the survey and to maximise the fitness for use. Nevertheless, the survey design and the data processing could restrict the fitness for use. It is the policy at Statistics Canada to furnish users with measures of data quality so that the user is able to interpret the data properly. This report summarizes the set of quality measures of SLID data. Among the measures included in the report are sample composition and attrition rates, sampling errors, coverage errors in the form of slippage rates, response rates, tax permission and tax linkage rates, and imputation rates.

    Release date: 2008-08-20

  • Articles and reports: 82-622-X2008001
    Description:

    In this study, I examine the factorial validity of selected modules from the Canadian Survey of Experiences with Primary Health Care (CSE-PHC), in order to determine the potential for combining the items within each module into summary indices representing global primary health care concepts. The modules examined were: Patient Assessment of Chronic Illness Care (PACIC), Patient Activation (PA), Managing Own Health Care (MOHC), and Confidence in the Health Care System (CHCS). Confirmatory factor analyses were conducted on each module to assess the degree to which multiple observed items reflected the presence of common latent factors. While a four-factor model was initially specified for the PACIC instrument on the basis of priory theory and research, it did not fit the data well; rather, a revised two-factor model was found to be most appropriate. These two factors were labelled: "Whole Person Care" and "Coordination of Care". The remaining modules studied here (i.e., PA, MOHC, and CHCS) were all well-represented by single-factor models. The results suggest that the original factor structure of the PACIC developed within studies using clinical samples does not hold in general populations, although the precise reasons for this are not clear. Further empirical investigation will be required to shed more light on this discrepancy. The two factors identified here for the PACIC, as well as the single factors produced for the PA, MOHC, and CHCS could be used as the basis of summary indices for use in further analyses with the CSE-PHC.

    Release date: 2008-07-08

  • Technical products: 11-522-X200600110424
    Description:

    The International Tobacco Control (ITC) Policy Evaluation China Survey uses a multi-stage unequal probability sampling design with upper level clusters selected by the randomized systematic PPS sampling method. A difficulty arises in the execution of the survey: several selected upper level clusters refuse to participate in the survey and have to be replaced by substitute units, selected from units not included in the initial sample and once again using the randomized systematic PPS sampling method. Under such a scenario the first order inclusion probabilities of the final selected units are very difficult to calculate and the second order inclusion probabilities become virtually intractable. In this paper we develop a simulation-based approach for computing the first and the second order inclusion probabilities when direct calculation is prohibitive or impossible. The efficiency and feasibility of the proposed approach are demonstrated through both theoretical considerations and numerical examples. Several R/S-PLUS functions and codes for the proposed procedure are included. The approach can be extended to handle more complex refusal/substitution scenarios one may encounter in practice.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110618
    Description:

    The National Health and Nutrition Examination Survey (NHANES) is one of a series of health-related programs sponsored by the United States National Center for Health Statistics. A unique feature of NHANES is the administration of a complete medical examination for each respondent in the sample. To standardize administration, these examinations are carried out in mobile examination centers. The examination includes physical measurements, tests such as eye and dental examinations, and the collection of blood and urine specimens for laboratory testing. NHANES is an ongoing annual health survey of the noninstitutionalized civilian population of the United States. The major analytic goals of NHANES include estimating the number and percentage of persons in the U.S. population and in designated subgroups with selected diseases and risk factors. The sample design for NHANES must create a balance between the requirements for efficient annual and multiyear samples and the flexibility that allows changes in key design parameters to make the survey more responsive to the needs of the research and health policy communities. This paper discusses the challenges involved in designing and implementing a sample selection process that satisfies the goals of NHANES.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110642
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110616
    Description:

    With complete multivariate data the BACON algorithm (Billor, Hadi and Vellemann 2000) yields a robust estimate of the covariance matrix. The corresponding Mahalanobis distance may be used for multivariate outlier detection. When items are missing the EM algorithm is a convenient way to estimate the covariance matrix at each iteration step of the BACON algorithm. In finite population sampling the EM algorithm must be enhanced to estimate the covariance matrix of the population rather than of the sample. A version of the EM algorithm for survey data following a multivariate normal model, the EEM algorithm (Estimated Expectation Maximization), is proposed. The combination of the two algorithms, the BACON-EEM algorithm, is applied to two datasets and compared with alternative methods.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110606
    Description:

    Data from election polls in the US are typically presented in two-way categorical tables, and there are many polls before the actual election in November. For example, in the Buckeye State Poll in 1998 for governor there are three polls, January, April and October; the first category represents the candidates (e.g., Fisher, Taft and other) and the second category represents the current status of the voters (likely to vote and not likely to vote for governor of Ohio). There is a substantial number of undecided voters for one or both categories in all three polls, and we use a Bayesian method to allocate the undecided voters to the three candidates. This method permits modeling different patterns of missingness under ignorable and nonignorable assumptions, and a multinomial-Dirichlet model is used to estimate the cell probabilities which can help to predict the winner. We propose a time-dependent nonignorable nonresponse model for the three tables. Here, a nonignorable nonresponse model is centered on an ignorable nonresponse model to induce some flexibility and uncertainty about ignorabilty or nonignorability. As competitors we also consider two other models, an ignorable and a nonignorable nonresponse model. These latter two models assume a common stochastic process to borrow strength over time. Markov chain Monte Carlo methods are used to fit the models. We also construct a parameter that can potentially be used to predict the winner among the candidates in the November election.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110610
    Description:

    A new generalized regression estimator of a finite population total based on the Box-Cox transformation technique and its variance estimator are proposed under a general unequal probability sampling design. By being design consistent, the proposed estimator maintains the robustness property of the GREG estimator even if the underlying model fails. Furthermore, the Box-Cox technique automatically finds a reasonable transformation for the dependent variable using the data. The robustness and efficiency of the new estimator are evaluated analytically and via Monte Carlo simulation studies.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110613
    Description:

    The International Tobacco Control (ITC) Policy Evaluation Survey of China uses a multi-stage unequal probability sampling design with upper level clusters selected by the randomized systematic PPS sampling method. A difficulty arises in the execution of the survey: several selected upper level clusters refuse to participate in the survey and have to be replaced by substitute units, selected from units not included in the initial sample and once again using the randomized systematic PPS sampling method. Under such a scenario the first order inclusion probabilities of the final selected units are very difficult to calculate and the second order inclusion probabilities become virtually intractable. In this paper we develop a simulation-based approach for computing the first and the second order inclusion probabilities when direct calculation is prohibitive or impossible. The efficiency and feasibility of the proposed approach are demonstrated through both theoretical considerations and numerical examples. Several R/S-PLUS functions and codes for the proposed procedure are included. The approach can be extended to handle more complex refusal/substitution scenarios one may encounter in practice.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110619
    Description:

    Small area prediction based on random effects, called EBLUP, is a procedure for constructing estimates for small geographical areas or small subpopulations using existing survey data. The total of the small area predictors is often forced to equal the direct survey estimate and such predictors are said to be calibrated. Several calibrated predictors are reviewed and a criterion that unifies the derivation of these calibrated predictors is presented. The predictor that is the unique best linear unbiased predictor under the criterion is derived and the mean square error of the calibrated predictors is discussed. Implicit in the imposition of the restriction is the possibility that the small area model is misspecified and the predictors are biased. Augmented models with one additional explanatory variable for which the usual small area predictors achieve the self-calibrated property are considered. Simulations demonstrate that calibrated predictors have slightly smaller bias compared to those of the usual EBLUP predictor. However, if the bias is a concern, a better approach is to use an augmented model with an added auxiliary variable that is a function of area size. In the simulation, the predictors based on the augmented model had smaller MSE than EBLUP when the incorrect model was used for prediction. Furthermore, there was a very small increase in MSE relative to EBLUP if the auxiliary variable was added to the correct model.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110611
    Description:

    In finite population sampling prior information is often available in the form of partial knowledge about an auxiliary variable, for example its mean may be known. In such cases, the ratio estimator and the regression estimator are often used for estimating the population mean of the characteristic of interest. The Polya posterior has been developed as a noninformative Bayesian approach to survey sampling. It is appropriate when little or no prior information about the population is available. Here we show that it can be extended to incorporate types of partial prior information about auxiliary variables. We will see that it typically yields procedures with good frequentist properties even in some problems where standard frequentist methods are difficult to apply.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110614
    Description:

    The Canadian Labour Force Survey (LFS) produces monthly estimates of the unemployment rate at national and provincial levels. The LFS also releases unemployment estimates for sub-provincial areas such as Census Metropolitan Areas (CMAs) and Urban Centers (UCs). However, for some sub-provincial areas, the direct estimates are not reliable since the sample size in some areas is quite small. The small area estimation in LFS concerns estimation of unemployment rates for local sub-provincial areas such as CMA/UCs using small area models. In this paper, we will discuss various models including the Fay-Herriot model and cross-sectional and time series models. In particular, an integrated non-linear mixed effects model will be proposed under the hierarchical Bayes (HB) framework for the LFS unemployment rate estimation. Monthly Employment Insurance (EI) beneficiary data at the CMA/UC level are used as auxiliary covariates in the model. A HB approach with the Gibbs sampling method is used to obtain the estimates of posterior means and posterior variances of the CMA/UC level unemployment rates. The proposed HB model leads to reliable model-based estimates in terms of CV reduction. Model fit analysis and comparison of the model-based estimates with the direct estimates are presented in the paper.

    Release date: 2008-06-26

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (37)

Analysis (37) (25 of 37 results)

  • Articles and reports: 12-001-X200800210755
    Description:

    Dependent interviewing (DI) is used in many longitudinal surveys to "feed forward" data from one wave to the next. Though it is a promising technique which has been demonstrated to enhance data quality in certain respects, relatively little is known about how it is actually administered in the field. This research seeks to address this issue through behavior coding. Various styles of DI were employed in the English Longitudinal Study of Ageing (ELSA) in January, 2006, and recordings were made of pilot field interviews. These recordings were analysed to determine whether the questions (particularly the DI aspects) were administered appropriately and to explore the respondent's reaction to the fed-forward data. Of particular interest was whether respondents confirmed or challenged the previously-reported information, whether the prior wave data came into play when respondents were providing their current-wave answers, and how any discrepancies were negotiated by the interviewer and respondent. Also of interest was to examine the effectiveness of various styles of DI. For example, in some cases the prior wave data was brought forward and respondents were asked to explicitly confirm it; in other cases the previous data was read and respondents were asked if the situation was still the same. Results indicate varying levels of compliance in terms of initial question-reading, and suggest that some styles of DI may be more effective than others.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210757
    Description:

    Sample weights can be calibrated to reflect the known population totals of a set of auxiliary variables. Predictors of finite population totals calculated using these weights have low bias if these variables are related to the variable of interest, but can have high variance if too many auxiliary variables are used. This article develops an "adaptive calibration" approach, where the auxiliary variables to be used in weighting are selected using sample data. Adaptively calibrated estimators are shown to have lower mean squared error and better coverage properties than non-adaptive estimators in many cases.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210762
    Description:

    This paper considers the optimum allocation in multivariate stratified sampling as a nonlinear matrix optimisation of integers. As a particular case, a nonlinear problem of the multi-objective optimisation of integers is studied. A full detailed example including some of proposed techniques is provided at the end of the work.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210761
    Description:

    Optimum stratification is the method of choosing the best boundaries that make strata internally homogeneous, given some sample allocation. In order to make the strata internally homogenous, the strata should be constructed in such a way that the strata variances for the characteristic under study be as small as possible. This could be achieved effectively by having the distribution of the main study variable known and create strata by cutting the range of the distribution at suitable points. If the frequency distribution of the study variable is unknown, it may be approximated from the past experience or some prior knowledge obtained at a recent study. In this paper the problem of finding Optimum Strata Boundaries (OSB) is considered as the problem of determining Optimum Strata Widths (OSW). The problem is formulated as a Mathematical Programming Problem (MPP), which minimizes the variance of the estimated population parameter under Neyman allocation subject to the restriction that sum of the widths of all the strata is equal to the total range of the distribution. The distributions of the study variable are considered as continuous with Triangular and Standard Normal density functions. The formulated MPPs, which turn out to be multistage decision problems, can then be solved using dynamic programming technique proposed by Bühler and Deutler (1975). Numerical examples are presented to illustrate the computational details. The results obtained are also compared with the method of Dalenius and Hodges (1959) with an example of normal distribution.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210760
    Description:

    The design of a stratified simple random sample without replacement from a finite population deals with two main issues: the definition of a rule to partition the population into strata, and the allocation of sampling units in the selected strata. This article examines a tree-based strategy which plans to approach jointly these issues when the survey is multipurpose and multivariate information, quantitative or qualitative, is available. Strata are formed through a hierarchical divisive algorithm that selects finer and finer partitions by minimizing, at each step, the sample allocation required to achieve the precision levels set for each surveyed variable. In this way, large numbers of constraints can be satisfied without drastically increasing the sample size, and also without discarding variables selected for stratification or diminishing the number of their class intervals. Furthermore, the algorithm tends not to define empty or almost empty strata, thus avoiding the need for strata collapsing aggregations. The procedure was applied to redesign the Italian Farm Structure Survey. The results indicate that the gain in efficiency held using our strategy is nontrivial. For a given sample size, this procedure achieves the required precision by exploiting a number of strata which is usually a very small fraction of the number of strata available when combining all possible classes from any of the covariates.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210763
    Description:

    The present work illustrates a sampling strategy useful for obtaining planned sample size for domains belonging to different partitions of the population and in order to guarantee the sampling errors of domain estimates be lower than given thresholds. The sampling strategy that covers the multivariate multi-domain case is useful when the overall sample size is bounded and consequently the standard solution of using a stratified sample with the strata given by cross-classification of variables defining the different partitions is not feasible since the number of strata is larger than the overall sample size. The proposed sampling strategy is based on the use of balanced sampling selection technique and on a GREG-type estimation. The main advantages of the solution is the computational feasibility which allows one to easily implement an overall small area strategy considering jointly the sampling design and the estimator and improving the efficiency of the direct domain estimators. An empirical simulation on real population data and different domain estimators shows the empirical properties of the examined sample strategy.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210758
    Description:

    We propose a method for estimating the variance of estimators of changes over time, a method that takes account of all the components of these estimators: the sampling design, treatment of non-response, treatment of large companies, correlation of non-response from one wave to another, the effect of using a panel, robustification, and calibration using a ratio estimator. This method, which serves to determine the confidence intervals of changes over time, is then applied to the Swiss survey of value added.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210764
    Description:

    This paper considers situations where the target response value is either zero or an observation from a continuous distribution. A typical example analyzed in the paper is the assessment of literacy proficiency with the possible outcome being either zero, indicating illiteracy, or a positive score measuring the level of literacy. Our interest is in how to obtain valid estimates of the average response, or the proportion of positive responses in small areas, for which only small samples or no samples are available. As in other small area estimation problems, the small sample sizes in at least some of the sampled areas and/or the existence of nonsampled areas requires the use of model based methods. Available methods, however, are not suitable for this kind of data because of the mixed distribution of the responses, having a large peak at zero, juxtaposed to a continuous distribution for the rest of the responses. We develop, therefore, a suitable two-part random effects model and show how to fit the model and assess its goodness of fit, and how to compute the small area estimators of interest and measure their precision. The proposed method is illustrated using simulated data and data obtained from a literacy survey conducted in Cambodia.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210759
    Description:

    The analysis of stratified multistage sample data requires the use of design information such as stratum and primary sampling unit (PSU) identifiers, or associated replicate weights, in variance estimation. In some public release data files, such design information is masked as an effort to avoid their disclosure risk and yet to allow the user to obtain valid variance estimation. For example, in area surveys with a limited number of PSUs, the original PSUs are split or/and recombined to construct pseudo-PSUs with swapped second or subsequent stage sampling units. Such PSU masking methods, however, obviously distort the clustering structure of the sample design, yielding biased variance estimates possibly with certain systematic patterns between two variance estimates from the unmasked and masked PSU identifiers. Some of the previous work observed patterns in the ratio of the masked and unmasked variance estimates when plotted against the unmasked design effect. This paper investigates the effect of PSU masking on variance estimates under cluster sampling regarding various aspects including the clustering structure and the degree of masking. Also, we seek a PSU masking strategy through swapping of subsequent stage sampling units that helps reduce the resulting biases of the variance estimates. For illustration, we used data from the National Health Interview Survey (NHIS) with some artificial modification. The proposed strategy performs very well in reducing the biases of variance estimates. Both theory and empirical results indicate that the effect of PSU masking on variance estimates is modest with minimal swapping of subsequent stage sampling units. The proposed masking strategy has been applied to the 2003-2004 National Health and Nutrition Examination Survey (NHANES) data release.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210756
    Description:

    In longitudinal surveys nonresponse often occurs in a pattern that is not monotone. We consider estimation of time-dependent means under the assumption that the nonresponse mechanism is last-value-dependent. Since the last value itself may be missing when nonresponse is nonmonotone, the nonresponse mechanism under consideration is nonignorable. We propose an imputation method by first deriving some regression imputation models according to the nonresponse mechanism and then applying nonparametric regression imputation. We assume that the longitudinal data follow a Markov chain with finite second-order moments. No other assumption is imposed on the joint distribution of longitudinal data and their nonresponse indicators. A bootstrap method is applied for variance estimation. Some simulation results and an example concerning the Current Employment Survey are presented.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210754
    Description:

    The context of the discussion is the increasing incidence of international surveys, of which one is the International Tobacco Control (ITC) Policy Evaluation Project, which began in 2002. The ITC country surveys are longitudinal, and their aim is to evaluate the effects of policy measures being introduced in various countries under the WHO Framework Convention on Tobacco Control. The challenges of organization, data collection and analysis in international surveys are reviewed and illustrated. Analysis is an increasingly important part of the motivation for large scale cross-cultural surveys. The fundamental challenge for analysis is to discern the real response (or lack of response) to policy change, separating it from the effects of data collection mode, differential non-response, external events, time-in-sample, culture, and language. Two problems relevant to statistical analysis are discussed. The first problem is the question of when and how to analyze pooled data from several countries, in order to strengthen conclusions which might be generally valid. While in some cases this seems to be straightforward, there are differing opinions on the extent to which pooling is possible and reasonable. It is suggested that for formal comparisons, random effects models are of conceptual use. The second problem is to find models of measurement across cultures and data collection modes which will enable calibration of continuous, binary and ordinal responses, and produce comparisons from which extraneous effects have been removed. It is noted that hierarchical models provide a natural way of relaxing requirements of model invariance across groups.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210768
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2008-12-23

  • Articles and reports: 82-622-X2008001
    Description:

    In this study, I examine the factorial validity of selected modules from the Canadian Survey of Experiences with Primary Health Care (CSE-PHC), in order to determine the potential for combining the items within each module into summary indices representing global primary health care concepts. The modules examined were: Patient Assessment of Chronic Illness Care (PACIC), Patient Activation (PA), Managing Own Health Care (MOHC), and Confidence in the Health Care System (CHCS). Confirmatory factor analyses were conducted on each module to assess the degree to which multiple observed items reflected the presence of common latent factors. While a four-factor model was initially specified for the PACIC instrument on the basis of priory theory and research, it did not fit the data well; rather, a revised two-factor model was found to be most appropriate. These two factors were labelled: "Whole Person Care" and "Coordination of Care". The remaining modules studied here (i.e., PA, MOHC, and CHCS) were all well-represented by single-factor models. The results suggest that the original factor structure of the PACIC developed within studies using clinical samples does not hold in general populations, although the precise reasons for this are not clear. Further empirical investigation will be required to shed more light on this discrepancy. The two factors identified here for the PACIC, as well as the single factors produced for the PA, MOHC, and CHCS could be used as the basis of summary indices for use in further analyses with the CSE-PHC.

    Release date: 2008-07-08

  • Articles and reports: 12-001-X200800110618
    Description:

    The National Health and Nutrition Examination Survey (NHANES) is one of a series of health-related programs sponsored by the United States National Center for Health Statistics. A unique feature of NHANES is the administration of a complete medical examination for each respondent in the sample. To standardize administration, these examinations are carried out in mobile examination centers. The examination includes physical measurements, tests such as eye and dental examinations, and the collection of blood and urine specimens for laboratory testing. NHANES is an ongoing annual health survey of the noninstitutionalized civilian population of the United States. The major analytic goals of NHANES include estimating the number and percentage of persons in the U.S. population and in designated subgroups with selected diseases and risk factors. The sample design for NHANES must create a balance between the requirements for efficient annual and multiyear samples and the flexibility that allows changes in key design parameters to make the survey more responsive to the needs of the research and health policy communities. This paper discusses the challenges involved in designing and implementing a sample selection process that satisfies the goals of NHANES.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110642
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110616
    Description:

    With complete multivariate data the BACON algorithm (Billor, Hadi and Vellemann 2000) yields a robust estimate of the covariance matrix. The corresponding Mahalanobis distance may be used for multivariate outlier detection. When items are missing the EM algorithm is a convenient way to estimate the covariance matrix at each iteration step of the BACON algorithm. In finite population sampling the EM algorithm must be enhanced to estimate the covariance matrix of the population rather than of the sample. A version of the EM algorithm for survey data following a multivariate normal model, the EEM algorithm (Estimated Expectation Maximization), is proposed. The combination of the two algorithms, the BACON-EEM algorithm, is applied to two datasets and compared with alternative methods.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110606
    Description:

    Data from election polls in the US are typically presented in two-way categorical tables, and there are many polls before the actual election in November. For example, in the Buckeye State Poll in 1998 for governor there are three polls, January, April and October; the first category represents the candidates (e.g., Fisher, Taft and other) and the second category represents the current status of the voters (likely to vote and not likely to vote for governor of Ohio). There is a substantial number of undecided voters for one or both categories in all three polls, and we use a Bayesian method to allocate the undecided voters to the three candidates. This method permits modeling different patterns of missingness under ignorable and nonignorable assumptions, and a multinomial-Dirichlet model is used to estimate the cell probabilities which can help to predict the winner. We propose a time-dependent nonignorable nonresponse model for the three tables. Here, a nonignorable nonresponse model is centered on an ignorable nonresponse model to induce some flexibility and uncertainty about ignorabilty or nonignorability. As competitors we also consider two other models, an ignorable and a nonignorable nonresponse model. These latter two models assume a common stochastic process to borrow strength over time. Markov chain Monte Carlo methods are used to fit the models. We also construct a parameter that can potentially be used to predict the winner among the candidates in the November election.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110610
    Description:

    A new generalized regression estimator of a finite population total based on the Box-Cox transformation technique and its variance estimator are proposed under a general unequal probability sampling design. By being design consistent, the proposed estimator maintains the robustness property of the GREG estimator even if the underlying model fails. Furthermore, the Box-Cox technique automatically finds a reasonable transformation for the dependent variable using the data. The robustness and efficiency of the new estimator are evaluated analytically and via Monte Carlo simulation studies.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110613
    Description:

    The International Tobacco Control (ITC) Policy Evaluation Survey of China uses a multi-stage unequal probability sampling design with upper level clusters selected by the randomized systematic PPS sampling method. A difficulty arises in the execution of the survey: several selected upper level clusters refuse to participate in the survey and have to be replaced by substitute units, selected from units not included in the initial sample and once again using the randomized systematic PPS sampling method. Under such a scenario the first order inclusion probabilities of the final selected units are very difficult to calculate and the second order inclusion probabilities become virtually intractable. In this paper we develop a simulation-based approach for computing the first and the second order inclusion probabilities when direct calculation is prohibitive or impossible. The efficiency and feasibility of the proposed approach are demonstrated through both theoretical considerations and numerical examples. Several R/S-PLUS functions and codes for the proposed procedure are included. The approach can be extended to handle more complex refusal/substitution scenarios one may encounter in practice.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110619
    Description:

    Small area prediction based on random effects, called EBLUP, is a procedure for constructing estimates for small geographical areas or small subpopulations using existing survey data. The total of the small area predictors is often forced to equal the direct survey estimate and such predictors are said to be calibrated. Several calibrated predictors are reviewed and a criterion that unifies the derivation of these calibrated predictors is presented. The predictor that is the unique best linear unbiased predictor under the criterion is derived and the mean square error of the calibrated predictors is discussed. Implicit in the imposition of the restriction is the possibility that the small area model is misspecified and the predictors are biased. Augmented models with one additional explanatory variable for which the usual small area predictors achieve the self-calibrated property are considered. Simulations demonstrate that calibrated predictors have slightly smaller bias compared to those of the usual EBLUP predictor. However, if the bias is a concern, a better approach is to use an augmented model with an added auxiliary variable that is a function of area size. In the simulation, the predictors based on the augmented model had smaller MSE than EBLUP when the incorrect model was used for prediction. Furthermore, there was a very small increase in MSE relative to EBLUP if the auxiliary variable was added to the correct model.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110611
    Description:

    In finite population sampling prior information is often available in the form of partial knowledge about an auxiliary variable, for example its mean may be known. In such cases, the ratio estimator and the regression estimator are often used for estimating the population mean of the characteristic of interest. The Polya posterior has been developed as a noninformative Bayesian approach to survey sampling. It is appropriate when little or no prior information about the population is available. Here we show that it can be extended to incorporate types of partial prior information about auxiliary variables. We will see that it typically yields procedures with good frequentist properties even in some problems where standard frequentist methods are difficult to apply.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110614
    Description:

    The Canadian Labour Force Survey (LFS) produces monthly estimates of the unemployment rate at national and provincial levels. The LFS also releases unemployment estimates for sub-provincial areas such as Census Metropolitan Areas (CMAs) and Urban Centers (UCs). However, for some sub-provincial areas, the direct estimates are not reliable since the sample size in some areas is quite small. The small area estimation in LFS concerns estimation of unemployment rates for local sub-provincial areas such as CMA/UCs using small area models. In this paper, we will discuss various models including the Fay-Herriot model and cross-sectional and time series models. In particular, an integrated non-linear mixed effects model will be proposed under the hierarchical Bayes (HB) framework for the LFS unemployment rate estimation. Monthly Employment Insurance (EI) beneficiary data at the CMA/UC level are used as auxiliary covariates in the model. A HB approach with the Gibbs sampling method is used to obtain the estimates of posterior means and posterior variances of the CMA/UC level unemployment rates. The proposed HB model leads to reliable model-based estimates in terms of CV reduction. Model fit analysis and comparison of the model-based estimates with the direct estimates are presented in the paper.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110607
    Description:

    Respondent incentives are increasingly used as a measure of combating falling response rates and resulting risks of nonresponse bias. Nonresponse in panel surveys is particularly problematic, since even low wave-on-wave nonresponse rates can lead to substantial cumulative losses; if nonresponse is differential, this may lead to increasing bias across waves. Although the effects of incentives have been studied extensively in cross-sectional contexts, little is known about cumulative effects across waves of a panel. We provide new evidence about the effects of continued incentive payments on attrition, bias and item nonresponse, using data from a large scale, multi-wave, mixed mode incentive experiment on a UK government panel survey of young people. In this study, incentives significantly reduced attrition, far outweighing negative effects on item response rates in terms of the amount of information collected by the survey per issued case. Incentives had proportionate effects on retention rates across a range of respondent characteristics and as a result did not reduce attrition bias in terms of those characteristics. The effects of incentives on retention rates were larger for unconditional than conditional incentives and larger in postal than telephone mode. Across waves, the effects on attrition decreased somewhat, although the effects on item nonresponse and the lack of effect on bias remained constant. The effects of incentives at later waves appeared to be independent of incentive treatments and mode of data collection at earlier waves.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110615
    Description:

    We consider optimal sampling rates in element-sampling designs when the anticipated analysis is survey-weighted linear regression and the estimands of interest are linear combinations of regression coefficients from one or more models. Methods are first developed assuming that exact design information is available in the sampling frame and then generalized to situations in which some design variables are available only as aggregates for groups of potential subjects, or from inaccurate or old data. We also consider design for estimation of combinations of coefficients from more than one model. A further generalization allows for flexible combinations of coefficients chosen to improve estimation of one effect while controlling for another. Potential applications include estimation of means for several sets of overlapping domains, or improving estimates for subpopulations such as minority races by disproportionate sampling of geographic areas. In the motivating problem of designing a survey on care received by cancer patients (the CanCORS study), potential design information included block-level census data on race/ethnicity and poverty as well as individual-level data. In one study site, an unequal-probability sampling design using the subjectss residential addresses and census data would have reduced the variance of the estimator of an income effect by 25%, or by 38% if the subjects' races were also known. With flexible weighting of the income contrasts by race, the variance of the estimator would be reduced by 26% using residential addresses alone and by 52% using addresses and races. Our methods would be useful in studies in which geographic oversampling by race-ethnicity or socioeconomic characteristics is considered, or in any study in which characteristics available in sampling frames are measured with error.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110612
    Description:

    Lehtonen and Veijanen (1999) proposed a new model-assisted generalized regression (GREG) estimator of a small area mean under a two-level model. They have shown that the proposed estimator performs better than the customary GREG estimator in terms of average absolute relative bias and average median absolute relative error. We derive the mean squared error (MSE) of the new GREG estimator under the two-level model and compare it to the MSE of the best linear unbiased prediction (BLUP) estimator. We also provide empirical results on the relative efficiency of the estimators. We show that the new GREG estimator exhibits better performance relative to the customary GREG estimator in terms of average MSE and average absolute relative error. We also show that, due to borrowing strength from related small areas, the EBLUP estimator exhibits significantly better performance relative to the customary GREG and the new GREG estimators. We provide simulation results under a model-based set-up as well as under a real finite population.

    Release date: 2008-06-26

Reference (69)

Reference (69) (25 of 69 results)

  • Surveys and statistical programs – Documentation: 62F0026M2009001
    Description:

    This guide presents information of interest to users of data from the Survey of Household Spending, which gathers information on the spending habits, dwelling characteristics and household equipment of Canadian households. The survey covers private households in the 10 provinces. (The territories are surveyed every second year, starting in 1999.)

    This guide includes definitions of survey terms and variables, as well as descriptions of survey methodology and data quality. One section describes the various statistics that can be created using expenditure data (e.g., budget share, market share, aggregates and medians)

    Release date: 2008-12-22

  • Technical products: 75F0002M2008005
    Description:

    The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey initiated in 1993. The survey was designed to measure changes in the economic well-being of Canadians as well as the factors affecting these changes. Sample surveys are subject to sampling errors. In order to consider these errors, each estimates presented in the "Income Trends in Canada" series comes with a quality indicator based on the coefficient of variation. However, other factors must also be considered to make sure data are properly used. Statistics Canada puts considerable time and effort to control errors at every stage of the survey and to maximise the fitness for use. Nevertheless, the survey design and the data processing could restrict the fitness for use. It is the policy at Statistics Canada to furnish users with measures of data quality so that the user is able to interpret the data properly. This report summarizes the set of quality measures of SLID data. Among the measures included in the report are sample composition and attrition rates, sampling errors, coverage errors in the form of slippage rates, response rates, tax permission and tax linkage rates, and imputation rates.

    Release date: 2008-08-20

  • Technical products: 11-522-X200600110424
    Description:

    The International Tobacco Control (ITC) Policy Evaluation China Survey uses a multi-stage unequal probability sampling design with upper level clusters selected by the randomized systematic PPS sampling method. A difficulty arises in the execution of the survey: several selected upper level clusters refuse to participate in the survey and have to be replaced by substitute units, selected from units not included in the initial sample and once again using the randomized systematic PPS sampling method. Under such a scenario the first order inclusion probabilities of the final selected units are very difficult to calculate and the second order inclusion probabilities become virtually intractable. In this paper we develop a simulation-based approach for computing the first and the second order inclusion probabilities when direct calculation is prohibitive or impossible. The efficiency and feasibility of the proposed approach are demonstrated through both theoretical considerations and numerical examples. Several R/S-PLUS functions and codes for the proposed procedure are included. The approach can be extended to handle more complex refusal/substitution scenarios one may encounter in practice.

    Release date: 2008-06-26

  • Technical products: 75F0002M2008002
    Description:

    The Survey of Labour and Income Dynamics (SLID) conducts an annual labour and income interview in January. The data are collected using computer-assisted interviewing; thus no paper questionnaire is required for data collection. The questions, responses and interview flow for labour and income are documented in another SLID research paper. This document presents the information for the 2007 entry and exit portions of the labour and income interview (reference year 2006).

    The entry exit component consists of five separate modules. The entry module is the first set of data collected. It is information collected to update the place of residence, housing conditions and expenses, as well as the household composition. For each person identified in entry, the demographics module collects (or updates) the person's name, date of birth, sex and marital status. Then the relationships module identifies (or updates) the relationship between each respondent and every other household member. The exit module includes questions on who to contact for the next interview and the names, phone numbers and addresses of two contacts to be used only if future tracing of respondents is required. An overview of the tracing component is also included in this document.

    Release date: 2008-05-30

  • Technical products: 75F0002M2008003
    Description:

    The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey which collects information related to the standard of living of individuals and their families. By interviewing the same people over a period of six years, changes and the causes of these changes can be monitored.

    A preliminary interview of background information is collected for all respondents aged 16 and over, who enter the SLID sample. Preliminary interviews are conducted for new household members during their first labour and income interview after they join the household. A labour and income interview is collected each year for all respondents 16 years of age and over.

    The purpose of this document is to present the questions, possible responses and question flows for the 2007 preliminary, labour and income questionnaire (for the 2006 reference year).

    Release date: 2008-05-30

  • Technical products: 11-522-X2006001
    Description:

    Since 1984, an annual international symposium on methodological issues has been sponsored by Statistics Canada. Proceedings have been available since 1987. Symposium 2006 was the twenty third in Statistics Canada's series of international symposia on methodological issues. Each year the symposium focuses on a particular them. In 2006 the theme was: "Methodological Issues In Measuring Population Health".

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110437
    Description:

    The New Zealand Ministry of Health has expanded its population health survey, the New Zealand Health Survey (NZHS), to include a questionnaire specifically on child health. The principal aim of the NZHS child questionnaire is to collect health data from parents or caregivers that can be used for monitoring population-level child health status, health service utilisation, and the health risk and protective behaviours that have their origins in childhood. Previously, only data collected through child contact with the health system, for example hospital administration records and disease/injury databases, have been available for monitoring child health in New Zealand. This paper reviews the questionnaire development for the child health component of the 2006/2007 New Zealand Health Survey, including topic selection, question development, cognitive-testing, preliminary sample design, final questionnaire drafting, and dress rehearsal testing.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110404
    Description:

    Pursuing reduction in cost and response burden in survey programs has led to increased use of information available in administrative databases. Linkages between these two data sources is a way to exploit their complementary nature and maximize their respective usefulness. This paper discusses the various ways we have performed record linkage between the Canadian Community Health Survey (CCHS) and the Health Person-Oriented Information (HPOI) databases. The files resulting from selected linkage methods are used in an analysis of risk factors for having been hospitalized for heart disease. The sensitivity of the analysis with respect to the various linkage approaches is investigated.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110438
    Description:

    In accordance with an effort to design a set of questions for the Current Population Survey (CPS) to measure disability, potential questions were drawn from existing surveys, cognitively and field tested. Based on an analysis of the test data, a set of seven questions was identified, cognitively tested, and placed in the February 2006 CPS for testing. Analysis of the data revealed a lower overall disability rate as measured in the CPS than in the field test, with lower positive response rates for each question. The data did not indicate that there was an adverse effect on the response rates.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110443
    Description:

    The Brazilian population has experienced an ageing process, thus characterizing an increase in the number of elderly people. Instruments have been developed in order to measure the quality of life of elderly individuals. Hence, a questionnaire consisting of various validated instruments and an open question was applied to a group of elderly citizens in the city of Botucatu, SP, Brazil. The analysis of the open question, assessed by qualitative methods, generated eleven categories concerning the elderly people's opinions as regards quality of life and a cluster analysis of such answers was carried out, producing three groups of elderly individuals. Therefore, this work aimed at validating the categories obtained by the open question with the closed questions of the instrument by means of associations and application of chi-square tests at a level of significance of 5%. It was observed that qualitative analysis identifies phenomena regardless of category saturation. The quantitative method, on the other hand, shows the power of each category in a set, that is, as a whole.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110397
    Description:

    In practice it often happens that some collected data are subject to measurement error. Sometimes covariates (or risk factors) of interest may be difficult to observe precisely due to physical location or cost. Sometimes it is impossible to measure covariates accurately due to the nature of the covariates. In other situations, a covariate may represent an average of a certain quantity over time, and any practical way of measuring such a quantity necessarily features measurement error. When carrying out statistical inference in such settings, it is important to account for the effects of mismeasured covariates; otherwise, erroneous or even misleading results may be produced. In this paper, we discuss several measurement error examples arising in distinct contexts. Specific attention is focused on survival data with covariates subject to measurement error. We discuss a simulation-extrapolation method for adjusting for measurement error effects. A simulation study is reported.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110395
    Description:

    This study investigates factors associated with obesity in Canada and the U.S., using data from the 2002-03 Joint Canada/United States Survey of Health, a telephone survey conducted jointly by Statistics Canada and the U.S. National Center for Health Statistics. Essentially the same questionnaire was administered in both countries at the same time, yielding a data set that provided unprecedented comparability of national estimates from the two countries. Analysis of empirical distributions of body mass index (BMI) show that American women are appreciably heavier than Canadian women, whereas the distributions of BMI are almost identical for American men and Canadian men. Factors are investigated that may account for the differences between women.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110414
    Description:

    In Finland the first national health examination surveys were carried out in the 1960s. Comprehensive surveys of nationally representative population samples have been carried out in 1978 to 1980 (The Mini-Finland Health Survey) and in 2000 to 2001 (Health 2000). Surveys of cardiovascular risk factors, so called FinRisk surveys, have assessed their trends every five years. The health examination surveys are an important tool of health monitoring, and, linked with registers also a rich source of data for epidemiological research. The paper also gives examples on reports published from several of these studies.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110440
    Description:

    Now that we have come to the end of a day of workshops plus two very full days of sessions, I have the very pleasant task of offering a few closing remarks and, more importantly, of recognizing the efforts of those who have contributed to the success of this year's symposium. And it has clearly been a success.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110392
    Description:

    We use a robust Bayesian method to analyze data with possibly nonignorable nonresponse and selection bias. A robust logistic regression model is used to relate the response indicators (Bernoulli random variable) to the covariates, which are available for everyone in the finite population. This relationship can adequately explain the difference between respondents and nonrespondents for the sample. This robust model is obtained by expanding the standard logistic regression model to a mixture of Student's distributions, thereby providing propensity scores (selection probability) which are used to construct adjustment cells. The nonrespondents' values are filled in by drawing a random sample from a kernel density estimator, formed from the respondents' values within the adjustment cells. Prediction uses a linear spline rank-based regression of the response variable on the covariates by areas, sampling the errors from another kernel density estimator; thereby further robustifying our method. We use Markov chain Monte Carlo (MCMC) methods to fit our model. The posterior distribution of a quantile of the response variable is obtained within each sub-area using the order statistic over all the individuals (sampled and nonsampled). We compare our robust method with recent parametric methods

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110452
    Description:

    Accurate information about the timing of access to primary mental health care is critically important in order to identify potentially modifiable factors which could facilitate timely and on-going management of care. No "gold standard" measure of mental health care utilization exists, so it useful to know how strengths, gaps, and limitations in different data sources influence study results. This study compares two population-wide measures of primary mental health care utilization data: the Canadian Community Health Survey of Mental Health and Well-being (CCHS, cycle 1.2) and provincial health insurance records in the province of British Columbia. It explores four questions: (1) Is 12-month prevalence of contacts with general practitioners for mental heath issues the same regardless of whether survey data or administrative data are used? (2) What is the level of agreement between the survey data and administrative data for having had any contact with a general practitioner for mental heath issues during the 12 month period before the survey interview? (3) Is the level of agreement constant throughout the 12-month period or does it decline over more distant sub-timeframes within the 12-month period? (4) What kinds of respondent characteristics, including mental disorders, are associated with agreement or lack of agreement? The results of this study will provide useful information about how to use and interpret each measure of health care utilization. In addition, it will contribute to survey design research, and to research which aims to improve the methods for using administrative data for mental health services research.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110422
    Description:

    Many population surveys collecting food consumption data use 24 hour recall methodology to capture detailed one day intakes. In order to estimate longer term intakes of foods and nutrients from these data, methods have been developed that required a repeat recall to be collected from at least a subset of responders in order to estimate day to day variability. During the Canadian Community Health Survey Cycle 2.2 Nutrition Focus Survey, most first interviews were collected in person and most repeat interviews were conducted by telephone. This paper looks at the impact of the mode of interview on the reported foods and nutrients on both the first day and the repeat day and on the estimation of intra individual variability between the first and the second interviews.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110391
    Description:

    Small area estimation using linear area level models typically assumes normality of the area level random effects (model errors) and of the survey errors of the direct survey estimates. Outlying observations can be a concern, and can arise from outliers in either the model errors or the survey errors, two possibilities with very different implications. We consider both possibilities here and investigate empirically how use of a Bayesian approach with a t-distribution assumed for one of the error components can address potential outliers. The empirical examples use models for U.S. state poverty ratios from the U.S. Census Bureau's Small Area Income and Poverty Estimates program, extending the usual Gaussian models to assume a t-distribution for the model error or survey error. Results are examined to see how they are affected by varying the number of degrees of freedom (assumed known) of the t-distribution. We find that using a t-distribution with low degrees of freedom can diminish the effects of outliers, but in the examples discussed the results do not go as far as approaching outright rejection of observations.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110444
    Description:

    General population health surveys often include small samples of smokers. Few longitudinal studies specific to smoking have been carried out. We discuss development of the Ontario Tobacco Survey (OTS) which combines a rolling longitudinal, and repeated cross-sectional components. The OTS began in July 2005 using random selection and data-collection by telephones. Every 6 months, new samples of smokers and non-smokers provide data on smoking behaviours and attitudes. Smokers enter a panel study and are followed for changes in smoking influences and behaviour. The design is proving to be cost effective in meeting sample requirements for multiple research objectives.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110370
    Description:

    Many countries conduct surveys that focus specifically on their population's health. Because health plays a key role in most aspects of life, health data are also often collected in population surveys on other topics. The subject matter of population health surveys broadly encompasses physical and mental heath, dental health, disabilities, substance abuse, health risk factors, nutrition, health promotion, health care utilization and quality, health coverage, and costs. Some surveys focus on specific health conditions, whereas others aim to obtain an overall health assessment. Health is often an important component in longitudinal studies, particularly in birth and aging cohorts. Information about health can be collected by respondents' reports (for themselves and sometimes for others), by medical examinations, and by collecting biological measures. There is a serious concern about the accuracy of health information collected by respondents' reports. Logistical issues, cost considerations, and respondent cooperation feature prominently when the information is collected by medical examinations. Ethical and privacy issues are often important, particularly when DNA and biomarkers are involved. International comparability of health measures is of growing importance. This paper reviews the methodology for a range of health surveys and will discuss the challenges in obtaining accurate data in this field.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110410
    Description:

    The U.S. Survey of Occupational Illnesses and Injuries (SOII) is a large-scale establishment survey conducted by the Bureau of Labor Statistics to measure incidence rates and impact of occupational illnesses and injuries within specified industries at the national and state levels. This survey currently uses relatively simple procedures for detection and treatment of outliers. The outlier-detection methods center on comparison of reported establishment-level incidence rates to the corresponding distribution of reports within specified cells defined by the intersection of state and industry classifications. The treatment methods involve replacement of standard probability weights with a weight set equal to one, followed by a benchmark adjustment.

    One could use more complex methods for detection and treatment of outliers for the SOII, e.g., detection methods that use influence functions, probability weights and multivariate observations; or treatment methods based on Winsorization or M-estimation. Evaluation of the practical benefits of these more complex methods requires one to consider three important factors. First, severe outliers are relatively rare, but when they occur, they may have a severe impact on SOII estimators in cells defined by the intersection of states and industries. Consequently, practical evaluation of the impact of outlier methods focuses primarily on the tails of the distributions of estimators, rather than standard aggregate performance measures like variance or mean squared error. Second, the analytic and data-based evaluations focus on the incremental improvement obtained through use of the more complex methods, relative to the performance of the simple methods currently in place. Third, development of the abovementioned tools requires somewhat nonstandard asymptotics the reflect trade-offs in effects associated with, respectively, increasing sample sizes; increasing numbers of publication cells; and changing tails of underlying distributions of observations.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110401
    Description:

    The Australian Bureau of Statistics (ABS) will begin the formation of a Statistical Longitudinal Census Data Set (SLCD) by choosing a 5% sample of people from the 2006 population census to be linked probabilistically with subsequent censuses. A long-term aim is to use the power of the rich longitudinal demographic data provided by the SLCD to shed light on a variety of issues which cannot be addressed using cross-sectional data. The SLCD may be further enhanced by probabilistically linking it with births, deaths, immigration settlements or disease registers. This paper gives a brief description of recent developments in data linking at the ABS, outlines the data linking methodology and quality measures we have considered and summarises preliminary results using Census Dress Rehearsal data.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110453
    Description:

    National Food and Nutrition Surveys provide critical information to support the understanding the complex relationship between health and diet in the population. Many of these surveys use 24 hour recall methodology which collects at a detailed level all food and beverages consumed over a day. Often it is the longer term intake of foods and nutrients that is of interest and a number of techniques are available that allow estimation of population usual intakes. These techniques require that at least one repeat 24 hour recall be collected from at least a subset of the population in order to estimate the intra individual variability of intakes. Deciding on the number of individuals required to provide a repeat is an important step in the survey design that must recognize that too few repeat individuals compromises the ability to estimate usual intakes, but large numbers of repeats are costly and pose added burden to the respondents. This paper looks at the statistical issues related to the number of repeat individuals, assessing the impact of the number of repeaters on the stability and uncertainty in the estimate of intra individual variability and provides guidance on required number of repeat responders .

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110426
    Description:

    This paper describes the sample design used to satisfy the objectives and logistics of the Canadian Health Measures Survey. Among the challenges in developing the design were the need to select respondents close to clinics, the difficulty of achieving the desired sample size for young people, and subsampling for measures associated with exposure to environmental contaminants. The sample design contains solutions to those challenges: the establishment of collection sites, the use of more than one sample frame, and a respondent selection strategy.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110427
    Description:

    The National Health and Nutrition Examination Surveys (NHANES) is one of a series of health-related programs sponsored by the United States National Center for Health Statistics. A unique feature of NHANES is the administration of a complete medical examination for each respondent in the sample. To standardize administration, these examinations are carried out in mobile examination centers (MECs). The examination includes physical measurements, tests such as eye and dental examinations, and the collection of blood and urine specimens for laboratory testing. NHANES is an ongoing annual health survey of the noninstitutionalized civilian population of the United States. The major analytic goals of NHANES include estimating the number and percentage of persons in the U.S. population and in designated subgroups with selected diseases and risk factors. The sample design for NHANES needs to create a balance between the requirements for efficient annual and multiyear samples and the flexibility that allows changes in key design parameters to make the survey more responsive to the needs of the research and health policy communities. This paper discusses the challenges involved in designing and implementing a sample selection process that satisfies the goals of NHANES.

    Release date: 2008-03-17

Date modified: