Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (106)

All (106) (25 of 106 results)

  • Articles and reports: 12-001-X200800210764
    Description:

    This paper considers situations where the target response value is either zero or an observation from a continuous distribution. A typical example analyzed in the paper is the assessment of literacy proficiency with the possible outcome being either zero, indicating illiteracy, or a positive score measuring the level of literacy. Our interest is in how to obtain valid estimates of the average response, or the proportion of positive responses in small areas, for which only small samples or no samples are available. As in other small area estimation problems, the small sample sizes in at least some of the sampled areas and/or the existence of nonsampled areas requires the use of model based methods. Available methods, however, are not suitable for this kind of data because of the mixed distribution of the responses, having a large peak at zero, juxtaposed to a continuous distribution for the rest of the responses. We develop, therefore, a suitable two-part random effects model and show how to fit the model and assess its goodness of fit, and how to compute the small area estimators of interest and measure their precision. The proposed method is illustrated using simulated data and data obtained from a literacy survey conducted in Cambodia.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210759
    Description:

    The analysis of stratified multistage sample data requires the use of design information such as stratum and primary sampling unit (PSU) identifiers, or associated replicate weights, in variance estimation. In some public release data files, such design information is masked as an effort to avoid their disclosure risk and yet to allow the user to obtain valid variance estimation. For example, in area surveys with a limited number of PSUs, the original PSUs are split or/and recombined to construct pseudo-PSUs with swapped second or subsequent stage sampling units. Such PSU masking methods, however, obviously distort the clustering structure of the sample design, yielding biased variance estimates possibly with certain systematic patterns between two variance estimates from the unmasked and masked PSU identifiers. Some of the previous work observed patterns in the ratio of the masked and unmasked variance estimates when plotted against the unmasked design effect. This paper investigates the effect of PSU masking on variance estimates under cluster sampling regarding various aspects including the clustering structure and the degree of masking. Also, we seek a PSU masking strategy through swapping of subsequent stage sampling units that helps reduce the resulting biases of the variance estimates. For illustration, we used data from the National Health Interview Survey (NHIS) with some artificial modification. The proposed strategy performs very well in reducing the biases of variance estimates. Both theory and empirical results indicate that the effect of PSU masking on variance estimates is modest with minimal swapping of subsequent stage sampling units. The proposed masking strategy has been applied to the 2003-2004 National Health and Nutrition Examination Survey (NHANES) data release.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210758
    Description:

    We propose a method for estimating the variance of estimators of changes over time, a method that takes account of all the components of these estimators: the sampling design, treatment of non-response, treatment of large companies, correlation of non-response from one wave to another, the effect of using a panel, robustification, and calibration using a ratio estimator. This method, which serves to determine the confidence intervals of changes over time, is then applied to the Swiss survey of value added.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210768
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210762
    Description:

    This paper considers the optimum allocation in multivariate stratified sampling as a nonlinear matrix optimisation of integers. As a particular case, a nonlinear problem of the multi-objective optimisation of integers is studied. A full detailed example including some of proposed techniques is provided at the end of the work.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210761
    Description:

    Optimum stratification is the method of choosing the best boundaries that make strata internally homogeneous, given some sample allocation. In order to make the strata internally homogenous, the strata should be constructed in such a way that the strata variances for the characteristic under study be as small as possible. This could be achieved effectively by having the distribution of the main study variable known and create strata by cutting the range of the distribution at suitable points. If the frequency distribution of the study variable is unknown, it may be approximated from the past experience or some prior knowledge obtained at a recent study. In this paper the problem of finding Optimum Strata Boundaries (OSB) is considered as the problem of determining Optimum Strata Widths (OSW). The problem is formulated as a Mathematical Programming Problem (MPP), which minimizes the variance of the estimated population parameter under Neyman allocation subject to the restriction that sum of the widths of all the strata is equal to the total range of the distribution. The distributions of the study variable are considered as continuous with Triangular and Standard Normal density functions. The formulated MPPs, which turn out to be multistage decision problems, can then be solved using dynamic programming technique proposed by Bühler and Deutler (1975). Numerical examples are presented to illustrate the computational details. The results obtained are also compared with the method of Dalenius and Hodges (1959) with an example of normal distribution.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210755
    Description:

    Dependent interviewing (DI) is used in many longitudinal surveys to "feed forward" data from one wave to the next. Though it is a promising technique which has been demonstrated to enhance data quality in certain respects, relatively little is known about how it is actually administered in the field. This research seeks to address this issue through behavior coding. Various styles of DI were employed in the English Longitudinal Study of Ageing (ELSA) in January, 2006, and recordings were made of pilot field interviews. These recordings were analysed to determine whether the questions (particularly the DI aspects) were administered appropriately and to explore the respondent's reaction to the fed-forward data. Of particular interest was whether respondents confirmed or challenged the previously-reported information, whether the prior wave data came into play when respondents were providing their current-wave answers, and how any discrepancies were negotiated by the interviewer and respondent. Also of interest was to examine the effectiveness of various styles of DI. For example, in some cases the prior wave data was brought forward and respondents were asked to explicitly confirm it; in other cases the previous data was read and respondents were asked if the situation was still the same. Results indicate varying levels of compliance in terms of initial question-reading, and suggest that some styles of DI may be more effective than others.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210757
    Description:

    Sample weights can be calibrated to reflect the known population totals of a set of auxiliary variables. Predictors of finite population totals calculated using these weights have low bias if these variables are related to the variable of interest, but can have high variance if too many auxiliary variables are used. This article develops an "adaptive calibration" approach, where the auxiliary variables to be used in weighting are selected using sample data. Adaptively calibrated estimators are shown to have lower mean squared error and better coverage properties than non-adaptive estimators in many cases.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210756
    Description:

    In longitudinal surveys nonresponse often occurs in a pattern that is not monotone. We consider estimation of time-dependent means under the assumption that the nonresponse mechanism is last-value-dependent. Since the last value itself may be missing when nonresponse is nonmonotone, the nonresponse mechanism under consideration is nonignorable. We propose an imputation method by first deriving some regression imputation models according to the nonresponse mechanism and then applying nonparametric regression imputation. We assume that the longitudinal data follow a Markov chain with finite second-order moments. No other assumption is imposed on the joint distribution of longitudinal data and their nonresponse indicators. A bootstrap method is applied for variance estimation. Some simulation results and an example concerning the Current Employment Survey are presented.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210763
    Description:

    The present work illustrates a sampling strategy useful for obtaining planned sample size for domains belonging to different partitions of the population and in order to guarantee the sampling errors of domain estimates be lower than given thresholds. The sampling strategy that covers the multivariate multi-domain case is useful when the overall sample size is bounded and consequently the standard solution of using a stratified sample with the strata given by cross-classification of variables defining the different partitions is not feasible since the number of strata is larger than the overall sample size. The proposed sampling strategy is based on the use of balanced sampling selection technique and on a GREG-type estimation. The main advantages of the solution is the computational feasibility which allows one to easily implement an overall small area strategy considering jointly the sampling design and the estimator and improving the efficiency of the direct domain estimators. An empirical simulation on real population data and different domain estimators shows the empirical properties of the examined sample strategy.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210754
    Description:

    The context of the discussion is the increasing incidence of international surveys, of which one is the International Tobacco Control (ITC) Policy Evaluation Project, which began in 2002. The ITC country surveys are longitudinal, and their aim is to evaluate the effects of policy measures being introduced in various countries under the WHO Framework Convention on Tobacco Control. The challenges of organization, data collection and analysis in international surveys are reviewed and illustrated. Analysis is an increasingly important part of the motivation for large scale cross-cultural surveys. The fundamental challenge for analysis is to discern the real response (or lack of response) to policy change, separating it from the effects of data collection mode, differential non-response, external events, time-in-sample, culture, and language. Two problems relevant to statistical analysis are discussed. The first problem is the question of when and how to analyze pooled data from several countries, in order to strengthen conclusions which might be generally valid. While in some cases this seems to be straightforward, there are differing opinions on the extent to which pooling is possible and reasonable. It is suggested that for formal comparisons, random effects models are of conceptual use. The second problem is to find models of measurement across cultures and data collection modes which will enable calibration of continuous, binary and ordinal responses, and produce comparisons from which extraneous effects have been removed. It is noted that hierarchical models provide a natural way of relaxing requirements of model invariance across groups.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210760
    Description:

    The design of a stratified simple random sample without replacement from a finite population deals with two main issues: the definition of a rule to partition the population into strata, and the allocation of sampling units in the selected strata. This article examines a tree-based strategy which plans to approach jointly these issues when the survey is multipurpose and multivariate information, quantitative or qualitative, is available. Strata are formed through a hierarchical divisive algorithm that selects finer and finer partitions by minimizing, at each step, the sample allocation required to achieve the precision levels set for each surveyed variable. In this way, large numbers of constraints can be satisfied without drastically increasing the sample size, and also without discarding variables selected for stratification or diminishing the number of their class intervals. Furthermore, the algorithm tends not to define empty or almost empty strata, thus avoiding the need for strata collapsing aggregations. The procedure was applied to redesign the Italian Farm Structure Survey. The results indicate that the gain in efficiency held using our strategy is nontrivial. For a given sample size, this procedure achieves the required precision by exploiting a number of strata which is usually a very small fraction of the number of strata available when combining all possible classes from any of the covariates.

    Release date: 2008-12-23

  • Surveys and statistical programs – Documentation: 62F0026M2009001
    Description:

    This guide presents information of interest to users of data from the Survey of Household Spending, which gathers information on the spending habits, dwelling characteristics and household equipment of Canadian households. The survey covers private households in the 10 provinces. (The territories are surveyed every second year, starting in 1999.)

    This guide includes definitions of survey terms and variables, as well as descriptions of survey methodology and data quality. One section describes the various statistics that can be created using expenditure data (e.g., budget share, market share, aggregates and medians)

    Release date: 2008-12-22

  • Technical products: 75F0002M2008005
    Description:

    The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey initiated in 1993. The survey was designed to measure changes in the economic well-being of Canadians as well as the factors affecting these changes. Sample surveys are subject to sampling errors. In order to consider these errors, each estimates presented in the "Income Trends in Canada" series comes with a quality indicator based on the coefficient of variation. However, other factors must also be considered to make sure data are properly used. Statistics Canada puts considerable time and effort to control errors at every stage of the survey and to maximise the fitness for use. Nevertheless, the survey design and the data processing could restrict the fitness for use. It is the policy at Statistics Canada to furnish users with measures of data quality so that the user is able to interpret the data properly. This report summarizes the set of quality measures of SLID data. Among the measures included in the report are sample composition and attrition rates, sampling errors, coverage errors in the form of slippage rates, response rates, tax permission and tax linkage rates, and imputation rates.

    Release date: 2008-08-20

  • Articles and reports: 82-622-X2008001
    Description:

    In this study, I examine the factorial validity of selected modules from the Canadian Survey of Experiences with Primary Health Care (CSE-PHC), in order to determine the potential for combining the items within each module into summary indices representing global primary health care concepts. The modules examined were: Patient Assessment of Chronic Illness Care (PACIC), Patient Activation (PA), Managing Own Health Care (MOHC), and Confidence in the Health Care System (CHCS). Confirmatory factor analyses were conducted on each module to assess the degree to which multiple observed items reflected the presence of common latent factors. While a four-factor model was initially specified for the PACIC instrument on the basis of priory theory and research, it did not fit the data well; rather, a revised two-factor model was found to be most appropriate. These two factors were labelled: "Whole Person Care" and "Coordination of Care". The remaining modules studied here (i.e., PA, MOHC, and CHCS) were all well-represented by single-factor models. The results suggest that the original factor structure of the PACIC developed within studies using clinical samples does not hold in general populations, although the precise reasons for this are not clear. Further empirical investigation will be required to shed more light on this discrepancy. The two factors identified here for the PACIC, as well as the single factors produced for the PA, MOHC, and CHCS could be used as the basis of summary indices for use in further analyses with the CSE-PHC.

    Release date: 2008-07-08

  • Articles and reports: 12-001-X200800110607
    Description:

    Respondent incentives are increasingly used as a measure of combating falling response rates and resulting risks of nonresponse bias. Nonresponse in panel surveys is particularly problematic, since even low wave-on-wave nonresponse rates can lead to substantial cumulative losses; if nonresponse is differential, this may lead to increasing bias across waves. Although the effects of incentives have been studied extensively in cross-sectional contexts, little is known about cumulative effects across waves of a panel. We provide new evidence about the effects of continued incentive payments on attrition, bias and item nonresponse, using data from a large scale, multi-wave, mixed mode incentive experiment on a UK government panel survey of young people. In this study, incentives significantly reduced attrition, far outweighing negative effects on item response rates in terms of the amount of information collected by the survey per issued case. Incentives had proportionate effects on retention rates across a range of respondent characteristics and as a result did not reduce attrition bias in terms of those characteristics. The effects of incentives on retention rates were larger for unconditional than conditional incentives and larger in postal than telephone mode. Across waves, the effects on attrition decreased somewhat, although the effects on item nonresponse and the lack of effect on bias remained constant. The effects of incentives at later waves appeared to be independent of incentive treatments and mode of data collection at earlier waves.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110606
    Description:

    Data from election polls in the US are typically presented in two-way categorical tables, and there are many polls before the actual election in November. For example, in the Buckeye State Poll in 1998 for governor there are three polls, January, April and October; the first category represents the candidates (e.g., Fisher, Taft and other) and the second category represents the current status of the voters (likely to vote and not likely to vote for governor of Ohio). There is a substantial number of undecided voters for one or both categories in all three polls, and we use a Bayesian method to allocate the undecided voters to the three candidates. This method permits modeling different patterns of missingness under ignorable and nonignorable assumptions, and a multinomial-Dirichlet model is used to estimate the cell probabilities which can help to predict the winner. We propose a time-dependent nonignorable nonresponse model for the three tables. Here, a nonignorable nonresponse model is centered on an ignorable nonresponse model to induce some flexibility and uncertainty about ignorabilty or nonignorability. As competitors we also consider two other models, an ignorable and a nonignorable nonresponse model. These latter two models assume a common stochastic process to borrow strength over time. Markov chain Monte Carlo methods are used to fit the models. We also construct a parameter that can potentially be used to predict the winner among the candidates in the November election.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110615
    Description:

    We consider optimal sampling rates in element-sampling designs when the anticipated analysis is survey-weighted linear regression and the estimands of interest are linear combinations of regression coefficients from one or more models. Methods are first developed assuming that exact design information is available in the sampling frame and then generalized to situations in which some design variables are available only as aggregates for groups of potential subjects, or from inaccurate or old data. We also consider design for estimation of combinations of coefficients from more than one model. A further generalization allows for flexible combinations of coefficients chosen to improve estimation of one effect while controlling for another. Potential applications include estimation of means for several sets of overlapping domains, or improving estimates for subpopulations such as minority races by disproportionate sampling of geographic areas. In the motivating problem of designing a survey on care received by cancer patients (the CanCORS study), potential design information included block-level census data on race/ethnicity and poverty as well as individual-level data. In one study site, an unequal-probability sampling design using the subjectss residential addresses and census data would have reduced the variance of the estimator of an income effect by 25%, or by 38% if the subjects' races were also known. With flexible weighting of the income contrasts by race, the variance of the estimator would be reduced by 26% using residential addresses alone and by 52% using addresses and races. Our methods would be useful in studies in which geographic oversampling by race-ethnicity or socioeconomic characteristics is considered, or in any study in which characteristics available in sampling frames are measured with error.

    Release date: 2008-06-26

  • Technical products: 11-522-X200600110424
    Description:

    The International Tobacco Control (ITC) Policy Evaluation China Survey uses a multi-stage unequal probability sampling design with upper level clusters selected by the randomized systematic PPS sampling method. A difficulty arises in the execution of the survey: several selected upper level clusters refuse to participate in the survey and have to be replaced by substitute units, selected from units not included in the initial sample and once again using the randomized systematic PPS sampling method. Under such a scenario the first order inclusion probabilities of the final selected units are very difficult to calculate and the second order inclusion probabilities become virtually intractable. In this paper we develop a simulation-based approach for computing the first and the second order inclusion probabilities when direct calculation is prohibitive or impossible. The efficiency and feasibility of the proposed approach are demonstrated through both theoretical considerations and numerical examples. Several R/S-PLUS functions and codes for the proposed procedure are included. The approach can be extended to handle more complex refusal/substitution scenarios one may encounter in practice.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110616
    Description:

    With complete multivariate data the BACON algorithm (Billor, Hadi and Vellemann 2000) yields a robust estimate of the covariance matrix. The corresponding Mahalanobis distance may be used for multivariate outlier detection. When items are missing the EM algorithm is a convenient way to estimate the covariance matrix at each iteration step of the BACON algorithm. In finite population sampling the EM algorithm must be enhanced to estimate the covariance matrix of the population rather than of the sample. A version of the EM algorithm for survey data following a multivariate normal model, the EEM algorithm (Estimated Expectation Maximization), is proposed. The combination of the two algorithms, the BACON-EEM algorithm, is applied to two datasets and compared with alternative methods.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110619
    Description:

    Small area prediction based on random effects, called EBLUP, is a procedure for constructing estimates for small geographical areas or small subpopulations using existing survey data. The total of the small area predictors is often forced to equal the direct survey estimate and such predictors are said to be calibrated. Several calibrated predictors are reviewed and a criterion that unifies the derivation of these calibrated predictors is presented. The predictor that is the unique best linear unbiased predictor under the criterion is derived and the mean square error of the calibrated predictors is discussed. Implicit in the imposition of the restriction is the possibility that the small area model is misspecified and the predictors are biased. Augmented models with one additional explanatory variable for which the usual small area predictors achieve the self-calibrated property are considered. Simulations demonstrate that calibrated predictors have slightly smaller bias compared to those of the usual EBLUP predictor. However, if the bias is a concern, a better approach is to use an augmented model with an added auxiliary variable that is a function of area size. In the simulation, the predictors based on the augmented model had smaller MSE than EBLUP when the incorrect model was used for prediction. Furthermore, there was a very small increase in MSE relative to EBLUP if the auxiliary variable was added to the correct model.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110611
    Description:

    In finite population sampling prior information is often available in the form of partial knowledge about an auxiliary variable, for example its mean may be known. In such cases, the ratio estimator and the regression estimator are often used for estimating the population mean of the characteristic of interest. The Polya posterior has been developed as a noninformative Bayesian approach to survey sampling. It is appropriate when little or no prior information about the population is available. Here we show that it can be extended to incorporate types of partial prior information about auxiliary variables. We will see that it typically yields procedures with good frequentist properties even in some problems where standard frequentist methods are difficult to apply.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110618
    Description:

    The National Health and Nutrition Examination Survey (NHANES) is one of a series of health-related programs sponsored by the United States National Center for Health Statistics. A unique feature of NHANES is the administration of a complete medical examination for each respondent in the sample. To standardize administration, these examinations are carried out in mobile examination centers. The examination includes physical measurements, tests such as eye and dental examinations, and the collection of blood and urine specimens for laboratory testing. NHANES is an ongoing annual health survey of the noninstitutionalized civilian population of the United States. The major analytic goals of NHANES include estimating the number and percentage of persons in the U.S. population and in designated subgroups with selected diseases and risk factors. The sample design for NHANES must create a balance between the requirements for efficient annual and multiyear samples and the flexibility that allows changes in key design parameters to make the survey more responsive to the needs of the research and health policy communities. This paper discusses the challenges involved in designing and implementing a sample selection process that satisfies the goals of NHANES.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110642
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110613
    Description:

    The International Tobacco Control (ITC) Policy Evaluation Survey of China uses a multi-stage unequal probability sampling design with upper level clusters selected by the randomized systematic PPS sampling method. A difficulty arises in the execution of the survey: several selected upper level clusters refuse to participate in the survey and have to be replaced by substitute units, selected from units not included in the initial sample and once again using the randomized systematic PPS sampling method. Under such a scenario the first order inclusion probabilities of the final selected units are very difficult to calculate and the second order inclusion probabilities become virtually intractable. In this paper we develop a simulation-based approach for computing the first and the second order inclusion probabilities when direct calculation is prohibitive or impossible. The efficiency and feasibility of the proposed approach are demonstrated through both theoretical considerations and numerical examples. Several R/S-PLUS functions and codes for the proposed procedure are included. The approach can be extended to handle more complex refusal/substitution scenarios one may encounter in practice.

    Release date: 2008-06-26

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (37)

Analysis (37) (25 of 37 results)

  • Articles and reports: 12-001-X200800210764
    Description:

    This paper considers situations where the target response value is either zero or an observation from a continuous distribution. A typical example analyzed in the paper is the assessment of literacy proficiency with the possible outcome being either zero, indicating illiteracy, or a positive score measuring the level of literacy. Our interest is in how to obtain valid estimates of the average response, or the proportion of positive responses in small areas, for which only small samples or no samples are available. As in other small area estimation problems, the small sample sizes in at least some of the sampled areas and/or the existence of nonsampled areas requires the use of model based methods. Available methods, however, are not suitable for this kind of data because of the mixed distribution of the responses, having a large peak at zero, juxtaposed to a continuous distribution for the rest of the responses. We develop, therefore, a suitable two-part random effects model and show how to fit the model and assess its goodness of fit, and how to compute the small area estimators of interest and measure their precision. The proposed method is illustrated using simulated data and data obtained from a literacy survey conducted in Cambodia.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210759
    Description:

    The analysis of stratified multistage sample data requires the use of design information such as stratum and primary sampling unit (PSU) identifiers, or associated replicate weights, in variance estimation. In some public release data files, such design information is masked as an effort to avoid their disclosure risk and yet to allow the user to obtain valid variance estimation. For example, in area surveys with a limited number of PSUs, the original PSUs are split or/and recombined to construct pseudo-PSUs with swapped second or subsequent stage sampling units. Such PSU masking methods, however, obviously distort the clustering structure of the sample design, yielding biased variance estimates possibly with certain systematic patterns between two variance estimates from the unmasked and masked PSU identifiers. Some of the previous work observed patterns in the ratio of the masked and unmasked variance estimates when plotted against the unmasked design effect. This paper investigates the effect of PSU masking on variance estimates under cluster sampling regarding various aspects including the clustering structure and the degree of masking. Also, we seek a PSU masking strategy through swapping of subsequent stage sampling units that helps reduce the resulting biases of the variance estimates. For illustration, we used data from the National Health Interview Survey (NHIS) with some artificial modification. The proposed strategy performs very well in reducing the biases of variance estimates. Both theory and empirical results indicate that the effect of PSU masking on variance estimates is modest with minimal swapping of subsequent stage sampling units. The proposed masking strategy has been applied to the 2003-2004 National Health and Nutrition Examination Survey (NHANES) data release.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210758
    Description:

    We propose a method for estimating the variance of estimators of changes over time, a method that takes account of all the components of these estimators: the sampling design, treatment of non-response, treatment of large companies, correlation of non-response from one wave to another, the effect of using a panel, robustification, and calibration using a ratio estimator. This method, which serves to determine the confidence intervals of changes over time, is then applied to the Swiss survey of value added.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210768
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210762
    Description:

    This paper considers the optimum allocation in multivariate stratified sampling as a nonlinear matrix optimisation of integers. As a particular case, a nonlinear problem of the multi-objective optimisation of integers is studied. A full detailed example including some of proposed techniques is provided at the end of the work.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210761
    Description:

    Optimum stratification is the method of choosing the best boundaries that make strata internally homogeneous, given some sample allocation. In order to make the strata internally homogenous, the strata should be constructed in such a way that the strata variances for the characteristic under study be as small as possible. This could be achieved effectively by having the distribution of the main study variable known and create strata by cutting the range of the distribution at suitable points. If the frequency distribution of the study variable is unknown, it may be approximated from the past experience or some prior knowledge obtained at a recent study. In this paper the problem of finding Optimum Strata Boundaries (OSB) is considered as the problem of determining Optimum Strata Widths (OSW). The problem is formulated as a Mathematical Programming Problem (MPP), which minimizes the variance of the estimated population parameter under Neyman allocation subject to the restriction that sum of the widths of all the strata is equal to the total range of the distribution. The distributions of the study variable are considered as continuous with Triangular and Standard Normal density functions. The formulated MPPs, which turn out to be multistage decision problems, can then be solved using dynamic programming technique proposed by Bühler and Deutler (1975). Numerical examples are presented to illustrate the computational details. The results obtained are also compared with the method of Dalenius and Hodges (1959) with an example of normal distribution.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210755
    Description:

    Dependent interviewing (DI) is used in many longitudinal surveys to "feed forward" data from one wave to the next. Though it is a promising technique which has been demonstrated to enhance data quality in certain respects, relatively little is known about how it is actually administered in the field. This research seeks to address this issue through behavior coding. Various styles of DI were employed in the English Longitudinal Study of Ageing (ELSA) in January, 2006, and recordings were made of pilot field interviews. These recordings were analysed to determine whether the questions (particularly the DI aspects) were administered appropriately and to explore the respondent's reaction to the fed-forward data. Of particular interest was whether respondents confirmed or challenged the previously-reported information, whether the prior wave data came into play when respondents were providing their current-wave answers, and how any discrepancies were negotiated by the interviewer and respondent. Also of interest was to examine the effectiveness of various styles of DI. For example, in some cases the prior wave data was brought forward and respondents were asked to explicitly confirm it; in other cases the previous data was read and respondents were asked if the situation was still the same. Results indicate varying levels of compliance in terms of initial question-reading, and suggest that some styles of DI may be more effective than others.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210757
    Description:

    Sample weights can be calibrated to reflect the known population totals of a set of auxiliary variables. Predictors of finite population totals calculated using these weights have low bias if these variables are related to the variable of interest, but can have high variance if too many auxiliary variables are used. This article develops an "adaptive calibration" approach, where the auxiliary variables to be used in weighting are selected using sample data. Adaptively calibrated estimators are shown to have lower mean squared error and better coverage properties than non-adaptive estimators in many cases.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210756
    Description:

    In longitudinal surveys nonresponse often occurs in a pattern that is not monotone. We consider estimation of time-dependent means under the assumption that the nonresponse mechanism is last-value-dependent. Since the last value itself may be missing when nonresponse is nonmonotone, the nonresponse mechanism under consideration is nonignorable. We propose an imputation method by first deriving some regression imputation models according to the nonresponse mechanism and then applying nonparametric regression imputation. We assume that the longitudinal data follow a Markov chain with finite second-order moments. No other assumption is imposed on the joint distribution of longitudinal data and their nonresponse indicators. A bootstrap method is applied for variance estimation. Some simulation results and an example concerning the Current Employment Survey are presented.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210763
    Description:

    The present work illustrates a sampling strategy useful for obtaining planned sample size for domains belonging to different partitions of the population and in order to guarantee the sampling errors of domain estimates be lower than given thresholds. The sampling strategy that covers the multivariate multi-domain case is useful when the overall sample size is bounded and consequently the standard solution of using a stratified sample with the strata given by cross-classification of variables defining the different partitions is not feasible since the number of strata is larger than the overall sample size. The proposed sampling strategy is based on the use of balanced sampling selection technique and on a GREG-type estimation. The main advantages of the solution is the computational feasibility which allows one to easily implement an overall small area strategy considering jointly the sampling design and the estimator and improving the efficiency of the direct domain estimators. An empirical simulation on real population data and different domain estimators shows the empirical properties of the examined sample strategy.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210754
    Description:

    The context of the discussion is the increasing incidence of international surveys, of which one is the International Tobacco Control (ITC) Policy Evaluation Project, which began in 2002. The ITC country surveys are longitudinal, and their aim is to evaluate the effects of policy measures being introduced in various countries under the WHO Framework Convention on Tobacco Control. The challenges of organization, data collection and analysis in international surveys are reviewed and illustrated. Analysis is an increasingly important part of the motivation for large scale cross-cultural surveys. The fundamental challenge for analysis is to discern the real response (or lack of response) to policy change, separating it from the effects of data collection mode, differential non-response, external events, time-in-sample, culture, and language. Two problems relevant to statistical analysis are discussed. The first problem is the question of when and how to analyze pooled data from several countries, in order to strengthen conclusions which might be generally valid. While in some cases this seems to be straightforward, there are differing opinions on the extent to which pooling is possible and reasonable. It is suggested that for formal comparisons, random effects models are of conceptual use. The second problem is to find models of measurement across cultures and data collection modes which will enable calibration of continuous, binary and ordinal responses, and produce comparisons from which extraneous effects have been removed. It is noted that hierarchical models provide a natural way of relaxing requirements of model invariance across groups.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800210760
    Description:

    The design of a stratified simple random sample without replacement from a finite population deals with two main issues: the definition of a rule to partition the population into strata, and the allocation of sampling units in the selected strata. This article examines a tree-based strategy which plans to approach jointly these issues when the survey is multipurpose and multivariate information, quantitative or qualitative, is available. Strata are formed through a hierarchical divisive algorithm that selects finer and finer partitions by minimizing, at each step, the sample allocation required to achieve the precision levels set for each surveyed variable. In this way, large numbers of constraints can be satisfied without drastically increasing the sample size, and also without discarding variables selected for stratification or diminishing the number of their class intervals. Furthermore, the algorithm tends not to define empty or almost empty strata, thus avoiding the need for strata collapsing aggregations. The procedure was applied to redesign the Italian Farm Structure Survey. The results indicate that the gain in efficiency held using our strategy is nontrivial. For a given sample size, this procedure achieves the required precision by exploiting a number of strata which is usually a very small fraction of the number of strata available when combining all possible classes from any of the covariates.

    Release date: 2008-12-23

  • Articles and reports: 82-622-X2008001
    Description:

    In this study, I examine the factorial validity of selected modules from the Canadian Survey of Experiences with Primary Health Care (CSE-PHC), in order to determine the potential for combining the items within each module into summary indices representing global primary health care concepts. The modules examined were: Patient Assessment of Chronic Illness Care (PACIC), Patient Activation (PA), Managing Own Health Care (MOHC), and Confidence in the Health Care System (CHCS). Confirmatory factor analyses were conducted on each module to assess the degree to which multiple observed items reflected the presence of common latent factors. While a four-factor model was initially specified for the PACIC instrument on the basis of priory theory and research, it did not fit the data well; rather, a revised two-factor model was found to be most appropriate. These two factors were labelled: "Whole Person Care" and "Coordination of Care". The remaining modules studied here (i.e., PA, MOHC, and CHCS) were all well-represented by single-factor models. The results suggest that the original factor structure of the PACIC developed within studies using clinical samples does not hold in general populations, although the precise reasons for this are not clear. Further empirical investigation will be required to shed more light on this discrepancy. The two factors identified here for the PACIC, as well as the single factors produced for the PA, MOHC, and CHCS could be used as the basis of summary indices for use in further analyses with the CSE-PHC.

    Release date: 2008-07-08

  • Articles and reports: 12-001-X200800110607
    Description:

    Respondent incentives are increasingly used as a measure of combating falling response rates and resulting risks of nonresponse bias. Nonresponse in panel surveys is particularly problematic, since even low wave-on-wave nonresponse rates can lead to substantial cumulative losses; if nonresponse is differential, this may lead to increasing bias across waves. Although the effects of incentives have been studied extensively in cross-sectional contexts, little is known about cumulative effects across waves of a panel. We provide new evidence about the effects of continued incentive payments on attrition, bias and item nonresponse, using data from a large scale, multi-wave, mixed mode incentive experiment on a UK government panel survey of young people. In this study, incentives significantly reduced attrition, far outweighing negative effects on item response rates in terms of the amount of information collected by the survey per issued case. Incentives had proportionate effects on retention rates across a range of respondent characteristics and as a result did not reduce attrition bias in terms of those characteristics. The effects of incentives on retention rates were larger for unconditional than conditional incentives and larger in postal than telephone mode. Across waves, the effects on attrition decreased somewhat, although the effects on item nonresponse and the lack of effect on bias remained constant. The effects of incentives at later waves appeared to be independent of incentive treatments and mode of data collection at earlier waves.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110606
    Description:

    Data from election polls in the US are typically presented in two-way categorical tables, and there are many polls before the actual election in November. For example, in the Buckeye State Poll in 1998 for governor there are three polls, January, April and October; the first category represents the candidates (e.g., Fisher, Taft and other) and the second category represents the current status of the voters (likely to vote and not likely to vote for governor of Ohio). There is a substantial number of undecided voters for one or both categories in all three polls, and we use a Bayesian method to allocate the undecided voters to the three candidates. This method permits modeling different patterns of missingness under ignorable and nonignorable assumptions, and a multinomial-Dirichlet model is used to estimate the cell probabilities which can help to predict the winner. We propose a time-dependent nonignorable nonresponse model for the three tables. Here, a nonignorable nonresponse model is centered on an ignorable nonresponse model to induce some flexibility and uncertainty about ignorabilty or nonignorability. As competitors we also consider two other models, an ignorable and a nonignorable nonresponse model. These latter two models assume a common stochastic process to borrow strength over time. Markov chain Monte Carlo methods are used to fit the models. We also construct a parameter that can potentially be used to predict the winner among the candidates in the November election.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110615
    Description:

    We consider optimal sampling rates in element-sampling designs when the anticipated analysis is survey-weighted linear regression and the estimands of interest are linear combinations of regression coefficients from one or more models. Methods are first developed assuming that exact design information is available in the sampling frame and then generalized to situations in which some design variables are available only as aggregates for groups of potential subjects, or from inaccurate or old data. We also consider design for estimation of combinations of coefficients from more than one model. A further generalization allows for flexible combinations of coefficients chosen to improve estimation of one effect while controlling for another. Potential applications include estimation of means for several sets of overlapping domains, or improving estimates for subpopulations such as minority races by disproportionate sampling of geographic areas. In the motivating problem of designing a survey on care received by cancer patients (the CanCORS study), potential design information included block-level census data on race/ethnicity and poverty as well as individual-level data. In one study site, an unequal-probability sampling design using the subjectss residential addresses and census data would have reduced the variance of the estimator of an income effect by 25%, or by 38% if the subjects' races were also known. With flexible weighting of the income contrasts by race, the variance of the estimator would be reduced by 26% using residential addresses alone and by 52% using addresses and races. Our methods would be useful in studies in which geographic oversampling by race-ethnicity or socioeconomic characteristics is considered, or in any study in which characteristics available in sampling frames are measured with error.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110616
    Description:

    With complete multivariate data the BACON algorithm (Billor, Hadi and Vellemann 2000) yields a robust estimate of the covariance matrix. The corresponding Mahalanobis distance may be used for multivariate outlier detection. When items are missing the EM algorithm is a convenient way to estimate the covariance matrix at each iteration step of the BACON algorithm. In finite population sampling the EM algorithm must be enhanced to estimate the covariance matrix of the population rather than of the sample. A version of the EM algorithm for survey data following a multivariate normal model, the EEM algorithm (Estimated Expectation Maximization), is proposed. The combination of the two algorithms, the BACON-EEM algorithm, is applied to two datasets and compared with alternative methods.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110619
    Description:

    Small area prediction based on random effects, called EBLUP, is a procedure for constructing estimates for small geographical areas or small subpopulations using existing survey data. The total of the small area predictors is often forced to equal the direct survey estimate and such predictors are said to be calibrated. Several calibrated predictors are reviewed and a criterion that unifies the derivation of these calibrated predictors is presented. The predictor that is the unique best linear unbiased predictor under the criterion is derived and the mean square error of the calibrated predictors is discussed. Implicit in the imposition of the restriction is the possibility that the small area model is misspecified and the predictors are biased. Augmented models with one additional explanatory variable for which the usual small area predictors achieve the self-calibrated property are considered. Simulations demonstrate that calibrated predictors have slightly smaller bias compared to those of the usual EBLUP predictor. However, if the bias is a concern, a better approach is to use an augmented model with an added auxiliary variable that is a function of area size. In the simulation, the predictors based on the augmented model had smaller MSE than EBLUP when the incorrect model was used for prediction. Furthermore, there was a very small increase in MSE relative to EBLUP if the auxiliary variable was added to the correct model.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110611
    Description:

    In finite population sampling prior information is often available in the form of partial knowledge about an auxiliary variable, for example its mean may be known. In such cases, the ratio estimator and the regression estimator are often used for estimating the population mean of the characteristic of interest. The Polya posterior has been developed as a noninformative Bayesian approach to survey sampling. It is appropriate when little or no prior information about the population is available. Here we show that it can be extended to incorporate types of partial prior information about auxiliary variables. We will see that it typically yields procedures with good frequentist properties even in some problems where standard frequentist methods are difficult to apply.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110618
    Description:

    The National Health and Nutrition Examination Survey (NHANES) is one of a series of health-related programs sponsored by the United States National Center for Health Statistics. A unique feature of NHANES is the administration of a complete medical examination for each respondent in the sample. To standardize administration, these examinations are carried out in mobile examination centers. The examination includes physical measurements, tests such as eye and dental examinations, and the collection of blood and urine specimens for laboratory testing. NHANES is an ongoing annual health survey of the noninstitutionalized civilian population of the United States. The major analytic goals of NHANES include estimating the number and percentage of persons in the U.S. population and in designated subgroups with selected diseases and risk factors. The sample design for NHANES must create a balance between the requirements for efficient annual and multiyear samples and the flexibility that allows changes in key design parameters to make the survey more responsive to the needs of the research and health policy communities. This paper discusses the challenges involved in designing and implementing a sample selection process that satisfies the goals of NHANES.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110642
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110613
    Description:

    The International Tobacco Control (ITC) Policy Evaluation Survey of China uses a multi-stage unequal probability sampling design with upper level clusters selected by the randomized systematic PPS sampling method. A difficulty arises in the execution of the survey: several selected upper level clusters refuse to participate in the survey and have to be replaced by substitute units, selected from units not included in the initial sample and once again using the randomized systematic PPS sampling method. Under such a scenario the first order inclusion probabilities of the final selected units are very difficult to calculate and the second order inclusion probabilities become virtually intractable. In this paper we develop a simulation-based approach for computing the first and the second order inclusion probabilities when direct calculation is prohibitive or impossible. The efficiency and feasibility of the proposed approach are demonstrated through both theoretical considerations and numerical examples. Several R/S-PLUS functions and codes for the proposed procedure are included. The approach can be extended to handle more complex refusal/substitution scenarios one may encounter in practice.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110610
    Description:

    A new generalized regression estimator of a finite population total based on the Box-Cox transformation technique and its variance estimator are proposed under a general unequal probability sampling design. By being design consistent, the proposed estimator maintains the robustness property of the GREG estimator even if the underlying model fails. Furthermore, the Box-Cox technique automatically finds a reasonable transformation for the dependent variable using the data. The robustness and efficiency of the new estimator are evaluated analytically and via Monte Carlo simulation studies.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110614
    Description:

    The Canadian Labour Force Survey (LFS) produces monthly estimates of the unemployment rate at national and provincial levels. The LFS also releases unemployment estimates for sub-provincial areas such as Census Metropolitan Areas (CMAs) and Urban Centers (UCs). However, for some sub-provincial areas, the direct estimates are not reliable since the sample size in some areas is quite small. The small area estimation in LFS concerns estimation of unemployment rates for local sub-provincial areas such as CMA/UCs using small area models. In this paper, we will discuss various models including the Fay-Herriot model and cross-sectional and time series models. In particular, an integrated non-linear mixed effects model will be proposed under the hierarchical Bayes (HB) framework for the LFS unemployment rate estimation. Monthly Employment Insurance (EI) beneficiary data at the CMA/UC level are used as auxiliary covariates in the model. A HB approach with the Gibbs sampling method is used to obtain the estimates of posterior means and posterior variances of the CMA/UC level unemployment rates. The proposed HB model leads to reliable model-based estimates in terms of CV reduction. Model fit analysis and comparison of the model-based estimates with the direct estimates are presented in the paper.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200800110612
    Description:

    Lehtonen and Veijanen (1999) proposed a new model-assisted generalized regression (GREG) estimator of a small area mean under a two-level model. They have shown that the proposed estimator performs better than the customary GREG estimator in terms of average absolute relative bias and average median absolute relative error. We derive the mean squared error (MSE) of the new GREG estimator under the two-level model and compare it to the MSE of the best linear unbiased prediction (BLUP) estimator. We also provide empirical results on the relative efficiency of the estimators. We show that the new GREG estimator exhibits better performance relative to the customary GREG estimator in terms of average MSE and average absolute relative error. We also show that, due to borrowing strength from related small areas, the EBLUP estimator exhibits significantly better performance relative to the customary GREG and the new GREG estimators. We provide simulation results under a model-based set-up as well as under a real finite population.

    Release date: 2008-06-26

Reference (69)

Reference (69) (25 of 69 results)

  • Surveys and statistical programs – Documentation: 62F0026M2009001
    Description:

    This guide presents information of interest to users of data from the Survey of Household Spending, which gathers information on the spending habits, dwelling characteristics and household equipment of Canadian households. The survey covers private households in the 10 provinces. (The territories are surveyed every second year, starting in 1999.)

    This guide includes definitions of survey terms and variables, as well as descriptions of survey methodology and data quality. One section describes the various statistics that can be created using expenditure data (e.g., budget share, market share, aggregates and medians)

    Release date: 2008-12-22

  • Technical products: 75F0002M2008005
    Description:

    The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey initiated in 1993. The survey was designed to measure changes in the economic well-being of Canadians as well as the factors affecting these changes. Sample surveys are subject to sampling errors. In order to consider these errors, each estimates presented in the "Income Trends in Canada" series comes with a quality indicator based on the coefficient of variation. However, other factors must also be considered to make sure data are properly used. Statistics Canada puts considerable time and effort to control errors at every stage of the survey and to maximise the fitness for use. Nevertheless, the survey design and the data processing could restrict the fitness for use. It is the policy at Statistics Canada to furnish users with measures of data quality so that the user is able to interpret the data properly. This report summarizes the set of quality measures of SLID data. Among the measures included in the report are sample composition and attrition rates, sampling errors, coverage errors in the form of slippage rates, response rates, tax permission and tax linkage rates, and imputation rates.

    Release date: 2008-08-20

  • Technical products: 11-522-X200600110424
    Description:

    The International Tobacco Control (ITC) Policy Evaluation China Survey uses a multi-stage unequal probability sampling design with upper level clusters selected by the randomized systematic PPS sampling method. A difficulty arises in the execution of the survey: several selected upper level clusters refuse to participate in the survey and have to be replaced by substitute units, selected from units not included in the initial sample and once again using the randomized systematic PPS sampling method. Under such a scenario the first order inclusion probabilities of the final selected units are very difficult to calculate and the second order inclusion probabilities become virtually intractable. In this paper we develop a simulation-based approach for computing the first and the second order inclusion probabilities when direct calculation is prohibitive or impossible. The efficiency and feasibility of the proposed approach are demonstrated through both theoretical considerations and numerical examples. Several R/S-PLUS functions and codes for the proposed procedure are included. The approach can be extended to handle more complex refusal/substitution scenarios one may encounter in practice.

    Release date: 2008-06-26

  • Technical products: 75F0002M2008002
    Description:

    The Survey of Labour and Income Dynamics (SLID) conducts an annual labour and income interview in January. The data are collected using computer-assisted interviewing; thus no paper questionnaire is required for data collection. The questions, responses and interview flow for labour and income are documented in another SLID research paper. This document presents the information for the 2007 entry and exit portions of the labour and income interview (reference year 2006).

    The entry exit component consists of five separate modules. The entry module is the first set of data collected. It is information collected to update the place of residence, housing conditions and expenses, as well as the household composition. For each person identified in entry, the demographics module collects (or updates) the person's name, date of birth, sex and marital status. Then the relationships module identifies (or updates) the relationship between each respondent and every other household member. The exit module includes questions on who to contact for the next interview and the names, phone numbers and addresses of two contacts to be used only if future tracing of respondents is required. An overview of the tracing component is also included in this document.

    Release date: 2008-05-30

  • Technical products: 75F0002M2008003
    Description:

    The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey which collects information related to the standard of living of individuals and their families. By interviewing the same people over a period of six years, changes and the causes of these changes can be monitored.

    A preliminary interview of background information is collected for all respondents aged 16 and over, who enter the SLID sample. Preliminary interviews are conducted for new household members during their first labour and income interview after they join the household. A labour and income interview is collected each year for all respondents 16 years of age and over.

    The purpose of this document is to present the questions, possible responses and question flows for the 2007 preliminary, labour and income questionnaire (for the 2006 reference year).

    Release date: 2008-05-30

  • Technical products: 11-522-X2006001
    Description:

    Since 1984, an annual international symposium on methodological issues has been sponsored by Statistics Canada. Proceedings have been available since 1987. Symposium 2006 was the twenty third in Statistics Canada's series of international symposia on methodological issues. Each year the symposium focuses on a particular them. In 2006 the theme was: "Methodological Issues In Measuring Population Health".

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110413
    Description:

    The National Health and Nutrition Examination Survey (NHANES) has been conducted by the National Center for Health Statistics for over forty years. The survey collects information on the health and nutritional status of the United States population using in-person interviews and standardized physical examinations conducted in mobile examination centers. During the course of these forty years, numerous lessons have been learned about the conduct of a survey using direct physical measures. Examples of these "lessons learned" are described and provide a guide for other organizations and countries as they plan similar surveys.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110450
    Description:

    Using survey and contact attempt history data collected with the 2005 National Health Interview Survey (NHIS), a multi-purpose health survey conducted by the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC), we set out to explore the impact of participant concerns/reluctance on data quality, as measured by rates of partially complete interviews and item nonresponse. Overall, results show that respondents from households where some type of concern or reluctance (e.g., "too busy," "not interested") was expressed produced higher rates of partially complete interviews and item nonresponse than respondents from households where concern/reluctance was not expressed. Differences by type of concern were also identified.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110436
    Description:

    The 2006/2007 New Zealand Health Survey sample was designed to meet a range of objectives, the most challenging of which was to achieve sufficient precision for subpopulations of interest, particularly the indigenous Maori population. About 14% of New Zealand's population are Maori. This group is geographically clustered to some extent, but even so most Maori live in areas which have relatively low proportions of Maori, making it difficult to sample this population efficiently. Disproportionate sampling and screening were used to achieve sufficient sample size while maintaining low design effects.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110416
    Description:

    Application of standard methods to survey data without accounting for the design features and weight adjustments can lead to erroneous inferences. Bootstrap methods offer an attractive option to the analyst for taking account of the design features and weight adjustments. The data file consists of the full-sample final weights and associated bootstrap final weights for a large number of bootstrap replicates as well as the observed data on the sample elements. We show how such data files can be used to analyze survey data in a straightforward manner using weighted estimating equations. A one-step estimating function bootstrap method that avoids some difficulties with the bootstrap is also discussed.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110428
    Description:

    In the last two decades, considerable international effort has been put into the development of summary measures of population health that integrate information of mortality and non-fatal health outcomes and international policy interest in such indicators is increasing. There are two main classes of summary measures of population health: health gaps and health expectancies. The Disability-Adjusted Life Year (DALY) is the best known health gap measure and quantifies the gap between a population's actual health and a normative health goal, defined in terms of a global standard life table specifying the healthy years of life lost due to a death at any given age.

    This paper gives an overview of the Global Burden of Disease (GBD) conceptual framework, the relationship of the DALY to other measures of population health, and the GBD analytical approach, with particular attention to issues in (1) dealing with biased and missing data, (2) dealing with uncertainty and (3) specific technical issues in ensuring cross-population comparability. The latter include dealing with variations in quality and completeness of cause of death information, explicit use of a comprehensive framework and internal consistency checks for improving comparability of estimates of incidence, prevalence and mortality for causes, the assessment of disability weights, and techniques for improving the comparability of the assessment of the disease burden attributable to risk factors.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110429
    Description:

    During the last three decades, there has been general acceptance of an approach to describing health states of individuals in terms of multiple domains of health, and in developing self-report instruments that seek information on each of these domains. A health state is thus a multi-dimensional attribute of an individual that reflects his or her levels on the various components or domains of health. Thus, a health state differs from pathology, risk factors or etiology, and from health service encounters or interventions.

    How to describe health states, is a central challenge in undertaking the measurement of health. The relationship of health states to other aspects of health such as future non-fatal health outcomes or risk of mortality need to be examined. The way people report their own health varies consistently with factors such as education, sex, age, or other cultural factors. Various people use different response category cut-points across cultures or population sub-groups, and this 'response shift' implies that self-report categorical data are not comparable across individuals. The responses cannot be directly used to measure health without adjustment.

    In recognition of this the WHO World Health Surveys (WHS), used a set of questions across a core set of domains to measure health states and employed vignettes to detect and correct for biases in self-report in order to adjust for response category cut-point shifts. This paper will describe the instrument used in the WHS and the methods used to provide cross population comparable data. It will present results from the WHS demonstrating the existence of systematic reporting biases, the ability of respondents to rate vignettes and their use to adjust for biases in order to make data more comparable. Future strategies to address these problems will be discussed.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110419
    Description:

    Health services research generally relies on observational data to compare outcomes of patients receiving different therapies. Comparisons of patient groups in observational studies may be biased, in that outcomes differ due to both the effects of treatment and the effects of patient prognosis. In some cases, especially when data are collected on detailed clinical risk factors, these differences can be controlled for using statistical or epidemiological methods. In other cases, when unmeasured characteristics of the patient population affect both the decision to provide therapy and the outcome, these differences cannot be removed using standard techniques. Use of health administrative data requires particular cautions in undertaking observational studies since important clinical information does not exist. We discuss several statistical and epidemiological approaches to remove overt (measurable) and hidden (unmeasurable) bias in observational studies. These include regression model-based case-mix adjustment, propensity-based matching, redefining the exposure variable of interest, and the econometric technique of instrumental variable (IV) analysis. These methods are illustrated using examples from the medical literature including prediction of one-year mortality following heart attack; the return to health care spending in higher spending U.S. regions in terms of clinical and financial benefits; and the long-term survival benefits of invasive cardiac management of heart attack patients. It is possible to use health administrative data for observational studies provided careful attention is paid to addressing issues of reverse causation and unmeasured confounding.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110399
    Description:

    In health studies, it is quite common to collect binary or count repeated responses along with a set of multi-dimensional covariates over a small period of time from a large number of independent families, where the families are selected from a finite population by using certain complex sampling designs. It is of interest to examine the effects of the covariates on the familial longitudinal responses after taking the variation in the family effects as well as the longitudinal correlations of the repeated responses into account. In this paper, I review the advantages and drawbacks of the existing methodologies for the estimation of the regression effects, the variance of the family effects and the longitudinal correlations. We then outline the advantages of a new unified generalized quasilikelihood approach in analyzing the complex design based familial longitudinal data. Some existing numerical studies are discussed as illustrations of the methodologies considered in the paper.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110412
    Description:

    The Canadian Health Measures Survey (CHMS) represents Statistics Canada's first health survey employing a comprehensive battery of direct physical measurements of health. The CHMS will be collecting directly measured health data on a representative sample of 5000 Canadians aged 6 to 79 in 2007 to 2009. After a comprehensive in-home health interview, respondents report to a mobile examination centre where direct health measures are performed. Measures include fitness tests, anthropometry, objective physical activity monitoring, spirometry, blood pressure measurements, oral health measures and blood and urine sampling. Blood and urine are analyzed for measures of chronic disease, infectious disease, nutritional indicators and environmental biomarkers. This survey has many unique and peculiar challenges rarely experienced by most Statistics Canada surveys; some of these challenges are described in this paper. The data collected through the CHMS is unique and represents a valuable health surveillance and research resource for Canada.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110420
    Description:

    Most major survey research organizations in the United States and Canada do not include wireless telephone numbers when conducting random-digit-dialed (RDD) household telephone surveys. In this paper, we offer the most up-to-date estimates available from the U.S. National Center for Health Statistics and Statistics Canada concerning the prevalence and demographic characteristics of the wireless-only population. We then present data from the U.S. National Health Interview Survey on the health and health care access of wireless-only adults, and we examine the potential for coverage bias when health research is conducted using RDD surveys that exclude wireless telephone numbers.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110404
    Description:

    Pursuing reduction in cost and response burden in survey programs has led to increased use of information available in administrative databases. Linkages between these two data sources is a way to exploit their complementary nature and maximize their respective usefulness. This paper discusses the various ways we have performed record linkage between the Canadian Community Health Survey (CCHS) and the Health Person-Oriented Information (HPOI) databases. The files resulting from selected linkage methods are used in an analysis of risk factors for having been hospitalized for heart disease. The sensitivity of the analysis with respect to the various linkage approaches is investigated.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110437
    Description:

    The New Zealand Ministry of Health has expanded its population health survey, the New Zealand Health Survey (NZHS), to include a questionnaire specifically on child health. The principal aim of the NZHS child questionnaire is to collect health data from parents or caregivers that can be used for monitoring population-level child health status, health service utilisation, and the health risk and protective behaviours that have their origins in childhood. Previously, only data collected through child contact with the health system, for example hospital administration records and disease/injury databases, have been available for monitoring child health in New Zealand. This paper reviews the questionnaire development for the child health component of the 2006/2007 New Zealand Health Survey, including topic selection, question development, cognitive-testing, preliminary sample design, final questionnaire drafting, and dress rehearsal testing.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110422
    Description:

    Many population surveys collecting food consumption data use 24 hour recall methodology to capture detailed one day intakes. In order to estimate longer term intakes of foods and nutrients from these data, methods have been developed that required a repeat recall to be collected from at least a subset of responders in order to estimate day to day variability. During the Canadian Community Health Survey Cycle 2.2 Nutrition Focus Survey, most first interviews were collected in person and most repeat interviews were conducted by telephone. This paper looks at the impact of the mode of interview on the reported foods and nutrients on both the first day and the repeat day and on the estimation of intra individual variability between the first and the second interviews.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110430
    Description:

    In this presentation, Mr. Murray discusses the notion of functional health status and proposes an agenda for developing comparable methods of measuring this concept.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110448
    Description:

    Approaches used to select records for tabulation of national injury hospitalization data were identified. Three of the approaches were based on the principal diagnosis in the hospital separation record; the other three required that the record contain a code for an external cause of injury. Differences within these two main groups resulted in identification of six distinct approaches. Each approach was applied to the same set of hospital separation data. The numbers and types of injury records retrieved with the six approaches are compared and implications of the findings for injury surveillance are discussed.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110438
    Description:

    In accordance with an effort to design a set of questions for the Current Population Survey (CPS) to measure disability, potential questions were drawn from existing surveys, cognitively and field tested. Based on an analysis of the test data, a set of seven questions was identified, cognitively tested, and placed in the February 2006 CPS for testing. Analysis of the data revealed a lower overall disability rate as measured in the CPS than in the field test, with lower positive response rates for each question. The data did not indicate that there was an adverse effect on the response rates.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110409
    Description:

    In unequal-probability-of-selection sample, correlations between the probability of selection and the sampled data can induce bias. Weights equal to the inverse of the probability of selection are often used to counteract this bias. Highly disproportional sample designs have large weights, which can introduce unnecessary variability in statistics such as the population mean estimate. Weight trimming reduces large weights to a fixed cutpoint value and adjusts weights below this value to maintain the untrimmed weight sum. This reduces variability at the cost of introducing some bias. Standard approaches are not "data-driven": they do not use the data to make the appropriate bias-variance tradeoff, or else do so in a highly inefficient fashion. This presentation develops Bayesian variable selection methods for weight trimming to supplement standard, ad-hoc design-based methods in disproportional probability-of-inclusion designs where variances due to sample weights exceeds bias correction. These methods are used to estimate linear and generalized linear regression model population parameters in the context of stratified and poststratified known-probability sample designs. Applications will be considered in the context of traffic injury survey data, in which highly disproportional sample designs are often utilized.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110417
    Description:

    The coefficients of regression equations are often parameters of interest for health surveys and such surveys are usually of complex design with differential sampling rates. We give estimators for the regression coefficients for complex surveys that are superior to ordinary expansion estimators under the subject matter model, but also retain desirable design properties. Theoretical and Monte Carlo properties are presented.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110444
    Description:

    General population health surveys often include small samples of smokers. Few longitudinal studies specific to smoking have been carried out. We discuss development of the Ontario Tobacco Survey (OTS) which combines a rolling longitudinal, and repeated cross-sectional components. The OTS began in July 2005 using random selection and data-collection by telephones. Every 6 months, new samples of smokers and non-smokers provide data on smoking behaviours and attitudes. Smokers enter a panel study and are followed for changes in smoking influences and behaviour. The design is proving to be cost effective in meeting sample requirements for multiple research objectives.

    Release date: 2008-03-17

Date modified: