Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (131)

All (131) (25 of 131 results)

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (27)

Analysis (27) (25 of 27 results)

  • Articles and reports: 75-001-X200510613145
    Description:

    Changes in hours worked normally track employment changes very closely. Recently, however, employment has increased more than hours, resulting in an unprecedented gap. In effect, the average annual hours worked have decreased by the equivalent of two weeks. Many factors can affect the hours worked. Some are structural or cyclical - population aging, industrial shifts, the business cycle, natural disasters, legislative changes or personal preferences. Others are a result of the survey methodology. How have the various factors contributed to the recent drop in hours of work?

    Release date: 2005-09-21

  • Articles and reports: 11F0019M2005261
    Description:

    The upbringing of children is modeled as a modified principal agent problem in which children attempt to maximize their own well-being when faced with a parenting strategy chosen by the parent, to maximize parent's perception of family well-being. Thus, children as well as parents are players, but children have higher discount rates than parents. The simultaneity of parenting and child behaviour is confirmed using the 1994 Canadian National Longitudinal Survey of Children.

    Release date: 2005-08-02

  • Articles and reports: 12-001-X20050018083
    Description:

    The advent of computerized record linkage methodology has facilitated the conduct of cohort mortality studies in which exposure data in one database are electronically linked with mortality data from another database. This, however, introduces linkage errors due to mismatching an individual from one database with a different individual from the other database. In this article, the impact of linkage errors on estimates of epidemiological indicators of risk such as standardized mortality ratios and relative risk regression model parameters is explored. It is shown that the observed and expected number of deaths are affected in opposite direction and, as a result, these indicators can be subject to bias and additional variability in the presence of linkage errors.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018089
    Description:

    We use hierarchical Bayesian models to analyze body mass index (BMI) data of children and adolescents with nonignorable nonresponse from the Third National Health and Nutrition Examination Survey (NHANES III). Our objective is to predict the finite population mean BMI and the proportion of respondents for domains formed by age, race and sex (covariates in the regression models) in each of thirty five large counties, accounting for the nonrespondents. Markov chain Monte Carlo methods are used to fit the models (two selection and two pattern mixture) to the NHANES III BMI data. Using a deviance measure and a cross-validation study, we show that the nonignorable selection model is the best among the four models. We also show that inference about BMI is not too sensitive to the model choice. An improvement is obtained by including a spline regression into the selection model to reflect changes in the relationship between BMI and age.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050019190
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018084
    Description:

    At national statistical institutes, experiments embedded in ongoing sample surveys are conducted occasionally to investigate possible effects of alternative survey methodologies on estimates of finite population parameters. To test hypotheses about differences between sample estimates due to alternative survey implementations, a design-based theory is developed for the analysis of completely randomized designs or randomized block designs embedded in general complex sampling designs. For both experimental designs, design-based Wald statistics are derived for the Horvitz-Thompson estimator and the generalized regression estimator. The theory is illustrated with a simulation study.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018088
    Description:

    When administrative records are geographically linked to census block groups, local-area characteristics from the census can be used as contextual variables, which may be useful supplements to variables that are not directly observable from the administrative records. Often databases contain records that have insufficient address information to permit geographical links with census block groups; the contextual variables for these records are therefore unobserved. We propose a new method that uses information from "matched cases" and multivariate regression models to create multiple imputations for the unobserved variables. Our method outperformed alternative methods in simulation evaluations using census data, and was applied to the dataset for a study on treatment patterns for colorectal cancer patients.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018094
    Description:

    Nested error regression models are frequently used in small-area estimation and related problems. Standard regression model selection criterion, when applied to nested error regression models, may result in inefficient model selection methods. We illustrate this point by examining the performance of the C_P statistic through a Monte Carlo simulation study. The inefficiency of the C_P statistic may, however, be rectified by a suitable transformation of the data.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018085
    Description:

    Record linkage is a process of pairing records from two files and trying to select the pairs that belong to the same entity. The basic framework uses a match weight to measure the likelihood of a correct match and a decision rule to assign record pairs as "true" or "false" match pairs. Weight thresholds for selecting a record pair as matched or unmatched depend on the desired control over linkage errors. Current methods to determine the selection thresholds and estimate linkage errors can provide divergent results, depending on the type of linkage error and the approach to linkage. This paper presents a case study that uses existing linkage methods to link record pairs but a new simulation approach (SimRate) to help determine selection thresholds and estimate linkage errors. SimRate uses the observed distribution of data in matched and unmatched pairs to generate a large simulated set of record pairs, assigns a match weight to each pair based on specified match rules, and uses the weight curves of the simulated pairs for error estimation.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018091
    Description:

    Procedures for constructing vectors of nonnegative regression weights are considered. A vector of regression weights in which initial weights are the inverse of the approximate conditional inclusion probabilities is introduced. Through a simulation study, the weighted regression weights, quadratic programming weights, raking ratio weights, weights from logit procedure, and weights of a likelihood-type are compared.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018087
    Description:

    In Official Statistics, data editing process plays an important role in terms of timeliness, data accuracy, and survey costs. Techniques introduced to identify and eliminate errors from data are essentially required to consider all of these aspects simultaneously. Among others, a frequent and pervasive systematic error appearing in surveys collecting numerical data, is the unity measure error. It highly affects timeliness, data accuracy and costs of the editing and imputation phase. In this paper we propose a probabilistic formalisation of the problem based on finite mixture models. This setting allows us to deal with the problem in a multivariate context, and provides also a number of useful diagnostics for prioritising cases to be more deeply investigated through a clerical review. Prioritising units is important in order to increase data accuracy while avoiding waste of time due to the follow up of non-really critical units.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018093
    Description:

    Kish's well-known expression for the design effect due to clustering is often used to inform sample design, using an approximation such as b_bar in place of b. If the design involves either weighting or variation in cluster sample sizes, this can be a poor approximation. In this article we discuss the sensitivity of the approximation to departures from the implicit assumptions and propose an alternative approximation.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018086
    Description:

    The item count technique, which is an indirect questioning technique, was devised to estimate the proportion of people for whom a sensitive key item holds true. This is achieved by having respondents report the number of descriptive phrases, from a list of several phrases that they believe apply to themselves. The list for half the sample includes the key item, and the list for the other half does not include the key item. The difference in mean number of selected phrases is an estimator of the proportion. In this article, we propose two new methods, referred to as the cross-based method and the double cross-based method, by which proportions in subgroups or domains are estimated based on the data obtained via the item count technique. In order to assess the precision of the proposed methods, we conducted simulation experiments using data obtained from a survey of the Japanese national character. The results illustrate that the double cross-based method is much more accurate than the traditional stratified method, and is less likely to produce illogical estimates.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018092
    Description:

    When there is auxiliary information in survey sampling, the design based "optimal (regression) estimator" of a finite population total/mean is known to be (at least asymptotically) more efficient than the corresponding GREG estimator. We will illustrate this by some simulations with stratified sampling from skewed populations. The GREG estimator was originally constructed using an assisting linear superpopulation model. It may also be seen as a calibration estimator; i.e., as a weighted linear estimator, where the weights obey the calibration equation and, with that restriction, are as close as possible to the original "Horvitz-Thompson weights" (according to a suitable distance). We show that the optimal estimator can also be seen as a calibration estimator in this respect, with a quadratic distance measure closely related to the one generating the GREG estimator. Simple examples will also be given, revealing that this new measure is not always easily obtained.

    Release date: 2005-07-21

  • Articles and reports: 91F0015M2005007
    Description:

    The Population Estimates Program at Statistics Canada is using internal migration estimates derived from administrative sources of data. There are two versions of migration estimates currently available, preliminary (P), based on Child Tax Credit information and final (F), produced using information from income tax reports. For some reference dates they could be significantly different. This paper summarises the research undertaken in Demography Division to modify the current method for preliminary estimates in order to decrease those differences. After a brief analysis of the differences, six methods are tested: 1) regression of out-migration; 2) regression of in- and out-migration separately; 3) regression of net migration; 4) the exponentially weighted moving average; 5) the U.S. Bureau of Census approach; and 6) method of using the first difference regression. It seems that the methods in which final and preliminary migration data are combined to estimate preliminary net migration (Method 3) are the best approach to improve convergence between preliminary and final estimates of internal migration for the Population Estimation Program. This approach allows for "smoothing" of some erratic patterns displayed by the former method while preserving CTB data's ability to capture current shifts in migration patterns.

    Release date: 2005-06-20

  • Articles and reports: 89-552-M2005013
    Description:

    This report documents key aspects of the development of the International Adult Literacy and Life Skills Survey (ALL) - its theoretical roots, the domains selected for possible assessment, the approaches taken to assessment in each domain and the criteria that were employed to decide which domains were to be carried in the final design. As conceived, the ALL survey was meant to build on the success of the International Adult Literacy Survey (IALS) assessments by extending the range of skills assessed and by improving the quality of the assessment methods employed. This report documents several successes including: · the development of a new framework and associated robust measures for problem solving · the development of a powerful numeracy framework and associated robust measures · the specification of frameworks for practical cognition, teamwork and information and communication technology literacy The report also provides insight into those domains where development failed to yield approaches to assessment of sufficient quality, insight that reminds us that scientific advance in this domain is hard won.

    Release date: 2005-03-24

  • Articles and reports: 12-001-X20040027758
    Description:

    In this article, we study the use of Bayesian neural networks in finite population estimation.We propose estimators for finite population mean and the associated mean squared error. We also propose to use the student t-distribution to model the disturbances in order to accommodate extreme observations that are often present in the data from social sample surveys. Numerical results show that Bayesian neural networks have made a significant improvement in finite population estimation over linear regression based methods

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20040027752
    Description:

    The Best Linear Unbiased (BLU) estimator (or predictor) of a population total is based on the following two assumptions: i) the estimation model underlying the BLU estimator is correctly specified and ii) the sampling design is ignorable with respect to the estimation model. In this context, an estimator is robust if it stays close to the BLU estimator when both assumptions hold and if it keeps good properties when one or both assumptions are not fully satisfied. Robustness with respect to deviations from assumption (i) is called model robustness while robustness with respect to deviations from assumption (ii) is called design robustness. The Generalized Regression (GREG) estimator is often viewed as being robust since its property of being Asymptotically Design Unbiased (ADU) is not dependent on assumptions (i) and (ii). However, if both assumptions hold, the GREG estimator may be far less efficient than the BLU estimator and, in that sense, it is not robust. The relative inefficiency of the GREG estimator as compared to the BLU estimator is caused by widely dispersed design weights. To obtain a design-robust estimator, we thus propose a compromise between the GREG and the BLU estimators. This compromise also provides some protection against deviations from assumption (i). However, it does not offer any protection against outliers, which can be viewed as a consequence of a model misspecification. To deal with outliers, we use the weighted generalized M-estimation technique to reduce the influence of units with large weighted population residuals. We propose two practical ways of implementing M-estimators for multipurpose surveys; either the weights of influential units are modified and a calibration approach is used to obtain a single set of robust estimation weights or the values of influential units are modified. Some properties of the proposed approach are evaluated in a simulation study using a skewed finite population created from real survey data.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20040027749
    Description:

    A simple and practicable algorithm for constructing stratum boundaries in such a way that the coefficients of variation are equal in each stratum is derived for positively skewed populations. The new algorithm is shown to compare favourably with the cumulative root frequency method (Dalenius and Hodges 1957) and the Lavallée and Hidiroglou (1988) approximation method for estimating the optimum stratum boundaries.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20040027747
    Description:

    The reduced accuracy of the revised classification of unemployed persons in the Current Population Survey (CPS) was documented in Biemer and Bushery (2000). In this paper, we provide additional evidence of this anomaly and attempt to trace the source of the error through extended analysis of the CPS data before and after the redesign. The paper presents an novel approach decomposing the error in a complex classification process, such as the CPS labor force status classification, using Markov Latent Class Analysis (MLCA). To identify the cause of the apparent reduction in unemployed classification accuracy, we identify the key question components that determine the classifications and estimate the contribution of each of these question components to the total error in the classification process. This work provides guidance for further investigation into the root causes of the errors in the collection of labor force data in the CPS possibly through cognitive laboratory and/or field experiments.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20040027753
    Description:

    Samplers often distrust model-based approaches to survey inference because of concerns about misspecification when models are applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator can be very inefficient. In Zheng and Little (2003, 2004) we used penalized splines (p-splines) to model smoothly - varying relationships between the outcome and the inclusion probabilities in one-stage probability proportional to size (PPS) samples. We showed that p spline model-based estimators are in general more efficient than the HT estimator, and can provide narrower confidence intervals with close to nominal confidence coverage. In this article, we extend this approach to two-stage sampling designs. We use a p-spline based mixed model that fits a nonparametric relationship between the primary sampling unit (PSU) means and a measure of PSU size, and incorporates random effects to model clustering. For variance estimation we consider the empirical Bayes model-based variance, the jackknife and balanced repeated replication (BRR) methods. Simulation studies on simulated data and samples drawn from public use microdata in the 1990 census demonstrate gains for the model-based p-spline estimator over the HT estimator and linear model-assisted estimators. Simulations also show the variance estimation methods yield confidence intervals with satisfactory confidence coverage. Interestingly, these gains can be seen for a common equal-probability design, where the first stage selection is PPS and the second stage selection probabilities are proportional to the inverse of the first stage inclusion probabilities, and the HT estimator leads to the unweighted mean. In situations that most favor the HT estimator, the model-based estimators have comparable efficiency.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20040027751
    Description:

    We revisit the relationship between the design effects for the weighted total estimator and the weighted mean estimator under complex survey sampling. Examples are provided under various cases. Furthermore, some of the misconceptions surrounding design effects will be clarified with examples.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20040027750
    Description:

    Intelligent Character Recognition (ICR) has been widely used as a new technology in data capture processing. It was used for the first time at Statistics Canada to process the 2001 Canadian Census of Agriculture. This involved many new challenges, both operational and methodological. This paper presents an overview of the methodological tools used to put in place an efficient ICR system. Since the potential for high levels of error existed at various stages of the operation, Quality Assurance (QA) and Quality Control (QC) methods and procedures were built into this operation to ensure a high degree of accuracy in the captured data. This paper describes these QA / QC methods along with their results and shows how quality improvements were achieved in the ICR Data Capture operation. This paper also identifies the positive impacts of these procedures on this operation.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20040027755
    Description:

    Several statistical agencies use, or are considering the use of, multiple imputation to limit the risk of disclosing respondents' identities or sensitive attributes in public use data files. For example, agencies can release partially synthetic datasets, comprising the units originally surveyed with some collected values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. This article presents an approach for generating multiply-imputed, partially synthetic datasets that simultaneously handles disclosure limitation and missing data. The basic idea is to fill in the missing data first to generate m completed datasets, then replace sensitive or identifying values in each completed dataset with r imputed values. This article also develops methods for obtaining valid inferences from such multiply-imputed datasets. New rules for combining the multiple point and variance estimates are needed because the double duty of multiple imputation introduces two sources of variability into point estimates, which existing methods for obtaining inferences from multiply-imputed datasets do not measure accurately. A reference t-distribution appropriate for inferences when m and r are moderate is derived using moment matching and Taylor series approximations.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20040027756
    Description:

    It is usually discovered in the data collection phase of a survey that some units in the sample are ineligible even if the frame information has indicated otherwise. For example, in many business surveys a nonnegligible proportion of the sampled units will have ceased trading since the latest update of the frame. This information may be fed back to the frame and used in subsequent surveys, thereby making forthcoming samples more efficient by avoiding sampling ineligible units. On the first of two survey occasions, we assume that all ineligible units in the sample (or set of samples) are detected and excluded from the frame. On the second occasion, a subsample of the eligible part is observed again. The subsample may be augmented with a fresh sample that will contain both eligible and ineligible units. We investigate what effect on survey estimation the process of feeding back information on ineligibility may have, and derive an expression for the bias that can occur as a result of feeding back. The focus is on estimation of the total using the common expansion estimator. An estimator that is nearly unbiased in the presence of feed back is obtained. This estimator relies on consistent estimates of the number of eligible and ineligible units in the population being available.

    Release date: 2005-02-03

Reference (104)

Reference (104) (25 of 104 results)

Date modified: