Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

129 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

129 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

129 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

129 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (169)

All (169) (25 of 169 results)

  • Technical products: 11-522-X19980015018
    Description:

    This paper presents a method for handling longitudinal data in which individuals belong to more than one unit at a higher level, and also where there is missing information on the identification of the units to which they belong. In education, for example, a student might be classified as belonging sequentially to a particular combination of primary and secondary school, but for some students, the identity of either the primary or secondary school may be unknown. Likewise, in a longitudinal study, students may change school or class from one period to the next, so 'belonging' to more than one higher level unit. The procedures used to model these stuctures are extensions of a random effects cross-classified multilevel model.

    Release date: 1999-10-22

  • Technical products: 11-522-X19980015016
    Description:

    Models for fitting longitudinal binary responses are explored using a panel study of voting intentions. A standard repeated measures multilevel logistic model is shown inadequate due to the presence of a substantial proportion of respondents who maintain a constant response over time. A multivariate binary response model is shown a better fit to the data.

    Release date: 1999-10-22

  • Technical products: 11-522-X19980015030
    Description:

    Two-phase sampling designs have been conducted in waves to estimate the incidence of a rare disease such as dementia. Estimation of disease incidence from longitudinal dementia study has to appropriately adjust for data missing by death as well as the sampling design used at each study wave. In this paper we adopt a selection model approach to model the missing data by death and use a likelihood approach to derive incidence estimates. A modified EM algorithm is used to deal with data missing by sampling selection. The non-paramedic jackknife variance estimator is used to derive variance estimates for the model parameters and the incidence estimates. The proposed approaches are applied to data from the Indianapolis-Ibadan Dementia Study.

    Release date: 1999-10-22

  • Articles and reports: 12-001-X19990014714
    Description:

    In this paper a general multilevel model framework is used to provide estimates for small areas using survey data. This class of models allows for variation between areas because of: (i) differences in the distributions of unit level variables between areas, (ii) differences in the distribution of area level variables between areas (iii) area specific components of variance which make provision for additional local variation which cannot be explained by unit-level or area-level covariates. Small area estimators are derived for this multilevel model formulation and an approximation to the mean square error (MSE) of each small area estimates for this general class of mixed models is provided together with an estimator of this MSE. Both the approximations to the MSE and the estimator of MSE take into account three sources of variation: (i) the prediction MSE assuming that both the fixed and components of variance terms in the multilevel model are knows, (ii) the additional component due to the fact that the fixed coefficients must be estimated, and (iii) the further component due to the fact that the components of variance in the model must be estimated. The proposed methods are estimated using a large data set as a basis for numerical investigation. The results confirm that the extra components of variance contained in multilevel models as well as small area covariates can improve small area estimates and that the MSE approximation and estimator are satisfactory.

    Release date: 1999-10-08

  • Articles and reports: 12-001-X19990014718
    Description:

    In this short note, we demonstrate that the well-known formula for the design effect intuitively proposed by Kish has a model-based justification. The formula can be interpreted as a conservative value for the actual design effect.

    Release date: 1999-10-08

  • Articles and reports: 12-001-X19980013905
    Description:

    Two-phase sampling designs offer a variety of possibilities for use of auxiliary information. We begin by reviewing the different forms that auxiliary information may take in two-phase surveys. We then set up the procedure by which this information is transformed into calibrated weights, which we use to construct efficient estimators of a population total. The calibration is done in two steps: (i) at the population level; (ii) at the level of the first-phase sample. We go on to show that the resulting calibration estimators are also derivable via regression fitting in two steps. We examine these estimators for a special case of interest, namely, when auxiliary information is available for population subgroups called calibration groups. Postrata are the simplest example of such groups. Estimation for domains of interest and variance estimation are also discussed. These results are illustrated by applying them to two-phase designs at Statistics Canada. The general theory for using auxiliary information in two-phase sampling is being incorporated into Statistics Canada's Generalized Estimation System.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013912
    Description:

    Efficient estimates of population size and totals based on information from multiple list frames and an independent area frame are considered. This work is an extension of the methodology proposed by Harley (1962) which considers two general frames. A main disadvantage of list frames is that they are typically incomplete. In this paper, we propose several methods to address frame deficiencies. A joint list-area sampling design incorporates multiple frames and achieves full coverage of the target population. For each combination of frames, we present the appropriate notation, likelihood function, and parameter estimators. Results from a simulation study that compares the various properties of the proposed estimators are also presented.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013908
    Description:

    In the present investigation, the problem of estimation of variance of the general linear regression estimator has been considered. It has been shown that the efficiency of the low level calibration approach adopted by Särndal (1996) is less than or equal to that of a class of estimators proposed by Deng and Wu (1987). A higher level calibration approach has also been suggested. The efficiency of higher level calibration approach is shown to improve on the original approach. Several estimators are shown to be the special cases of this proposed higher level calibration approach. An idea to find a non-negative estimate of variance of the GREG has been suggested. Results have been extended to a stratified random sampling design. An empirical study has also been carried out to study the performance of the proposed strategies. The well known statistical package, GES, developed at Statistics Canada can further be improved to obtain better estimates of variance of GREG using the proposed higher level calibration approach under certain circumstances discussed in this paper.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013904
    Description:

    Many economic and agricultural surveys are multi-purpose. It would be convenient if one could stratify the target population of such a survey in a number of different purposes and then combine the samples for enumeration. We explore four different sampling methods that select similar samples across all stratifications thereby reducing the overall sample size. Data from an agriculture survey is used to evaluate the effectiveness of these alternative sampling strategies. We then show how a calibration (i.e., reweighted) estimator can increase statistical efficiency by capturing what is known about the original stratum sizes in the estimation. Raking, which has been suggested in the literature for this purpose, is simply one method of calibration.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19970023618
    Description:

    Statistical agencies often constitute their business panels by Poisson sampling, or by stratified sampling of fixed size and uniform probabilities in each stratum. This stampling corresponds to algorithms which use permanent numbers following a uniform distribution. Since the characteristics of the units change over time, it is necessary to periodically conduct resamplings while endeavouring to conserve the maximum number of units. The solution by Poisson sampling is the simplest and provides the maximum theoretical coverage, but with the disadvantage of a random sample size. On the other hand, in the case of stratified sampling of fixed size, the changes in strata cause difficulties precisely because of these fixed size constraints. An initial difficulty is that the finer the stratification, the more the coverage is decreased. Indeed, this is likely to occur if births constitute separate strata. We show how this effect can be corrected by rendering the numbers equidistant before resampling. The disadvantage, a fairly minor one, is that in each stratum the sampling is no longer a simple random sampling, which makes the estimation of the variance less rigorous. Another difficulty is reconciling the resampling with an eventual rotation of the units in the sample. We present a type of algorithm which extends after resampling the rotation before resampling. It is based on transformations of the random numbers used for the sampling, so as to return to resampling without rotation. These transformations are particularly simple when they involve equidistant numbers, but can also be carried out with the numbers following a uniform distribution.

    Release date: 1998-03-12

  • Table: 62-010-X19970023422
    Description:

    The current official time base of the Consumer Price Index (CPI) is 1986=100. This time base was first used when the CPI for June 1990 was released. Statistics Canada is about to convert all price index series to the time base 1992=100. As a result, all constant dollar series will be converted to 1992 dollars. The CPI will shift to the new time base when the CPI for January 1998 is released on February 27th, 1998.

    Release date: 1997-11-17

  • Articles and reports: 12-001-X19970013105
    Description:

    The problem of estimating transition rates from longitudinal survey data in the presence of misclassification error is considered. Approaches which use external information on misclassification rates are reviewed, together with alternative models for measurement error. We define categorical instrumental variables and propose methods for the identification and estimation of models including such variables by viewing the model as a restricted latent class model. The numerical properties of the implied instrumental variable estimators of flow rates are studied using data from the Panel Study of Income Dynamics.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013101
    Description:

    In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19960022978
    Description:

    The use of auxiliary information in estimation procedures in complex surveys, such as Statistics Canada's Labour Force Survey, is becoming increasingly sophisticated. In the past, regression and raking ratio estimation were the commonly used procedures for incorporating auxiliary data into the estimation process. However, the weights associated with these estimators could be negative or highly positive. Recent theoretical developments by Deville and Sárndal (1992) in the construction of "restricted" weights, which can be forced to be positive and upwardly bounded, has led us to study the properties of the resulting estimators. In this paper, we investigate the properties of a number of such weight generating procedures, as well as their corresponding estimated variances. In particular, two variance estimation procedures are investigated via a Monte Carlo simulation study based on Labour Force Survey data; they are Jackknifing and Taylor Linearization. The conclusion is that the bias of both the point estimators and the variance estimators is minimal, even under severe "restricting" of the final weights.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X199500214399
    Description:

    This paper considers the winsorized mean as an estimator of the mean of a positive skewed population. A winsorized mean is obtained by replacing all the observations larger than some cut-off value R by R before averaging. The optimal cut-off value, as defined by Searls (1966), minimizes the mean square error of the winsorized estimator. Techniques are proposed for the evaluation of this optimal cut-off in several sampling designs including simple random sampling, stratified sampling and sampling with probability proportional to size. For most skewed distributions, the optimal winsorization strategy is shown, on average, to modify the value of about one data point in the sample. Closed form approximations to the efficiency of Searls’ winsorized mean are derived using the theory of extreme order statistics. Various estimators reducing the impact of large data values are compared in a Monte Carlo experiment.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500114407
    Description:

    The Horvitz-Thompson estimator (HT-estimator) is not robust against outliers. Outliers in the population may increase its variance though it remains unbiased. The HT-estimator is expressed as a least squares functional to robustify it through M-estimators. An approximate variance of the robustified HT-estimator is derived using a kind of influence function for sampling and an estimator of this variance is developed. An adaptive method to choose an M-estimator leads to minimum estimated risk estimators. These estimators and robustified HT-estimators are often more efficient than the HT-estimator when outliers occur.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199400114435
    Description:

    The problem of estimating domain totals and means from sample survey data is common. When the domain is large, the observed sample is generally large enough that direct, design-based estimators are sufficiently accurate. But when the domain is small, the observed sample size is small and direct estimators are inadequate. Small area estimation is a particular case in point and alternative methods such as synthetic estimation or model-based estimators have been developed. The two usual facets of such methods are that information is ‘borrowed’ from other small domains (or areas) so as to obtain more precise estimators of certain parameters and these are then combined with auxiliary information, such as population means or totals, from each small area in turn to obtain a more precise estimate of the domain (or area) mean or total. This paper describes a case involving unequal probability sampling in which no auxiliary population means or totals are available and borrowing strength from other domains is not allowed and yet simple model-based estimators are developed which appear to offer substantial efficiency gains. The approach is motivated by an application to market research but the methods are more widely applicable.

    Release date: 1994-06-15

  • Articles and reports: 12-001-X199300114474
    Description:

    The need for standards introduced for the gathering and reporting of information on nonresponse across surveys within a statistical agency is discussed. Standards being adopted at Statistics Canada are then described. Measures to reduce nonresponse undertaken at different stages in the design of surveys at Statistics Canada that have a bearing on nonresponse are described. These points are illustrated by examining nonresponse experiences for two major surveys at Statistics Canada.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199200114497
    Description:

    The present article discusses a model-based approach towards adjustment of the 1988 Census Dress Rehearsal Data collected from test sites in Missouri. The primary objective is to develop procedures that can be used to model data from the 1990 Census Post Enumeration Survey in April, 1991 and smooth survey-based estimates of the adjustment factors. We have proposed in this paper hierarchical Bayes (HB) and empirical Bayes (EB) procedures which meet this objective. The resulting estimators seem to improve consistently on the estimators of the adjustment factors based on dual system estimation (DSE) as well as the smoothed regression estimators.

    Release date: 1992-06-15

  • Articles and reports: 12-001-X199100214502
    Description:

    A sample design for the initial selection, sample rotation and updating for sub-annual business surveys is proposed. The sample design is a stratified clustered design, with the stratification being carried out on the basis of industry, geography and size. Sample rotation of the sample units is carried out under time-in and time-out constraints. Updating is with respect to the selection of births (new businesses), removal of deaths (defunct businesses) and implementation of changes in the classification variables used for stratification, i.e. industry, geography and size. A number of alternate estimators, including the simple expansion estimator and Mickey’s (1959) unbiased ratio-type estimator have been evaluated for this design in an empirical study under various survey conditions. The problem of variance estimation has also been considered using the Taylor linearization method and the jackknife technique.

    Release date: 1991-12-16

  • Articles and reports: 12-001-X199000214535
    Description:

    Papers by Scott and Smith (1974) and Scott, Smith, and Jones (1977) suggested the use of signal extraction results from time series analysis to improve estimates in repeated surveys, what we call the time series approach to estimation in repeated surveys. We review the underlying philosophy of this approach, pointing out that it stems from recognition of two sources of variation - time series variation and sampling variation - and that the approach can provide a unifying framework for other problems where the two sources of variation are present. We obtain some theoretical results for the time series approach regarding design consistency of the time series estimators, and uncorrelatedness of the signal and sampling error series. We observe that, from a design-based perspective, the time series approach trades some bias for a reduction in variance and a reduction in average mean squared error relative to classical survey estimators. We briefly discuss modeling to implement the time series approach, and then illustrate the approach by applying it to time series of retail sales of eating places and of drinking places from the U.S. Census Bureau’s Retail Trade Survey.

    Release date: 1990-12-14

  • Articles and reports: 12-001-X199000214536
    Description:

    We discuss frame and sample maintenance issues that arise in recurring surveys. A new system is described that meets four objectives. Through time, it maintains (1) the geographical balance of a sample; (2) the sample size; (3) the unbiased character of estimators; and (4) the lack of distortion in estimated trends. The system is based upon the Peano key, which creates a fractal, space-filling curve. An example of the new system is presented using a national survey of establishments in the United States conducted by the A.C. Nielsen Company.

    Release date: 1990-12-14

  • Articles and reports: 12-001-X199000114552
    Description:

    The effects of utilizing a self-administered questionnaire or a personal interview procedure on the responses of an adolescent sample on their alcohol consumption and related behaviors are examined. The results are generally supportive of previous studies on the relationship between the method of data collection and the distribution of responses with sensitive or non-normative content. Although of significance in a statistical sense, many of the differences are not of sufficient magnitude to be considered significant in a substantive sense.

    Release date: 1990-06-15

  • Articles and reports: 12-001-X198900114580
    Description:

    Estimation of total numbers of hogs and pigs, sows and gilts, and cattle and calves in a state is studied using data obtained in the June Enumerative Survey conducted by the National Agricultural Statistics Service of the U.S. Department of Agriculture. It is possible to construct six different estimators using the June Enumerative Survey data. Three estimators involve data from area samples and three estimators combine data from list-frame and area-frame surveys. A rotation sampling scheme is used for the area frame portion of the June Enumerative Survey. Using data from the five years, 1982 through 1986, covariances among the estimators for different years are estimated. A composite estimator is proposed for the livestock numbers. The composite estimator is obtained by a generalized least-squares regression of the vector of different yearly estimators on an appropriate set of dummy variables. The composite estimator is designed to yield estimates for livestock inventories that are “at the same level” as the official estimates made by the U.S. Department of Agriculture.

    Release date: 1989-06-15

  • Articles and reports: 12-001-X198900114577
    Description:

    In this article the authors evaluate the relative performance of survey and diary data collection methods in the context of the long-distance telephone communication market. Based on an analysis of 1,530 respondents, the results indicate that two demographic variables, sex and income, are important in explaining the difference in survey reporting and diary recording of usage data.

    Release date: 1989-06-15

Data (1)

Data (1) (1 result)

  • Table: 62-010-X19970023422
    Description:

    The current official time base of the Consumer Price Index (CPI) is 1986=100. This time base was first used when the CPI for June 1990 was released. Statistics Canada is about to convert all price index series to the time base 1992=100. As a result, all constant dollar series will be converted to 1992 dollars. The CPI will shift to the new time base when the CPI for January 1998 is released on February 27th, 1998.

    Release date: 1997-11-17

Analysis (81)

Analysis (81) (25 of 81 results)

  • Articles and reports: 82-003-X201700614829
    Description:

    POHEM-BMI is a microsimulation tool that includes a model of adult body mass index (BMI) and a model of childhood BMI history. This overview describes the development of BMI prediction models for adults and of childhood BMI history, and compares projected BMI estimates with those from nationally representative survey data to establish validity.

    Release date: 2017-06-21

  • Articles and reports: 12-001-X201600214662
    Description:

    Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114540
    Description:

    In this paper, we compare the EBLUP and pseudo-EBLUP estimators for small area estimation under the nested error regression model and three area level model-based estimators using the Fay-Herriot model. We conduct a design-based simulation study to compare the model-based estimators for unit level and area level models under informative and non-informative sampling. In particular, we are interested in the confidence interval coverage rate of the unit level and area level estimators. We also compare the estimators if the model has been misspecified. Our simulation results show that estimators based on the unit level model perform better than those based on the area level. The pseudo-EBLUP estimator is the best among unit level and area level estimators.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114543
    Description:

    The regression estimator is extensively used in practice because it can improve the reliability of the estimated parameters of interest such as means or totals. It uses control totals of variables known at the population level that are included in the regression set up. In this paper, we investigate the properties of the regression estimator that uses control totals estimated from the sample, as well as those known at the population level. This estimator is compared to the regression estimators that strictly use the known totals both theoretically and via a simulation study.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201500214248
    Description:

    Unit level population models are often used in model-based small area estimation of totals and means, but the models may not hold for the sample if the sampling design is informative for the model. As a result, standard methods, assuming that the model holds for the sample, can lead to biased estimators. We study alternative methods that use a suitable function of the unit selection probability as an additional auxiliary variable in the sample model. We report the results of a simulation study on the bias and mean squared error (MSE) of the proposed estimators of small area means and on the relative bias of the associated MSE estimators, using informative sampling schemes to generate the samples. Alternative methods, based on modeling the conditional expectation of the design weight as a function of the model covariates and the response, are also included in the simulation study.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214236
    Description:

    We propose a model-assisted extension of weighting design-effect measures. We develop a summary-level statistic for different variables of interest, in single-stage sampling and under calibration weight adjustments. Our proposed design effect measure captures the joint effects of a non-epsem sampling design, unequal weights produced using calibration adjustments, and the strength of the association between an analysis variable and the auxiliaries used in calibration. We compare our proposed measure to existing design effect measures in simulations using variables like those collected in establishment surveys and telephone surveys of households.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214229
    Description:

    Self-weighting estimation through equal probability selection methods (epsem) is desirable for variance efficiency. Traditionally, the epsem property for (one phase) two stage designs for estimating population-level parameters is realized by using each primary sampling unit (PSU) population count as the measure of size for PSU selection along with equal sample size allocation per PSU under simple random sampling (SRS) of elementary units. However, when self-weighting estimates are desired for parameters corresponding to multiple domains under a pre-specified sample allocation to domains, Folsom, Potter and Williams (1987) showed that a composite measure of size can be used to select PSUs to obtain epsem designs when besides domain-level PSU counts (i.e., distribution of domain population over PSUs), frame-level domain identifiers for elementary units are also assumed to be available. The term depsem-A will be used to denote such (one phase) two stage designs to obtain domain-level epsem estimation. Folsom et al. also considered two phase two stage designs when domain-level PSU counts are unknown, but whole PSU counts are known. For these designs (to be termed depsem-B) with PSUs selected proportional to the usual size measure (i.e., the total PSU count) at the first stage, all elementary units within each selected PSU are first screened for classification into domains in the first phase of data collection before SRS selection at the second stage. Domain-stratified samples are then selected within PSUs with suitably chosen domain sampling rates such that the desired domain sample sizes are achieved and the resulting design is self-weighting. In this paper, we first present a simple justification of composite measures of size for the depsem-A design and of the domain sampling rates for the depsem-B design. Then, for depsem-A and -B designs, we propose generalizations, first to cases where frame-level domain identifiers for elementary units are not available and domain-level PSU counts are only approximately known from alternative sources, and second to cases where PSU size measures are pre-specified based on other practical and desirable considerations of over- and under-sampling of certain domains. We also present a further generalization in the presence of subsampling of elementary units and nonresponse within selected PSUs at the first phase before selecting phase two elementary units from domains within each selected PSU. This final generalization of depsem-B is illustrated for an area sample of housing units.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114199
    Description:

    In business surveys, it is not unusual to collect economic variables for which the distribution is highly skewed. In this context, winsorization is often used to treat the problem of influential values. This technique requires the determination of a constant that corresponds to the threshold above which large values are reduced. In this paper, we consider a method of determining the constant which involves minimizing the largest estimated conditional bias in the sample. In the context of domain estimation, we also propose a method of ensuring consistency between the domain-level winsorized estimates and the population-level winsorized estimate. The results of two simulation studies suggest that the proposed methods lead to winsorized estimators that have good bias and relative efficiency properties.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201400214119
    Description:

    When considering sample stratification by several variables, we often face the case where the expected number of sample units to be selected in each stratum is very small and the total number of units to be selected is smaller than the total number of strata. These stratified sample designs are specifically represented by the tabular arrays with real numbers, called controlled selection problems, and are beyond the reach of conventional methods of allocation. Many algorithms for solving these problems have been studied over about 60 years beginning with Goodman and Kish (1950). Those developed more recently are especially computer intensive and always find the solutions. However, there still remains the unanswered question: In what sense are the solutions to a controlled selection problem obtained from those algorithms optimal? We introduce the general concept of optimal solutions, and propose a new controlled selection algorithm based on typical distance functions to achieve solutions. This algorithm can be easily performed by a new SAS-based software. This study focuses on two-way stratification designs. The controlled selection solutions from the new algorithm are compared with those from existing algorithms using several examples. The new algorithm successfully obtains robust solutions to two-way controlled selection problems that meet the optimality criteria.

    Release date: 2014-12-19

  • Articles and reports: 82-003-X201401014098
    Description:

    This study compares registry and non-registry approaches to linking 2006 Census of Population data for Manitoba and Ontario to Hospital data from the Discharge Abstract Database.

    Release date: 2014-10-15

  • Articles and reports: 12-001-X201400114004
    Description:

    In 2009, two major surveys in the Governments Division of the U.S. Census Bureau were redesigned to reduce sample size, save resources, and improve the precision of the estimates (Cheng, Corcoran, Barth and Hogue 2009). The new design divides each of the traditional state by government-type strata with sufficiently many units into two sub-strata according to each governmental unit’s total payroll, in order to sample less from the sub-stratum with small size units. The model-assisted approach is adopted in estimating population totals. Regression estimators using auxiliary variables are obtained either within each created sub-stratum or within the original stratum by collapsing two sub-strata. A decision-based method was proposed in Cheng, Slud and Hogue (2010), applying a hypothesis test to decide which regression estimator is used within each original stratum. Consistency and asymptotic normality of these model-assisted estimators are established here, under a design-based or model-assisted asymptotic framework. Our asymptotic results also suggest two types of consistent variance estimators, one obtained by substituting unknown quantities in the asymptotic variances and the other by applying the bootstrap. The performance of all the estimators of totals and of their variance estimators are examined in some empirical studies. The U.S. Annual Survey of Public Employment and Payroll (ASPEP) is used to motivate and illustrate our study.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211885
    Description:

    Web surveys are generally connected with low response rates. Common suggestions in textbooks on Web survey research highlight the importance of the welcome screen in encouraging respondents to take part. The importance of this screen has been empirically proven in research, showing that most respondents breakoff at the welcome screen. However, there has been little research on the effect of the design of this screen on the level of the breakoff rate. In a study conducted at the University of Konstanz, three experimental treatments were added to a survey of the first-year student population (2,629 students) to assess the impact of different design features of this screen on the breakoff rates. The methodological experiments included varying the background color of the welcome screen, varying the promised task duration on this first screen, and varying the length of the information provided on the welcome screen explaining the privacy rights of the respondents. The analyses show that the longer stated length and the more attention given to explaining privacy rights on the welcome screen, the fewer respondents started and completed the survey. However, the use of a different background color does not result in the expected significant difference.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300211887
    Description:

    Multi-level models are extensively used for analyzing survey data with the design hierarchy matching the model hierarchy. We propose a unified approach, based on a design-weighted log composite likelihood, for two-level models that leads to design-model consistent estimators of the model parameters even when the within cluster sample sizes are small provided the number of sample clusters is large. This method can handle both linear and generalized linear two-level models and it requires level 2 and level 1 inclusion probabilities and level 1 joint inclusion probabilities, where level 2 represents a cluster and level 1 an element within a cluster. Results of a simulation study demonstrating superior performance of the proposed method relative to existing methods under informative sampling are also reported.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300211869
    Description:

    The house price index compiled by Statistics Netherlands relies on the Sale Price Appraisal Ratio (SPAR) method. The SPAR method combines selling prices with prior government assessments of properties. This paper outlines an alternative approach where the appraisals serve as auxiliary information in a generalized regression (GREG) framework. An application on Dutch data demonstrates that, although the GREG index is much smoother than the ratio of sample means, it is very similar to the SPAR series. To explain this result we show that the SPAR index is an estimator of our more general GREG index and in practice almost as efficient.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300111830
    Description:

    We consider two different self-benchmarking methods for the estimation of small area means based on the Fay-Herriot (FH) area level model: the method of You and Rao (2002) applied to the FH model and the method of Wang, Fuller and Qu (2008) based on augmented models. We derive an estimator of the mean squared prediction error (MSPE) of the You-Rao (YR) estimator of a small area mean that, under the true model, is correct to second-order terms. We report the results of a simulation study on the relative bias of the MSPE estimator of the YR estimator and the MSPE estimator of the Wang, Fuller and Qu (WFQ) estimator obtained under an augmented model. We also study the MSPE and the estimators of MSPE for the YR and WFQ estimators obtained under a misspecified model.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201200111682
    Description:

    Sample allocation issues are studied in the context of estimating sub-population (stratum or domain) means as well as the aggregate population mean under stratified simple random sampling. A non-linear programming method is used to obtain "optimal" sample allocation to strata that minimizes the total sample size subject to specified tolerances on the coefficient of variation of the estimators of strata means and the population mean. The resulting total sample size is then used to determine sample allocations for the methods of Costa, Satorra and Ventura (2004) based on compromise allocation and Longford (2006) based on specified "inferential priorities". In addition, we study sample allocation to strata when reliability requirements for domains, cutting across strata, are also specified. Performance of the three methods is studied using data from Statistics Canada's Monthly Retail Trade Survey (MRTS) of single establishments.

    Release date: 2012-06-27

  • Articles and reports: 82-003-X201200111625
    Description:

    This study compares estimates of the prevalence of cigarette smoking based on self-report with estimates based on urinary cotinine concentrations. The data are from the 2007 to 2009 Canadian Health Measures Survey, which included self-reported smoking status and the first nationally representative measures of urinary cotinine.

    Release date: 2012-02-15

  • Articles and reports: 12-001-X201000211378
    Description:

    One key to poverty alleviation or eradication in the third world is reliable information on the poor and their location, so that interventions and assistance can be effectively targeted to the neediest people. Small area estimation is one statistical technique that is used to monitor poverty and to decide on aid allocation in pursuit of the Millennium Development Goals. Elbers, Lanjouw and Lanjouw (ELL) (2003) proposed a small area estimation methodology for income-based or expenditure-based poverty measures, which is implemented by the World Bank in its poverty mapping projects via the involvement of the central statistical agencies in many third world countries, including Cambodia, Lao PDR, the Philippines, Thailand and Vietnam, and is incorporated into the World Bank software program PovMap. In this paper, the ELL methodology which consists of first modeling survey data and then applying that model to census information is presented and discussed with strong emphasis on the first phase, i.e., the fitting of regression models and on the estimated standard errors at the second phase. Other regression model fitting procedures such as the General Survey Regression (GSR) (as described in Lohr (1999) Chapter 11) and those used in existing small area estimation techniques: Pseudo-Empirical Best Linear Unbiased Prediction (Pseudo-EBLUP) approach (You and Rao 2002) and Iterative Weighted Estimating Equation (IWEE) method (You, Rao and Kovacevic 2003) are presented and compared with the ELL modeling strategy. The most significant difference between the ELL method and the other techniques is in the theoretical underpinning of the ELL model fitting procedure. An example based on the Philippines Family Income and Expenditure Survey is presented to show the differences in both the parameter estimates and their corresponding standard errors, and in the variance components generated from the different methods and the discussion is extended to the effect of these on the estimated accuracy of the final small area estimates themselves. The need for sound estimation of variance components, as well as regression estimates and estimates of their standard errors for small area estimation of poverty is emphasized.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X201000211385
    Description:

    In this short note, we show that simple random sampling without replacement and Bernoulli sampling have approximately the same entropy when the population size is large. An empirical example is given as an illustration.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X201000211384
    Description:

    The current economic downturn in the US could challenge costly strategies in survey operations. In the Behavioral Risk Factor Surveillance System (BRFSS), ending the monthly data collection at 31 days could be a less costly alternative. However, this could potentially exclude a portion of interviews completed after 31 days (late responders) whose respondent characteristics could be different in many respects from those who completed the survey within 31 days (early responders). We examined whether there are differences between the early and late responders in demographics, health-care coverage, general health status, health risk behaviors, and chronic disease conditions or illnesses. We used 2007 BRFSS data, where a representative sample of the noninstitutionalized adult U.S. population was selected using a random digit dialing method. Late responders were significantly more likely to be male; to report race/ethnicity as Hispanic; to have annual income higher than $50,000; to be younger than 45 years of age; to have less than high school education; to have health-care coverage; to be significantly more likely to report good health; and to be significantly less likely to report hypertension, diabetes, or being obese. The observed differences between early and late responders on survey estimates may hardly influence national and state-level estimates. As the proportion of late responders may increase in the future, its impact on surveillance estimates should be examined before excluding from the analysis. Analysis on late responders only should combine several years of data to produce reliable estimates.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X201000111246
    Description:

    Many surveys employ weight adjustment procedures to reduce nonresponse bias. These adjustments make use of available auxiliary data. This paper addresses the issue of jackknife variance estimation for estimators that have been adjusted for nonresponse. Using the reverse approach for variance estimation proposed by Fay (1991) and Shao and Steel (1999), we study the effect of not re-calculating the nonresponse weight adjustment within each jackknife replicate. We show that the resulting 'shortcut' jackknife variance estimator tends to overestimate the true variance of point estimators in the case of several weight adjustment procedures used in practice. These theoretical results are confirmed through a simulation study where we compare the shortcut jackknife variance estimator with the full jackknife variance estimator obtained by re-calculating the nonresponse weight adjustment within each jackknife replicate.

    Release date: 2010-06-29

  • Articles and reports: 12-001-X200900110885
    Description:

    Peaks in the spectrum of a stationary process are indicative of the presence of stochastic periodic phenomena, such as a stochastic seasonal effect. This work proposes to measure and test for the presence of such spectral peaks via assessing their aggregate slope and convexity. Our method is developed nonparametrically, and thus may be useful during a preliminary analysis of a series. The technique is also useful for detecting the presence of residual seasonality in seasonally adjusted data. The diagnostic is investigated through simulation and an extensive case study using data from the U.S. Census Bureau and the Organization for Economic Co-operation and Development (OECD).

    Release date: 2009-06-22

  • Articles and reports: 12-001-X200800110616
    Description:

    With complete multivariate data the BACON algorithm (Billor, Hadi and Vellemann 2000) yields a robust estimate of the covariance matrix. The corresponding Mahalanobis distance may be used for multivariate outlier detection. When items are missing the EM algorithm is a convenient way to estimate the covariance matrix at each iteration step of the BACON algorithm. In finite population sampling the EM algorithm must be enhanced to estimate the covariance matrix of the population rather than of the sample. A version of the EM algorithm for survey data following a multivariate normal model, the EEM algorithm (Estimated Expectation Maximization), is proposed. The combination of the two algorithms, the BACON-EEM algorithm, is applied to two datasets and compared with alternative methods.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200700210495
    Description:

    The purpose of this work is to obtain reliable estimates in study domains when there are potentially very small sample sizes and the sampling design stratum differs from the study domain. The population sizes are unknown as well for both the study domain and the sampling design stratum. In calculating parameter estimates in the study domains, a random sample size is often necessary. We propose a new family of generalized linear mixed models with correlated random effects when there is more than one unknown parameter. The proposed model will estimate both the population size and the parameter of interest. General formulae for full conditional distributions required for Markov chain Monte Carlo (MCMC) simulations are given for this framework. Equations for Bayesian estimation and prediction at the study domains are also given. We apply the 1998 Missouri Turkey Hunting Survey, which stratified samples based on the hunter's place of residence and we require estimates at the domain level, defined as the county in which the turkey hunter actually hunted.

    Release date: 2008-01-03

  • Articles and reports: 12-001-X200700210498
    Description:

    In this paper we describe a methodology for combining a convenience sample with a probability sample in order to produce an estimator with a smaller mean squared error (MSE) than estimators based on only the probability sample. We then explore the properties of the resulting composite estimator, a linear combination of the convenience and probability sample estimators with weights that are a function of bias. We discuss the estimator's properties in the context of web-based convenience sampling. Our analysis demonstrates that the use of a convenience sample to supplement a probability sample for improvements in the MSE of estimation may be practical only under limited circumstances. First, the remaining bias of the estimator based on the convenience sample must be quite small, equivalent to no more than 0.1 of the outcome's population standard deviation. For a dichotomous outcome, this implies a bias of no more than five percentage points at 50 percent prevalence and no more than three percentage points at 10 percent prevalence. Second, the probability sample should contain at least 1,000-10,000 observations for adequate estimation of the bias of the convenience sample estimator. Third, it must be inexpensive and feasible to collect at least thousands (and probably tens of thousands) of web-based convenience observations. The conclusions about the limited usefulness of convenience samples with estimator bias of more than 0.1 standard deviations also apply to direct use of estimators based on that sample.

    Release date: 2008-01-03

Reference (87)

Reference (87) (25 of 87 results)

  • Technical products: 11-522-X201700014714
    Description:

    The Labour Market Development Agreements (LMDAs) between Canada and the provinces and territories fund labour market training and support services to Employment Insurance claimants. The objective of this paper is to discuss the improvements over the years in the impact assessment methodology. The paper describes the LMDAs and past evaluation work and discusses the drivers to make better use of large administrative data holdings. It then explains how the new approach made the evaluation less resource-intensive, while results are more relevant to policy development. The paper outlines the lessons learned from a methodological perspective and provides insight into ways for making this type of use of administrative data effective, especially in the context of large programs.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014742
    Description:

    This paper describes the Quick Match System (QMS), an in-house application designed to match business microdata records, and the methods used to link the United States Patent and Trademark Office (USPTO) dataset to Statistics Canada’s Business Register (BR) for the period from 2000 to 2011. The paper illustrates the record-linkage framework and outlines the techniques used to prepare and classify each record and evaluate the match results. The USPTO dataset consisted of 41,619 U.S. patents granted to 14,162 distinct Canadian entities. The record-linkage process matched the names, city, province and postal codes of the patent assignees in the USPTO dataset with those of businesses in the January editions of the Generic Survey Universe File (GSUF) from the BR for the same reference period. As the vast majority of individual patent assignees are not engaged in commercial activity to provide taxable property or services, they tend not to appear in the BR. The relatively poor match rate of 24.5% among individuals, compared to 84.7% among institutions, reflects this tendency. Although the 8,844 individual patent assignees outnumbered the 5,318 institutions, the institutions accounted for 73.0% of the patents, compared to 27.0% held by individuals. Consequently, this study and its conclusions focus primarily on institutional patent assignees. The linkage of the USPTO institutions to the BR is significant because it provides access to business micro-level data on firm characteristics, employment, revenue, assets and liabilities. In addition, the retrieval of robust administrative identifiers enables subsequent linkage to other survey and administrative data sources. The integrated dataset will support direct and comparative analytical studies on the performance of Canadian institutions that obtained patents in the United States between 2000 and 2011.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014711
    Description:

    After the 2010 Census, the U.S. Census Bureau conducted two separate research projects matching survey data to databases. One study matched to the third-party database Accurint, and the other matched to U.S. Postal Service National Change of Address (NCOA) files. In both projects, we evaluated response error in reported move dates by comparing the self-reported move date to records in the database. We encountered similar challenges in the two projects. This paper discusses our experience using “big data” as a comparison source for survey data and our lessons learned for future projects similar to the ones we conducted.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014729
    Description:

    The use of administrative datasets as a data source in official statistics has become much more common as there is a drive for more outputs to be produced more efficiently. Many outputs rely on linkage between two or more datasets, and this is often undertaken in a number of phases with different methods and rules. In these situations we would like to be able to assess the quality of the linkage, and this involves some re-assessment of both links and non-links. In this paper we discuss sampling approaches to obtain estimates of false negatives and false positives with reasonable control of both accuracy of estimates and cost. Approaches to stratification of links (non-links) to sample are evaluated using information from the 2011 England and Wales population census.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014749
    Description:

    As part of the Tourism Statistics Program redesign, Statistics Canada is developing the National Travel Survey (NTS) to collect travel information from Canadian travellers. This new survey will replace the Travel Survey of Residents of Canada and the Canadian resident component of the International Travel Survey. The NTS will take advantage of Statistics Canada’s common sampling frames and common processing tools while maximizing the use of administrative data. This paper discusses the potential uses of administrative data such as Passport Canada files, Canada Border Service Agency files and Canada Revenue Agency files, to increase the efficiency of the NTS sample design.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014726
    Description:

    Internal migration is one of the components of population growth estimated at Statistics Canada. It is estimated by comparing individuals’ addresses at the beginning and end of a given period. The Canada Child Tax Benefit and T1 Family File are the primary data sources used. Address quality and coverage of more mobile subpopulations are crucial to producing high-quality estimates. The purpose of this article is to present the results of evaluations of these elements using access to more tax data sources at Statistics Canada.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014719
    Description:

    Open Data initiatives are transforming how governments and other public institutions interact and provide services to their constituents. They increase transparency and value to citizens, reduce inefficiencies and barriers to information, enable data-driven applications that improve public service delivery, and provide public data that can stimulate innovative business opportunities. As one of the first international organizations to adopt an open data policy, the World Bank has been providing guidance and technical expertise to developing countries that are considering or designing their own initiatives. This presentation will give an overview of developments in open data at the international level along with current and future experiences, challenges, and opportunities. Mr. Herzog will discuss the rationales under which governments are embracing open data, demonstrated benefits to both the public and private sectors, the range of different approaches that governments are taking, and the availability of tools for policymakers, with special emphasis on the roles and perspectives of National Statistics Offices within a government-wide initiative.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014743
    Description:

    Probabilistic linkage is susceptible to linkage errors such as false positives and false negatives. In many cases, these errors may be reliably measured through clerical-reviews, i.e. the visual inspection of a sample of record pairs to determine if they are matched. A framework is described to effectively carry-out such clerical-reviews based on a probabilistic sample of pairs, repeated independent reviews of the same pairs and latent class analysis to account for clerical errors.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014740
    Description:

    In this paper, we discuss the impacts of Employment Benefit and Support Measures delivered in Canada under the Labour Market Development Agreements. We use linked rich longitudinal administrative data covering all LMDA participants from 2002 to 2005. We Apply propensity score matching as in Blundell et al. (2002), Gerfin and Lechner (2002), and Sianesi (2004), and produced the national incremental impact estimates using difference-in-differences and Kernel Matching estimator (Heckman and Smith, 1999). The findings suggest that, both Employment Assistance Services and employment benefit such as Skills Development and Targeted Wage Subsidies had positive effects on earnings and employment.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014718
    Description:

    This study assessed whether starting participation in Employment Assistance Services (EAS) earlier after initiating an Employment Insurance (EI) claim leads to better impacts for unemployed individuals than participating later during the EI benefit period. As in Sianesi (2004) and Hujer and Thomsen (2010), the analysis relied on a stratified propensity score matching approach conditional on the discretized duration of unemployment until the program starts. The results showed that individuals who participated in EAS within the first four weeks after initiating an EI claim had the best impacts on earnings and incidence of employment while also experiencing reduced use of EI starting the second year post-program.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014716
    Description:

    Administrative data, depending on its source and original purpose, can be considered a more reliable source of information than survey-collected data. It does not require a respondent to be present and understand question wording, and it is not limited by the respondent’s ability to recall events retrospectively. This paper compares selected survey data, such as demographic variables, from the Longitudinal and International Study of Adults (LISA) to various administrative sources for which LISA has linkage agreements in place. The agreement between data sources, and some factors that might affect it, are analyzed for various aspects of the survey.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014732
    Description:

    The Institute for Employment Research (IAB) is the research unit of the German Federal Employment Agency. Via the Research Data Centre (FDZ) at the IAB, administrative and survey data on individuals and establishments are provided to researchers. In cooperation with the Institute for the Study of Labor (IZA), the FDZ has implemented the Job Submission Application (JoSuA) environment which enables researchers to submit jobs for remote data execution through a custom-built web interface. Moreover, two types of user-generated output files may be distinguished within the JoSuA environment which allows for faster and more efficient disclosure review services.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014746
    Description:

    Paradata research has focused on identifying opportunities for strategic improvement in data collection that could be operationally viable and lead to enhancements in quality or cost efficiency. To that end, Statistics Canada has developed and implemented a responsive collection design (RCD) strategy for computer-assisted telephone interview (CATI) household surveys to maximize quality and efficiency and to potentially reduce costs. RCD is an adaptive approach to survey data collection that uses information available prior to and during data collection to adjust the collection strategy for the remaining in-progress cases. In practice, the survey managers monitor and analyze collection progress against a predetermined set of indicators for two purposes: to identify critical data-collection milestones that require significant changes to the collection approach and to adjust collection strategies to make the most efficient use of remaining available resources. In the RCD context, numerous considerations come into play when determining which aspects of data collection to adjust and how to adjust them. Paradata sources play a key role in the planning, development and implementation of active management for RCD surveys. Since 2009, Statistics Canada has conducted several RCD surveys. This paper describes Statistics Canada’s experiences in implementing and monitoring this type of surveys.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014755
    Description:

    The National Children’s Study Vanguard Study was a pilot epidemiological cohort study of children and their parents. Measures were to be taken from pre-pregnancy until adulthood. The use of extant data was planned to supplement direct data collection from the respondents. Our paper outlines a strategy for cataloging and evaluating extant data sources for use with large scale longitudinal. Through our review we selected five evaluation factors to guide a researcher through available data sources including 1) relevance, 2) timeliness, 3) spatiality, 4) accessibility, and 5) accuracy.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014725
    Description:

    Tax data are being used more and more to measure and analyze the population and its characteristics. One of the issues raised by the growing use of these type of data relates to the definition of the concept of place of residence. While the census uses the traditional concept of place of residence, tax data provide information based on the mailing address of tax filers. Using record linkage between the census, the National Household Survey and tax data from the T1 Family File, this study examines the consistency level of the place of residence of these two sources and its associated characteristics.

    Release date: 2016-03-24

  • Technical products: 11-522-X201300014256
    Description:

    The American Community Survey (ACS) added an Internet data collection mode as part of a sequential mode design in 2013. The ACS currently uses a single web application for all Internet respondents, regardless of whether they respond on a personal computer or on a mobile device. As market penetration of mobile devices increases, however, more survey respondents are using tablets and smartphones to take surveys that are designed for personal computers. Using mobile devices to complete these surveys may be more difficult for respondents and this difficulty may translate to reduced data quality if respondents become frustrated or cannot navigate around usability issues. This study uses several indicators to compare data quality across computers, tablets, and smartphones and also compares the demographic characteristics of respondents that use each type of device.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014278
    Description:

    In January and February 2014, Statistics Canada conducted a test aiming at measuring the effectiveness of different collection strategies using an online self-reporting survey. Sampled units were contacted using mailed introductory letters and asked to complete the online survey without any interviewer contact. The objectives of this test were to measure the take-up rates for completing an online survey, and to profile the respondents/non-respondents. Different samples and letters were tested to determine the relative effectiveness of the different approaches. The results of this project will be used to inform various social surveys that are preparing to include an internet response option in their surveys. The paper will present the general methodology of the test as well as results observed from collection and the analysis of profiles.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014276
    Description:

    In France, budget restrictions are making it more difficult to hire casual interviewers to deal with collection problems. As a result, it has become necessary to adhere to a predetermined annual work quota. For surveys of the National Institute of Statistics and Economic Studies (INSEE), which use a master sample, problems arise when an interviewer is on extended leave throughout the entire collection period of a survey. When that occurs, an area may cease to be covered by the survey, and this effectively generates a bias. In response to this new problem, we have implemented two methods, depending on when the problem is identified: If an area is ‘abandoned’ before or at the very beginning of collection, we carry out a ‘sub-allocation’ procedure. The procedure involves interviewing a minimum number of households in each collection area at the expense of other areas in which no collection problems have been identified. The idea is to minimize the dispersion of weights while meeting collection targets. If an area is ‘abandoned’ during collection, we prioritize the remaining surveys. Prioritization is based on a representativeness indicator (R indicator) that measures the degree of similarity between a sample and the base population. The goal of this prioritization process during collection is to get as close as possible to equal response probability for respondents. The R indicator is based on the dispersion of the estimated response probabilities of the sampled households, and it is composed of partial R indicators that measure representativeness variable by variable. These R indicators are tools that we can use to analyze collection by isolating underrepresented population groups. We can increase collection efforts for groups that have been identified beforehand. In the oral presentation, we covered these two points concisely. By contrast, this paper deals exclusively with the first point: sub-allocation. Prioritization is being implemented for the first time at INSEE for the assets survey, and it will be covered in a specific paper by A. Rebecq.

    Release date: 2014-10-31

  • Technical products: 11-522-X200800010973
    Description:

    The Canadian Community Health Survey (CCHS) provides timely estimates of health information at the sub-provincial level. We explore two main issues that prevented us from using physical activity data from CCHS cycle 3.1 (2005) as part of the Profile of Women's Health in Manitoba. CCHS uses the term 'moderate' to describe physical effort that meets Canadian minimum guidelines, whereas 'moderate' conversely describes sub-minimal levels of activity. A Manitoba survey of physical activity interrogates a wider variety of activities to measure respondents' daily energy expenditure. We found the latter survey better suited to our needs and more likely a better measure of women's daily physical activity and health.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010964
    Description:

    Statistics Netherlands (SN) has been using electronic questionnaires for Business surveys since the early nineties. Some years ago SN decided to invest in a large scale use of electronic questionnaires. The big yearly production survey of about 80 000 forms, divided over many different economical activity areas, was redesigned using a meta database driven approach. The resulting system is able to generate non-intelligent personalized PDF forms and intelligent personalized Blaise forms. The Blaise forms are used by a new tool in the Blaise system which can be downloaded by the respondents from the SN web site to run the questionnaire off-line. Essential to the system is the SN house style for paper and electronic forms. The flexibility of the new tool offered the questionnaire designers the possibility to implement a user friendly form according to this house style.

    Part of the implementation is an audit trail that offers insight in the way respondents operate the questionnaire program. The entered data including the audit trail can be transferred via encrypted e-mail or through the internet to SN. The paper will give an outline of the overall system architecture and the role of Blaise in the system. It will also describe the results of using the system for several years now and some results of the analysis of the audit trail.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011010
    Description:

    The Survey of Employment, Payrolls and Hours (SEPH) is a monthly survey using two sources of data: a census of payroll deduction (PD7) forms (administrative data) and a survey of business establishments. This paper focuses on the processing of the administrative data, from the weekly receipt of data from the Canada Revenue Agency to the production of monthly estimates produced by SEPH.

    The edit and imputation methods used to process the administrative data have been revised in the last several years. The goals of this redesign were primarily to improve the data quality and to increase the consistency with another administrative data source (T4) which is a benchmark measure for Statistics Canada's System of National Accounts people. An additional goal was to ensure that the new process would be easier to understand and to modify, if needed. As a result, a new processing module was developed to edit and impute PD7 forms before their data is aggregated to the monthly level.

    This paper presents an overview of both the current and new processes, including a description of challenges that we faced during development. Improved quality is demonstrated both conceptually (by presenting examples of PD7 forms and their treatment under the old and new systems) and quantitatively (by comparison to T4 data).

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010956
    Description:

    The use of Computer Audio-Recorded Interviewing (CARI) as a tool to identify interview falsification is quickly growing in survey research (Biemer, 2000, 2003; Thissen, 2007). Similarly, survey researchers are starting to expand the usefulness of CARI by combining recordings with coding to address data quality (Herget, 2001; Hansen, 2005; McGee, 2007). This paper presents results from a study included as part of the establishment-based National Center for Health Statistics' National Home and Hospice Care Survey (NHHCS) which used CARI behavior coding and CARI-specific paradata to: 1) identify and correct problematic interviewer behavior or question issues early in the data collection period before either negatively impact data quality, and; 2) identify ways to diminish measurement error in future implementations of the NHHCS. During the first 9 weeks of the 30-week field period, CARI recorded a subset of questions from the NHHCS application for all interviewers. Recordings were linked with the interview application and output and then coded in one of two modes: Code by Interviewer or Code by Question. The Code by Interviewer method provided visibility into problems specific to an interviewer as well as more generalized problems potentially applicable to all interviewers. The Code by Question method yielded data that spoke to understandability of the questions and other response problems. In this mode, coders coded multiple implementations of the same question across multiple interviewers. Using the Code by Question approach, researchers identified issues with three key survey questions in the first few weeks of data collection and provided guidance to interviewers in how to handle those questions as data collection continued. Results from coding the audio recordings (which were linked with the survey application and output) will inform question wording and interviewer training in the next implementation of the NHHCS, and guide future enhancement of CARI and the coding system.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010999
    Description:

    The choice of number of call attempts in a telephone survey is an important decision. A large number of call attempts makes the data collection costly and time-consuming; and a small number of attempts decreases the response set from which conclusions are drawn and increases the variance. The decision can also have an effect on the nonresponse bias. In this paper we study the effects of number of call attempts on the nonresponse rate and the nonresponse bias in two surveys conducted by Statistics Sweden: The Labour Force Survey (LFS) and Household Finances (HF).

    By use of paradata we calculate the response rate as a function of the number of call attempts. To estimate the nonresponse bias we use estimates of some register variables, where observations are available for both respondents and nonrespondents. We also calculate estimates of some real survey parameters as functions of varying number of call attempts. The results indicate that it is possible to reduce the current number of call attempts without getting an increased nonresponse bias.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011008
    Description:

    In one sense, a questionnaire is never complete. Test results, paradata and research findings constantly provide reasons to update and improve the questionnaire. In addition, establishments change over time and questions need to be updated accordingly. In reality, it doesn't always work like this. At Statistics Sweden there are several examples of questionnaires that were designed at one point in time and rarely improved later on. However, we are currently trying to shift the perspective on questionnaire design from a linear to a cyclic one. We are developing a cyclic model in which the questionnaire can be improved continuously in multiple rounds. In this presentation, we will discuss this model and how we work with it.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010988
    Description:

    Online data collection emerged in 1995 as an alternative approach for conducting certain types of consumer research studies and has grown in 2008. This growth has been primarily in studies where non-probability sampling methods are used. While online sampling has gained acceptance for some research applications, serious questions remain concerning online samples' suitability for research requiring precise volumetric measurement of the behavior of the U.S. population, particularly their travel behavior. This paper reviews literature and compares results from studies using probability samples and online samples to understand whether results differ from the two sampling approaches. The paper also demonstrates that online samples underestimate critical types of travel even after demographic and geographic weighting.

    Release date: 2009-12-03

Date modified: