# Statistics by subject – Statistical methods

• Articles and reports: 12-001-X199000214534
Description:

The common approach to small area estimation is to exploit the cross-sectional relationships of the data in an attempt to borrow information from one small area to assist in the estimation in others. However, in the case of repeated surveys, further gains in efficiency can be secured by modelling the time series properties of the data as well. We illustrate the idea by considering regression models with time varying, cross-sectionally correlated coefficients. The use of past relationships to estimate current means raises the question of how to protect against model breakdowns. We propose a modification which guarantees that the model dependent predictors of aggregates of the small area means coincide with the corresponding survey estimators and we explore the statistical properties of the modification. The proposed procedure is applied to data on home sale prices used for the computation of housing price indexes.

Release date: 1990-12-14

• Articles and reports: 12-001-X199000214529
Description:

The Canadian Labour Force Survey uses the rotation panel design. Every month, one sixth of the sample rotates and five sixths remain. Hence, under this rotation scheme, once a rotation panel enters in the sample, it stays 6 months in the sample before it rotates out. Because of this design feature and the way of selecting the rotate-in panel, the estimates based on the panels in the same or different months are correlated. The correlation between two panel estimates is called the panel correlation. Three kinds of panel correlations are defined in this paper: (1) the correlation (denoted by \rho) between estimates for the same characteristic based on the same panel in different months; (2) the correlation (denoted by \gamma) between estimates of the same characteristic based on geographically neighboring panels in different months; (3) the correlation (denoted by \tau) between estimates of different characteristics based on the same panel in the same or different months. This paper describes a methodology for estimating these panel correlations and presents estimated correlations for selected variables using 1980-81 and 1985-87 data with some discussion.

Release date: 1990-12-14

• Articles and reports: 12-001-X199000214531
Description:

Benchmarking is a method of improving estimates from a sub-annual survey with the help of corresponding estimates from an annual survey. For example, estimates of monthly retail sales might be improved using estimates from the annual survey. This article deals, first with the problem posed by the benchmarking of time series produced by economic surveys, and then reviews the most relevant methods for solving this problem. Next, two new statistical methods are proposed, based on a non-linear model for sub-annual data. The benchmarked estimates are then obtained by applying weighted least squares.

Release date: 1990-12-14

• Articles and reports: 12-001-X199000214537
Description:

Repeated surveys in which a portion of the units are observed at more than one time point and some units are not observed at some time points are of primary interest. Least squares estimation for such surveys is reviewed. Included in the discussion are estimation procedures in which existing estimates are not revised when new data become available. Also considered are techniques for the estimation of longitudinal parameters, such as gross change tables. Estimation for a repeated survey of land use conducted by the U.S. Soil Conservation Service is described. The effects of measurement error on gross change estimates is illustrated and it is shown that survey designs constructed to enable estimation of the parameters of the measurement error process can be very efficient.

Release date: 1990-12-14

• Articles and reports: 12-001-X199000214536
Description:

We discuss frame and sample maintenance issues that arise in recurring surveys. A new system is described that meets four objectives. Through time, it maintains (1) the geographical balance of a sample; (2) the sample size; (3) the unbiased character of estimators; and (4) the lack of distortion in estimated trends. The system is based upon the Peano key, which creates a fractal, space-filling curve. An example of the new system is presented using a national survey of establishments in the United States conducted by the A.C. Nielsen Company.

Release date: 1990-12-14

• Articles and reports: 12-001-X199000214528
Description:

Panel responses to the U.S. Consumer Expenditure Interview Survey are compared, to assess the magnitude of telescoping in the unbounded first wave. Analysis of selected expense categories confirms other studies’ findings that telescoping can be considerable in unbounded interviews and tends to vary by type of expense. In addition, estimates from the first wave are found to be greater than estimates derived from subsequent waves, even after telescoping effects are deducted, and much of these effects can be attributed to the shorter recall period in the first wave of this survey.

Release date: 1990-12-14

• Articles and reports: 12-001-X199000214535
Description:

Papers by Scott and Smith (1974) and Scott, Smith, and Jones (1977) suggested the use of signal extraction results from time series analysis to improve estimates in repeated surveys, what we call the time series approach to estimation in repeated surveys. We review the underlying philosophy of this approach, pointing out that it stems from recognition of two sources of variation - time series variation and sampling variation - and that the approach can provide a unifying framework for other problems where the two sources of variation are present. We obtain some theoretical results for the time series approach regarding design consistency of the time series estimators, and uncorrelatedness of the signal and sampling error series. We observe that, from a design-based perspective, the time series approach trades some bias for a reduction in variance and a reduction in average mean squared error relative to classical survey estimators. We briefly discuss modeling to implement the time series approach, and then illustrate the approach by applying it to time series of retail sales of eating places and of drinking places from the U.S. Census Bureau’s Retail Trade Survey.

Release date: 1990-12-14

• Articles and reports: 12-001-X199000214527
Description:

The United States’ National Crime Survey is a large-scale, household survey used to provide estimates of victimizations. The National Crime Survey uses a rotating panel design under which sampled housing units are maintained in the sample for three-and-one-half years with residents of the housing units being interviewed every six months. Nonresponse is a serious problem in longitudinal data from the National Crime Survey since as few as 25% of all individuals interviewed for the survey are respondents over an entire three-and-one-half-year period. In addition, the nonresponse typically does not occur at random with respect to victimization status. This paper presents models for gross flows among two types of victimization reporting classifications: number of victimizations and seriousness of victimization. The models allow for random or nonrandom nonresponse mechanisms, and allow the probabilities underlying the gross flows to be either unconstrained or symmetric. The models are fit, using maximum likelihood estimation, to the data from the National Crime Survey.

Release date: 1990-12-14

• Articles and reports: 12-001-X199000214530
Description:

For a class of linear unbiased estimators in a class of sampling schemes, it is shown that one can forget the weights used for sample selection while estimating a population ratio by a ratio of two unbiased estimators, respectively of the numerator and the denominator defining the population ratio. This class of schemes includes commonly used sampling schemes such as unequal probability sampling with or without replacement, stratified proportional allocation sampling with unequal selection probabilities and without replacement in each stratum, etc.

Release date: 1990-12-14

• Articles and reports: 12-001-X199000214533
Description:

A commonly used model for the analysis of time series models is the seasonal ARIMA model. However, the survey errors of the input data are usually ignored in the analysis. We show, through the use of state-space models with partially improper initial conditions, how to estimate the unknown parameters of this model using maximum likelihood methods. As well, the survey estimates can be smoothed using an empirical Bayes framework and model validation can be performed. We apply these techniques to an unemployment series from the Labour Force Survey.

Release date: 1990-12-14

• Articles and reports: 12-001-X199000214532
Description:

Births by census division are studied via graphs and maps for the province of Saskatchewan for the years 1986-87. The goal of the work is to see how births are related to time and geography by obtaining contour maps that display the birth phenomenon in a smooth fashion. A principal difficulty arising is that the data are aggregate. A secondary goal is to examine the extent to which the Poisson-lognormal can replace for data that are counts, the normal regression model for continuous variates. To this end a hierarchy of models for count-valued random variates are fit to the birth data by maximum likelihood. These models include: the simple Poisson, the Poisson with year and weekday effects and the Poisson-lognormal with year and weekday effects. The use of the Poisson-lognormal is motivated by the idea that important covariates are unavailable to include in the fitting. As the discussion indicates, the work is preliminary.

Release date: 1990-12-14

• Articles and reports: 12-001-X199000114550
Description:

Modular Test 2 was a survey conducted by Statistics Canada that used two different questionnaires. Its purpose was to assist in the making of the 1991 census questionnaire. The sample used for the survey was not a probability sample. This article briefly describes the survey methodology, and the use of randomization tests to compare the two questionnaires.

Release date: 1990-06-15

• Articles and reports: 12-001-X199000114557
Description:

Rolling censuses combine F nonoverlapping periodic samples of 1/F each, so designed that cumulating the F periods yields a complete census of the whole population area with F / F = 1. Intermediate cumulations of k samples would yield samples of k/F for more timely uses (annual or quinquennial censuses). Area sampling frames would cover the national territory for naturally mobile populations. These methods may often be preferable to other alternative methods for censuses, also discussed. Asymmetrical cumulations are also recommended to counter the problems of small sample cells for area domains (provinces, regions, states) common to most countries and to other population units. Split-panel-designs offer another use for cumulating periodic surveys by combining nonoverlapping portions a - b - c - d - with panels p for partial overlaps, pa - pb - pc - pd -, for multipurpose designs.

Release date: 1990-06-15

• Articles and reports: 12-001-X199000114559
Description:

The basic theme of this paper is that the development of survey methods in the technical sense can only be well understood in the context of the development of the institutions through which survey-taking is done. Thus we consider here survey methods in the large, in order to better prepare the reader for consideration of more formal methodological developments in sampling theory in the mathematical statistics sense. After a brief introduction, we give a historical overview of the evolution of institutional and contextual factors in Europe and the United States, up through the early part of the twentieth century, concentrating on governmental activities. We then focus on the emergence of institutional bases for survey research in the United States, primarily in the 1930s and 1940s. In a separate section, we take special note of the role of the U.S. Bureau of the Census in the study of non-sampling errors that was initiated in the 1940s and 1950s. Then, we look at three areas of basic change in survey methodology since 1960.

Release date: 1990-06-15

• Articles and reports: 12-001-X199000114552
Description:

The effects of utilizing a self-administered questionnaire or a personal interview procedure on the responses of an adolescent sample on their alcohol consumption and related behaviors are examined. The results are generally supportive of previous studies on the relationship between the method of data collection and the distribution of responses with sensitive or non-normative content. Although of significance in a statistical sense, many of the differences are not of sufficient magnitude to be considered significant in a substantive sense.

Release date: 1990-06-15

• Articles and reports: 12-001-X199000114558
Description:

Drawing upon experiences from developments at the U.S. Bureau of the Census, the paper briefly traces some contributions made by practitioners to the theory and application of censuses and surveys. Some guesses about future developments are also given.

Release date: 1990-06-15

• Articles and reports: 12-001-X199000114549
Description:

In many government surveys, respondents are interviewed a set number of times during the life of the survey, a practice referred to as a rotation design or repeated sampling. Often composite estimation - where data from the current and earlier periods of time are combined - is used to measure the level of a characteristic of interest. As other authors have observed, composite estimation can be used in a rotation design to decrease the variance of estimators of change in level. In this paper, simple expressions are derived for the variance of a general class of composite estimators for level, change in level, and average level over time. Considered first are “one-level” rotation designs, where only the current month is referenced in the interview. Results are developed for any sampling pattern of m interviews over a period of M months. Subsequently, “multi-level” plans are addressed. In each month one of p different groups is interviewed. Respondents then answer questions referring to the previous p months. Results from the several sections apply to a wide range of government surveys.

Release date: 1990-06-15

• Articles and reports: 12-001-X199000114553
Description:

The National Farm Survey is a sample survey which produces annual estimates on a variety of subjects related to agriculture in Canada. The 1988 survey was conducted using a new sample design. This design involved multiple sampling frames and multivariate sampling techniques different from those of the previous design. This article first describes the strategy and methods used to develop the new sample design, then gives details on factors affecting the precision of the estimates. Finally, the performance of the new design is assessed using the 1988 survey results.

Release date: 1990-06-15

• Articles and reports: 12-001-X199000114555
Description:

This paper proposes an unbiased variance estimation formula for a two-phase sampling design used in many agricultural surveys. In this design, geographically defined primary sampling units (PSUs) are first selected via stratified simple random sampling; then secondary sampling units within sampled PSUs are restratified based on their characteristics and subsampled in a second phase of stratified simple random sampling.

Release date: 1990-06-15

• Articles and reports: 12-001-X199000114556
Description:

In this paper we present some important features of the history of sample surveys in Sweden, and we comment on related developments of sampling techniques (methods and theory) in official statistics. The account is organized into three periods as follows: (i) before 1900; (ii) 1900-1950; and (iii) after 1950. The emphasis is on the third period.

Release date: 1990-06-15

• Articles and reports: 12-001-X199000114560
Description:

Early developments in sampling theory and methods largely concentrated on efficient sampling designs and associated estimation techniques for population totals or means. More recently, the theoretical foundations of survey based estimation have also been critically examined, and formal frameworks for inference on totals or means have emerged. During the past 10 years or so, rapid progress has also been made in the development of methods for the analysis of survey data that take account of the complexity of the sampling design. The scope of this paper is restricted to an overview and appraisal of some of these developments.

Release date: 1990-06-15

• Articles and reports: 12-001-X199000114554
Description:

The problem considered is that of estimation of the total of a finite population which is stratified at two levels: a deeper level which has low intrastratum variability but is not known until the first phase of sampling, and a known pre-stratification which is relatively effective, unit by unit, in predicting the deeper post-stratification. As an important example, the post-stratification may define two groups corresponding to responders and non-responders in the situation of two-phase sampling for non-response. The estimators of Vardeman and Meeden (1984) are employed in a variety of situations where different types of prior information are assumed. In a general case, the standard error relative to that of the usual methods is studied via simulation. In the situation where no prior information is available and where proportional sampling is employed, the estimator is unbiased and its variance is approximated. Here, the variance is always lower than that of the usual double sampling for stratification. Also, without prior information, but with non-proportional sampling, using a slight modification of the second phase sampling plan, an unbiased estimator is found along with its variance, an unbiased estimator of its variance, and an optimal allocation scheme for the two phases of sampling. Finally, applications of these methods are discussed.

Release date: 1990-06-15

• Articles and reports: 12-001-X199000114551
Description:

The problem of collapsing the imputation classes defined by a large number of cross-classifications of auxiliary variables is considered. A solution based on cluster analysis to reduce the number of levels of auxiliary variables to a reasonably small number of imputation classes is proposed. The motivation and solution of this general problem are illustrated by the imputation of age in the Hospital Morbidity System where auxiliary variables are sex and diagnosis.

Release date: 1990-06-15

