Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (27)

All (27) (25 of 27 results)

  • Articles and reports: 89-553-X19980014027
    Description:

    I examine three questions in this paper : 1. Does existing knowledge about intergenerational transfers, both public and private, provide the basis for effective policy choices? What is missing? What is needed, in particular by Canada's statistical system? 2. With an aging society, rapidly shifting labour markets, and shrinking social transfers in Canada, is a new generational compact emerging? and 3. What are the roles of differing models of inter-generational transfers, indeed of the demo-graphic concept of generation itself, in defining the field of policy options for Canadians in the late 1990s? In addressing these questions, I rely on analyses and a framework developed in McDaniel (1997).

    Release date: 1998-11-05

  • Technical products: 88F0006X1998006
    Description:

    The results of this paper, An Overview of Statistical Indicators of Regional Innovation in Canada: A Provincial Comparison, contribute to the analysis of regional differences in science and technology activity in Canada, as part of the Information System for Science and Technology Project at Statistics Canada. This working paper presents estimates of R&D expenditure and personnel for universities, for the federal government, for industry and for provincial research organizations, as well as providing general provincial statistics. The objective of the Project is to develop useful indicators of activity and a framework to tie them together into a coherent picture of science and technology in Canada. The indicators can provide the picture at the national level or at provincial or sub-provincial levels to reflect regional differences. A previously published working paper, R&D Tax Treatment in Canada: A Provincial Comparison, uses a method developed by the Conference Board of Canada to compare the tax incentives to do R&D in each of the provinces. Six out of ten provinces have their own incentive programmes and tax rates which differ from province to province. The B-Index analysis of the Conference Board provides a means of comparing tax incentives and of providing an indicator.

    Release date: 1998-10-30

  • Articles and reports: 12-001-X19980013907
    Description:

    Least squares estimation for repeated surveys is addressed. Several estimators of current level, change in level and average level for multiple time periods are developed. The Recursive Regression Estimator, a recursive computational form of the best linear unbiased estimator based on all periods of the survey, is presented. It is shown that the recursive regression procedure converges; and that the dimension of the estimation problem is bounded as the number of periods increases indefinitely. The recursive procedure offers a solution to the problem of computational complexity associated with minimum variance unbiased estimation in repeated surveys. Data from the U.S. Current Population Survey are used to compare alternative estimators under two types of rotation designs: the intermittent rotation design used in the U.S. Current Population Survey, and two continuous rotation designs.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013910
    Description:

    Let A be a population domain of interest and assume that the elements of A cannot be identified on the sampling frame and the number of elements in A is not known. Further assume that a sample of fixed size (say n) is selected from the entire frame and the resulting domain sample size (say n_A) is random. The problem addressed is the construction of a confidence interval for a domain parameter such as the domain aggregate T_A = \sum_{i \in A} x_i. The usual approach to this problem is to redefine x_i, by setting x_i = 0 if i \notin A. Thus, the construction of a confidence interval for the domain total is recast as the construction of a confidence interval for a population total which can be addressed (at least asymptotically in n) by normal theory. As an alternative, we condition on n_A and construct confidence intervals which have approximately nominal coverage under certain assumptions regarding the domain population. We evaluate the new approach empirically using artificial populations and data from the Bureau of Labor Statistics (BLS) Occupational Compensation Survey.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013906
    Description:

    In sample surveys, the units contained in the sampling frame ideally have a one-to-one correspondence with the elements in the target population under study. In many cases, however, the frame has a many-to-many structure. That is, a unit in the frame may be associated with multiple target population elements and a target population element may be associated with multiple frame units. Such was the case in a building characteristics survey in which the frame was a list of street addresses, but the target population was commercial buildings. The frame was messy because a street address corresponded either to a single building, multiple buildings, or part of a building. In this paper, we develop estimators and formulas for their variances in both simple and stratified random sampling designs when the frame has a many-to-many structure.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013913
    Description:

    Temporary mobility is hypothesized to contribute toward within-household coverage error since it may affect an individual's determination of "usual residence" - a concept commonly applied when listing persons as part of a household-based survey or census. This paper explores a typology of temporary mobility patterns and how they relate to the identification of usual residence. Temporary mobility is defined by the pattern of movement away from, but usually back to a single residence over a two-three month reference period. The typology is constructed using two dimensions: the variety of places visited and the frequency of visits made. Using data from the U.S. Living Situation Survey (LSS) conducted in 1993, four types of temporary mobility patterns are identified. In particular, two groups exhibiting patterns of repeat visit behavior were found to contain more of the types of people who tend to be missed during censuses and surveys. Log-linear modeling indicates spent away and demographic characteristics.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013911
    Description:

    This paper examines the main properties of the generalized regression estimator of a finite population mean and those of the regression estimator obtained from the optimal difference estimator. Given that the latter can be more efficient than the former, conditions allowing this to happen are established, and a criterion for choosing between the two types of regression estimators follows. A simulation study illustrates their finite sample performances.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013909
    Description:

    In this paper we study the model-assisted estimation of class frequencies of a discrete response variable by a new survey estimation method, which is closely related to generalized regression estimation. In generalized regression estimation the available auxiliary data are incorporated in the estimation procedure by a linear model fit. Instead of using a linear model for the class indicators, we describe the joint distribution of the class indicators by a multinomial logistic model. Logistic generalized regression estimators are introduced for class frequencies in a population and domains. Monte Carlo experiments were carried out for simulated data and for real data taken from the Labour Force Survey conducted monthly by Statistics Finland. The logistic generalized regression estimation yielded better results than the ordinary regression estimation for small domains and particularly for small class frequencies.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013905
    Description:

    Two-phase sampling designs offer a variety of possibilities for use of auxiliary information. We begin by reviewing the different forms that auxiliary information may take in two-phase surveys. We then set up the procedure by which this information is transformed into calibrated weights, which we use to construct efficient estimators of a population total. The calibration is done in two steps: (i) at the population level; (ii) at the level of the first-phase sample. We go on to show that the resulting calibration estimators are also derivable via regression fitting in two steps. We examine these estimators for a special case of interest, namely, when auxiliary information is available for population subgroups called calibration groups. Postrata are the simplest example of such groups. Estimation for domains of interest and variance estimation are also discussed. These results are illustrated by applying them to two-phase designs at Statistics Canada. The general theory for using auxiliary information in two-phase sampling is being incorporated into Statistics Canada's Generalized Estimation System.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013904
    Description:

    Many economic and agricultural surveys are multi-purpose. It would be convenient if one could stratify the target population of such a survey in a number of different purposes and then combine the samples for enumeration. We explore four different sampling methods that select similar samples across all stratifications thereby reducing the overall sample size. Data from an agriculture survey is used to evaluate the effectiveness of these alternative sampling strategies. We then show how a calibration (i.e., reweighted) estimator can increase statistical efficiency by capturing what is known about the original stratum sizes in the estimation. Raking, which has been suggested in the literature for this purpose, is simply one method of calibration.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013912
    Description:

    Efficient estimates of population size and totals based on information from multiple list frames and an independent area frame are considered. This work is an extension of the methodology proposed by Harley (1962) which considers two general frames. A main disadvantage of list frames is that they are typically incomplete. In this paper, we propose several methods to address frame deficiencies. A joint list-area sampling design incorporates multiple frames and achieves full coverage of the target population. For each combination of frames, we present the appropriate notation, likelihood function, and parameter estimators. Results from a simulation study that compares the various properties of the proposed estimators are also presented.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013908
    Description:

    In the present investigation, the problem of estimation of variance of the general linear regression estimator has been considered. It has been shown that the efficiency of the low level calibration approach adopted by Särndal (1996) is less than or equal to that of a class of estimators proposed by Deng and Wu (1987). A higher level calibration approach has also been suggested. The efficiency of higher level calibration approach is shown to improve on the original approach. Several estimators are shown to be the special cases of this proposed higher level calibration approach. An idea to find a non-negative estimate of variance of the GREG has been suggested. Results have been extended to a stratified random sampling design. An empirical study has also been carried out to study the performance of the proposed strategies. The well known statistical package, GES, developed at Statistics Canada can further be improved to obtain better estimates of variance of GREG using the proposed higher level calibration approach under certain circumstances discussed in this paper.

    Release date: 1998-07-31

  • Table: 82-567-X
    Description:

    The National Population Health Survey (NPHS) is designed to enhance the understanding of the processes affecting health. The survey collects cross-sectional as well as longitudinal data. In 1994/95 the survey interviewed a panel of 17,276 individuals, then returned to interview them a second time in 1996/97. The response rate for these individuals was 96% in 1996/97. Data collection from the panel will continue for up to two decades. For cross-sectional purposes, data were collected for a total of 81,000 household residents in all provinces (except people on Indian reserves or on Canadian Forces bases) in 1996/97.

    This overview illustrates the variety of information available by presenting data on perceived health, chronic conditions, injuries, repetitive strains, depression, smoking, alcohol consumption, physical activity, consultations with medical professionals, use of medications and use of alternative medicine.

    Release date: 1998-07-29

  • Articles and reports: 87-003-X19980033853
    Description:

    The international travel data series - based on the International Travel Survey (ITS) - covers a whole range of information, such as where travellers went, what they did, how much they spent, and their satisfaction with a country's tourist attractions.

    Release date: 1998-07-13

  • Classification: 89F0077X199802B
    Description:

    The National Longitudinal Survey of Children and Youth (NLSCY) is the first Canada-wide survey of children. Starting in 1994, it will gather information on a sample of children and their life experiences. It will follow these children over time. The survey will collect information on children and their families, education, health, development, behaviour, friends, activities, etc.

    Along with 89F0077XIE issue 9802A, this document contains the various questionnaires used to gather information from parents, children, teachers and principals.

    Release date: 1998-06-04

  • Articles and reports: 91F0015M1998005
    Description:

    All countries that organize censuses have concerns about data quality and coverage error. Different methods have been developed in evaluating the quality of census data and census undercount. Some methods make use of information independent of the census itself, while some others are designed to check the internal consistency of the data. These are expensive and complicated operations.

    Given that the population in each country is organized differently and that the administrative structures differ from one country to another, no universal method can be applied. In order to compare the methods and identify their strengths and gaps, Demography Division of Statistics Canada has reviewed the procedures used in four industrialized countries: the United States, the United Kingdom, Australia and, of course, Canada. It appears from this review that demographic analysis can help considerably in the identification of inconsistencies through comparisons of consecutive censuses, while micro-level record linkage and survey based procedures are essential in order to estimate the number of people omitted or counted twice in census collection. The most important conclusion from this review is that demographers and statisticians have to work together in order to evaluate the figures the accuracy of which will always remain questionable.

    Release date: 1998-03-27

  • Articles and reports: 12-001-X19970023618
    Description:

    Statistical agencies often constitute their business panels by Poisson sampling, or by stratified sampling of fixed size and uniform probabilities in each stratum. This stampling corresponds to algorithms which use permanent numbers following a uniform distribution. Since the characteristics of the units change over time, it is necessary to periodically conduct resamplings while endeavouring to conserve the maximum number of units. The solution by Poisson sampling is the simplest and provides the maximum theoretical coverage, but with the disadvantage of a random sample size. On the other hand, in the case of stratified sampling of fixed size, the changes in strata cause difficulties precisely because of these fixed size constraints. An initial difficulty is that the finer the stratification, the more the coverage is decreased. Indeed, this is likely to occur if births constitute separate strata. We show how this effect can be corrected by rendering the numbers equidistant before resampling. The disadvantage, a fairly minor one, is that in each stratum the sampling is no longer a simple random sampling, which makes the estimation of the variance less rigorous. Another difficulty is reconciling the resampling with an eventual rotation of the units in the sample. We present a type of algorithm which extends after resampling the rotation before resampling. It is based on transformations of the random numbers used for the sampling, so as to return to resampling without rotation. These transformations are particularly simple when they involve equidistant numbers, but can also be carried out with the numbers following a uniform distribution.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023614
    Description:

    In 1993, Statistics Canada implemented Computer-assisted Interviewing (CAI) for conducting interviews for some household surveys that were conducted in a decentralised environment. The technology has been successfully used for a number of years, and most household surveys have now been converted to this collection mode. This paper is a summary of the experience and the lessons that have been learned since the research started. It described some of the tests that led to the implementation of the technology, and some of the new opportunities that have arisen with its implementation. It also discusses some challenges that were faced when CAI was implemented (some are on-going issues), and ends with a brief overview of where this may lead us in the future.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023613
    Description:

    Many policy decisions are best made when there is supporting statistical evidence based on analyses of appropriate microdata. Sometimes all the needed data exist but reside in multiple files for which common identifiers (e.g., SIN's, EIN's, or SSN's) are unavailable. This paper demonstrates a methodology for analyzing two such files: (1) when there is common nonunique information subject to significant error and (2) when each source file contains uncommon quantitative data that can be connected with appropriate models. Such a situation might arise with files of businesses only having difficult-to-use name and address information in common, one file with the energy products consumed by the companies, and the other file containing the types and amounts of goods they produce. Another situation might arise with files on individuals in which one file has earnings data, another information about health-related expenses, and a third information about receipts of supplemental payments. The goal of the methodology presented is to produce valid statistical analyses; appropriate microdata files may or may not be produced.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023619
    Description:

    The presence of outliers in survey data is a recurring problem in applied statistics, and the INSEE survey on industrial investment is not immune from this. The forecasting of the rate of growth of capital investment expenditures in industry therefore comes down to robust estimation of a total in a finite population. The first part of this article analyses the estimator currently used in the Investment Survey. We show that it follows a strategy of reweighting the linear estimator. But the strict dichotomy imposed between outliers - all assumed to be nonrepresentative - and other points is not fully satisfactory from either a theoretical or a practical standpoint. These flaws can be overcome by adopting a model-based approach and estimating by GM-estimators, applied to the case of a finite population. We then construct a robust adaptive procedure that determines the appropriate estimator on the basis of the residuals observed in the sample in cases where the residuals may be assumed to be symmetrical. Lastly, this method is applied to the data from the Investment Survey for the period 1990-1995.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023621
    Description:

    The jackknife variance estimator has been shown to have desirable properties when used with smooth estimators based on stratified multi-stage samples. This paper focuses on the use of the jackknife given a particular two-phase sampling design: a stratified with-replacement probability cluster sample is drawn, elements from sampled clusters are then retratified, and simple random subsamples are selected within each second-phase stratum. It turns out that the jackknife can behave reasonably well as an estimator for the variance for one common "expansion" estimator but not for another. Extensions to more complex estimation strategies are then discussed. A Monte Carlo study supports our principal findings.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023615
    Description:

    This paper demonstrates the utility of a multi-stage survey design that obtains a total count of health facilities and of the potential client population in an area. The design has been used for a state-level survey conducted in mid-1995 in Uttar Pradesh, India. The design involves a multi-stage, areal cluster sample, wherein the primary sampling unit is either an urban block or rural village. All health service delivery points, either self-standing facilities or distribution agents, in or formally assigned to the primary sampling unit are mapped, listed, and selected. A systematic sample of households is selected, and all resident females meeting predetermined eligibility criteria are interviewed. Sample weights for facilities and individuals are applied. For facilities, the weights are adjusted for survey response levels. The survey estimate of the total number of government facilities compares well against the total published counts. Similarly the female client population estimated in the survey compares well with the total enumerated in the 1991 census.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023617
    Description:

    Much research has been conducted into the modelling of ordinal responses. Some authors argue that, when the response variable is ordinal, inclusion of ordinality in the model to be estimated should improve model performance. Under the condition of ordinality, Campbell and Donner (1989) compared the asymptotic classification error rate of the multinominal logistic model to that of the ordinal logistic model of Anderson (1984). They showed that the ordinal logistic model had a lower expected asymptotic error rate than the multinominal logistic model. This paper also aims to compare the performance of ordinal and multinomial logistic models for ordinal responses. However, rather than focussing on classification efficiency, the assessment is made in the context of an application where the objective is to estimate small area proportions. More specifically, using multinominal and ordinal logistic models, the empirical Bayes approach proposed by Farrell, MacGibbon and Tomberlin (1997a) for estimating small area proportions based on binomial outcome data is extended to response variables consisting of more than two outcome categories. The properties of estimators based on these two models are compared via a simulation study in which the empirical Bayes methods proposed here are applied to data from the 1950 United States Census with the objective of predicting, for a small area, the proportion of individuals who belong to the various categories of an ordinal response variable representing income level.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023616
    Description:

    A standard method for correcting for unequal sampling probabilities and nonresponse in sample surveys is poststratification: that is, dividing the population into several categories, estimating the distribution of responses in each category, and then counting each category in proportion to its size in the population. We consider poststratification as a general framework that includes many weighting schemes used in survey analysis (see Little 1993). We construct a hierarchical logistic regression model for the mean of a binary response variable conditional on poststratification cells. The hierarchical model allows us to fit many more cells than is possible using classical methods, and thus to include much more population-level information, while at the same time including all the information used in standard survey sampling inferences. We are thus combining the modeling approach often used in small-area estimation with the population information used in poststratification. We apply the method to a set of U.S. pre-election polls, poststratified by state as well as the usual demographic variables. We evaluate the models graphically by comparing to state-level election outcomes.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023620
    Description:

    Since France has no population registers, population censuses are the basis for its socio-demographic information system. However, between two censuses, some data must be updated, in particular at a high level of geographic detail, especially since censuses are tending, for various reasons, to be less frequent. In 1993, the Institut National de la Statistique et des Études Économiques (INSEE) set up a team whose objective was to propose a system to substantially improve the existing mechanism for making small area population estimates. Its task was twofold: to prepare an efficient and robust synthesis of the information available from different administrative sources, and to assemble a sufficient number of "good" sources. The "multi-source" system that it designed, which is reported on here, is flexible and reliable, without being overly complex.

    Release date: 1998-03-12

Data (1)

Data (1) (1 result)

  • Table: 82-567-X
    Description:

    The National Population Health Survey (NPHS) is designed to enhance the understanding of the processes affecting health. The survey collects cross-sectional as well as longitudinal data. In 1994/95 the survey interviewed a panel of 17,276 individuals, then returned to interview them a second time in 1996/97. The response rate for these individuals was 96% in 1996/97. Data collection from the panel will continue for up to two decades. For cross-sectional purposes, data were collected for a total of 81,000 household residents in all provinces (except people on Indian reserves or on Canadian Forces bases) in 1996/97.

    This overview illustrates the variety of information available by presenting data on perceived health, chronic conditions, injuries, repetitive strains, depression, smoking, alcohol consumption, physical activity, consultations with medical professionals, use of medications and use of alternative medicine.

    Release date: 1998-07-29

Analysis (22)

Analysis (22) (22 of 22 results)

  • Articles and reports: 89-553-X19980014027
    Description:

    I examine three questions in this paper : 1. Does existing knowledge about intergenerational transfers, both public and private, provide the basis for effective policy choices? What is missing? What is needed, in particular by Canada's statistical system? 2. With an aging society, rapidly shifting labour markets, and shrinking social transfers in Canada, is a new generational compact emerging? and 3. What are the roles of differing models of inter-generational transfers, indeed of the demo-graphic concept of generation itself, in defining the field of policy options for Canadians in the late 1990s? In addressing these questions, I rely on analyses and a framework developed in McDaniel (1997).

    Release date: 1998-11-05

  • Articles and reports: 12-001-X19980013907
    Description:

    Least squares estimation for repeated surveys is addressed. Several estimators of current level, change in level and average level for multiple time periods are developed. The Recursive Regression Estimator, a recursive computational form of the best linear unbiased estimator based on all periods of the survey, is presented. It is shown that the recursive regression procedure converges; and that the dimension of the estimation problem is bounded as the number of periods increases indefinitely. The recursive procedure offers a solution to the problem of computational complexity associated with minimum variance unbiased estimation in repeated surveys. Data from the U.S. Current Population Survey are used to compare alternative estimators under two types of rotation designs: the intermittent rotation design used in the U.S. Current Population Survey, and two continuous rotation designs.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013910
    Description:

    Let A be a population domain of interest and assume that the elements of A cannot be identified on the sampling frame and the number of elements in A is not known. Further assume that a sample of fixed size (say n) is selected from the entire frame and the resulting domain sample size (say n_A) is random. The problem addressed is the construction of a confidence interval for a domain parameter such as the domain aggregate T_A = \sum_{i \in A} x_i. The usual approach to this problem is to redefine x_i, by setting x_i = 0 if i \notin A. Thus, the construction of a confidence interval for the domain total is recast as the construction of a confidence interval for a population total which can be addressed (at least asymptotically in n) by normal theory. As an alternative, we condition on n_A and construct confidence intervals which have approximately nominal coverage under certain assumptions regarding the domain population. We evaluate the new approach empirically using artificial populations and data from the Bureau of Labor Statistics (BLS) Occupational Compensation Survey.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013906
    Description:

    In sample surveys, the units contained in the sampling frame ideally have a one-to-one correspondence with the elements in the target population under study. In many cases, however, the frame has a many-to-many structure. That is, a unit in the frame may be associated with multiple target population elements and a target population element may be associated with multiple frame units. Such was the case in a building characteristics survey in which the frame was a list of street addresses, but the target population was commercial buildings. The frame was messy because a street address corresponded either to a single building, multiple buildings, or part of a building. In this paper, we develop estimators and formulas for their variances in both simple and stratified random sampling designs when the frame has a many-to-many structure.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013913
    Description:

    Temporary mobility is hypothesized to contribute toward within-household coverage error since it may affect an individual's determination of "usual residence" - a concept commonly applied when listing persons as part of a household-based survey or census. This paper explores a typology of temporary mobility patterns and how they relate to the identification of usual residence. Temporary mobility is defined by the pattern of movement away from, but usually back to a single residence over a two-three month reference period. The typology is constructed using two dimensions: the variety of places visited and the frequency of visits made. Using data from the U.S. Living Situation Survey (LSS) conducted in 1993, four types of temporary mobility patterns are identified. In particular, two groups exhibiting patterns of repeat visit behavior were found to contain more of the types of people who tend to be missed during censuses and surveys. Log-linear modeling indicates spent away and demographic characteristics.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013911
    Description:

    This paper examines the main properties of the generalized regression estimator of a finite population mean and those of the regression estimator obtained from the optimal difference estimator. Given that the latter can be more efficient than the former, conditions allowing this to happen are established, and a criterion for choosing between the two types of regression estimators follows. A simulation study illustrates their finite sample performances.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013909
    Description:

    In this paper we study the model-assisted estimation of class frequencies of a discrete response variable by a new survey estimation method, which is closely related to generalized regression estimation. In generalized regression estimation the available auxiliary data are incorporated in the estimation procedure by a linear model fit. Instead of using a linear model for the class indicators, we describe the joint distribution of the class indicators by a multinomial logistic model. Logistic generalized regression estimators are introduced for class frequencies in a population and domains. Monte Carlo experiments were carried out for simulated data and for real data taken from the Labour Force Survey conducted monthly by Statistics Finland. The logistic generalized regression estimation yielded better results than the ordinary regression estimation for small domains and particularly for small class frequencies.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013905
    Description:

    Two-phase sampling designs offer a variety of possibilities for use of auxiliary information. We begin by reviewing the different forms that auxiliary information may take in two-phase surveys. We then set up the procedure by which this information is transformed into calibrated weights, which we use to construct efficient estimators of a population total. The calibration is done in two steps: (i) at the population level; (ii) at the level of the first-phase sample. We go on to show that the resulting calibration estimators are also derivable via regression fitting in two steps. We examine these estimators for a special case of interest, namely, when auxiliary information is available for population subgroups called calibration groups. Postrata are the simplest example of such groups. Estimation for domains of interest and variance estimation are also discussed. These results are illustrated by applying them to two-phase designs at Statistics Canada. The general theory for using auxiliary information in two-phase sampling is being incorporated into Statistics Canada's Generalized Estimation System.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013904
    Description:

    Many economic and agricultural surveys are multi-purpose. It would be convenient if one could stratify the target population of such a survey in a number of different purposes and then combine the samples for enumeration. We explore four different sampling methods that select similar samples across all stratifications thereby reducing the overall sample size. Data from an agriculture survey is used to evaluate the effectiveness of these alternative sampling strategies. We then show how a calibration (i.e., reweighted) estimator can increase statistical efficiency by capturing what is known about the original stratum sizes in the estimation. Raking, which has been suggested in the literature for this purpose, is simply one method of calibration.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013912
    Description:

    Efficient estimates of population size and totals based on information from multiple list frames and an independent area frame are considered. This work is an extension of the methodology proposed by Harley (1962) which considers two general frames. A main disadvantage of list frames is that they are typically incomplete. In this paper, we propose several methods to address frame deficiencies. A joint list-area sampling design incorporates multiple frames and achieves full coverage of the target population. For each combination of frames, we present the appropriate notation, likelihood function, and parameter estimators. Results from a simulation study that compares the various properties of the proposed estimators are also presented.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19980013908
    Description:

    In the present investigation, the problem of estimation of variance of the general linear regression estimator has been considered. It has been shown that the efficiency of the low level calibration approach adopted by Särndal (1996) is less than or equal to that of a class of estimators proposed by Deng and Wu (1987). A higher level calibration approach has also been suggested. The efficiency of higher level calibration approach is shown to improve on the original approach. Several estimators are shown to be the special cases of this proposed higher level calibration approach. An idea to find a non-negative estimate of variance of the GREG has been suggested. Results have been extended to a stratified random sampling design. An empirical study has also been carried out to study the performance of the proposed strategies. The well known statistical package, GES, developed at Statistics Canada can further be improved to obtain better estimates of variance of GREG using the proposed higher level calibration approach under certain circumstances discussed in this paper.

    Release date: 1998-07-31

  • Articles and reports: 87-003-X19980033853
    Description:

    The international travel data series - based on the International Travel Survey (ITS) - covers a whole range of information, such as where travellers went, what they did, how much they spent, and their satisfaction with a country's tourist attractions.

    Release date: 1998-07-13

  • Articles and reports: 91F0015M1998005
    Description:

    All countries that organize censuses have concerns about data quality and coverage error. Different methods have been developed in evaluating the quality of census data and census undercount. Some methods make use of information independent of the census itself, while some others are designed to check the internal consistency of the data. These are expensive and complicated operations.

    Given that the population in each country is organized differently and that the administrative structures differ from one country to another, no universal method can be applied. In order to compare the methods and identify their strengths and gaps, Demography Division of Statistics Canada has reviewed the procedures used in four industrialized countries: the United States, the United Kingdom, Australia and, of course, Canada. It appears from this review that demographic analysis can help considerably in the identification of inconsistencies through comparisons of consecutive censuses, while micro-level record linkage and survey based procedures are essential in order to estimate the number of people omitted or counted twice in census collection. The most important conclusion from this review is that demographers and statisticians have to work together in order to evaluate the figures the accuracy of which will always remain questionable.

    Release date: 1998-03-27

  • Articles and reports: 12-001-X19970023618
    Description:

    Statistical agencies often constitute their business panels by Poisson sampling, or by stratified sampling of fixed size and uniform probabilities in each stratum. This stampling corresponds to algorithms which use permanent numbers following a uniform distribution. Since the characteristics of the units change over time, it is necessary to periodically conduct resamplings while endeavouring to conserve the maximum number of units. The solution by Poisson sampling is the simplest and provides the maximum theoretical coverage, but with the disadvantage of a random sample size. On the other hand, in the case of stratified sampling of fixed size, the changes in strata cause difficulties precisely because of these fixed size constraints. An initial difficulty is that the finer the stratification, the more the coverage is decreased. Indeed, this is likely to occur if births constitute separate strata. We show how this effect can be corrected by rendering the numbers equidistant before resampling. The disadvantage, a fairly minor one, is that in each stratum the sampling is no longer a simple random sampling, which makes the estimation of the variance less rigorous. Another difficulty is reconciling the resampling with an eventual rotation of the units in the sample. We present a type of algorithm which extends after resampling the rotation before resampling. It is based on transformations of the random numbers used for the sampling, so as to return to resampling without rotation. These transformations are particularly simple when they involve equidistant numbers, but can also be carried out with the numbers following a uniform distribution.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023614
    Description:

    In 1993, Statistics Canada implemented Computer-assisted Interviewing (CAI) for conducting interviews for some household surveys that were conducted in a decentralised environment. The technology has been successfully used for a number of years, and most household surveys have now been converted to this collection mode. This paper is a summary of the experience and the lessons that have been learned since the research started. It described some of the tests that led to the implementation of the technology, and some of the new opportunities that have arisen with its implementation. It also discusses some challenges that were faced when CAI was implemented (some are on-going issues), and ends with a brief overview of where this may lead us in the future.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023613
    Description:

    Many policy decisions are best made when there is supporting statistical evidence based on analyses of appropriate microdata. Sometimes all the needed data exist but reside in multiple files for which common identifiers (e.g., SIN's, EIN's, or SSN's) are unavailable. This paper demonstrates a methodology for analyzing two such files: (1) when there is common nonunique information subject to significant error and (2) when each source file contains uncommon quantitative data that can be connected with appropriate models. Such a situation might arise with files of businesses only having difficult-to-use name and address information in common, one file with the energy products consumed by the companies, and the other file containing the types and amounts of goods they produce. Another situation might arise with files on individuals in which one file has earnings data, another information about health-related expenses, and a third information about receipts of supplemental payments. The goal of the methodology presented is to produce valid statistical analyses; appropriate microdata files may or may not be produced.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023619
    Description:

    The presence of outliers in survey data is a recurring problem in applied statistics, and the INSEE survey on industrial investment is not immune from this. The forecasting of the rate of growth of capital investment expenditures in industry therefore comes down to robust estimation of a total in a finite population. The first part of this article analyses the estimator currently used in the Investment Survey. We show that it follows a strategy of reweighting the linear estimator. But the strict dichotomy imposed between outliers - all assumed to be nonrepresentative - and other points is not fully satisfactory from either a theoretical or a practical standpoint. These flaws can be overcome by adopting a model-based approach and estimating by GM-estimators, applied to the case of a finite population. We then construct a robust adaptive procedure that determines the appropriate estimator on the basis of the residuals observed in the sample in cases where the residuals may be assumed to be symmetrical. Lastly, this method is applied to the data from the Investment Survey for the period 1990-1995.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023621
    Description:

    The jackknife variance estimator has been shown to have desirable properties when used with smooth estimators based on stratified multi-stage samples. This paper focuses on the use of the jackknife given a particular two-phase sampling design: a stratified with-replacement probability cluster sample is drawn, elements from sampled clusters are then retratified, and simple random subsamples are selected within each second-phase stratum. It turns out that the jackknife can behave reasonably well as an estimator for the variance for one common "expansion" estimator but not for another. Extensions to more complex estimation strategies are then discussed. A Monte Carlo study supports our principal findings.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023615
    Description:

    This paper demonstrates the utility of a multi-stage survey design that obtains a total count of health facilities and of the potential client population in an area. The design has been used for a state-level survey conducted in mid-1995 in Uttar Pradesh, India. The design involves a multi-stage, areal cluster sample, wherein the primary sampling unit is either an urban block or rural village. All health service delivery points, either self-standing facilities or distribution agents, in or formally assigned to the primary sampling unit are mapped, listed, and selected. A systematic sample of households is selected, and all resident females meeting predetermined eligibility criteria are interviewed. Sample weights for facilities and individuals are applied. For facilities, the weights are adjusted for survey response levels. The survey estimate of the total number of government facilities compares well against the total published counts. Similarly the female client population estimated in the survey compares well with the total enumerated in the 1991 census.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023617
    Description:

    Much research has been conducted into the modelling of ordinal responses. Some authors argue that, when the response variable is ordinal, inclusion of ordinality in the model to be estimated should improve model performance. Under the condition of ordinality, Campbell and Donner (1989) compared the asymptotic classification error rate of the multinominal logistic model to that of the ordinal logistic model of Anderson (1984). They showed that the ordinal logistic model had a lower expected asymptotic error rate than the multinominal logistic model. This paper also aims to compare the performance of ordinal and multinomial logistic models for ordinal responses. However, rather than focussing on classification efficiency, the assessment is made in the context of an application where the objective is to estimate small area proportions. More specifically, using multinominal and ordinal logistic models, the empirical Bayes approach proposed by Farrell, MacGibbon and Tomberlin (1997a) for estimating small area proportions based on binomial outcome data is extended to response variables consisting of more than two outcome categories. The properties of estimators based on these two models are compared via a simulation study in which the empirical Bayes methods proposed here are applied to data from the 1950 United States Census with the objective of predicting, for a small area, the proportion of individuals who belong to the various categories of an ordinal response variable representing income level.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023616
    Description:

    A standard method for correcting for unequal sampling probabilities and nonresponse in sample surveys is poststratification: that is, dividing the population into several categories, estimating the distribution of responses in each category, and then counting each category in proportion to its size in the population. We consider poststratification as a general framework that includes many weighting schemes used in survey analysis (see Little 1993). We construct a hierarchical logistic regression model for the mean of a binary response variable conditional on poststratification cells. The hierarchical model allows us to fit many more cells than is possible using classical methods, and thus to include much more population-level information, while at the same time including all the information used in standard survey sampling inferences. We are thus combining the modeling approach often used in small-area estimation with the population information used in poststratification. We apply the method to a set of U.S. pre-election polls, poststratified by state as well as the usual demographic variables. We evaluate the models graphically by comparing to state-level election outcomes.

    Release date: 1998-03-12

  • Articles and reports: 12-001-X19970023620
    Description:

    Since France has no population registers, population censuses are the basis for its socio-demographic information system. However, between two censuses, some data must be updated, in particular at a high level of geographic detail, especially since censuses are tending, for various reasons, to be less frequent. In 1993, the Institut National de la Statistique et des Études Économiques (INSEE) set up a team whose objective was to propose a system to substantially improve the existing mechanism for making small area population estimates. Its task was twofold: to prepare an efficient and robust synthesis of the information available from different administrative sources, and to assemble a sufficient number of "good" sources. The "multi-source" system that it designed, which is reported on here, is flexible and reliable, without being overly complex.

    Release date: 1998-03-12

Reference (4)

Reference (4) (4 of 4 results)

  • Technical products: 88F0006X1998006
    Description:

    The results of this paper, An Overview of Statistical Indicators of Regional Innovation in Canada: A Provincial Comparison, contribute to the analysis of regional differences in science and technology activity in Canada, as part of the Information System for Science and Technology Project at Statistics Canada. This working paper presents estimates of R&D expenditure and personnel for universities, for the federal government, for industry and for provincial research organizations, as well as providing general provincial statistics. The objective of the Project is to develop useful indicators of activity and a framework to tie them together into a coherent picture of science and technology in Canada. The indicators can provide the picture at the national level or at provincial or sub-provincial levels to reflect regional differences. A previously published working paper, R&D Tax Treatment in Canada: A Provincial Comparison, uses a method developed by the Conference Board of Canada to compare the tax incentives to do R&D in each of the provinces. Six out of ten provinces have their own incentive programmes and tax rates which differ from province to province. The B-Index analysis of the Conference Board provides a means of comparing tax incentives and of providing an indicator.

    Release date: 1998-10-30

  • Classification: 89F0077X199802B
    Description:

    The National Longitudinal Survey of Children and Youth (NLSCY) is the first Canada-wide survey of children. Starting in 1994, it will gather information on a sample of children and their life experiences. It will follow these children over time. The survey will collect information on children and their families, education, health, development, behaviour, friends, activities, etc.

    Along with 89F0077XIE issue 9802A, this document contains the various questionnaires used to gather information from parents, children, teachers and principals.

    Release date: 1998-06-04

  • Classification: 89F0077X1996001
    Description:

    The National Longitudinal Survey of Children (NLSC) is the first Canada-wide survey of children. Starting in 1994, it will gather information on a sample of children and their life experiences. It will follow these children over time, collecting information on the children and their families, education, health, development, behaviour, friends, activities, etc.

    Release date: 1998-02-27

  • Classification: 89F0077X199802A
    Description:

    The National Longitudinal Survey of Children and Youth (NLSCY) is the first Canada-wide survey of children. Starting in 1994, it will gather information on a sample of children and their life experiences. It will follow these children over time. The survey will collect information on children and their families, education, health, development, behaviour, friends, activities, etc.

    Along with 89F0077XPE issue 9802b, this document contains the various questionnaires used to gather information from parents, children, teachers and principals.

    Release date: 1998-02-27

Date modified: