Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Survey or statistical program

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Survey or statistical program

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (24)

All (24) (24 of 24 results)

  • Articles and reports: 91F0015M1997004
    Description:

    The estimation of the population by age, sex and marital status for each province is a difficult task, principally because of migration. The characteristics of migrants are available only from responses to the census. Until 1991, the census included only the question on place of residence five years ago. Thus, a person who had a different residence five years earlier was considered as a migrant and was attributed the characteristics reported for him/her at the time of the census. However, the respondent had up to five years to change characteristics, particularly those relating to marital status.

    Since 1991, the census has asked a question on the place of residence one year ago. The same procedure attributes to the migrant the characteristics reported one year earlier, but this time there is only one year to change them.The article describes, in some detail, the methods now used by Statistics Canada to estimate the characteristics of migrants and evaluates the advantages of using the data on place of residence one year ago.

    Release date: 1997-12-23

  • Table: 62-010-X19970023422
    Description:

    The current official time base of the Consumer Price Index (CPI) is 1986=100. This time base was first used when the CPI for June 1990 was released. Statistics Canada is about to convert all price index series to the time base 1992=100. As a result, all constant dollar series will be converted to 1992 dollars. The CPI will shift to the new time base when the CPI for January 1998 is released on February 27th, 1998.

    Release date: 1997-11-17

  • Index and guides: 92-125-G
    Description:

    This consultation guide marks the beginning of the content consultation and testing process for the 2001 Census. A broad range of data users, including those in every level of government, national associations, non-government organizations, community groups, businesses and private sector, universities and the general public, will be asked to provide their comments on the questions asked, requirements for future census information, and the identification of data gaps.

    Release date: 1997-10-31

  • Articles and reports: 91F0015M1997003
    Description:

    For historical reasons, the best known life tables and those most often used are period tables. They are built using death rates by age for a short period of observation (often a single year) and have as their purpose to represent the status of mortality for this period. The survivors and deaths appearing in their columns are in a way abstractions rather than reality. It is thus erroneous to believe that the life table for a given year (for example, 1995) serves in any way whatever to predict the rate at which those born that year will pass away and, hence, of the average length of the life that they have just begun. With rare exceptions, the average number of years lived by individuals has always been longer than the life expectancy found in the life table constructed for the year of their birth. This is due to the fact that period tables are established using the risks of death by age prevailing in that year. But the ceaseless battle against death reduces these risks year after year for these ages and, by growing older, people benefit from these successive gains.

    To reconstitute (or foresee) the rate at which the members of a cohort have (or will) really pass away, it is necessary to deploy very long series of death rates by age and to possess reliable indicators of missing data, and then to adjust them to establish the actual experience of the persons in a cohort. Built in exactly the same way as period tables, these tables are naturally called cohort tables, but comparing observations of their parameters yields conclusions of a different kind.

    Release date: 1997-10-01

  • Articles and reports: 12-001-X19970013100
    Description:

    A system of procedures that can be used to automate complicated algebraic calculations frequently encountered in sample survey theory is introduced. It is shown that three basic techniques in sampling theory depend on the repeated application of rules that give rise to partitions: the computation of expected values under any unistage sampling design, the determination of unbiased or consistent estimators under these designs and the calculation of Taylor series expansions. The methodology is illustrated here through applications to moment calculations of the sample mean, the ratio estimator and the regression estimator under the special case of simply random sampling without replacement. The innovation presented here is that calculations can now be performed instantaneously on a computer without error and without reliance on existing formulae which may be long and involved. One other immediate benefit of this is that calculations can be performed where no formulae which may be long and involved. One other immediate benefit of this is that calculations can be performed where no formulae presently exist. The computer code developed to implement this methodology is available via anonymous ftp at fisher.stats.uwo.ca.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013106
    Description:

    The standard error estimation method used for sample data in the U.S. Decennial Census from 1970 through 1990 yielded irregular results. For example, the method gave different standard error estimates for the "yes" and "no" response for the same binomial variable, when both standard error estimates should have been the same. If most respondents answered a binomial variable one way and a few answered the other way, the standard error estimate was much higher for the response with the most respondents. In addition, when 100 percent of respondents answered a question the same way, the standard error of this estimate was not zero, but was still quite high. Reporting average design effects which were weighted by the number of respondents that reported particular characteristics magnified the problem. An alternative to the random groups standard error estimate used in the U.S. census is suggested here.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013102
    Description:

    The selection of auxiliary variables is considered for regression estimation in finite populations under a simple random sampling design. This problem is a basic one for model-based and model-assisted survey sampling approaches and is of practical importance when the number of variables available is large. An approach is developed in which a mean squared error estimator is minimised. This approach is compared to alternative approaches using a fixed set of auxiliary variables, a conventional significance test criterion, a condition number reduction approach and a ridge regression approach. The proposed approach is found to perform well in terms of efficiency. It is noted that the variable selection approach affects the properties of standard variance estimators and thus leads to a problem of variance estimation.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013107
    Description:

    Often one of the key objectives of multi-purpose demographic surveys in the U.S. is to produce estimates for small domains of the population such as race, ethnicity, and income. Geographic-based oversampling is one of the techniques often considered for improving the reliability of the small domain statistics using block or block group information from the Bureau of the Census to identify areas where the small domains are concentrated. This paper reviews the issues involved in oversampling geographical areas in conjunction with household screening to improve the precision of small domain estimates. The results from an empirical evaluation of the variance reduction from geographic-based oversampling are given along with an assessment of the robustness of the sampling efficiency over time as information for stratification becomes out of date. The simultaneous oversampling of several small domains is also discussed.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013105
    Description:

    The problem of estimating transition rates from longitudinal survey data in the presence of misclassification error is considered. Approaches which use external information on misclassification rates are reviewed, together with alternative models for measurement error. We define categorical instrumental variables and propose methods for the identification and estimation of models including such variables by viewing the model as a restricted latent class model. The numerical properties of the implied instrumental variable estimators of flow rates are studied using data from the Panel Study of Income Dynamics.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013101
    Description:

    In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013104
    Description:

    Measures of income inequality and polarization are fundamental to the discussions of many economic and thus their variances are not expressible by simple formulae and one must rely on approximate variance estimation techniques. In this paper, several methods of variance estimation for six particular income inequality and polarization measures are summarized and their performance is investigated empirically through a simulation study based on the Canadian Survey of Consumer Finance. Our findings indicate that for the measures studied here, the bootstrap and the estimating equations approach perform considerably better than the other methods.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013108
    Description:

    We show how the use of matrix calculus can simplify the derivation of the linearization of the regression coefficient estimator and the regression estimator.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013103
    Description:

    This paper discusses the use of some simple diagnostics to guide the formation of nonresponse adjustment cells. Following Little (1986), we consider construction of adjustment cells by grouping sample units according to their estimated response probabilities or estimated survey items. Four issues receive principal attention: assessment of the sensitivity of adjusted mean estimates to changes in k, the number of cells used; identification of specific cells that require additional refinement; comparison of adjusted and unadjusted mean estimates; and comparison of estimation results from estimated-probability and estimated-item based cells. The proposed methods are motivated and illustrated with an application involving estimation of mean consumer unit income from the U.S. Consumer Expenditure Survey.

    Release date: 1997-08-18

  • Articles and reports: 11F0019M1997099
    Description:

    Context : Lung cancer has been the leading cause of cancer deaths in Canadian males for many years, and since 1994, this has been the case for Canadian femalesas well. It is therefore important to evaluate the resources required for its diagnosis and treatment. This article presents an estimate of the direct medical costsassociated with the diagnosis and treatment of lung cancer calculated through the use of a micro-simulation model. For disease incidence, 1992 was chosen as thereference year, whereas costs are evaluated according to the rates that prevailed in 1993.Methods : A model for lung cancer has been incorporated into the Population Health Model (POHEM). The parameters of the model were drawn in part fromStatistics Canada's Canadian Cancer Registry (CCR), which provides information on the incidence and histological classification of lung cancer cases in Canada.The distribution of cancer stage at diagnosis was estimated by using information from two provincial cancer registries. A team of oncologists derived "typical" treatment approaches reflective of current practice, and the associated direct costs were calculated for these approaches. Once this information and the appropriatesurvival curves were incorporated into the POHEM model, overall costs of treatment were estimated by means of a Monte Carlo simulation.Results: It is estimated that overall, the direct medical costs of lung cancer diagnosis and treatment were just over $528 million. The cost per year of life gained as aresult of treatment of the disease was approximately $19,450. For the first time in Canada, it was possible to estimate the five year costs following diagnosis, bystage of the disease at the time of diagnosis. It was possible to estimate the cost per year of additional life gained for three alternative treatments of non small-cell lungcancer (NSCLC). Sensitivity analyses showed that these costs varied between $1,870 and $6,860 per year of additional life gained, which compares favourablywith the costs that the treatment of other diseases may involve.Conclusions: Contrary to widespread perceptions, it appears that the treatment of lung cancer is effective from an economic standpoint. In addition, the use of amicro-simulation model such as POHEM not only makes it possible to incorporate information from various sources in a coherent manner but also offers thepossibility of estimating the effect of alternative medical procedures from the standpoint of financial pressures on the health care system.

    Release date: 1997-04-22

  • Articles and reports: 12-001-X19960022986
    Description:

    Within a survey re-engineering context, the combined methodology developed in the paper addresses the problem of finding the minimal sample size for the generalized regression estimator in skewed survey populations (e.g., business, institutional, agriculture populations). Three components necessary in identifying an efficient sample redesign strategy involve i) constructing an efficient partitioning between the “take-all” and “sampled” groups, ii) identifying an efficient sample selection scheme, and iii) finding the minimal sample size required to meet the desired precision constraint(s). A scheme named the “Transfer Algorithm” is devised to address the first issue (Pandher 1995) and is integrated with the other two components to arrive at a combined iterative procedure that converges to a globally minimal sample size and population partitioning under the imposed precision constraint. An equivalence result is obtained allowing the solution to the proposed algorithm to be alternatively determined in terms of simple quantities computable directly from the population auxiliary data. Results from the application of the proposed sample redesign methodology to the Local Government Survey in Ontario are reported. A 52% reduction in the total sample size is achieved for the regression estimator of the total at a minimum coefficient of variation of 2%.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022980
    Description:

    In this paper, we study a confidence interval estimation method for a finite population average when some auxiliairy information is available. As demonstrated by Royall and Cumberland in a series of empirical studies, naive use of existing methods to construct confidence intervals for population averages may result in very poor conditional coverage probabilities, conditional on the sample mean of the covariate. When this happens, we propose to transform the data to improve the precision of the normal approximation. The transformed data are then used to make inference on the original population average, and the auxiliary information is incorporated into the inference directly, or by calibration with empirical likelihood. Our approach is design-based. We apply our approach to six real populations and find that when transformation is needed, our approach performs well compared to the usual regression method.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022982
    Description:

    In work with sample surveys, we often use estimators of the variance components associated with sampling within and between primary sample units. For these applications, it can be important to have some indication of whether the variance component estimators are stable, i.e., have relatively low variance. This paper discusses several data-based measures of the stability of design-based variance component estimators and related quantities. The development emphasizes methods that can be applied to surveys with moderate or large numbers of strata and small numbers of primary sample units per stratum. We direct principal attention toward the design variance of a within-PSU variance estimator, and two related degrees-of-freedom terms. A simulation-based method allows one to assess whether an observed stability measure is consistent with standard assumptions regarding variance estimator stability. We also develop two sets of stability measures for design-based estimators of between-PSU variance components and the ratio of the overall variance to the within-PSU variance. The proposed methods are applied to interview and examination data from the U.S. Third National Health and Nutrition Examination Survey (NHANES III). These results indicate that the true stability properties may vary substantially across variables. In addition, for some variables, within-PSU variance estimators appear to be considerably less stable than one would anticipate from a simple count of secondary units within each stratum.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022983
    Description:

    We propose a second-order inclusion probability approximation for the Chao plan (1982) to obtain an approximate variance estimator for the Horvitz and Thompson estimator. We will then compare this variance with other approximations provided for the randomized systematic sampling plan (Hartley and Rao 1962), the rejective sampling plan (Hájek 1964) and the Rao-Sampford sampling plan (Rao 1965 and Sampford 1967). Our conclusion will be that these approximations are equivalent if the first-order inclusion probabilities are small and if the sample is large.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022979
    Description:

    This paper empirically compares three estimation methods - regression, restricted regression, and principal person - used in a household survey of consumer expenditures. The three methods are applied to post-stratification which is important in many household surveys to adjust for under-coverage of the target population. Post-stratum population counts are typically available from an external census for numbers of persons but not for numbers of households. If household estimates are needed, a single weight must be assigned to each household while using the person counts for post-stratification. This is easily accomplished with regression estimators of totals or means by using person counts in each household's auxiliary data. Restricted regression estimation refines the weights by controlling extremes and can produce estimators with lower variance than Horvitz-Thompson estimators while still adhering to the population controls. The regression methods also allow controls to be used for both person-level counts and quantitative auxiliaries. With the principal person method, persons are classified into post-strata and person weights are ratio adjusted to achieve population control totals. This leads to each person in a household potentially having a different weight. The weight associated with the "principal person" is then selected as the household weight. We will compare estimated means from the three methods and their estimated standard errors for a number of expenditures from the Consumer Expenditure survey sponsored by the U.S. Bureau of Labor Statistics.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022984
    Description:

    In this paper we present two applications of spatial smoothing using data collected in a large scale economic survey of Australian farms: one a small area and the other a large area application. In the small area application, we describe how the sample weigths can be spatially smoothed in order to improve small area estimates. In the large area application, we give a method for spatially smoothing and then mapping the survey data. The standard method of weighting in the survey is a variant of linear regression weighting. For the small area application, this method is modified by introducing a constraint on the spatial variability of the weights. Results from a small scale empirical study indicate that this decreases the variance of the small area estimators as expected, but at the cost of an increase in their bias. In the large area application, we describe the nonparametric regression method used to spatially smooth the survey data as well as techniques for mapping this smoothed data using a Geographic Information System (GIS) package. We also present the results of a simulation study conducted to determine the most appropriate method and level of smoothing for use in the maps.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022981
    Description:

    Results from the Current Population Survey split panel studies indicated a centralized computer-assisted telephone interviewing (CATI) effect on labor force estimates. One hypothesis is that the CATI interviewing increased the probability of respondent's changing their reported labor force status. The two sample McNemar test is appropriate for testing this type of hypothesis: the hypothesis of interest is that the marginal changes in each of two independent sample's tables are equal. We show two adaptations of this test to complex survey data, along with applications from the Current Population Survey's Parallel Survey split data and from the Current Population Survey's CATI Phase-in data.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022973
    Description:

    There exist well known methods due to Deville and Sárndal (1992) which adjust sampling weights to meet benchmark constraints and range restrictions. The resulting estimators are known as calibration estimators. There also exists an earlier, but perhaps not as well known, method due to Huang and Fuller (1978). In addition, alternative methods were developed by Singh (1993), who showed that similar to the result of Deville-Sárndal, all these methods are asymptotically equivalent to the regression method. The purpose of this paper is threefold: (i) to attempt to provide a simple heuristic justification of all calibration estimators (including both well known and not so well known) by taking a non-traditional approach; to do this, a model (instead of the distance function) for the weight adjustment factor is first chosen and then a suitable method of model fitting is shown to correspond to the distance minimization solution, (ii) to provide to practitioner computational algorithms as a quick reference, and (iii) to illustrate how various methods might compare in terms of distribution of weight adjustment factors, point estimates, estimated precision, and computational burden by giving numerical examples based on a real data set. Some interesting observations can be made by means of a descriptive analysis of numerical results which indicate that while all the calibration methods seem to behave similarly to the regression method for loose bounds, they however seem to behave differently for tight bounds.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022985
    Description:

    Telephone surveys in the U.S. are subject to coverage bias because about 6 percent of all households do not have a telephone at any particular point in time. The bias resulting from this undercoverage can be important since those who do not have a telephone are generally poorer and have other characteristics that differ from the telephone population. Poststratification and the other usual methods of adjustment often do not fully compensate for this bias. This research examines a procedure for adjusting the survey estimates based on the observation that some households have a telephone for only part of the year, often due to economic circumstances. By collecting data on interruptions in telephone service in the past year, statistical adjustments of the estimates can be made which may reduce the bias in the estimates but which at the same time increase variances because of greater variability in weights. This paper considers a method of adjustment using data collected from a national telephone survey. Estimates of the reductions in bias and the effect on the mean square error of the estimates are computed for a variety of statistics. The results show that when the estimates from the survey are highly related to economic conditions the telephone interruption adjustment procedure can improve the mean square error of the estimates.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022978
    Description:

    The use of auxiliary information in estimation procedures in complex surveys, such as Statistics Canada's Labour Force Survey, is becoming increasingly sophisticated. In the past, regression and raking ratio estimation were the commonly used procedures for incorporating auxiliary data into the estimation process. However, the weights associated with these estimators could be negative or highly positive. Recent theoretical developments by Deville and Sárndal (1992) in the construction of "restricted" weights, which can be forced to be positive and upwardly bounded, has led us to study the properties of the resulting estimators. In this paper, we investigate the properties of a number of such weight generating procedures, as well as their corresponding estimated variances. In particular, two variance estimation procedures are investigated via a Monte Carlo simulation study based on Labour Force Survey data; they are Jackknifing and Taylor Linearization. The conclusion is that the bias of both the point estimators and the variance estimators is minimal, even under severe "restricting" of the final weights.

    Release date: 1997-01-30

Data (1)

Data (1) (1 result)

  • Table: 62-010-X19970023422
    Description:

    The current official time base of the Consumer Price Index (CPI) is 1986=100. This time base was first used when the CPI for June 1990 was released. Statistics Canada is about to convert all price index series to the time base 1992=100. As a result, all constant dollar series will be converted to 1992 dollars. The CPI will shift to the new time base when the CPI for January 1998 is released on February 27th, 1998.

    Release date: 1997-11-17

Analysis (22)

Analysis (22) (22 of 22 results)

  • Articles and reports: 91F0015M1997004
    Description:

    The estimation of the population by age, sex and marital status for each province is a difficult task, principally because of migration. The characteristics of migrants are available only from responses to the census. Until 1991, the census included only the question on place of residence five years ago. Thus, a person who had a different residence five years earlier was considered as a migrant and was attributed the characteristics reported for him/her at the time of the census. However, the respondent had up to five years to change characteristics, particularly those relating to marital status.

    Since 1991, the census has asked a question on the place of residence one year ago. The same procedure attributes to the migrant the characteristics reported one year earlier, but this time there is only one year to change them.The article describes, in some detail, the methods now used by Statistics Canada to estimate the characteristics of migrants and evaluates the advantages of using the data on place of residence one year ago.

    Release date: 1997-12-23

  • Articles and reports: 91F0015M1997003
    Description:

    For historical reasons, the best known life tables and those most often used are period tables. They are built using death rates by age for a short period of observation (often a single year) and have as their purpose to represent the status of mortality for this period. The survivors and deaths appearing in their columns are in a way abstractions rather than reality. It is thus erroneous to believe that the life table for a given year (for example, 1995) serves in any way whatever to predict the rate at which those born that year will pass away and, hence, of the average length of the life that they have just begun. With rare exceptions, the average number of years lived by individuals has always been longer than the life expectancy found in the life table constructed for the year of their birth. This is due to the fact that period tables are established using the risks of death by age prevailing in that year. But the ceaseless battle against death reduces these risks year after year for these ages and, by growing older, people benefit from these successive gains.

    To reconstitute (or foresee) the rate at which the members of a cohort have (or will) really pass away, it is necessary to deploy very long series of death rates by age and to possess reliable indicators of missing data, and then to adjust them to establish the actual experience of the persons in a cohort. Built in exactly the same way as period tables, these tables are naturally called cohort tables, but comparing observations of their parameters yields conclusions of a different kind.

    Release date: 1997-10-01

  • Articles and reports: 12-001-X19970013100
    Description:

    A system of procedures that can be used to automate complicated algebraic calculations frequently encountered in sample survey theory is introduced. It is shown that three basic techniques in sampling theory depend on the repeated application of rules that give rise to partitions: the computation of expected values under any unistage sampling design, the determination of unbiased or consistent estimators under these designs and the calculation of Taylor series expansions. The methodology is illustrated here through applications to moment calculations of the sample mean, the ratio estimator and the regression estimator under the special case of simply random sampling without replacement. The innovation presented here is that calculations can now be performed instantaneously on a computer without error and without reliance on existing formulae which may be long and involved. One other immediate benefit of this is that calculations can be performed where no formulae which may be long and involved. One other immediate benefit of this is that calculations can be performed where no formulae presently exist. The computer code developed to implement this methodology is available via anonymous ftp at fisher.stats.uwo.ca.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013106
    Description:

    The standard error estimation method used for sample data in the U.S. Decennial Census from 1970 through 1990 yielded irregular results. For example, the method gave different standard error estimates for the "yes" and "no" response for the same binomial variable, when both standard error estimates should have been the same. If most respondents answered a binomial variable one way and a few answered the other way, the standard error estimate was much higher for the response with the most respondents. In addition, when 100 percent of respondents answered a question the same way, the standard error of this estimate was not zero, but was still quite high. Reporting average design effects which were weighted by the number of respondents that reported particular characteristics magnified the problem. An alternative to the random groups standard error estimate used in the U.S. census is suggested here.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013102
    Description:

    The selection of auxiliary variables is considered for regression estimation in finite populations under a simple random sampling design. This problem is a basic one for model-based and model-assisted survey sampling approaches and is of practical importance when the number of variables available is large. An approach is developed in which a mean squared error estimator is minimised. This approach is compared to alternative approaches using a fixed set of auxiliary variables, a conventional significance test criterion, a condition number reduction approach and a ridge regression approach. The proposed approach is found to perform well in terms of efficiency. It is noted that the variable selection approach affects the properties of standard variance estimators and thus leads to a problem of variance estimation.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013107
    Description:

    Often one of the key objectives of multi-purpose demographic surveys in the U.S. is to produce estimates for small domains of the population such as race, ethnicity, and income. Geographic-based oversampling is one of the techniques often considered for improving the reliability of the small domain statistics using block or block group information from the Bureau of the Census to identify areas where the small domains are concentrated. This paper reviews the issues involved in oversampling geographical areas in conjunction with household screening to improve the precision of small domain estimates. The results from an empirical evaluation of the variance reduction from geographic-based oversampling are given along with an assessment of the robustness of the sampling efficiency over time as information for stratification becomes out of date. The simultaneous oversampling of several small domains is also discussed.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013105
    Description:

    The problem of estimating transition rates from longitudinal survey data in the presence of misclassification error is considered. Approaches which use external information on misclassification rates are reviewed, together with alternative models for measurement error. We define categorical instrumental variables and propose methods for the identification and estimation of models including such variables by viewing the model as a restricted latent class model. The numerical properties of the implied instrumental variable estimators of flow rates are studied using data from the Panel Study of Income Dynamics.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013101
    Description:

    In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013104
    Description:

    Measures of income inequality and polarization are fundamental to the discussions of many economic and thus their variances are not expressible by simple formulae and one must rely on approximate variance estimation techniques. In this paper, several methods of variance estimation for six particular income inequality and polarization measures are summarized and their performance is investigated empirically through a simulation study based on the Canadian Survey of Consumer Finance. Our findings indicate that for the measures studied here, the bootstrap and the estimating equations approach perform considerably better than the other methods.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013108
    Description:

    We show how the use of matrix calculus can simplify the derivation of the linearization of the regression coefficient estimator and the regression estimator.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19970013103
    Description:

    This paper discusses the use of some simple diagnostics to guide the formation of nonresponse adjustment cells. Following Little (1986), we consider construction of adjustment cells by grouping sample units according to their estimated response probabilities or estimated survey items. Four issues receive principal attention: assessment of the sensitivity of adjusted mean estimates to changes in k, the number of cells used; identification of specific cells that require additional refinement; comparison of adjusted and unadjusted mean estimates; and comparison of estimation results from estimated-probability and estimated-item based cells. The proposed methods are motivated and illustrated with an application involving estimation of mean consumer unit income from the U.S. Consumer Expenditure Survey.

    Release date: 1997-08-18

  • Articles and reports: 11F0019M1997099
    Description:

    Context : Lung cancer has been the leading cause of cancer deaths in Canadian males for many years, and since 1994, this has been the case for Canadian femalesas well. It is therefore important to evaluate the resources required for its diagnosis and treatment. This article presents an estimate of the direct medical costsassociated with the diagnosis and treatment of lung cancer calculated through the use of a micro-simulation model. For disease incidence, 1992 was chosen as thereference year, whereas costs are evaluated according to the rates that prevailed in 1993.Methods : A model for lung cancer has been incorporated into the Population Health Model (POHEM). The parameters of the model were drawn in part fromStatistics Canada's Canadian Cancer Registry (CCR), which provides information on the incidence and histological classification of lung cancer cases in Canada.The distribution of cancer stage at diagnosis was estimated by using information from two provincial cancer registries. A team of oncologists derived "typical" treatment approaches reflective of current practice, and the associated direct costs were calculated for these approaches. Once this information and the appropriatesurvival curves were incorporated into the POHEM model, overall costs of treatment were estimated by means of a Monte Carlo simulation.Results: It is estimated that overall, the direct medical costs of lung cancer diagnosis and treatment were just over $528 million. The cost per year of life gained as aresult of treatment of the disease was approximately $19,450. For the first time in Canada, it was possible to estimate the five year costs following diagnosis, bystage of the disease at the time of diagnosis. It was possible to estimate the cost per year of additional life gained for three alternative treatments of non small-cell lungcancer (NSCLC). Sensitivity analyses showed that these costs varied between $1,870 and $6,860 per year of additional life gained, which compares favourablywith the costs that the treatment of other diseases may involve.Conclusions: Contrary to widespread perceptions, it appears that the treatment of lung cancer is effective from an economic standpoint. In addition, the use of amicro-simulation model such as POHEM not only makes it possible to incorporate information from various sources in a coherent manner but also offers thepossibility of estimating the effect of alternative medical procedures from the standpoint of financial pressures on the health care system.

    Release date: 1997-04-22

  • Articles and reports: 12-001-X19960022986
    Description:

    Within a survey re-engineering context, the combined methodology developed in the paper addresses the problem of finding the minimal sample size for the generalized regression estimator in skewed survey populations (e.g., business, institutional, agriculture populations). Three components necessary in identifying an efficient sample redesign strategy involve i) constructing an efficient partitioning between the “take-all” and “sampled” groups, ii) identifying an efficient sample selection scheme, and iii) finding the minimal sample size required to meet the desired precision constraint(s). A scheme named the “Transfer Algorithm” is devised to address the first issue (Pandher 1995) and is integrated with the other two components to arrive at a combined iterative procedure that converges to a globally minimal sample size and population partitioning under the imposed precision constraint. An equivalence result is obtained allowing the solution to the proposed algorithm to be alternatively determined in terms of simple quantities computable directly from the population auxiliary data. Results from the application of the proposed sample redesign methodology to the Local Government Survey in Ontario are reported. A 52% reduction in the total sample size is achieved for the regression estimator of the total at a minimum coefficient of variation of 2%.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022980
    Description:

    In this paper, we study a confidence interval estimation method for a finite population average when some auxiliairy information is available. As demonstrated by Royall and Cumberland in a series of empirical studies, naive use of existing methods to construct confidence intervals for population averages may result in very poor conditional coverage probabilities, conditional on the sample mean of the covariate. When this happens, we propose to transform the data to improve the precision of the normal approximation. The transformed data are then used to make inference on the original population average, and the auxiliary information is incorporated into the inference directly, or by calibration with empirical likelihood. Our approach is design-based. We apply our approach to six real populations and find that when transformation is needed, our approach performs well compared to the usual regression method.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022982
    Description:

    In work with sample surveys, we often use estimators of the variance components associated with sampling within and between primary sample units. For these applications, it can be important to have some indication of whether the variance component estimators are stable, i.e., have relatively low variance. This paper discusses several data-based measures of the stability of design-based variance component estimators and related quantities. The development emphasizes methods that can be applied to surveys with moderate or large numbers of strata and small numbers of primary sample units per stratum. We direct principal attention toward the design variance of a within-PSU variance estimator, and two related degrees-of-freedom terms. A simulation-based method allows one to assess whether an observed stability measure is consistent with standard assumptions regarding variance estimator stability. We also develop two sets of stability measures for design-based estimators of between-PSU variance components and the ratio of the overall variance to the within-PSU variance. The proposed methods are applied to interview and examination data from the U.S. Third National Health and Nutrition Examination Survey (NHANES III). These results indicate that the true stability properties may vary substantially across variables. In addition, for some variables, within-PSU variance estimators appear to be considerably less stable than one would anticipate from a simple count of secondary units within each stratum.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022983
    Description:

    We propose a second-order inclusion probability approximation for the Chao plan (1982) to obtain an approximate variance estimator for the Horvitz and Thompson estimator. We will then compare this variance with other approximations provided for the randomized systematic sampling plan (Hartley and Rao 1962), the rejective sampling plan (Hájek 1964) and the Rao-Sampford sampling plan (Rao 1965 and Sampford 1967). Our conclusion will be that these approximations are equivalent if the first-order inclusion probabilities are small and if the sample is large.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022979
    Description:

    This paper empirically compares three estimation methods - regression, restricted regression, and principal person - used in a household survey of consumer expenditures. The three methods are applied to post-stratification which is important in many household surveys to adjust for under-coverage of the target population. Post-stratum population counts are typically available from an external census for numbers of persons but not for numbers of households. If household estimates are needed, a single weight must be assigned to each household while using the person counts for post-stratification. This is easily accomplished with regression estimators of totals or means by using person counts in each household's auxiliary data. Restricted regression estimation refines the weights by controlling extremes and can produce estimators with lower variance than Horvitz-Thompson estimators while still adhering to the population controls. The regression methods also allow controls to be used for both person-level counts and quantitative auxiliaries. With the principal person method, persons are classified into post-strata and person weights are ratio adjusted to achieve population control totals. This leads to each person in a household potentially having a different weight. The weight associated with the "principal person" is then selected as the household weight. We will compare estimated means from the three methods and their estimated standard errors for a number of expenditures from the Consumer Expenditure survey sponsored by the U.S. Bureau of Labor Statistics.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022984
    Description:

    In this paper we present two applications of spatial smoothing using data collected in a large scale economic survey of Australian farms: one a small area and the other a large area application. In the small area application, we describe how the sample weigths can be spatially smoothed in order to improve small area estimates. In the large area application, we give a method for spatially smoothing and then mapping the survey data. The standard method of weighting in the survey is a variant of linear regression weighting. For the small area application, this method is modified by introducing a constraint on the spatial variability of the weights. Results from a small scale empirical study indicate that this decreases the variance of the small area estimators as expected, but at the cost of an increase in their bias. In the large area application, we describe the nonparametric regression method used to spatially smooth the survey data as well as techniques for mapping this smoothed data using a Geographic Information System (GIS) package. We also present the results of a simulation study conducted to determine the most appropriate method and level of smoothing for use in the maps.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022981
    Description:

    Results from the Current Population Survey split panel studies indicated a centralized computer-assisted telephone interviewing (CATI) effect on labor force estimates. One hypothesis is that the CATI interviewing increased the probability of respondent's changing their reported labor force status. The two sample McNemar test is appropriate for testing this type of hypothesis: the hypothesis of interest is that the marginal changes in each of two independent sample's tables are equal. We show two adaptations of this test to complex survey data, along with applications from the Current Population Survey's Parallel Survey split data and from the Current Population Survey's CATI Phase-in data.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022973
    Description:

    There exist well known methods due to Deville and Sárndal (1992) which adjust sampling weights to meet benchmark constraints and range restrictions. The resulting estimators are known as calibration estimators. There also exists an earlier, but perhaps not as well known, method due to Huang and Fuller (1978). In addition, alternative methods were developed by Singh (1993), who showed that similar to the result of Deville-Sárndal, all these methods are asymptotically equivalent to the regression method. The purpose of this paper is threefold: (i) to attempt to provide a simple heuristic justification of all calibration estimators (including both well known and not so well known) by taking a non-traditional approach; to do this, a model (instead of the distance function) for the weight adjustment factor is first chosen and then a suitable method of model fitting is shown to correspond to the distance minimization solution, (ii) to provide to practitioner computational algorithms as a quick reference, and (iii) to illustrate how various methods might compare in terms of distribution of weight adjustment factors, point estimates, estimated precision, and computational burden by giving numerical examples based on a real data set. Some interesting observations can be made by means of a descriptive analysis of numerical results which indicate that while all the calibration methods seem to behave similarly to the regression method for loose bounds, they however seem to behave differently for tight bounds.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022985
    Description:

    Telephone surveys in the U.S. are subject to coverage bias because about 6 percent of all households do not have a telephone at any particular point in time. The bias resulting from this undercoverage can be important since those who do not have a telephone are generally poorer and have other characteristics that differ from the telephone population. Poststratification and the other usual methods of adjustment often do not fully compensate for this bias. This research examines a procedure for adjusting the survey estimates based on the observation that some households have a telephone for only part of the year, often due to economic circumstances. By collecting data on interruptions in telephone service in the past year, statistical adjustments of the estimates can be made which may reduce the bias in the estimates but which at the same time increase variances because of greater variability in weights. This paper considers a method of adjustment using data collected from a national telephone survey. Estimates of the reductions in bias and the effect on the mean square error of the estimates are computed for a variety of statistics. The results show that when the estimates from the survey are highly related to economic conditions the telephone interruption adjustment procedure can improve the mean square error of the estimates.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022978
    Description:

    The use of auxiliary information in estimation procedures in complex surveys, such as Statistics Canada's Labour Force Survey, is becoming increasingly sophisticated. In the past, regression and raking ratio estimation were the commonly used procedures for incorporating auxiliary data into the estimation process. However, the weights associated with these estimators could be negative or highly positive. Recent theoretical developments by Deville and Sárndal (1992) in the construction of "restricted" weights, which can be forced to be positive and upwardly bounded, has led us to study the properties of the resulting estimators. In this paper, we investigate the properties of a number of such weight generating procedures, as well as their corresponding estimated variances. In particular, two variance estimation procedures are investigated via a Monte Carlo simulation study based on Labour Force Survey data; they are Jackknifing and Taylor Linearization. The conclusion is that the bias of both the point estimators and the variance estimators is minimal, even under severe "restricting" of the final weights.

    Release date: 1997-01-30

Reference (1)

Reference (1) (1 result)

  • Index and guides: 92-125-G
    Description:

    This consultation guide marks the beginning of the content consultation and testing process for the 2001 Census. A broad range of data users, including those in every level of government, national associations, non-government organizations, community groups, businesses and private sector, universities and the general public, will be asked to provide their comments on the questions asked, requirements for future census information, and the identification of data gaps.

    Release date: 1997-10-31

Date modified: