Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (29)

All (29) (25 of 29 results)

  • Technical products: 75F0002M1993008
    Description:

    This paper describes the learning curve associated with breaking away from the traditional method of developing questions for a "paper and pencil" questionnaire to providing specifications for questions and flows for a programmer. It uses the Survey of Labour and Income Dynamics (SLID) as a case study.

    Release date: 1995-12-30

  • Technical products: 75F0002M1995016
    Description:

    This paper examines the development of survey data files, or data modelling, for longitudinal surveys.

    Release date: 1995-12-30

  • Articles and reports: 11F0019M1995083
    Description:

    This paper examines the robustness of a measure of the average complete duration of unemployment in Canada to a host of assumptions used in its derivation. In contrast to the average incomplete duration of unemployment, which is a lagging cyclical indicator, this statistic is a coincident indicator of the business cycle. The impact of using a steady state as opposed to a non steady state assumption, as well as the impact of various corrections for response bias are explored. It is concluded that a non steady state estimator would be a valuable compliment to the statistics on unemployment duration that are currently released by many statistical agencies, and particularly Statistics Canada.

    Release date: 1995-12-30

  • Articles and reports: 11F0019M1995084
    Description:

    The objective of this paper is to introduce in a new measure of the average duration of unemployment spells using Canadian data. The paper summarizes the work of Corak (1993) and Corak and Heisz (1994) on the average complete duration of unemployment in a non-technical way by focusing on the distinction between it and the average incomplete duration of unemployment, which is regularly released by Statistics Canada. It is pointed out that the latter is a lagging cyclical indicator. The average complete duration of unemployment is a more accurate indicator of prevailing labour market conditions, but some assumptions required in its derivation also imply that it lags actual developments.

    Release date: 1995-12-30

  • Articles and reports: 12-001-X199500214392
    Description:

    Although large scale surveys conducted in developing countries can provide an invaluable snapshot of the health situation in a community, results produced rarely reflect the current reality as they are often released several months or years after data collection. The time lag can be partially attributed to delays in entering, coding and cleaning data after it is collected in the field. Recent advances in computer technology have provided a means of directly recording data onto a hand-held computer. Errors are reduced because in-built checks triggered as the questionnaire is administered reject illogical or inconsistent entries. This paper reports the use of one such computer-assisted interviewing tool in the collection of demographic data in Kenya. Although initial costs of establishing computer-assisted interviewing are high, the benefits are clear: errors that can creep into data collected by experienced field staff can be reduced to negligible levels. In situations where speed is essential, a large number of staff are involved, or a pre-coded questionnaire is used to collect data routinely over a long period, computer-assisted interviewing could prove a means of saving costs in the long term, as well as producing a dramatic improvement in data quality in the immediate term.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214395
    Description:

    When redesigning a sample with a stratified multi-stage design, it is sometimes considered desirable to maximize the number of primary sampling units retained in the new sample without altering unconditional selection probabilities. For this problem, an optimal solution which uses transportation theory exists for a very general class of designs. However, this procedure has never been used in the redesign of any survey (that the authors are aware of), in part because even for moderately-sized strata, the resulting transportation problem may be too large to solve in practice. In this paper, a modified reduced-size transportation algorithm is presented for maximizing the overlap, which substantially reduces the size of the problem. This reduced-size overlap procedure was used in the recent redesign of the Survey of Income and Program Participation (SIPP). The performance of the reduced-size algorithm is summarized, both for the actual production SIPP overlap and for earlier, artificial simulations of the SIPP overlap. Although the procedure is not optimal and theoretically can produce only negligible improvements in expected overlap compared to independent selection, in practice it gave substantial improvements in overlap over independent selection for SIPP, and generally provided an overlap that is close to optimal.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214397
    Description:

    Regression estimation and its generalization, calibration estimation, introduced by Deville and Särndal in 1993, serves to reduce a posteriori the variance of the estimators through the use of auxiliary information. In sample surveys, there is often useable supplementary information that is distributed according to a complex schema, especially where the sampling is realized in several phases. An adaptation of regression estimation was proposed along with its variants in the framework of two-phase sampling by Särndal and Swensson in 1987. This article seeks to examine alternative estimation strategies according to two alternative configurations for auxiliary information. It will do so by linking the two possible approaches to the problem: use of a regression model and calibration estimation.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214398
    Description:

    We present empirical evidence from 14 surveys in six countries concerning the existence and magnitude of design effects (defts) for five designs of two major types. The first type concerns deft (p_i – p_j), the difference of two proportions from a polytomous variable of three or more categories. The second type uses Chi-square tests for differences from two samples. We find that for all variables in all designs deft (p_i – p_j) \cong [deft (p_i) + deft (p_j)] / 2 are good approximations. These are empirical results, and exceptions disprove the existence of mere analytical inequalities. These results hold despite great variations of defts between variables and also between categories of the same variables. They also show the need for sample survey treatment of survey data even for analytical statistics. Furthermore they permit useful approximations of deft (p_i – p_j) from more accessible deft (p_i) values.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214399
    Description:

    This paper considers the winsorized mean as an estimator of the mean of a positive skewed population. A winsorized mean is obtained by replacing all the observations larger than some cut-off value R by R before averaging. The optimal cut-off value, as defined by Searls (1966), minimizes the mean square error of the winsorized estimator. Techniques are proposed for the evaluation of this optimal cut-off in several sampling designs including simple random sampling, stratified sampling and sampling with probability proportional to size. For most skewed distributions, the optimal winsorization strategy is shown, on average, to modify the value of about one data point in the sample. Closed form approximations to the efficiency of Searls’ winsorized mean are derived using the theory of extreme order statistics. Various estimators reducing the impact of large data values are compared in a Monte Carlo experiment.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214396
    Description:

    We summarize some salient aspects of the theory of estimation functions for finite populations. In particular, we discuss the problem of estimation of means and totals and extend this theory to estimating functions. We then apply this estimating functions framework to the problem of estimating measures of income inequality. The resulting statistics are nonlinear functions of the observations. Some of them depend on the order of observations or quantiles. Consequently, the mean squared errors of these estimates are inexpressible by simple formulae and cannot be estimated by conventional variance estimation methods. We show that within the estimating function framework this problem can be resolved using the Taylor linearization method. Finally, we illustrate the proposed methodology using income data from Canadian Survey of Consumer Finance and comparing it to the ‘delete-one-cluster’ jackknifing method.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214391
    Description:

    Statistical process control can be used as a quality tool to assure the accuracy of sampling frames that are constructed periodically. Sampling frame sizes are plotted in a control chart to detect special causes of variation. Procedures to identify the appropriate time series (ARIMA) model for serially correlated observations are described. Applications of time series analysis to the construction of control charts are discussed. Data from the United States Department of Labor’s Unemployment Insurance Benefits Quality Control Program is used to illustrate the technique.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214393
    Description:

    Major uncertainties about the quality of elderly population and death enumerations in the United States result from coverage and content errors in the censuses and the death registration system. This study evaluates the consistency of reported data between the two sources for the white and the African-American populations. The focus is on the older population (aged 60 and above), where mortality trends have the greatest impact on social programs and where data are most problematic. Using intercensal cohort analysis, age-specific inconsistencies between the sources are identified for two periods: 1970-1980 and 1980-1990. The U.S. data inconsistencies are examined in light of evidence in the literature regarding the nature of coverage and content errors in the data sources. Data for African-Americans are highly inconsistent in the 1970-1990 period, likely the result of age overstatement in censuses relative to death registration. Inconsistencies also exist for whites in the 1970-1980 intercensal period. We argue that the primary source of this error is an undercount in the 1970 census relative to both the 1980 census and the death registration. In contrast, the 1980-1990 data for whites, and particularly for white females, are highly consistent, far better than in most European countries.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214405
    Description:

    In this paper we explore the effect of interviewer variability on the precision of estimated contrasts between domain means. In the first part we develop a correlated components of variance model to identify the factors that determine the size of the effect. This has implications for sample design and for interviewer training. In the second part we report on an empirical study using data from a large multi-stage survey on dental health. Gender of respondent and ethnic affiliation are used to establish two sets of domains for the comparisons. Overall interviewer and cluster effects make little difference to the variance of male/female comparisons, but there is noticeable increase in the variance of some contrasts between the two ethnic groupings used in this study. Indeed, the impact of interviewer effects for the ethnic comparision is two or three times higher than it is for gender contrasts. These findings have particular relevance for health surveys where it is common to use a small cadre of highly-trained interviewers.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214394
    Description:

    In a 1992 National Test Census the mailing sequence of a prenotice letter, census form, reminder postcard, and replacement census form resulted in an overall mailback response of 63.4 percent. The response was substantially higher than the 49.2 percent response rate obtained in the 1986 National Content Test Census, which also utilized a replacement form mailing. Much of this difference appeared to be the result of the prenotice - census form - reminder sequence, but the extent to which each main effect and interactions contributed to overall response was not known. This paper reports results from the 1992 Census Implementation Test, a test of the individual and combined effectiveness of a prenotice letter, a stamped return envelope and a reminder postcard, on response rates. This was a national sample of households (n = 50,000) conducted in the fall of 1992. A factorial design was used to test all eight possible combinations of the main effects and interactions. Logistic regression and multiple comparisons were employed to analyze test results.

    Release date: 1995-12-15

  • Articles and reports: 82-003-X19950022507
    Description:

    Indicators based on the registration of vital events are used to determine the health status of populations. The need for these indicators at the regional and community levels has grown with the trend toward decentralization in the delivery of health services. Such indicators are important because they affect funding and the types of service that are provided. Health status indicators tend to be associated with variables such as the level of urbanization or socioeconomic status. According to four indicators - mortality ratios for all causes of death, mortality ratios for external causes of death, infant mortality ratios, and low birth weight live birth ratios - some areas of British Columbia, specifically along the border with Alberta, have relatively good health, although the characteristics of these regions suggest that this should not be the case. However, a much different picture emerges when vital event data registered in Alberta for residents of these areas of British Columbia are considered. This article shows that for adequate health planning and program implementation, some communities need data from neighbouring provinces. It illustrates the effect of incorporating Alberta data into the development of health status indicators for British Columbia. It also suggests that similar adjustments may be necessary for data compiled in other provinces.

    Release date: 1995-11-20

  • Articles and reports: 11F0019M1995081
    Description:

    Users of socio-economic statistics typically want more and better information. Often, these needs can be met simply by more extensive data collections, subject to usual concerns over financial costs and survey respondent burdens. Users, particularly for public policy purposes, have also expressed a continuing, and as yet unfilled, demand for an integrated and coherent system of socio-economic statistics. In this case, additional data will not be sufficient; the more important constraint is the absence of an agreed conceptual approach.

    In this paper, we briefly review the state of frameworks for social and economic statistics, including the kinds of socio-economic indicators users may want. These indicators are motivated first in general terms from basic principles and intuitive concepts, leaving aside for the moment the practicalities of their construction. We then show how a coherent structure of such indicators might be assembled.

    A key implication is that this structure requires a coordinated network of surveys and data collection processes, and higher data quality standards. This in turn implies a breaking down of the "stovepipe" systems that typify much of the survey work in national statistical agencies (i.e. parallel but generally unrelated data "production lines"). Moreover, the data flowing from the network of surveys must be integrated. Since the data of interest are dynamic, the proposed method goes beyond statistical matching to microsimulation modeling. Finally, these ideas are illustrated with preliminary results from the LifePaths model currently under development in Statistics Canada.

    Release date: 1995-07-30

  • Articles and reports: 82-003-X19950011661
    Description:

    In 1994, Statistics Canada began data collection for the National Population Health Survey (NPHS), a household survey designed to mesure the health status of Canadians and to expand knowledge of health determinants. The survey is longitudinal, with data being collected on selected panel members every second year. This article focuses on the NPHS sample design ant its rationale. Topics include sample allocation, representativeness, and selection; modifications in Quebec and the territories; and integration of the NPHS with the National Longitudinal Survey of Children. The final section considers some methodological issues to be addresses in future waves of the survey.

    Release date: 1995-07-27

  • Articles and reports: 11F0019M1995067
    Description:

    The role of technical innovation in economic growth is both a current matter of keen public policy interest, and active exploration in economic theory. However, formal economic theorizing is often constrained by considerations of mathematical tractability. Evolutionary economic theories which are realized as computerized microsimulation models offer significant promise both for transcending mathematical constraints and addressing fundamental questions in a more realistic and flexible manner. This paper sketches XEcon, a microsimulation model of economic growth in the evolutionary tradition.

    Release date: 1995-06-30

  • Articles and reports: 12-001-X199500114407
    Description:

    The Horvitz-Thompson estimator (HT-estimator) is not robust against outliers. Outliers in the population may increase its variance though it remains unbiased. The HT-estimator is expressed as a least squares functional to robustify it through M-estimators. An approximate variance of the robustified HT-estimator is derived using a kind of influence function for sampling and an estimator of this variance is developed. An adaptive method to choose an M-estimator leads to minimum estimated risk estimators. These estimators and robustified HT-estimators are often more efficient than the HT-estimator when outliers occur.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114410
    Description:

    As part of the decision on adjustment of the 1990 Decennial Census, the U.S. Census Bureau investigated possible heterogeneity of undercount rates between parts of different states falling in the same adjustment cell or poststratum. Five “surrogate variables” believed to be associated with undercount were analyzed using a large extract from the census and significant heterogeneity was found. Analysis of Post Enumeration Survey on undercount rates showed that more variance was explained by poststratification variables than by state, supporting the decision to use the poststratum as the adjustment cell. Significant interstate heterogeneity was found in 19 out of 99 poststratum groups (mainly in nonurban areas), but there was little if any evidence that the poststratified estimator was biased against particular states after aggregating across poststrata. Nonetheless, this issue should be addressed in future coverage evaluation studies.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114415
    Description:

    Stanley Warner’s contributions to randomized response are reviewed. Following this review, a linear model, based on random permutation models, is developed to include many known randomized response designs as special cases. Under this model optimal estimators for finite population variances and covariances are obtained within a general class of quadratic design-unbiased estimators. From these results an estimator of the finite population correlation is obtained. Three randomized response designs are examined in particular: (i) the unrelated questions model of Greenberg et al. (1969); (ii) the additive constants model of Pollock and Bek (1976); and (iii) the multiplicative constants model of Pollock and Bek (1976). Simple models for response bias are presented to illustrate the effect of this bias on estimation of the correlation.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114411
    Description:

    In 1991, Statistics Canada for the first time adjusted the Population Estimates Program for undercoverage in the 1991 Census. The Census coverage studies provided reliable estimates of undercoverage at the provincial level and for national estimates of large age - sex domains. However, the population series required estimates of undercoverage for age - sex domains within each province and territory. Since the direct survey estimates for some of these small domains had large standard errors due to the small sample size in the domain, small area modelling techniques were needed. In order to incorporate the varying degrees of reliability of the direct survey estimates, a regression model utilizing an Empirical Bayes methodology was used to estimate the undercoverage in small domains. A raking ratio procedure was then applied to the undercoverage estimates to preserve consistency with the marginal direct survey estimates. The results of this modelling process are shown along with the estimated reduction in standard errors.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114408
    Description:

    The problem of estimating the median of a finite population when an auxiliary variable is present is considered. Point and interval estimators based on a non-informative Bayesian approach are proposed. The point estimator is compared to other possible estimators and is seen to perform well in a variety of situations.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114406
    Description:

    This paper discusses the design of visitor surveys. To illustrate, two recent surveys are described. The first is a survey of visitors to National Park Service areas nationwide throughout the year (1992). The second is a survey of recreational users of the three-river basin around Pittsburgh, Pennsylvania, during a twelve-month period. Both surveys involved sampling in time with temporal as well as spatial stratification. Sampling units had the form of site-period pairs for the stage before the final, visitor sampling stage. Random assignment of sample sites to periods permits the computation of unbiased estimates for the temporal strata (e.g., monthly and seasonal estimates) as well as estimates for strata defined by region and by type of use.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114414
    Description:

    It is well known that the sample mean based on the distinct sample units in simple random sampling with replacement is more efficient than the sample mean based on all units selected including repetitions (Murthy 1967, pp. 65-66). Seth and Rao (1964) showed that the mean of the distinct units is less efficient than the sample mean in sampling without replacement under the same average sampling cost. Under Warner’s (1965) method of randomized response we compare simple random sampling without replacement and sampling with replacement when only the distinct number of units in the sample are considered.

    Release date: 1995-06-15

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (27)

Analysis (27) (25 of 27 results)

  • Articles and reports: 11F0019M1995083
    Description:

    This paper examines the robustness of a measure of the average complete duration of unemployment in Canada to a host of assumptions used in its derivation. In contrast to the average incomplete duration of unemployment, which is a lagging cyclical indicator, this statistic is a coincident indicator of the business cycle. The impact of using a steady state as opposed to a non steady state assumption, as well as the impact of various corrections for response bias are explored. It is concluded that a non steady state estimator would be a valuable compliment to the statistics on unemployment duration that are currently released by many statistical agencies, and particularly Statistics Canada.

    Release date: 1995-12-30

  • Articles and reports: 11F0019M1995084
    Description:

    The objective of this paper is to introduce in a new measure of the average duration of unemployment spells using Canadian data. The paper summarizes the work of Corak (1993) and Corak and Heisz (1994) on the average complete duration of unemployment in a non-technical way by focusing on the distinction between it and the average incomplete duration of unemployment, which is regularly released by Statistics Canada. It is pointed out that the latter is a lagging cyclical indicator. The average complete duration of unemployment is a more accurate indicator of prevailing labour market conditions, but some assumptions required in its derivation also imply that it lags actual developments.

    Release date: 1995-12-30

  • Articles and reports: 12-001-X199500214392
    Description:

    Although large scale surveys conducted in developing countries can provide an invaluable snapshot of the health situation in a community, results produced rarely reflect the current reality as they are often released several months or years after data collection. The time lag can be partially attributed to delays in entering, coding and cleaning data after it is collected in the field. Recent advances in computer technology have provided a means of directly recording data onto a hand-held computer. Errors are reduced because in-built checks triggered as the questionnaire is administered reject illogical or inconsistent entries. This paper reports the use of one such computer-assisted interviewing tool in the collection of demographic data in Kenya. Although initial costs of establishing computer-assisted interviewing are high, the benefits are clear: errors that can creep into data collected by experienced field staff can be reduced to negligible levels. In situations where speed is essential, a large number of staff are involved, or a pre-coded questionnaire is used to collect data routinely over a long period, computer-assisted interviewing could prove a means of saving costs in the long term, as well as producing a dramatic improvement in data quality in the immediate term.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214395
    Description:

    When redesigning a sample with a stratified multi-stage design, it is sometimes considered desirable to maximize the number of primary sampling units retained in the new sample without altering unconditional selection probabilities. For this problem, an optimal solution which uses transportation theory exists for a very general class of designs. However, this procedure has never been used in the redesign of any survey (that the authors are aware of), in part because even for moderately-sized strata, the resulting transportation problem may be too large to solve in practice. In this paper, a modified reduced-size transportation algorithm is presented for maximizing the overlap, which substantially reduces the size of the problem. This reduced-size overlap procedure was used in the recent redesign of the Survey of Income and Program Participation (SIPP). The performance of the reduced-size algorithm is summarized, both for the actual production SIPP overlap and for earlier, artificial simulations of the SIPP overlap. Although the procedure is not optimal and theoretically can produce only negligible improvements in expected overlap compared to independent selection, in practice it gave substantial improvements in overlap over independent selection for SIPP, and generally provided an overlap that is close to optimal.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214397
    Description:

    Regression estimation and its generalization, calibration estimation, introduced by Deville and Särndal in 1993, serves to reduce a posteriori the variance of the estimators through the use of auxiliary information. In sample surveys, there is often useable supplementary information that is distributed according to a complex schema, especially where the sampling is realized in several phases. An adaptation of regression estimation was proposed along with its variants in the framework of two-phase sampling by Särndal and Swensson in 1987. This article seeks to examine alternative estimation strategies according to two alternative configurations for auxiliary information. It will do so by linking the two possible approaches to the problem: use of a regression model and calibration estimation.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214398
    Description:

    We present empirical evidence from 14 surveys in six countries concerning the existence and magnitude of design effects (defts) for five designs of two major types. The first type concerns deft (p_i – p_j), the difference of two proportions from a polytomous variable of three or more categories. The second type uses Chi-square tests for differences from two samples. We find that for all variables in all designs deft (p_i – p_j) \cong [deft (p_i) + deft (p_j)] / 2 are good approximations. These are empirical results, and exceptions disprove the existence of mere analytical inequalities. These results hold despite great variations of defts between variables and also between categories of the same variables. They also show the need for sample survey treatment of survey data even for analytical statistics. Furthermore they permit useful approximations of deft (p_i – p_j) from more accessible deft (p_i) values.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214399
    Description:

    This paper considers the winsorized mean as an estimator of the mean of a positive skewed population. A winsorized mean is obtained by replacing all the observations larger than some cut-off value R by R before averaging. The optimal cut-off value, as defined by Searls (1966), minimizes the mean square error of the winsorized estimator. Techniques are proposed for the evaluation of this optimal cut-off in several sampling designs including simple random sampling, stratified sampling and sampling with probability proportional to size. For most skewed distributions, the optimal winsorization strategy is shown, on average, to modify the value of about one data point in the sample. Closed form approximations to the efficiency of Searls’ winsorized mean are derived using the theory of extreme order statistics. Various estimators reducing the impact of large data values are compared in a Monte Carlo experiment.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214396
    Description:

    We summarize some salient aspects of the theory of estimation functions for finite populations. In particular, we discuss the problem of estimation of means and totals and extend this theory to estimating functions. We then apply this estimating functions framework to the problem of estimating measures of income inequality. The resulting statistics are nonlinear functions of the observations. Some of them depend on the order of observations or quantiles. Consequently, the mean squared errors of these estimates are inexpressible by simple formulae and cannot be estimated by conventional variance estimation methods. We show that within the estimating function framework this problem can be resolved using the Taylor linearization method. Finally, we illustrate the proposed methodology using income data from Canadian Survey of Consumer Finance and comparing it to the ‘delete-one-cluster’ jackknifing method.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214391
    Description:

    Statistical process control can be used as a quality tool to assure the accuracy of sampling frames that are constructed periodically. Sampling frame sizes are plotted in a control chart to detect special causes of variation. Procedures to identify the appropriate time series (ARIMA) model for serially correlated observations are described. Applications of time series analysis to the construction of control charts are discussed. Data from the United States Department of Labor’s Unemployment Insurance Benefits Quality Control Program is used to illustrate the technique.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214393
    Description:

    Major uncertainties about the quality of elderly population and death enumerations in the United States result from coverage and content errors in the censuses and the death registration system. This study evaluates the consistency of reported data between the two sources for the white and the African-American populations. The focus is on the older population (aged 60 and above), where mortality trends have the greatest impact on social programs and where data are most problematic. Using intercensal cohort analysis, age-specific inconsistencies between the sources are identified for two periods: 1970-1980 and 1980-1990. The U.S. data inconsistencies are examined in light of evidence in the literature regarding the nature of coverage and content errors in the data sources. Data for African-Americans are highly inconsistent in the 1970-1990 period, likely the result of age overstatement in censuses relative to death registration. Inconsistencies also exist for whites in the 1970-1980 intercensal period. We argue that the primary source of this error is an undercount in the 1970 census relative to both the 1980 census and the death registration. In contrast, the 1980-1990 data for whites, and particularly for white females, are highly consistent, far better than in most European countries.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214405
    Description:

    In this paper we explore the effect of interviewer variability on the precision of estimated contrasts between domain means. In the first part we develop a correlated components of variance model to identify the factors that determine the size of the effect. This has implications for sample design and for interviewer training. In the second part we report on an empirical study using data from a large multi-stage survey on dental health. Gender of respondent and ethnic affiliation are used to establish two sets of domains for the comparisons. Overall interviewer and cluster effects make little difference to the variance of male/female comparisons, but there is noticeable increase in the variance of some contrasts between the two ethnic groupings used in this study. Indeed, the impact of interviewer effects for the ethnic comparision is two or three times higher than it is for gender contrasts. These findings have particular relevance for health surveys where it is common to use a small cadre of highly-trained interviewers.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214394
    Description:

    In a 1992 National Test Census the mailing sequence of a prenotice letter, census form, reminder postcard, and replacement census form resulted in an overall mailback response of 63.4 percent. The response was substantially higher than the 49.2 percent response rate obtained in the 1986 National Content Test Census, which also utilized a replacement form mailing. Much of this difference appeared to be the result of the prenotice - census form - reminder sequence, but the extent to which each main effect and interactions contributed to overall response was not known. This paper reports results from the 1992 Census Implementation Test, a test of the individual and combined effectiveness of a prenotice letter, a stamped return envelope and a reminder postcard, on response rates. This was a national sample of households (n = 50,000) conducted in the fall of 1992. A factorial design was used to test all eight possible combinations of the main effects and interactions. Logistic regression and multiple comparisons were employed to analyze test results.

    Release date: 1995-12-15

  • Articles and reports: 82-003-X19950022507
    Description:

    Indicators based on the registration of vital events are used to determine the health status of populations. The need for these indicators at the regional and community levels has grown with the trend toward decentralization in the delivery of health services. Such indicators are important because they affect funding and the types of service that are provided. Health status indicators tend to be associated with variables such as the level of urbanization or socioeconomic status. According to four indicators - mortality ratios for all causes of death, mortality ratios for external causes of death, infant mortality ratios, and low birth weight live birth ratios - some areas of British Columbia, specifically along the border with Alberta, have relatively good health, although the characteristics of these regions suggest that this should not be the case. However, a much different picture emerges when vital event data registered in Alberta for residents of these areas of British Columbia are considered. This article shows that for adequate health planning and program implementation, some communities need data from neighbouring provinces. It illustrates the effect of incorporating Alberta data into the development of health status indicators for British Columbia. It also suggests that similar adjustments may be necessary for data compiled in other provinces.

    Release date: 1995-11-20

  • Articles and reports: 11F0019M1995081
    Description:

    Users of socio-economic statistics typically want more and better information. Often, these needs can be met simply by more extensive data collections, subject to usual concerns over financial costs and survey respondent burdens. Users, particularly for public policy purposes, have also expressed a continuing, and as yet unfilled, demand for an integrated and coherent system of socio-economic statistics. In this case, additional data will not be sufficient; the more important constraint is the absence of an agreed conceptual approach.

    In this paper, we briefly review the state of frameworks for social and economic statistics, including the kinds of socio-economic indicators users may want. These indicators are motivated first in general terms from basic principles and intuitive concepts, leaving aside for the moment the practicalities of their construction. We then show how a coherent structure of such indicators might be assembled.

    A key implication is that this structure requires a coordinated network of surveys and data collection processes, and higher data quality standards. This in turn implies a breaking down of the "stovepipe" systems that typify much of the survey work in national statistical agencies (i.e. parallel but generally unrelated data "production lines"). Moreover, the data flowing from the network of surveys must be integrated. Since the data of interest are dynamic, the proposed method goes beyond statistical matching to microsimulation modeling. Finally, these ideas are illustrated with preliminary results from the LifePaths model currently under development in Statistics Canada.

    Release date: 1995-07-30

  • Articles and reports: 82-003-X19950011661
    Description:

    In 1994, Statistics Canada began data collection for the National Population Health Survey (NPHS), a household survey designed to mesure the health status of Canadians and to expand knowledge of health determinants. The survey is longitudinal, with data being collected on selected panel members every second year. This article focuses on the NPHS sample design ant its rationale. Topics include sample allocation, representativeness, and selection; modifications in Quebec and the territories; and integration of the NPHS with the National Longitudinal Survey of Children. The final section considers some methodological issues to be addresses in future waves of the survey.

    Release date: 1995-07-27

  • Articles and reports: 11F0019M1995067
    Description:

    The role of technical innovation in economic growth is both a current matter of keen public policy interest, and active exploration in economic theory. However, formal economic theorizing is often constrained by considerations of mathematical tractability. Evolutionary economic theories which are realized as computerized microsimulation models offer significant promise both for transcending mathematical constraints and addressing fundamental questions in a more realistic and flexible manner. This paper sketches XEcon, a microsimulation model of economic growth in the evolutionary tradition.

    Release date: 1995-06-30

  • Articles and reports: 12-001-X199500114407
    Description:

    The Horvitz-Thompson estimator (HT-estimator) is not robust against outliers. Outliers in the population may increase its variance though it remains unbiased. The HT-estimator is expressed as a least squares functional to robustify it through M-estimators. An approximate variance of the robustified HT-estimator is derived using a kind of influence function for sampling and an estimator of this variance is developed. An adaptive method to choose an M-estimator leads to minimum estimated risk estimators. These estimators and robustified HT-estimators are often more efficient than the HT-estimator when outliers occur.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114410
    Description:

    As part of the decision on adjustment of the 1990 Decennial Census, the U.S. Census Bureau investigated possible heterogeneity of undercount rates between parts of different states falling in the same adjustment cell or poststratum. Five “surrogate variables” believed to be associated with undercount were analyzed using a large extract from the census and significant heterogeneity was found. Analysis of Post Enumeration Survey on undercount rates showed that more variance was explained by poststratification variables than by state, supporting the decision to use the poststratum as the adjustment cell. Significant interstate heterogeneity was found in 19 out of 99 poststratum groups (mainly in nonurban areas), but there was little if any evidence that the poststratified estimator was biased against particular states after aggregating across poststrata. Nonetheless, this issue should be addressed in future coverage evaluation studies.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114415
    Description:

    Stanley Warner’s contributions to randomized response are reviewed. Following this review, a linear model, based on random permutation models, is developed to include many known randomized response designs as special cases. Under this model optimal estimators for finite population variances and covariances are obtained within a general class of quadratic design-unbiased estimators. From these results an estimator of the finite population correlation is obtained. Three randomized response designs are examined in particular: (i) the unrelated questions model of Greenberg et al. (1969); (ii) the additive constants model of Pollock and Bek (1976); and (iii) the multiplicative constants model of Pollock and Bek (1976). Simple models for response bias are presented to illustrate the effect of this bias on estimation of the correlation.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114411
    Description:

    In 1991, Statistics Canada for the first time adjusted the Population Estimates Program for undercoverage in the 1991 Census. The Census coverage studies provided reliable estimates of undercoverage at the provincial level and for national estimates of large age - sex domains. However, the population series required estimates of undercoverage for age - sex domains within each province and territory. Since the direct survey estimates for some of these small domains had large standard errors due to the small sample size in the domain, small area modelling techniques were needed. In order to incorporate the varying degrees of reliability of the direct survey estimates, a regression model utilizing an Empirical Bayes methodology was used to estimate the undercoverage in small domains. A raking ratio procedure was then applied to the undercoverage estimates to preserve consistency with the marginal direct survey estimates. The results of this modelling process are shown along with the estimated reduction in standard errors.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114408
    Description:

    The problem of estimating the median of a finite population when an auxiliary variable is present is considered. Point and interval estimators based on a non-informative Bayesian approach are proposed. The point estimator is compared to other possible estimators and is seen to perform well in a variety of situations.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114406
    Description:

    This paper discusses the design of visitor surveys. To illustrate, two recent surveys are described. The first is a survey of visitors to National Park Service areas nationwide throughout the year (1992). The second is a survey of recreational users of the three-river basin around Pittsburgh, Pennsylvania, during a twelve-month period. Both surveys involved sampling in time with temporal as well as spatial stratification. Sampling units had the form of site-period pairs for the stage before the final, visitor sampling stage. Random assignment of sample sites to periods permits the computation of unbiased estimates for the temporal strata (e.g., monthly and seasonal estimates) as well as estimates for strata defined by region and by type of use.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114414
    Description:

    It is well known that the sample mean based on the distinct sample units in simple random sampling with replacement is more efficient than the sample mean based on all units selected including repetitions (Murthy 1967, pp. 65-66). Seth and Rao (1964) showed that the mean of the distinct units is less efficient than the sample mean in sampling without replacement under the same average sampling cost. Under Warner’s (1965) method of randomized response we compare simple random sampling without replacement and sampling with replacement when only the distinct number of units in the sample are considered.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114412
    Description:

    Household panel surveys often start with a sample of households and then attempt to follow all the members of those households for the life of the panel. At subsequent waves data are collected for the original sample members and for all the persons who are living with the sample members at the time. It is desirable to include the data collected both for the original sample persons and for the persons living with them in making person-level cross-sectional estimates for a particular wave. Similarly, it is desirable to include data for all the households for which data are collected at a particular wave in making household-level cross-sectional estimates for that wave. This paper reviews weighting schemes that can be used for these purposes. These weighting schemes may also be used in other settings in which units have more than one way of being selected for the sample.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199500114416
    Description:

    Stanley Warner was widely known for the creation of the randomized response technique for asking sensitive questions in surveys. Over almost two decades he also formulated and developed statistical methodology for another problem, that of deriving balanced information in advocacy settings so that both positions regarding a policy issue can be fairly and adequately represented. We review this work, including two survey applications implemented by Warner in which he applied the methodology, and we set the ideas into the context of current methodological thinking.

    Release date: 1995-06-15

Reference (2)

Reference (2) (2 results)

Date modified: