Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Year of publication

1 facets displayed. 1 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (64)

All (64) (25 of 64 results)

  • Technical products: 85-602-X
    Description:

    The purpose of this report is to provide an overview of existing methods and techniques making use of personal identifiers to support record linkage. Record linkage can be loosely defined as a methodology for manipulating and / or transforming personal identifiers from individual data records from one or more operational databases and subsequently attempting to match these personal identifiers to create a composite record about an individual. Record linkage is not intended to uniquely identify individuals for operational purposes; however, it does provide probabilistic matches of varying degrees of reliability for use in statistical reporting. Techniques employed in record linkage may also be of use for investigative purposes to help narrow the field of search against existing databases when some form of personal identification information exists.

    Release date: 2000-12-05

  • Articles and reports: 85-002-X20000128385
    Description:

    In 1999, as part of its General Social Survey program, Statistics Canada conducted a survey on victimization and public perceptions of crime and the justice system. It was the third time that the General Social Survey (GSS) had examined victimization - previous surveys were conducted in 1993 and 1988.

    For the 1999 survey, interviews were conducted by telephone with approximately 26,000 people, aged 15 and older, living in the 10 provinces. Respondents were asked for their opinions concerning the level of crime in their neighbourhood, their fear of crime and their views concerning the performance of the justice system. They were also asked about their attitudes toward sentencing adult and young offenders. Respondents were randomly presented with one of four hypothetical situations for which they were asked to choose "prison" or "non-prison". Respondents who selected prison sentences were given a follow-up question that asked them whether a sentence of one year of probation and 200 hours of community work was an acceptable alternative to the prison sentence.

    This Juristat examines public attitudes toward sentencing adult and young offenders. It also analyzes public attitudes toward four sectors of the justice system including, the police, the criminal courts, the prison and parole systems.

    Release date: 2000-12-04

  • Technical products: 75F0002M2000006
    Description:

    This paper discusses methods and tools considered and used to produce cross-sectional estimates based on the combination of two longitudinal panels for the Survey of Labour and Income Dynamics (SLID).

    Release date: 2000-10-05

  • Articles and reports: 81-003-X19990045145
    Description:

    This paper examines the characteristics of young people who responded to the 1991 School Leavers Survey (SLS), but who subsequently failed to respond to the 1995 School Leavers Follow-up Survey (SLF).

    Release date: 2000-09-01

  • Index and guides: 12-001-S
    Description:

    This index to over 450 articles that have appeared in Survey methodology since its inception is intended to simplify the task of locating articles of interest to readers. The index is in three parts. Four bibliographies are listed in the bibliographical index and all other articles are classified both in the author index and in the subject index.

    Release date: 2000-08-31

  • Surveys and statistical programs – Documentation: 62F0026M2000004
    Description:

    The Survey of Household Spending (SHS), which replaced the periodic Family Expenditure Survey (FAMEX) in 1997, is an annual survey that collects detailed expenditure information from households for a given calendar year. Due to the heavy response burden placed on respondents of this survey, it was decided for the 1997 survey to perform a test of incentive effect on response rates. Two incentives were used: a one-year subscription to the Statistics Canada publication Canadian Social Trends and a telephone calling card. The response rate data was analysed using Fisher's exact test and some non-parametric methods. After controlling for a discovered interviewer assignment effect, it was found that there was some evidence of a telephone card effect in the western and eastern most regions of Canada, while there was no evidence of any effect for the magazine. These findings were somewhat corroborated by a separate study testing effects of incentives on respondent relations. All these results will be discussed in this paper.

    Release date: 2000-08-31

  • Articles and reports: 12-001-X200000110774
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015179
    Description:

    This paper suggests estimating the conditional mean squared error of small area estimators to evaluate their accuracy. This mean squared error is conditional in the sense that it measures the variability with respect to the sampling design for a particular realization of the smoothing model underlying the small area estimators. An unbiased estimators for the conditional mean squared error is easily constructed using Stein's Lemma for the expectation of normal random variables.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015176
    Description:

    A components-of-variance approach and an estimated covariance error structure were used in constructing predictors of adjustment factors for the 1990 Decennial Census. The variability of the estimated covariance matrix is the suspected cause of certain anomalies that appeared in the regression estimation and in the estimated adjustment factors. We investigate alternative prediction methods and propose a procedure that is less influenced by variability in the estimated covariance matrix. The proposed methodology is applied to a data set composed of 336 adjustment factors from the 1990 Post Enumeration Survey.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015174
    Description:

    Computation is an integral part of statistical analysis in general and survey sampling in particular. What kinds of analyses can be carried out will depend upon what kind of computational power is available. The general development of sampling theory is traced in connection with technological developments in computation.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015178
    Description:

    Longitudinal observations consist of repeated measurements on the same units over a number of occasions, with fixed or varying time spells between the occasions. Each vector observation can be viewed therefore as a time series, usually of short length. Analyzing the measurements for all the units permits the fitting of low-order time series models, despite the short lengths of the individual series.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015183
    Description:

    For surveys which involve more than one stage of data collection, one method recommended for adjusting weights for nonresponse (after the first stage of data collection) entails utilizing auxiliary variables (from previous stages of data collection) which are identified as predictors of nonresponse.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015181
    Description:

    Samples from hidden and hard-to-access human populations are often obtained by procedures in which social links are followed from one respondent to another. Inference from the sample to the larger population of interest can be affected by the link-tracing design and the type of data it produces. The population with its social network structure can be modeled as a stochastic graph with a joint distribution of node values representing characteristics of individuals and arc indicators representing social relationships between individuals.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015180
    Description:

    Imputation is a common procedure to compensate for nonresponse in survey problems. Using auxiliary data, imputation may produce estimators that are more efficient than the one constructed by ignoring nonrespondents and re-weighting. We study and compare the mean squared errors of survey estimators based on data imputed using three difference imputation techniques: the commonly used ratio imputation method and two cold deck imputation methods that are frequently adopted in economic area surveys conducted by the U.S. Census Bureau and the U.S. Bureau of Labor Statistics.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015173
    Description:

    In recognition of Survey Methodology's silver anniversary, this paper reviews the major advances in survey research that have taken place in the past 25 years. It provides a gneneral overview of developments in: the survey research profession; survey methodology - questionnaire design, data collection methods, handling missing data, survey sampling, and total survey error; and survey applications - panel surveys, international surveys, and secondary analysis. It also attempts to forecast some future developments in these areas.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015177
    Description:

    The 1996 Canadian Census is adjusted for coverage error as estimated primarily through the Reverse Record Check (RRC). In this paper, we will show how there is a wealth of additional information from the 1996 Reverse Record Check of direct value to population estimation. Beyond its ability to estimate coverage error, it is possible to extend the Reverse Record Check classification results to obtain an alternative estimate of demographic growth - potentially decomposed by component. This added feature of the Reverse Record Check provides promise in the evaluation of estimated census coverage error as well as insight as to possible problems in the estimation of selected components in the population estimates program.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015184
    Description:

    Survey statisticians frequently use superpopulation linear regression models. The Gauss-Markov theorem, assuming fixed regressors or conditioning on observed values of regressors, asserts that the standard estimators of regression coefficients are best linear unbiased.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015175
    Description:

    Mahalanobis provided an example of how to use statistics to enlighten and inform government policy makers. His pioneering work was used by the US Bureau of the Census to learn more about measurement errors in censuses and surveys. People have many misconceptions about censuses, among them who is to be counted and where. Errors in the census do occur, among them errors in coverage. Over the years, the US Bureau of the Census has developed statistical techniques, including sampling in the census, to increase accuracy and reduce response burden.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015182
    Description:

    To better understand the impact of imposing a restricted region on calibration weights, the author reviews the latter's aymptotic behaviour. Necessary and sufficient conditions are provided for the existence of a solution to the calibration equation with weights within given intervals.

    Release date: 2000-08-30

  • Articles and reports: 89-552-M2000007
    Description:

    This paper addresses the problem of statistical inference with ordinal variates and examines the robustness to alternative literacy measurement and scaling choices of rankings of average literacy and of estimates of the impact of literacy on individual earnings.

    Release date: 2000-06-02

  • Table: 53-222-X19980006587
    Description:

    The primary purpose of this article is to present a new time series data and to demonstrate its analytical potential and not to provide a detailed analysis of these data. The analysis in section 5.2.4 will deal primarily with the trends of major variables dealing with domestic and transborder traffic.

    Release date: 2000-03-07

  • Technical products: 11-522-X19990015686
    Description:

    The U.S. Consumer Expenditure Survey uses two instruments, a diary and an in-person interview, to collect data on many categories of consumer expenditures. Consequently, it is important to use these data efficiently to estimate mean expenditures and related parameters. Three options are: (1) use only data from the diary source; (2) Use only data from the interview source; and (3) use generalized least squares, or related methods, to combine the diary and interview data. Historically, the U.S. Bureau of Labor Statistics has focused on options (1) and (2) for estimation at the five or six-digit Universal Classification Code level. Evaluation and possible implementation of option (3) depends on several factors, including possible measurement biases in the diary and interview data; the empirical magnitude of these biases, relative to the standard errors of customary mean estimators; and the degree of homogeneity of these biases across strata and periods. This paper reviews some issues related to options (1) through (3); describes a relatively simple generalized least squares method for implementation of option (3); and discussed the need for diagnostics to evaluate the feasibility and relative efficiency of the generalized least squares method.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015664
    Description:

    Much work on probabilistic methods of linkage can be found in the statistical literature. However, although many groups undoubtedly still use deterministic procedures, not much literature is available on these strategies. Furthermore there appears to exist no documentation on the comparison of results for the two strategies. Such a comparison is pertinent in the situation where we have only non-unique identifiers like names, sex, race etc. as common identifiers on which the databases are to be linked. In this work we compare a stepwise deterministic linkage strategy with the probabilistic strategy, as implemented in AUTOMATCH, for such a situation. The comparison was carried out on a linkage between medical records from the Regional Perinatal Intensive Care Centers database and education records from the Florida Department of Education. Social security numbers, available in both databases, were used to decide the true status of the record pair after matching. Match rates and error rates for the two strategies are compared and a discussion of their similarities and differences, strengths and weaknesses is presented.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015684
    Description:

    Often, the same information is gathered almost simultaneously for several different surveys. In France, this practice is institutionalized for household surveys that have a common set of demographic variables, i.e., employment, residence and income. These variables are important co-factors for the variables of interest in each survey, and if used carefully, can reinforce the estimates derived from each survey. Techniques for calibrating uncertain data can apply naturally in this context. This involves finding the best unbiased estimator in common variables and calibrating each survey based on that estimator. The estimator thus obtained in each survey is always a linear estimator, the weightings of which can be easily explained and the variance can be obtained with no new problems, as can the variance estimate. To supplement the list of regression estimators, this technique can also be seen as a ridge-regression estimator, or as a Bayesian-regression estimator.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015648
    Description:

    We estimate the parameters of a stochastic model for labour force careers involving distributions of correlated durations employed, unemployed (with and without job search) and not in the labour force. If the model is to account for sub-annual labour force patterns as well as advancement towards retirement, then no single data source is adequate to inform it. However, it is possible to build up an approximation from a number of different sources.

    Release date: 2000-03-02

Data (1)

Data (1) (1 result)

  • Table: 53-222-X19980006587
    Description:

    The primary purpose of this article is to present a new time series data and to demonstrate its analytical potential and not to provide a detailed analysis of these data. The analysis in section 5.2.4 will deal primarily with the trends of major variables dealing with domestic and transborder traffic.

    Release date: 2000-03-07

Analysis (28)

Analysis (28) (25 of 28 results)

  • Articles and reports: 85-002-X20000128385
    Description:

    In 1999, as part of its General Social Survey program, Statistics Canada conducted a survey on victimization and public perceptions of crime and the justice system. It was the third time that the General Social Survey (GSS) had examined victimization - previous surveys were conducted in 1993 and 1988.

    For the 1999 survey, interviews were conducted by telephone with approximately 26,000 people, aged 15 and older, living in the 10 provinces. Respondents were asked for their opinions concerning the level of crime in their neighbourhood, their fear of crime and their views concerning the performance of the justice system. They were also asked about their attitudes toward sentencing adult and young offenders. Respondents were randomly presented with one of four hypothetical situations for which they were asked to choose "prison" or "non-prison". Respondents who selected prison sentences were given a follow-up question that asked them whether a sentence of one year of probation and 200 hours of community work was an acceptable alternative to the prison sentence.

    This Juristat examines public attitudes toward sentencing adult and young offenders. It also analyzes public attitudes toward four sectors of the justice system including, the police, the criminal courts, the prison and parole systems.

    Release date: 2000-12-04

  • Articles and reports: 81-003-X19990045145
    Description:

    This paper examines the characteristics of young people who responded to the 1991 School Leavers Survey (SLS), but who subsequently failed to respond to the 1995 School Leavers Follow-up Survey (SLF).

    Release date: 2000-09-01

  • Articles and reports: 12-001-X200000110774
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015179
    Description:

    This paper suggests estimating the conditional mean squared error of small area estimators to evaluate their accuracy. This mean squared error is conditional in the sense that it measures the variability with respect to the sampling design for a particular realization of the smoothing model underlying the small area estimators. An unbiased estimators for the conditional mean squared error is easily constructed using Stein's Lemma for the expectation of normal random variables.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015176
    Description:

    A components-of-variance approach and an estimated covariance error structure were used in constructing predictors of adjustment factors for the 1990 Decennial Census. The variability of the estimated covariance matrix is the suspected cause of certain anomalies that appeared in the regression estimation and in the estimated adjustment factors. We investigate alternative prediction methods and propose a procedure that is less influenced by variability in the estimated covariance matrix. The proposed methodology is applied to a data set composed of 336 adjustment factors from the 1990 Post Enumeration Survey.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015174
    Description:

    Computation is an integral part of statistical analysis in general and survey sampling in particular. What kinds of analyses can be carried out will depend upon what kind of computational power is available. The general development of sampling theory is traced in connection with technological developments in computation.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015178
    Description:

    Longitudinal observations consist of repeated measurements on the same units over a number of occasions, with fixed or varying time spells between the occasions. Each vector observation can be viewed therefore as a time series, usually of short length. Analyzing the measurements for all the units permits the fitting of low-order time series models, despite the short lengths of the individual series.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015183
    Description:

    For surveys which involve more than one stage of data collection, one method recommended for adjusting weights for nonresponse (after the first stage of data collection) entails utilizing auxiliary variables (from previous stages of data collection) which are identified as predictors of nonresponse.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015181
    Description:

    Samples from hidden and hard-to-access human populations are often obtained by procedures in which social links are followed from one respondent to another. Inference from the sample to the larger population of interest can be affected by the link-tracing design and the type of data it produces. The population with its social network structure can be modeled as a stochastic graph with a joint distribution of node values representing characteristics of individuals and arc indicators representing social relationships between individuals.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015180
    Description:

    Imputation is a common procedure to compensate for nonresponse in survey problems. Using auxiliary data, imputation may produce estimators that are more efficient than the one constructed by ignoring nonrespondents and re-weighting. We study and compare the mean squared errors of survey estimators based on data imputed using three difference imputation techniques: the commonly used ratio imputation method and two cold deck imputation methods that are frequently adopted in economic area surveys conducted by the U.S. Census Bureau and the U.S. Bureau of Labor Statistics.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015173
    Description:

    In recognition of Survey Methodology's silver anniversary, this paper reviews the major advances in survey research that have taken place in the past 25 years. It provides a gneneral overview of developments in: the survey research profession; survey methodology - questionnaire design, data collection methods, handling missing data, survey sampling, and total survey error; and survey applications - panel surveys, international surveys, and secondary analysis. It also attempts to forecast some future developments in these areas.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015177
    Description:

    The 1996 Canadian Census is adjusted for coverage error as estimated primarily through the Reverse Record Check (RRC). In this paper, we will show how there is a wealth of additional information from the 1996 Reverse Record Check of direct value to population estimation. Beyond its ability to estimate coverage error, it is possible to extend the Reverse Record Check classification results to obtain an alternative estimate of demographic growth - potentially decomposed by component. This added feature of the Reverse Record Check provides promise in the evaluation of estimated census coverage error as well as insight as to possible problems in the estimation of selected components in the population estimates program.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015184
    Description:

    Survey statisticians frequently use superpopulation linear regression models. The Gauss-Markov theorem, assuming fixed regressors or conditioning on observed values of regressors, asserts that the standard estimators of regression coefficients are best linear unbiased.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015175
    Description:

    Mahalanobis provided an example of how to use statistics to enlighten and inform government policy makers. His pioneering work was used by the US Bureau of the Census to learn more about measurement errors in censuses and surveys. People have many misconceptions about censuses, among them who is to be counted and where. Errors in the census do occur, among them errors in coverage. Over the years, the US Bureau of the Census has developed statistical techniques, including sampling in the census, to increase accuracy and reduce response burden.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X20000015182
    Description:

    To better understand the impact of imposing a restricted region on calibration weights, the author reviews the latter's aymptotic behaviour. Necessary and sufficient conditions are provided for the existence of a solution to the calibration equation with weights within given intervals.

    Release date: 2000-08-30

  • Articles and reports: 89-552-M2000007
    Description:

    This paper addresses the problem of statistical inference with ordinal variates and examines the robustness to alternative literacy measurement and scaling choices of rankings of average literacy and of estimates of the impact of literacy on individual earnings.

    Release date: 2000-06-02

  • Articles and reports: 12-001-X19990024875
    Description:

    Dr. Fellegi considers the challenges facing government statistical agencies and strategies to prepare for these challenges. He first describes the environment of changing information needs and the social, economic and technological developments driving this change. He goes on to describe both internal and external elements of a strategy to meet these evolving needs. Internally, a flexible capacity for survey taking and information gathering must be developed. Externally, contacts must be developed to ensure continuing relevance of statistical programs while maintaining non-political objectivity.

    Release date: 2000-03-01

  • Articles and reports: 12-001-X19990024879
    Description:

    Godambe and Thompson consider the problem of confidence intervals in survey sampling. They first review the use of estimating functions to obtain model robust pivotal quantities and associated confidence intervals, and then discuss the adaptation of this approach to the survey sampling context. Details are worked out for some more specific types of models, and an empirical comparison of this approach with more conventional methods is presented.

    Release date: 2000-03-01

  • Articles and reports: 12-001-X19990024881
    Description:

    Sirken and Shimizu derive a Horvitz-Thompson estimator for population based establishment sample surveys (PBESs). A PBES is a survey of establishments where the sampling frame consists of establishments with which a preliminary sample of households or individuals has had some contact.

    Release date: 2000-03-01

  • Articles and reports: 12-001-X19990024883
    Description:

    Brewer proposes a method of weight calibration in survey sampling, called cosmetic calibration, which yields cosmetic estimators of totals, i.e. estimators that can be interpreted as both design-based and prediction based. He also discusses variance estimation and shows how the problem of negative weights can be easily and naturally handled using cosmetic calibration. Finally he compares the properties of the weights and the resulting estimators to some alternative approaches using some Australian far data.

    Release date: 2000-03-01

  • Articles and reports: 12-001-X19990024884
    Description:

    In the final paper of this special issue, Estevao and Särndal consider two types of design-based estimators used for domain estimation. The first, a linear prediction estimator, is built on the principle of model fitting, requires known auxiliary information at the domain level, and results in weights that depend on the domain to be estimated. The second, a uni-weight estimator, has weights which are independent of the domain being estimated and has the clear advantage that it does not require the calculation of different weight systems for each different domain of interest. These estimators are compared and situations under which one is preferred over the other are identified.

    Release date: 2000-03-01

  • Articles and reports: 12-001-X19990024876
    Description:

    Leslie Kish describes the challenges and opportunities of combining data from surveys of different populations. Examples include multinational surveys where the data from surveys of several countries are combined for comparison and analysis, as well as cumulated periodic surveys of the "same" population. He also compares and contrasts the combining of surveys with the combining of experiments.

    Release date: 2000-03-01

  • Articles and reports: 12-001-X19990024882
    Description:

    Jean-Claude Deville shows how to use simple tools to calculate the variance of a complex estimator using a linearization technique. The process is that of a software used at INSEE for estimation of the variance of a complex estimator. It gives a way of computing the variance of a total estimated by the simple expansion estimator. In the case of a complex statistic, the process uses a derived variable that reduces the computations to those of the simple expansion estimator. Multiple examples are given to illustrate the process.

    Release date: 2000-03-01

  • Articles and reports: 12-001-X199900210900
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2000-03-01

  • Articles and reports: 12-001-X19990024878
    Description:

    In his paper Fritz Scheuren considers the possible uses of administrative records to enhance and improve population censuses. After reviewing previous uses of administrative records in an international context, he puts forward several proposals for research and development towards increased use of administrative records in the American statistical system.

    Release date: 2000-03-01

Reference (35)

Reference (35) (25 of 35 results)

  • Technical products: 85-602-X
    Description:

    The purpose of this report is to provide an overview of existing methods and techniques making use of personal identifiers to support record linkage. Record linkage can be loosely defined as a methodology for manipulating and / or transforming personal identifiers from individual data records from one or more operational databases and subsequently attempting to match these personal identifiers to create a composite record about an individual. Record linkage is not intended to uniquely identify individuals for operational purposes; however, it does provide probabilistic matches of varying degrees of reliability for use in statistical reporting. Techniques employed in record linkage may also be of use for investigative purposes to help narrow the field of search against existing databases when some form of personal identification information exists.

    Release date: 2000-12-05

  • Technical products: 75F0002M2000006
    Description:

    This paper discusses methods and tools considered and used to produce cross-sectional estimates based on the combination of two longitudinal panels for the Survey of Labour and Income Dynamics (SLID).

    Release date: 2000-10-05

  • Index and guides: 12-001-S
    Description:

    This index to over 450 articles that have appeared in Survey methodology since its inception is intended to simplify the task of locating articles of interest to readers. The index is in three parts. Four bibliographies are listed in the bibliographical index and all other articles are classified both in the author index and in the subject index.

    Release date: 2000-08-31

  • Surveys and statistical programs – Documentation: 62F0026M2000004
    Description:

    The Survey of Household Spending (SHS), which replaced the periodic Family Expenditure Survey (FAMEX) in 1997, is an annual survey that collects detailed expenditure information from households for a given calendar year. Due to the heavy response burden placed on respondents of this survey, it was decided for the 1997 survey to perform a test of incentive effect on response rates. Two incentives were used: a one-year subscription to the Statistics Canada publication Canadian Social Trends and a telephone calling card. The response rate data was analysed using Fisher's exact test and some non-parametric methods. After controlling for a discovered interviewer assignment effect, it was found that there was some evidence of a telephone card effect in the western and eastern most regions of Canada, while there was no evidence of any effect for the magazine. These findings were somewhat corroborated by a separate study testing effects of incentives on respondent relations. All these results will be discussed in this paper.

    Release date: 2000-08-31

  • Technical products: 11-522-X19990015686
    Description:

    The U.S. Consumer Expenditure Survey uses two instruments, a diary and an in-person interview, to collect data on many categories of consumer expenditures. Consequently, it is important to use these data efficiently to estimate mean expenditures and related parameters. Three options are: (1) use only data from the diary source; (2) Use only data from the interview source; and (3) use generalized least squares, or related methods, to combine the diary and interview data. Historically, the U.S. Bureau of Labor Statistics has focused on options (1) and (2) for estimation at the five or six-digit Universal Classification Code level. Evaluation and possible implementation of option (3) depends on several factors, including possible measurement biases in the diary and interview data; the empirical magnitude of these biases, relative to the standard errors of customary mean estimators; and the degree of homogeneity of these biases across strata and periods. This paper reviews some issues related to options (1) through (3); describes a relatively simple generalized least squares method for implementation of option (3); and discussed the need for diagnostics to evaluate the feasibility and relative efficiency of the generalized least squares method.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015664
    Description:

    Much work on probabilistic methods of linkage can be found in the statistical literature. However, although many groups undoubtedly still use deterministic procedures, not much literature is available on these strategies. Furthermore there appears to exist no documentation on the comparison of results for the two strategies. Such a comparison is pertinent in the situation where we have only non-unique identifiers like names, sex, race etc. as common identifiers on which the databases are to be linked. In this work we compare a stepwise deterministic linkage strategy with the probabilistic strategy, as implemented in AUTOMATCH, for such a situation. The comparison was carried out on a linkage between medical records from the Regional Perinatal Intensive Care Centers database and education records from the Florida Department of Education. Social security numbers, available in both databases, were used to decide the true status of the record pair after matching. Match rates and error rates for the two strategies are compared and a discussion of their similarities and differences, strengths and weaknesses is presented.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015684
    Description:

    Often, the same information is gathered almost simultaneously for several different surveys. In France, this practice is institutionalized for household surveys that have a common set of demographic variables, i.e., employment, residence and income. These variables are important co-factors for the variables of interest in each survey, and if used carefully, can reinforce the estimates derived from each survey. Techniques for calibrating uncertain data can apply naturally in this context. This involves finding the best unbiased estimator in common variables and calibrating each survey based on that estimator. The estimator thus obtained in each survey is always a linear estimator, the weightings of which can be easily explained and the variance can be obtained with no new problems, as can the variance estimate. To supplement the list of regression estimators, this technique can also be seen as a ridge-regression estimator, or as a Bayesian-regression estimator.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015648
    Description:

    We estimate the parameters of a stochastic model for labour force careers involving distributions of correlated durations employed, unemployed (with and without job search) and not in the labour force. If the model is to account for sub-annual labour force patterns as well as advancement towards retirement, then no single data source is adequate to inform it. However, it is possible to build up an approximation from a number of different sources.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015666
    Description:

    The fusion sample obtained by a statistical matching process can be considered a sample out of an artificial population. The distribution of this artificial population is derived. If the correlation between specific variables is the only focus the strong demand for conditional independence can be weakened. In a simulation study the effects of violations of some assumptions leading to the distribution of the artificial population are examined. Finally some ideas concerning the establishing of the claimed conditional independence by latent class analysis are presented.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015660
    Description:

    There are many different situations in which one or more files need to be linked. With one file the purpose of the linkage would be to locate duplicates within the file. When there are two files, the linkage is done to identify the units that are the same on both files and thus create matched pairs. Often records that need to be linked do not have a unique identifier. Hierarchical record linkage, probabilistic record linkage and statistical matching are three methods that can be used when there is no unique identifier on the files that need to be linked. We describe the major differences between the methods. We consider how to choose variables to link, how to prepare files for linkage and how the links are identified. As well, we review tips and tricks used when linking files. Two examples, the probabilistic record linkage used in the reverse record check and the hierarchical record linkage of the Business Number (BN) master file to the Statistical Universe File (SUF) of unincorporated tax filers (T1) will be illustrated.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015656
    Description:

    Time series studies have shown associations between air pollution concentrations and morbidity and mortality. These studies have largely been conducted within single cities, and with varying methods. Critics of these studies have questioned the validity of the data sets used and the statistical techniques applied to them; the critics have noted inconsistencies in findings among studies and even in independent re-analyses of data from the same city. In this paper we review some of the statistical methods used to analyze a subset of a national data base of air pollution, mortality and weather assembled during the National Morbidity and Mortality Air Pollution Study (NMMAPS).

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015682
    Description:

    The application of dual system estimation (DSE) to matched Census / Post Enumeration Survey (PES) data in order to measure net undercount is well understood (Hogan, 1993). However, this approach has so far not been used to measure net undercount in the UK. The 2001 PES in the UK will use this methodology. This paper presents the general approach to design and estimation for this PES (the 2001 Census Coverage Survey). The estimation combines DSE with standard ratio and regression estimation. A simulation study using census data from the 1991 Census of England and Wales demonstrates that the ratio model is in general more robust than the regression model.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015638
    Description:

    The focus of Symposium'99 is on techniques and methods for combining data from different sources and on analysis of the resulting data sets. In this talk we illustrate the usefulness of taking such an "integrating" approach when tackling a complex statistical problem. The problem itself is easily described - it is how to approximate, as closely as possible, a "perfect census", and in particular, how to obtain census counts that are "free" of underenumeration. Typically, underenumeration is estimated by carrying out a post enumeration survey (PES) following the census. In the UK in 1991 the PEF failed to identify the full size of the underenumeration and so demographic methods were used to estimate the extent of the undercount. The problems with the "traditional" PES approach in 1991 resulted in a joint research project between the Office for National Statistics and the Department of Social Statistics at the University of Southampton aimed at developing a methodology which will allow a "One Number Census" in the UK in 2001. That is, underenumeration will be accounted for not just at high levels of aggregation, but right down to the lowest levels at which census tabulations are produced. In this way all census outputs will be internally consistent, adding to the national population estimates. The basis of this methodology is the integration of information from a number of data sources in order to achieve this "One Number".

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015680
    Description:

    To augment the amount of available information, data from different sources are increasingly being combined. These databases are often combined using record linkage methods. When there is no unique identifier, a probabilistic linkage is used. In that case, a record on a first file is associated with a probability that is linked to a record on a second file, and then a decision is taken on whether a possible link is a true link or not. This usually requires a non-negligible amount of manual resolution. It might then be legitimate to evaluate if manual resolution can be reduced or even eliminated. This issue is addressed in this paper where one tries to produce an estimate of a total (or a mean) of one population, when using a sample selected from another population linked somehow to the first population. In other words, having two populations linked through probabilistic record linkage, we try to avoid any decision concerning the validity of links and still be able to produce an unbiased estimate for a total of the one of two populations. To achieve this goal, we suggest the use of the Generalised Weight Share Method (GWSM) described by Lavallée (1995).

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015654
    Description:

    A meta analysis was performed to estimate the proportion of liver carcinogens, the proportion of chemicals carcinogenic at any site, and the corresponding proportion of anticarcinogens among chemicals tested in 397 long-term cancer bioassays conducted by the U.S. National Toxicology Program. Although the estimator used was negatively biased, the study provided persuasive evidence for a larger proportion of liver carcinogens (0.43,90%CI: 0.35,0.51) than was identified by the NTP (0.28). A larger proportion of chemicals carcinogenic at any site was also estimated (0.59,90%CI: 0.49,0.69) than was identified by the NTP (0.51), although this excess was not statistically significant. A larger proportion of anticarcinogens (0.66) was estimated than carcinogens (0.59). Despite the negative bias, it was estimated that 85% of the chemicals were either carcinogenic or anticarcinogenic at some site in some sex-species group. This suggests that most chemicals tested at high enough doses will cause some sort of perturbation in tumor rates.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015672
    Description:

    Data fusion as discussed here means to create a set of data on not jointly observed variables from two different sources. Suppose for instance that observations are available for (X,Z) on a set of individuals and for (Y,Z) on a different set of individuals. Each of X, Y and Z may be a vector variable. The main purpose is to gain insight into the joint distribution of (X,Y) using Z as a so-called matching variable. At first however, it is attempted to recover as much information as possible on the joint distribution of (X,Y,Z) from the distinct sets of data. Such fusions can only be done at the cost of implementing some distributional properties for the fused data. These are conditional independencies given the matching variables. Fused data are typically discussed from the point of view of how appropriate this underlying assumption is. Here we give a different perspective. We formulate the problem as follows: how can distributions be estimated in situations when only observations from certain marginal distributions are available. It can be solved by applying the maximum entropy criterium. We show in particular that data created by fusing different sources can be interpreted as a special case of this situation. Thus, we derive the needed assumption of conditional independence as a consequence of the type of data available.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015668
    Description:

    Following the problems with estimating underenumeration in the 1991 Census of England and Wales the aim for the 2001 Census is to create a database that is fully adjusted to net underenumeration. To achieve this, the paper investigates weighted donor imputation methodology that utilises information from both the census and census coverage survey (CCS). The US Census Bureau has considered a similar approach for their 2000 Census (see Isaki et al 1998). The proposed procedure distinguishes between individuals who are not counted by the census because their household is missed and those who are missed in counted households. Census data is linked to data from the CCS. Multinomial logistic regression is used to estimate the probabilities that households are missed by the census and the probabilities that individuals are missed in counted households. Household and individual coverage weights are constructed from the estimated probabilities and these feed into the donor imputation procedure.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015694
    Description:

    We use data on 14 populations of coho salmon to estimate critical parameters that are vital for management of fish populations. Parameter estimates from individual data sets are inefficient and can be highly biased, and we investigate methods to overcome these problems. Combination of data sets using nonlinear mixed effects models provides more useful results, however questions of influence and robustness are raised. For comparison, robust estimates are obtained. Model-robustness is also explored using a family of alternative functional forms. Our results allow ready calculation of the limits of exploitation and may help to prevent extinction of fish stocks. Similar methods can be applied in other contexts where parameter estimation is part of a larger decision-making process.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015690
    Description:

    The artificial sample was generated in two steps. The first step, based on a master panel, was a Multiple Correspondence Analysis (MCA) carried out on basic variables. Then, "dummy" individuals were generated randomly using the distribution of each "significant" factor in the analysis. Finally, for each individual, a value was generated for each basic variable most closely linked to one of the previous factors. This method ensured that sets of variables were drawn independently. The second step consisted in grafting some other data bases, based on certain property requirements. A variable was generated to be added on the basis of its estimated distribution, using a generalized linear model for common variables and those already added. The same procedure was then used to graft the other samples. This method was applied to the generation of an artificial sample taken from two surveys. The artificial sample that was generated was validated using sample comparison testing. The results were positive, demonstrating the feasibility of this method.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015644
    Description:

    One method of enriching survey data is to supplement information collected directly from the respondent with that obtained from administrative systems. The aims of such a practice include being able to collect data which might not otherwise be possible, provision of better quality information for data items which respondents may not be able to report accurately (or not at all) reduction of respondent load, and maximising the utility of information held in administrative systems. Given the direct link with administrative information, the data set resulting from such techniques is potentially a powerful basis for policy-relevant analysis and evaluation. However, the processes involved in effectively combining data from different sources raise a number of challenges which need to be addressed by the parties involved. These include issues associated with privacy, data linking, data quality, estimation, and dissemination.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015652
    Description:

    Objective: To create an occupational surveillance system by collecting, linking, evaluating and disseminating data relating to occupation and mortality with the ultimate aim of reducing or preventing excess risk among workers and the general population.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015640
    Description:

    This paper states how SN is preparing for a new era in the making of statistics, as it is triggered by technological and methodological developments. An essential feature of the turn to the new era is the farewell to the stovepipe way of data processing. The paper discusses how new technological and methodological tools will affect processes and their organization. Special emphasis is put on one of the major chances and challenges the new tools offer: establishing coherence in the content of statistics and in the presentation to users.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015688
    Description:

    The geographical and temporal relationship between outdoor air pollution and asthma was examined by linking together data from multiple sources. These included the administrative records of 59 general practices widely dispersed across England and Wales for half a million patients and all their consultations for asthma, supplemented by a socio-economic interview survey. Postcode enabled linkage with: (i) computed local road density; (ii) emission estimates of sulphur dioxide and nitrogen dioxides, (iii) measured/interpolated concentration of black smoke, sulphur dioxide, nitrogen dioxide and other pollutants at practice level. Parallel Poisson time series analysis took into account between-practice variations to examine daily correlations in practices close to air quality monitoring stations. Preliminary analyses show small and generally non-significant geographical associations between consultation rates and pollution markers. The methodological issues relevant to combining such data, and the interpretation of these results will be discussed.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015650
    Description:

    The U.S. Manufacturing Plant Ownership Change Database (OCD) was constructed using plant-level data taken from the Census Bureau's Longitudinal Research Database (LRD). It contains data on all manufacturing plants that have experienced ownership change at least once during the period 1963-92. This paper reports the status of the OCD and discuss its research possibilities. For an empirical demonstration, data taken from the database are used to study the effects of ownership changes on plant closure.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015662
    Description:

    As the availability of both health utilization and outcome information becomes increasingly important to health care researchers and policy makers, the ability to link person-specific health data becomes a critical objective. This type of linkage of population-based administrative health databases has been realized in British Columbia. The database was created by constructing an historical file of all persons registered with the health care system, and then by probabilistically linking various program files to this 'coordinating' file. The first phase of development included the linkage of hospital discharge data, physician billing data, continuing care data, data about drug costs for the elderly, births data and deaths data. The second phase of development has seen the addition data sources external to the Ministry of Health including cancer incidence data, workers' compensation data, and income assistance data.

    Release date: 2000-03-02

Date modified: