Disclosure control and data dissemination

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

Results

All (10)

All (10) ((10 results))

1. Sample empirical likelihood approach under complex survey design with scrambled responses
Articles and reports: 12-001-X202100100003
Description:
One effective way to conduct statistical disclosure control is to use scrambled responses. Scrambled responses can be generated by using a controlled random device. In this paper, we propose using the sample empirical likelihood approach to conduct statistical inference under complex survey design with scrambled responses. Specifically, we propose using a Wilk-type confidence interval for statistical inference. Our proposed method can be used as a general tool for inference with confidential public use survey data files. Asymptotic properties are derived, and the limited simulation study verifies the validity of theory. We further apply the proposed method to some real applications.
Release date: 2021-06-24
2. Finding a Needle in a Haystack: The Theoretical and Empirical Foundations of Assessing Disclosure Risk for Contextualized Microdata Archived
Articles and reports: 11-522-X201700014731
Description:
Our study describes various factors that are of concern when evaluating disclosure risk of contextualized microdata and some of the empirical steps that are involved in their assessment. Utilizing synthetic sets of survey respondents, we illustrate how different postulates shape the assessment of risk when considering: (1) estimated probabilities that unidentified geographic areas are represented within a survey; (2) the number of people in the population who share the same personal and contextual identifiers as a respondent; and (3) the anticipated amount of coverage error in census population counts and extant files that provide identifying information (like names and addresses).
Release date: 2016-03-24
3. Enhancing data sharing via "safe designs" Archived
Articles and reports: 11-522-X201700014733
Description:
The social value of data collections are dramatically enhanced by the broad dissemination of research files and the resulting increase in scientific productivity. Currently, most studies are designed with a focus on collecting information that is analytically useful and accurate, with little forethought as to how it will be shared. Both literature and practice also presume that disclosure analysis will take place after data collection. But to produce public-use data of the highest analytical utility for the largest user group, disclosure risk must be considered at the beginning of the research process. Drawing upon economic and statistical decision-theoretic frameworks and survey methodology research, this study seeks to enhance the scientific productivity of shared research data by describing how disclosure risk can be addressed in the earliest stages of research with the formulation of "safe designs" and "disclosure simulations", where an applied statistical approach has been taken in: (1) developing and validating models that predict the composition of survey data under different sampling designs; (2) selecting and/or developing measures and methods used in the assessments of disclosure risk, analytical utility, and disclosure survey costs that are best suited for evaluating sampling and database designs; and (3) conducting simulations to gather estimates of risk, utility, and cost for studies with a wide range of sampling and database design characteristics.
Release date: 2016-03-24
4. Sparse and efficient replication variance estimation for complex surveys Archived
Articles and reports: 12-001-X201300111826
Description:
It is routine practice for survey organizations to provide replication weights as part of survey data files. These replication weights are meant to produce valid and efficient variance estimates for a variety of estimators in a simple and systematic manner. Most existing methods for constructing replication weights, however, are only valid for specific sampling designs and typically require a very large number of replicates. In this paper we first show how to produce replication weights based on the method outlined in Fay (1984) such that the resulting replication variance estimator is algebraically equivalent to the fully efficient linearization variance estimator for any given sampling design. We then propose a novel weight-calibration method to simultaneously achieve efficiency and sparsity in the sense that a small number of sets of replication weights can produce valid and efficient replication variance estimators for key population parameters. Our proposed method can be used in conjunction with existing resampling techniques for large-scale complex surveys. Validity of the proposed methods and extensions to some balanced sampling designs are also discussed. Simulation results showed that our proposed variance estimators perform very well in tracking coverage probabilities of confidence intervals. Our proposed strategies will likely have impact on how public-use survey data files are produced and how these data sets are analyzed.
Release date: 2013-06-28
5. Providing spatial data for secondary analysis: issues and current practices relating to confidentiality Archived
Articles and reports: 11-522-X20050019433
Description:
Spatially explicit data pose a series of opportunities and challenges for all the actors involved in providing data for long-term preservation and secondary analysis - the data producer, the data archive, and the data user.
Release date: 2007-03-02
6. Delivering the metadata: An RDC experience Archived
Articles and reports: 11-522-X20050019456
Description:
The metadata associated with microdata production of major Statistics Canada household and social surveys are often voluminous and daunting. There does not appear to be a systematic approach to disseminating the metadata of confidential microdata files across all surveys. This heterogeneity applies to content as well as method of dissemination. A pilot project was conducted within the RDC Program to evaluate one standard, the Data Documentation Initiative (DDI), that might support such a process.
Release date: 2007-03-02
7. Variance information for data users Archived
Articles and reports: 11-522-X20050019462
Description:
The traditional approach to presenting variance information to data users is to publish estimates of variance or related statistics, such as standard errors, coefficients of variation, confidence limits or simple grading systems. The paper examines potential sources of variance, such as sample design, sample allocation, sample selection, non-response, and considers what might best be done to reduce variance. Finally, the paper assesses briefly the financial costs to producers and users of reducing or not reducing variance and how we might trade off the costs of producing more accurate statistics against the financial benefits of greater accuracy.
Release date: 2007-03-02
8. A view on statistical disclosure control for microdata Archived
Articles and reports: 12-001-X199600114381
Description:
Problems arising from statistical disclosure control, which aims to prevent that information about individual respondents is disclosed by users of data, have come to the fore rapidly in recent years. The main reason for this is the growing demand for detailed data provided by statistical offices caused by the still increasing use of computers. In former days tables with relatively little information were published. Nowadays the users of data demand much more detailed tables and, moreover, microdata to analyze by themselves. Because of this increase in information content statistical disclosure control has become much more difficult. In this paper the authors give their view on the problems which one encounters when trying to protect microdata against disclosure. This view is based on their experience with statistical disclosure control acquired at Statistics Netherlands.
Release date: 1996-06-14
9. Disseminating Data from Longitudinal Surveys: Issues Facing the Survey of Labour and Income Dynamics Archived
Articles and reports: 75F0002M1995011
Description:
This paper outlines the challenges of disseminating microdata from longitudinal surveys and some of the measures being proposed to deal with them. It uses the Survey of Labour and Income Dynamics (SLID) as a case study.
Release date: 1995-12-30
10. First Microdata File of the Survey of Labour and Income Dynamics: Expected Content Archived
Surveys and statistical programs – Documentation: 75F0002M1995018
Description:
This paper presents a preview of the variables on the first microdata file of the Survey of Labour and Income Dynamics.
Release date: 1995-12-30

Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (9)

Analysis (9) ((9 results))

1. Sample empirical likelihood approach under complex survey design with scrambled responses
Articles and reports: 12-001-X202100100003
Description:
One effective way to conduct statistical disclosure control is to use scrambled responses. Scrambled responses can be generated by using a controlled random device. In this paper, we propose using the sample empirical likelihood approach to conduct statistical inference under complex survey design with scrambled responses. Specifically, we propose using a Wilk-type confidence interval for statistical inference. Our proposed method can be used as a general tool for inference with confidential public use survey data files. Asymptotic properties are derived, and the limited simulation study verifies the validity of theory. We further apply the proposed method to some real applications.
Release date: 2021-06-24
2. Finding a Needle in a Haystack: The Theoretical and Empirical Foundations of Assessing Disclosure Risk for Contextualized Microdata Archived
Articles and reports: 11-522-X201700014731
Description:
Our study describes various factors that are of concern when evaluating disclosure risk of contextualized microdata and some of the empirical steps that are involved in their assessment. Utilizing synthetic sets of survey respondents, we illustrate how different postulates shape the assessment of risk when considering: (1) estimated probabilities that unidentified geographic areas are represented within a survey; (2) the number of people in the population who share the same personal and contextual identifiers as a respondent; and (3) the anticipated amount of coverage error in census population counts and extant files that provide identifying information (like names and addresses).
Release date: 2016-03-24
3. Enhancing data sharing via "safe designs" Archived
Articles and reports: 11-522-X201700014733
Description:
The social value of data collections are dramatically enhanced by the broad dissemination of research files and the resulting increase in scientific productivity. Currently, most studies are designed with a focus on collecting information that is analytically useful and accurate, with little forethought as to how it will be shared. Both literature and practice also presume that disclosure analysis will take place after data collection. But to produce public-use data of the highest analytical utility for the largest user group, disclosure risk must be considered at the beginning of the research process. Drawing upon economic and statistical decision-theoretic frameworks and survey methodology research, this study seeks to enhance the scientific productivity of shared research data by describing how disclosure risk can be addressed in the earliest stages of research with the formulation of "safe designs" and "disclosure simulations", where an applied statistical approach has been taken in: (1) developing and validating models that predict the composition of survey data under different sampling designs; (2) selecting and/or developing measures and methods used in the assessments of disclosure risk, analytical utility, and disclosure survey costs that are best suited for evaluating sampling and database designs; and (3) conducting simulations to gather estimates of risk, utility, and cost for studies with a wide range of sampling and database design characteristics.
Release date: 2016-03-24
4. Sparse and efficient replication variance estimation for complex surveys Archived
Articles and reports: 12-001-X201300111826
Description:
It is routine practice for survey organizations to provide replication weights as part of survey data files. These replication weights are meant to produce valid and efficient variance estimates for a variety of estimators in a simple and systematic manner. Most existing methods for constructing replication weights, however, are only valid for specific sampling designs and typically require a very large number of replicates. In this paper we first show how to produce replication weights based on the method outlined in Fay (1984) such that the resulting replication variance estimator is algebraically equivalent to the fully efficient linearization variance estimator for any given sampling design. We then propose a novel weight-calibration method to simultaneously achieve efficiency and sparsity in the sense that a small number of sets of replication weights can produce valid and efficient replication variance estimators for key population parameters. Our proposed method can be used in conjunction with existing resampling techniques for large-scale complex surveys. Validity of the proposed methods and extensions to some balanced sampling designs are also discussed. Simulation results showed that our proposed variance estimators perform very well in tracking coverage probabilities of confidence intervals. Our proposed strategies will likely have impact on how public-use survey data files are produced and how these data sets are analyzed.
Release date: 2013-06-28
5. Providing spatial data for secondary analysis: issues and current practices relating to confidentiality Archived
Articles and reports: 11-522-X20050019433
Description:
Spatially explicit data pose a series of opportunities and challenges for all the actors involved in providing data for long-term preservation and secondary analysis - the data producer, the data archive, and the data user.
Release date: 2007-03-02
6. Delivering the metadata: An RDC experience Archived
Articles and reports: 11-522-X20050019456
Description:
The metadata associated with microdata production of major Statistics Canada household and social surveys are often voluminous and daunting. There does not appear to be a systematic approach to disseminating the metadata of confidential microdata files across all surveys. This heterogeneity applies to content as well as method of dissemination. A pilot project was conducted within the RDC Program to evaluate one standard, the Data Documentation Initiative (DDI), that might support such a process.
Release date: 2007-03-02
7. Variance information for data users Archived
Articles and reports: 11-522-X20050019462
Description:
The traditional approach to presenting variance information to data users is to publish estimates of variance or related statistics, such as standard errors, coefficients of variation, confidence limits or simple grading systems. The paper examines potential sources of variance, such as sample design, sample allocation, sample selection, non-response, and considers what might best be done to reduce variance. Finally, the paper assesses briefly the financial costs to producers and users of reducing or not reducing variance and how we might trade off the costs of producing more accurate statistics against the financial benefits of greater accuracy.
Release date: 2007-03-02
8. A view on statistical disclosure control for microdata Archived
Articles and reports: 12-001-X199600114381
Description:
Problems arising from statistical disclosure control, which aims to prevent that information about individual respondents is disclosed by users of data, have come to the fore rapidly in recent years. The main reason for this is the growing demand for detailed data provided by statistical offices caused by the still increasing use of computers. In former days tables with relatively little information were published. Nowadays the users of data demand much more detailed tables and, moreover, microdata to analyze by themselves. Because of this increase in information content statistical disclosure control has become much more difficult. In this paper the authors give their view on the problems which one encounters when trying to protect microdata against disclosure. This view is based on their experience with statistical disclosure control acquired at Statistics Netherlands.
Release date: 1996-06-14
9. Disseminating Data from Longitudinal Surveys: Issues Facing the Survey of Labour and Income Dynamics Archived
Articles and reports: 75F0002M1995011
Description:
This paper outlines the challenges of disseminating microdata from longitudinal surveys and some of the measures being proposed to deal with them. It uses the Survey of Labour and Income Dynamics (SLID) as a case study.
Release date: 1995-12-30

Reference (1)

Reference (1) ((1 result))

1. First Microdata File of the Survey of Labour and Income Dynamics: Expected Content Archived
Surveys and statistical programs – Documentation: 75F0002M1995018
Description:
This paper presents a preview of the variables on the first microdata file of the Survey of Labour and Income Dynamics.
Release date: 1995-12-30

Report a problem or mistake on this page

Date modified:: 2024-04-19