Disclosure control and data dissemination

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Type

1 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (8)

All (8) ((8 results))

  • Articles and reports: 12-001-X202300100006
    Description: My comments consist of three components: (1) A brief account of my professional association with Chris Skinner. (2) Observations on Skinner’s contributions to statistical disclosure control, (3) Some comments on making inferences from masked survey data.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X201200111687
    Description:

    To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants' confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemination. We propose to create disclosure-protected subsamples from large scale surveys based on multiple imputation. The idea is to replace identifying or sensitive values in the original sample with draws from statistical models, and release subsamples of the disclosure-protected data. We present methods for making inferences with the multiple synthetic subsamples.

    Release date: 2012-06-27

  • Articles and reports: 11-522-X20050019483
    Description:

    All member countries in Europe face similar problems with respect to Statistical disclosure control (SDC). They all need to find a balance between preservation of privacy for the respondents and the very legitimate requests of society, researchers and policy makers to provide more and more detailed information. This growing demand, due to developments of the information age and knowledge society is a common problem of the European Statistical System (ESS).In the paper current Eurostat confidentiality issues and strategy are discussed and is described a European SDC approach through the establishment of a Centres and Networks of Excellence (CENEX).

    Release date: 2007-03-02

  • Articles and reports: 12-001-X20040027755
    Description:

    Several statistical agencies use, or are considering the use of, multiple imputation to limit the risk of disclosing respondents' identities or sensitive attributes in public use data files. For example, agencies can release partially synthetic datasets, comprising the units originally surveyed with some collected values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. This article presents an approach for generating multiply-imputed, partially synthetic datasets that simultaneously handles disclosure limitation and missing data. The basic idea is to fill in the missing data first to generate m completed datasets, then replace sensitive or identifying values in each completed dataset with r imputed values. This article also develops methods for obtaining valid inferences from such multiply-imputed datasets. New rules for combining the multiple point and variance estimates are needed because the double duty of multiple imputation introduces two sources of variability into point estimates, which existing methods for obtaining inferences from multiply-imputed datasets do not measure accurately. A reference t-distribution appropriate for inferences when m and r are moderate is derived using moment matching and Taylor series approximations.

    Release date: 2005-02-03

  • Articles and reports: 11-522-X20030017692
    Description:

    This paper discusses regression servers, which are data dissemination systems that return some of the output generated by regression analyses in response to user queries. It details work on the special case where the data contain a sensitive variable whose regressions must be protected.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20010016282
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    The Discharge Abstract Database (DAD) is one of the key data holdings held by the Canadian Institute for Health Information (CIHI). The institute is a national, not-for-profit organization, which plays a critical role in the development of Canada's health information system. The DAD contains acute care discharge data from most Canadian hospitals. The data generated are essential for determining, for example, the number and types of procedures and the length of hospital stays. CIHI is conducting the first national data quality study of selected clinical and administrative data from the DAD. This study is evaluating and measuring the accuracy of the DAD by returning to the original data sources and comparing this information with what exists in the CIHI database, in order to identify any discrepancies and their associated reasons. This paper describes the DAD data quality study and some preliminary findings. The findings are also briefly compared with another similar study. In conclusion, the paper discusses subsequent steps for the study and how the findings from the first year are contributing to improvements in the quality of the DAD.

    Release date: 2002-09-12

  • Articles and reports: 11-522-X20010016286
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    It is customary for statistical agencies to audit tables containing suppressed cells in order to ensure that there is sufficient protection against inadvertent disclosure of sensitive information. If the table contains rounded values, this fact may be ignored by the audit procedure. This oversight can result in over-protection, reducing the utility of the published data. This paper provides correct auditing formulation and gives examples of over-protection.

    Release date: 2002-09-12

  • Articles and reports: 11-522-X20010016287
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    In this paper we discuss a specific component of a research agenda aimed at disclosure protections for "non-traditional" statistical outputs. We argue that these outputs present different disclosure risks than normally faced and hence may require new thinking on the issue. Specifically, we argue that kernel density estimators, while powerful (high quality) descriptions of cross-sections, pose potential disclosure risks that depend materially on the selection of bandwidth. We illustrate these risks using a unique, non-confidential data set on the statistical universe of coal mines and present potential solutions. Finally, we discuss current practices at the U.S. Census Bureau's Center for Economic Studies for performing disclosure analysis on kernel density estimators.

    Release date: 2002-09-12
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (8)

Analysis (8) ((8 results))

  • Articles and reports: 12-001-X202300100006
    Description: My comments consist of three components: (1) A brief account of my professional association with Chris Skinner. (2) Observations on Skinner’s contributions to statistical disclosure control, (3) Some comments on making inferences from masked survey data.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X201200111687
    Description:

    To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants' confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemination. We propose to create disclosure-protected subsamples from large scale surveys based on multiple imputation. The idea is to replace identifying or sensitive values in the original sample with draws from statistical models, and release subsamples of the disclosure-protected data. We present methods for making inferences with the multiple synthetic subsamples.

    Release date: 2012-06-27

  • Articles and reports: 11-522-X20050019483
    Description:

    All member countries in Europe face similar problems with respect to Statistical disclosure control (SDC). They all need to find a balance between preservation of privacy for the respondents and the very legitimate requests of society, researchers and policy makers to provide more and more detailed information. This growing demand, due to developments of the information age and knowledge society is a common problem of the European Statistical System (ESS).In the paper current Eurostat confidentiality issues and strategy are discussed and is described a European SDC approach through the establishment of a Centres and Networks of Excellence (CENEX).

    Release date: 2007-03-02

  • Articles and reports: 12-001-X20040027755
    Description:

    Several statistical agencies use, or are considering the use of, multiple imputation to limit the risk of disclosing respondents' identities or sensitive attributes in public use data files. For example, agencies can release partially synthetic datasets, comprising the units originally surveyed with some collected values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. This article presents an approach for generating multiply-imputed, partially synthetic datasets that simultaneously handles disclosure limitation and missing data. The basic idea is to fill in the missing data first to generate m completed datasets, then replace sensitive or identifying values in each completed dataset with r imputed values. This article also develops methods for obtaining valid inferences from such multiply-imputed datasets. New rules for combining the multiple point and variance estimates are needed because the double duty of multiple imputation introduces two sources of variability into point estimates, which existing methods for obtaining inferences from multiply-imputed datasets do not measure accurately. A reference t-distribution appropriate for inferences when m and r are moderate is derived using moment matching and Taylor series approximations.

    Release date: 2005-02-03

  • Articles and reports: 11-522-X20030017692
    Description:

    This paper discusses regression servers, which are data dissemination systems that return some of the output generated by regression analyses in response to user queries. It details work on the special case where the data contain a sensitive variable whose regressions must be protected.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20010016282
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    The Discharge Abstract Database (DAD) is one of the key data holdings held by the Canadian Institute for Health Information (CIHI). The institute is a national, not-for-profit organization, which plays a critical role in the development of Canada's health information system. The DAD contains acute care discharge data from most Canadian hospitals. The data generated are essential for determining, for example, the number and types of procedures and the length of hospital stays. CIHI is conducting the first national data quality study of selected clinical and administrative data from the DAD. This study is evaluating and measuring the accuracy of the DAD by returning to the original data sources and comparing this information with what exists in the CIHI database, in order to identify any discrepancies and their associated reasons. This paper describes the DAD data quality study and some preliminary findings. The findings are also briefly compared with another similar study. In conclusion, the paper discusses subsequent steps for the study and how the findings from the first year are contributing to improvements in the quality of the DAD.

    Release date: 2002-09-12

  • Articles and reports: 11-522-X20010016286
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    It is customary for statistical agencies to audit tables containing suppressed cells in order to ensure that there is sufficient protection against inadvertent disclosure of sensitive information. If the table contains rounded values, this fact may be ignored by the audit procedure. This oversight can result in over-protection, reducing the utility of the published data. This paper provides correct auditing formulation and gives examples of over-protection.

    Release date: 2002-09-12

  • Articles and reports: 11-522-X20010016287
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    In this paper we discuss a specific component of a research agenda aimed at disclosure protections for "non-traditional" statistical outputs. We argue that these outputs present different disclosure risks than normally faced and hence may require new thinking on the issue. Specifically, we argue that kernel density estimators, while powerful (high quality) descriptions of cross-sections, pose potential disclosure risks that depend materially on the selection of bandwidth. We illustrate these risks using a unique, non-confidential data set on the statistical universe of coal mines and present potential solutions. Finally, we discuss current practices at the U.S. Census Bureau's Center for Economic Studies for performing disclosure analysis on kernel density estimators.

    Release date: 2002-09-12
Reference (0)

Reference (0) (0 results)

No content available at this time.

Date modified: