Statistics by subject – Data dissemination

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (40)

All (40) (25 of 40 results)

  • Articles and reports: 12-001-X201700114818
    Description:

    The protection of data confidentiality in tables of magnitude can become extremely difficult when working in a custom tabulation environment. A relatively simple solution consists of perturbing the underlying microdata beforehand, but the negative impact on the accuracy of aggregates can be too high. A perturbative method is proposed that aims to better balance the needs of data protection and data accuracy in such an environment. The method works by processing the data in each cell in layers, applying higher levels of perturbation for the largest values and little or no perturbation for the smallest ones. The method is primarily aimed at protecting personal data, which tend to be less skewed than business data.

    Release date: 2017-06-22

  • Technical products: 11-522-X201700014735
    Description:

    Microdata dissemination normally requires data reduction and modification methods be applied, and the degree to which these methods are applied depend on the control methods that will be required to access and use the data. An approach that is in some circumstances more suitable for accessing data for statistical purposes is secure computation, which involves computing analytic functions on encrypted data without the need to decrypt the underlying source data to run a statistical analysis. This approach also allows multiple sites to contribute data while providing strong privacy guarantees. This way the data can be pooled and contributors can compute analytic functions without either party knowing their inputs. We explain how secure computation can be applied in practical contexts, with some theoretical results and real healthcare examples.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014719
    Description:

    Open Data initiatives are transforming how governments and other public institutions interact and provide services to their constituents. They increase transparency and value to citizens, reduce inefficiencies and barriers to information, enable data-driven applications that improve public service delivery, and provide public data that can stimulate innovative business opportunities. As one of the first international organizations to adopt an open data policy, the World Bank has been providing guidance and technical expertise to developing countries that are considering or designing their own initiatives. This presentation will give an overview of developments in open data at the international level along with current and future experiences, challenges, and opportunities. Mr. Herzog will discuss the rationales under which governments are embracing open data, demonstrated benefits to both the public and private sectors, the range of different approaches that governments are taking, and the availability of tools for policymakers, with special emphasis on the roles and perspectives of National Statistics Offices within a government-wide initiative.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014721
    Description:

    Open data is becoming an increasingly important expectation of Canadians, researchers, and developers. Learn how and why the Government of Canada has centralized the distribution of all Government of Canada open data through Open.Canada.ca and how this initiative will continue to support the consumption of statistical information.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014720
    Description:

    This paper is intended to give a brief overview of Statistics Canada’s involvement with open data. It will first discuss how the principles of open data are being adopted in the agency’s ongoing dissemination practices. It will then discuss the agency’s involvement with the whole of government open data initiative. This involvement is twofold: Statistics Canada is the major data contributor to the Government of Canada Open Data portal, but also plays an important behind the scenes role as the service provider responsible for developing and maintaining the Open Data portal (which is now part of the wider Open Government portal.)

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014732
    Description:

    The Institute for Employment Research (IAB) is the research unit of the German Federal Employment Agency. Via the Research Data Centre (FDZ) at the IAB, administrative and survey data on individuals and establishments are provided to researchers. In cooperation with the Institute for the Study of Labor (IZA), the FDZ has implemented the Job Submission Application (JoSuA) environment which enables researchers to submit jobs for remote data execution through a custom-built web interface. Moreover, two types of user-generated output files may be distinguished within the JoSuA environment which allows for faster and more efficient disclosure review services.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014733
    Description:

    The social value of data collections are dramatically enhanced by the broad dissemination of research files and the resulting increase in scientific productivity. Currently, most studies are designed with a focus on collecting information that is analytically useful and accurate, with little forethought as to how it will be shared. Both literature and practice also presume that disclosure analysis will take place after data collection. But to produce public-use data of the highest analytical utility for the largest user group, disclosure risk must be considered at the beginning of the research process. Drawing upon economic and statistical decision-theoretic frameworks and survey methodology research, this study seeks to enhance the scientific productivity of shared research data by describing how disclosure risk can be addressed in the earliest stages of research with the formulation of "safe designs" and "disclosure simulations", where an applied statistical approach has been taken in: (1) developing and validating models that predict the composition of survey data under different sampling designs; (2) selecting and/or developing measures and methods used in the assessments of disclosure risk, analytical utility, and disclosure survey costs that are best suited for evaluating sampling and database designs; and (3) conducting simulations to gather estimates of risk, utility, and cost for studies with a wide range of sampling and database design characteristics.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014731
    Description:

    Our study describes various factors that are of concern when evaluating disclosure risk of contextualized microdata and some of the empirical steps that are involved in their assessment. Utilizing synthetic sets of survey respondents, we illustrate how different postulates shape the assessment of risk when considering: (1) estimated probabilities that unidentified geographic areas are represented within a survey; (2) the number of people in the population who share the same personal and contextual identifiers as a respondent; and (3) the anticipated amount of coverage error in census population counts and extant files that provide identifying information (like names and addresses).

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014734
    Description:

    Data protection and privacy are key challenges that need to be tackled with high priority in order to enable the use of Big Data in the production of Official Statistics. This was emphasized in 2013 by the Directors of National Statistical Insitutes (NSIs) of the European Statistical System Committee (ESSC) in the Scheveningen Memorandum. The ESSC requested Eurostat and the NSIs to elaborate an action plan with a roadmap for following up the implementation of the Memorandum. At the Riga meeting on September 26, 2014, the ESSC endorsed the Big Data Action Plan and Roadmap 1.0 (BDAR) presented by the Eurostat Task Force on Big Data (TFBD) and agreed to integrate it into the ESS Vision 2020 portfolio. Eurostat also collaborates in this field with external partners such as the United Nations Economic Commission for Europe (UNECE). The big data project of the UNECE High-Level Group is an international project on the role of big data in the modernization of statistical production. It comprised four ‘task teams’ addressing different aspects of Big Data issues relevant for official statistics: Privacy, Partnerships, Sandbox, and Quality. The Privacy Task Team finished its work in 2014 and gave an overview of the existing tools for risk management regarding privacy issues, described how risk of identification relates to Big Data characteristics and drafted recommendations for National Statistical Offices (NSOs). It mainly concluded that extensions to existing frameworks, including use of new technologies were needed in order to deal with privacy risks related to the use of Big Data. The BDAR builds on the work achieved by the UNECE task teams. Specifically, it recognizes that a number of big data sources contain sensitive information, that their use for official statistics may induce negative perceptions with the general public and other stakeholders and that this risk should be mitigated in the short to medium term. It proposes to launch multiple actions like e.g., an adequate review on ethical principles governing the roles and activities of the NSIs and a strong communication strategy. The paper presents the different actions undertaken within the ESS and in collaboration with UNECE, as well as potential technical and legal solutions to be put in place in order to address the data protection and privacy risks in the use of Big Data for Official Statistics.

    Release date: 2016-03-24

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2014-11-12

  • Technical products: 11-522-X201300014285
    Description:

    The 2011 National Household Survey (NHS) is a voluntary survey that replaced the traditional mandatory long-form questionnaire of the Canadian census of population. The NHS sampled about 30% of Canadian households and achieved a design-weighted response rate of 77%. In comparison, the last census long form was sent to 20% of households and achieved a response rate of 94%. Based on the long-form data, Statistics Canada traditionally produces two public use microdata files (PUMFs): the individual PUMF and the hierarchical PUMF. Both give information on individuals, but the hierarchical PUMF provides extra information on the household and family relationships between the individuals. To produce two PUMFs, based on the NHS data, that cover the whole country evenly and that do not overlap, we applied a special sub-sampling strategy. Difficulties in the confidentiality analyses have increased because of the numerous new variables, the more detailed geographic information and the voluntary nature of the NHS. This paper describes the 2011 PUMF methodology and how it balances the requirements for more information and for low risk of disclosure.

    Release date: 2014-10-31

  • Articles and reports: 12-001-X201300111826
    Description:

    It is routine practice for survey organizations to provide replication weights as part of survey data files. These replication weights are meant to produce valid and efficient variance estimates for a variety of estimators in a simple and systematic manner. Most existing methods for constructing replication weights, however, are only valid for specific sampling designs and typically require a very large number of replicates. In this paper we first show how to produce replication weights based on the method outlined in Fay (1984) such that the resulting replication variance estimator is algebraically equivalent to the fully efficient linearization variance estimator for any given sampling design. We then propose a novel weight-calibration method to simultaneously achieve efficiency and sparsity in the sense that a small number of sets of replication weights can produce valid and efficient replication variance estimators for key population parameters. Our proposed method can be used in conjunction with existing resampling techniques for large-scale complex surveys. Validity of the proposed methods and extensions to some balanced sampling designs are also discussed. Simulation results showed that our proposed variance estimators perform very well in tracking coverage probabilities of confidence intervals. Our proposed strategies will likely have impact on how public-use survey data files are produced and how these data sets are analyzed.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201200111687
    Description:

    To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants' confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemination. We propose to create disclosure-protected subsamples from large scale surveys based on multiple imputation. The idea is to replace identifying or sensitive values in the original sample with draws from statistical models, and release subsamples of the disclosure-protected data. We present methods for making inferences with the multiple synthetic subsamples.

    Release date: 2012-06-27

  • Technical products: 11-522-X200600110433
    Description:

    The process of public-use micro-data files creation involves a number of components. One of its key elements is RTI International's innovative MASSC methodology. However, there are other major components in this process such as treatment of non-core identifying variables and extreme outcomes for extra protection. The statistical disclosure limitation is designed to counter both inside and outside intrusion. The components of the process are accordingly designed.

    Release date: 2008-03-17

  • Technical products: 11-522-X20050019434
    Description:

    Traditional methods for statistical disclosure limitation in tabular data are cell suppression, data rounding and data perturbation. Because the suppression mechanism is not describable in probabilistic terms, suppressed tables are not amenable to statistical methods such as imputation. Data quality characteristics of suppressed tables are consequently poor.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019483
    Description:

    All member countries in Europe face similar problems with respect to Statistical disclosure control (SDC). They all need to find a balance between preservation of privacy for the respondents and the very legitimate requests of society, researchers and policy makers to provide more and more detailed information. This growing demand, due to developments of the information age and knowledge society is a common problem of the European Statistical System (ESS).In the paper current Eurostat confidentiality issues and strategy are discussed and is described a European SDC approach through the establishment of a Centres and Networks of Excellence (CENEX).

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019451
    Description:

    The international comparative Generations and Gender Program, which is coordinated by the Population Activities Unit of the United Nations' Economic Commission for Europe, combines a panel survey carried out in various European countries, Japan, and Australia with a comparative contextual database developed as integral part of the program.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019487
    Description:

    The goal of this presentation is to present the different quality measures used to evaluate and manage the collection process related to the Telephone First Contact methodology in LFS.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019488
    Description:

    This paper sets out the importance of quality measures that can be used for monitoring purposes of these current and future information needs in the ESS. Particular emphasis is put on the needs for generalisation of initiatives in the ESS for the development and implementation of operational quality measures for enhanced quality of the statistical processes.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019463
    Description:

    Statisticians are developing additional concepts for communicating errors associated with estimates. Many of these concepts are readily understood by statisticians but are even more difficult to explain to users than the traditional confidence interval. The proposed solution, when communicating with non-statisticians, is to improve the estimates so that the requirement for explaining the error is minimised. The user is then not confused by having too many numbers to understand.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019462
    Description:

    The traditional approach to presenting variance information to data users is to publish estimates of variance or related statistics, such as standard errors, coefficients of variation, confidence limits or simple grading systems. The paper examines potential sources of variance, such as sample design, sample allocation, sample selection, non-response, and considers what might best be done to reduce variance. Finally, the paper assesses briefly the financial costs to producers and users of reducing or not reducing variance and how we might trade off the costs of producing more accurate statistics against the financial benefits of greater accuracy.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019436
    Description:

    Regardless of the specifics of any given metadata scheme, there are common metadata constructs used to describe statistical data. This paper will give an overview of the different approaches taken to achieve the common goal of providing consistent information.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019433
    Description:

    Spatially explicit data pose a series of opportunities and challenges for all the actors involved in providing data for long-term preservation and secondary analysis - the data producer, the data archive, and the data user.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019461
    Description:

    We propose a generalization of the usual coefficient of variation (CV) to address some of the known problems when used in measuring quality of estimates. Some of the problems associated with CV include interpretation when the estimate is near zero, and the inconsistency in the interpretation about precision when computed for different one-to-one monotonic transformations.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019494
    Description:

    Traditionally, data quality indicators reported by surveys have been the sampling variance, coverage error, non-response rate and imputation rate. To obtain an imputation rate when combining survey data and administrative data, one of the problems is to compute the imputation rate itself. The presentation will discuss how to solve this problem. First, we will discuss the desired properties when developing a rate in a general context. Second, we will develop some concepts and definitions that will help us to develop combine rates. Third, we will propose different combined rates for the case of imputation. We will then present three different combined rates, and we will discuss properties for each rate. We will end with some illustrative examples.

    Release date: 2007-03-02

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (8)

Analysis (8) (8 of 8 results)

  • Articles and reports: 12-001-X201700114818
    Description:

    The protection of data confidentiality in tables of magnitude can become extremely difficult when working in a custom tabulation environment. A relatively simple solution consists of perturbing the underlying microdata beforehand, but the negative impact on the accuracy of aggregates can be too high. A perturbative method is proposed that aims to better balance the needs of data protection and data accuracy in such an environment. The method works by processing the data in each cell in layers, applying higher levels of perturbation for the largest values and little or no perturbation for the smallest ones. The method is primarily aimed at protecting personal data, which tend to be less skewed than business data.

    Release date: 2017-06-22

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2014-11-12

  • Articles and reports: 12-001-X201300111826
    Description:

    It is routine practice for survey organizations to provide replication weights as part of survey data files. These replication weights are meant to produce valid and efficient variance estimates for a variety of estimators in a simple and systematic manner. Most existing methods for constructing replication weights, however, are only valid for specific sampling designs and typically require a very large number of replicates. In this paper we first show how to produce replication weights based on the method outlined in Fay (1984) such that the resulting replication variance estimator is algebraically equivalent to the fully efficient linearization variance estimator for any given sampling design. We then propose a novel weight-calibration method to simultaneously achieve efficiency and sparsity in the sense that a small number of sets of replication weights can produce valid and efficient replication variance estimators for key population parameters. Our proposed method can be used in conjunction with existing resampling techniques for large-scale complex surveys. Validity of the proposed methods and extensions to some balanced sampling designs are also discussed. Simulation results showed that our proposed variance estimators perform very well in tracking coverage probabilities of confidence intervals. Our proposed strategies will likely have impact on how public-use survey data files are produced and how these data sets are analyzed.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201200111687
    Description:

    To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants' confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemination. We propose to create disclosure-protected subsamples from large scale surveys based on multiple imputation. The idea is to replace identifying or sensitive values in the original sample with draws from statistical models, and release subsamples of the disclosure-protected data. We present methods for making inferences with the multiple synthetic subsamples.

    Release date: 2012-06-27

  • Articles and reports: 12-001-X20040027755
    Description:

    Several statistical agencies use, or are considering the use of, multiple imputation to limit the risk of disclosing respondents' identities or sensitive attributes in public use data files. For example, agencies can release partially synthetic datasets, comprising the units originally surveyed with some collected values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. This article presents an approach for generating multiply-imputed, partially synthetic datasets that simultaneously handles disclosure limitation and missing data. The basic idea is to fill in the missing data first to generate m completed datasets, then replace sensitive or identifying values in each completed dataset with r imputed values. This article also develops methods for obtaining valid inferences from such multiply-imputed datasets. New rules for combining the multiple point and variance estimates are needed because the double duty of multiple imputation introduces two sources of variability into point estimates, which existing methods for obtaining inferences from multiply-imputed datasets do not measure accurately. A reference t-distribution appropriate for inferences when m and r are moderate is derived using moment matching and Taylor series approximations.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X199600114381
    Description:

    Problems arising from statistical disclosure control, which aims to prevent that information about individual respondents is disclosed by users of data, have come to the fore rapidly in recent years. The main reason for this is the growing demand for detailed data provided by statistical offices caused by the still increasing use of computers. In former days tables with relatively little information were published. Nowadays the users of data demand much more detailed tables and, moreover, microdata to analyze by themselves. Because of this increase in information content statistical disclosure control has become much more difficult. In this paper the authors give their view on the problems which one encounters when trying to protect microdata against disclosure. This view is based on their experience with statistical disclosure control acquired at Statistics Netherlands.

    Release date: 1996-06-14

  • Articles and reports: 12-001-X199400214420
    Description:

    The statistical literature contains many methods for disclosure limitation in microdata. However, their use by statistical agencies and understanding of their properties and effects has been limited. For purposes of furthering research and use of these methods, and facilitating their evaluation and quality assurance, it would be desirable to formulate them within a single framework. A framework called matrix masking - based on ordinary matrix arithmetic - is presented, and explicit matrix mask formulations are given for the principal microdata disclosure limitation methods in current use. This enables improved understanding and implementation of these methods by statistical agencies and other practitioners.

    Release date: 1994-12-15

  • Articles and reports: 12-001-X198800214582
    Description:

    A comprehensive bibliography of books, research reports and published papers, dealing with the theory, application and development of randomized response techniques, includes a subject classification.

    Release date: 1988-12-15

Reference (32)

Reference (32) (25 of 32 results)

  • Technical products: 11-522-X201700014735
    Description:

    Microdata dissemination normally requires data reduction and modification methods be applied, and the degree to which these methods are applied depend on the control methods that will be required to access and use the data. An approach that is in some circumstances more suitable for accessing data for statistical purposes is secure computation, which involves computing analytic functions on encrypted data without the need to decrypt the underlying source data to run a statistical analysis. This approach also allows multiple sites to contribute data while providing strong privacy guarantees. This way the data can be pooled and contributors can compute analytic functions without either party knowing their inputs. We explain how secure computation can be applied in practical contexts, with some theoretical results and real healthcare examples.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014719
    Description:

    Open Data initiatives are transforming how governments and other public institutions interact and provide services to their constituents. They increase transparency and value to citizens, reduce inefficiencies and barriers to information, enable data-driven applications that improve public service delivery, and provide public data that can stimulate innovative business opportunities. As one of the first international organizations to adopt an open data policy, the World Bank has been providing guidance and technical expertise to developing countries that are considering or designing their own initiatives. This presentation will give an overview of developments in open data at the international level along with current and future experiences, challenges, and opportunities. Mr. Herzog will discuss the rationales under which governments are embracing open data, demonstrated benefits to both the public and private sectors, the range of different approaches that governments are taking, and the availability of tools for policymakers, with special emphasis on the roles and perspectives of National Statistics Offices within a government-wide initiative.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014721
    Description:

    Open data is becoming an increasingly important expectation of Canadians, researchers, and developers. Learn how and why the Government of Canada has centralized the distribution of all Government of Canada open data through Open.Canada.ca and how this initiative will continue to support the consumption of statistical information.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014720
    Description:

    This paper is intended to give a brief overview of Statistics Canada’s involvement with open data. It will first discuss how the principles of open data are being adopted in the agency’s ongoing dissemination practices. It will then discuss the agency’s involvement with the whole of government open data initiative. This involvement is twofold: Statistics Canada is the major data contributor to the Government of Canada Open Data portal, but also plays an important behind the scenes role as the service provider responsible for developing and maintaining the Open Data portal (which is now part of the wider Open Government portal.)

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014732
    Description:

    The Institute for Employment Research (IAB) is the research unit of the German Federal Employment Agency. Via the Research Data Centre (FDZ) at the IAB, administrative and survey data on individuals and establishments are provided to researchers. In cooperation with the Institute for the Study of Labor (IZA), the FDZ has implemented the Job Submission Application (JoSuA) environment which enables researchers to submit jobs for remote data execution through a custom-built web interface. Moreover, two types of user-generated output files may be distinguished within the JoSuA environment which allows for faster and more efficient disclosure review services.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014733
    Description:

    The social value of data collections are dramatically enhanced by the broad dissemination of research files and the resulting increase in scientific productivity. Currently, most studies are designed with a focus on collecting information that is analytically useful and accurate, with little forethought as to how it will be shared. Both literature and practice also presume that disclosure analysis will take place after data collection. But to produce public-use data of the highest analytical utility for the largest user group, disclosure risk must be considered at the beginning of the research process. Drawing upon economic and statistical decision-theoretic frameworks and survey methodology research, this study seeks to enhance the scientific productivity of shared research data by describing how disclosure risk can be addressed in the earliest stages of research with the formulation of "safe designs" and "disclosure simulations", where an applied statistical approach has been taken in: (1) developing and validating models that predict the composition of survey data under different sampling designs; (2) selecting and/or developing measures and methods used in the assessments of disclosure risk, analytical utility, and disclosure survey costs that are best suited for evaluating sampling and database designs; and (3) conducting simulations to gather estimates of risk, utility, and cost for studies with a wide range of sampling and database design characteristics.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014731
    Description:

    Our study describes various factors that are of concern when evaluating disclosure risk of contextualized microdata and some of the empirical steps that are involved in their assessment. Utilizing synthetic sets of survey respondents, we illustrate how different postulates shape the assessment of risk when considering: (1) estimated probabilities that unidentified geographic areas are represented within a survey; (2) the number of people in the population who share the same personal and contextual identifiers as a respondent; and (3) the anticipated amount of coverage error in census population counts and extant files that provide identifying information (like names and addresses).

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014734
    Description:

    Data protection and privacy are key challenges that need to be tackled with high priority in order to enable the use of Big Data in the production of Official Statistics. This was emphasized in 2013 by the Directors of National Statistical Insitutes (NSIs) of the European Statistical System Committee (ESSC) in the Scheveningen Memorandum. The ESSC requested Eurostat and the NSIs to elaborate an action plan with a roadmap for following up the implementation of the Memorandum. At the Riga meeting on September 26, 2014, the ESSC endorsed the Big Data Action Plan and Roadmap 1.0 (BDAR) presented by the Eurostat Task Force on Big Data (TFBD) and agreed to integrate it into the ESS Vision 2020 portfolio. Eurostat also collaborates in this field with external partners such as the United Nations Economic Commission for Europe (UNECE). The big data project of the UNECE High-Level Group is an international project on the role of big data in the modernization of statistical production. It comprised four ‘task teams’ addressing different aspects of Big Data issues relevant for official statistics: Privacy, Partnerships, Sandbox, and Quality. The Privacy Task Team finished its work in 2014 and gave an overview of the existing tools for risk management regarding privacy issues, described how risk of identification relates to Big Data characteristics and drafted recommendations for National Statistical Offices (NSOs). It mainly concluded that extensions to existing frameworks, including use of new technologies were needed in order to deal with privacy risks related to the use of Big Data. The BDAR builds on the work achieved by the UNECE task teams. Specifically, it recognizes that a number of big data sources contain sensitive information, that their use for official statistics may induce negative perceptions with the general public and other stakeholders and that this risk should be mitigated in the short to medium term. It proposes to launch multiple actions like e.g., an adequate review on ethical principles governing the roles and activities of the NSIs and a strong communication strategy. The paper presents the different actions undertaken within the ESS and in collaboration with UNECE, as well as potential technical and legal solutions to be put in place in order to address the data protection and privacy risks in the use of Big Data for Official Statistics.

    Release date: 2016-03-24

  • Technical products: 11-522-X201300014285
    Description:

    The 2011 National Household Survey (NHS) is a voluntary survey that replaced the traditional mandatory long-form questionnaire of the Canadian census of population. The NHS sampled about 30% of Canadian households and achieved a design-weighted response rate of 77%. In comparison, the last census long form was sent to 20% of households and achieved a response rate of 94%. Based on the long-form data, Statistics Canada traditionally produces two public use microdata files (PUMFs): the individual PUMF and the hierarchical PUMF. Both give information on individuals, but the hierarchical PUMF provides extra information on the household and family relationships between the individuals. To produce two PUMFs, based on the NHS data, that cover the whole country evenly and that do not overlap, we applied a special sub-sampling strategy. Difficulties in the confidentiality analyses have increased because of the numerous new variables, the more detailed geographic information and the voluntary nature of the NHS. This paper describes the 2011 PUMF methodology and how it balances the requirements for more information and for low risk of disclosure.

    Release date: 2014-10-31

  • Technical products: 11-522-X200600110433
    Description:

    The process of public-use micro-data files creation involves a number of components. One of its key elements is RTI International's innovative MASSC methodology. However, there are other major components in this process such as treatment of non-core identifying variables and extreme outcomes for extra protection. The statistical disclosure limitation is designed to counter both inside and outside intrusion. The components of the process are accordingly designed.

    Release date: 2008-03-17

  • Technical products: 11-522-X20050019434
    Description:

    Traditional methods for statistical disclosure limitation in tabular data are cell suppression, data rounding and data perturbation. Because the suppression mechanism is not describable in probabilistic terms, suppressed tables are not amenable to statistical methods such as imputation. Data quality characteristics of suppressed tables are consequently poor.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019483
    Description:

    All member countries in Europe face similar problems with respect to Statistical disclosure control (SDC). They all need to find a balance between preservation of privacy for the respondents and the very legitimate requests of society, researchers and policy makers to provide more and more detailed information. This growing demand, due to developments of the information age and knowledge society is a common problem of the European Statistical System (ESS).In the paper current Eurostat confidentiality issues and strategy are discussed and is described a European SDC approach through the establishment of a Centres and Networks of Excellence (CENEX).

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019451
    Description:

    The international comparative Generations and Gender Program, which is coordinated by the Population Activities Unit of the United Nations' Economic Commission for Europe, combines a panel survey carried out in various European countries, Japan, and Australia with a comparative contextual database developed as integral part of the program.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019487
    Description:

    The goal of this presentation is to present the different quality measures used to evaluate and manage the collection process related to the Telephone First Contact methodology in LFS.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019488
    Description:

    This paper sets out the importance of quality measures that can be used for monitoring purposes of these current and future information needs in the ESS. Particular emphasis is put on the needs for generalisation of initiatives in the ESS for the development and implementation of operational quality measures for enhanced quality of the statistical processes.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019463
    Description:

    Statisticians are developing additional concepts for communicating errors associated with estimates. Many of these concepts are readily understood by statisticians but are even more difficult to explain to users than the traditional confidence interval. The proposed solution, when communicating with non-statisticians, is to improve the estimates so that the requirement for explaining the error is minimised. The user is then not confused by having too many numbers to understand.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019462
    Description:

    The traditional approach to presenting variance information to data users is to publish estimates of variance or related statistics, such as standard errors, coefficients of variation, confidence limits or simple grading systems. The paper examines potential sources of variance, such as sample design, sample allocation, sample selection, non-response, and considers what might best be done to reduce variance. Finally, the paper assesses briefly the financial costs to producers and users of reducing or not reducing variance and how we might trade off the costs of producing more accurate statistics against the financial benefits of greater accuracy.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019436
    Description:

    Regardless of the specifics of any given metadata scheme, there are common metadata constructs used to describe statistical data. This paper will give an overview of the different approaches taken to achieve the common goal of providing consistent information.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019433
    Description:

    Spatially explicit data pose a series of opportunities and challenges for all the actors involved in providing data for long-term preservation and secondary analysis - the data producer, the data archive, and the data user.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019461
    Description:

    We propose a generalization of the usual coefficient of variation (CV) to address some of the known problems when used in measuring quality of estimates. Some of the problems associated with CV include interpretation when the estimate is near zero, and the inconsistency in the interpretation about precision when computed for different one-to-one monotonic transformations.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019494
    Description:

    Traditionally, data quality indicators reported by surveys have been the sampling variance, coverage error, non-response rate and imputation rate. To obtain an imputation rate when combining survey data and administrative data, one of the problems is to compute the imputation rate itself. The presentation will discuss how to solve this problem. First, we will discuss the desired properties when developing a rate in a general context. Second, we will develop some concepts and definitions that will help us to develop combine rates. Third, we will propose different combined rates for the case of imputation. We will then present three different combined rates, and we will discuss properties for each rate. We will end with some illustrative examples.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019489
    Description:

    This presentation (1) explores the meanings of quality within a national statistical organization, (2) examines the users and uses of the quality measurements, (3) identifies particular issues in the measurement of quality, and (4) argues the need for a balanced set of measures. Among the issues discussed are the roles of customer satisfaction measures, traditional survey quality measures, financial measures, and the reliability of the quality measures themselves. The discussion draws upon the statistical and the quality management literature, and includes examples drawn from a variety of national statistical organizations.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019456
    Description:

    The metadata associated with microdata production of major Statistics Canada household and social surveys are often voluminous and daunting. There does not appear to be a systematic approach to disseminating the metadata of confidential microdata files across all surveys. This heterogeneity applies to content as well as method of dissemination. A pilot project was conducted within the RDC Program to evaluate one standard, the Data Documentation Initiative (DDI), that might support such a process.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019484
    Description:

    The presentation reviewed the methodological problems related to the anonymisation of a European database, problems enhanced by the multiplicity of perceptions and realities of disclosure risk encountered in the different countries. Best practices are benchmarked against the practical issues. The presentation detailed, first, the Eurostat policy and practical arrangements taken with respect to the release of EU-SILC micro data base and, second, the methodological options taken to anonymise the database. The close inter-relation between these two aspects is highlighted in the presentation. The solution realises a trade off between the reduction of disclosure risk in each national component and the utility of released micro data by preserving information content and harmonisation of procedures used. The future perspectives with respect to European micro data release were also discussed.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019460
    Description:

    Users will analyse and interpret the time series of estimates in various ways often involving estimates for several time periods. Despite the large sample sizes and degree of overlap between the sample for some periods the sampling errors can still substantially affect the estimates of movements and functions of them used to interpret the series of estimates. We consider how to account for sampling errors in the interpretation of the estimates from repeated surveys and how to inform the users and analysts of their possible impact.

    Release date: 2007-03-02

Date modified: