Chapter 4.4: Access to microdata

Context

Both data users and national statistical offices (NSOs) agree on the importance of maximum access to statistical information. From a data user's perspective, statistical information provided by NSOs should be easily accessible so that it can be used to its fullest possible. From an NSO's perspective, data should be made available to the public in order to maximize their utility and, therefore, their relevance. NSOs should provide a continuum of options or services allowing different levels of access for different needs, while ensuring appropriate protection of confidentiality.

At Statistics Canada, survey data can be accessed in three ways, as shown in the figure 4.4.1:

  • Overall statistical products intended for dissemination to the public at large are accessible through the agency's corporate data repository, which contains most publicly available current and time-series data produced. In addition, the agency's website has analytical articles, other aggregate data tables, and other metadata that are disseminated and accessible for public consumption. For more details about the dissemination of data through the website, refer to Chapter 4.1: Disseminating data through the website.
  • Custom requests, including data tabulations, are available through Statistical Information Services and from a few subject-matter and service divisions. For more details about these services, consult Chapter 4.2: External communications and outreach.
  • Microdata programs and services are the focus of this chapter.

Figure 4.4.1: Data access mechanisms at Statistics Canada

Data access mechanisms at Statistics Canada
Description of Figure 4.4.1

This graphic represents the data access mechanisms at Statistics Canada.

Data Access splits its authority into three sections.

  1. Statistical Products (which includes Statistics Canada website, the Daily releases, the Cansim tables, the Analytic articles and the Aggregate tables.)
  2. Custom Request (which includes Data tabulations.)
  3. Microdata Program or Services (which includes Data Liberation Initiative and Public Use Microdata Files, Real Time Remote Access, Research Data Centres, Federal Research Data Centre and Canadian Centre for Data Development and Economic Research.)

Under those sections, there are three connections:  Data links to Access, Aggregated data links to Microdata and Open statistics links to Restricted data.

This diagram shows the three ways of accessing data at Statistics Canada. The three lines at the bottom illustrate that statistical products are usually aggregate data openly available through the website (except custom requests), while access to microdata programs and is restricted to protect confidentiality.

Increasingly, strong evidenced-based policy research requires access not only to aggregate statistics but also to anonymized data at the level of the individual business, household or person.

While providing a continuum of access to data, Statistics Canada is also ensuring that the following principles are applied:

  • The privacy and confidentiality of respondents must be protected in all agreements for microdata access.
  • Multiple-access options are made available as part of a continuum.
  • Access to microdata is provided for research purposes, for the public good.
  • Access costs will be covered by the researcher or the research community.

Figure 4.4.2 illustrates the interplay between ease of access and the level of detail or level of confidentiality of the data being accessed.

Figure 4.4.2: Access continuum and confidentiality challenge

Access continuum and confidentiality challenge
Description of Figure 4.4.2

This graphic shows the scale for access continuum and confidentiality risk. It is a scale between Aggregation and Confidentiality Risk.

Near Aggregation, you have Aggregate and Public Data. You have E.g. CANSIM, Census Profiles.

Near the middle on Aggregation side, you have Custom Tabulations and Other Products and Services.

Near the middle on Confidentiality Risk side, you have Public Use Microdata.

On Confidentiality Risk side, you have Confidentiality and Microdata Master Files.

At Statistics Canada, the delivery of microdata access services is centralized within the area (Microdata Access Division) that serves as the focal point for external requests to access microdata on a cost-recovery basis. This also provides opportunities for cost-efficiencies through harmonization, resource sharing, and minimization of duplication.

Strategies and tools

In order to meet the distinct needs of researchers, Statistics Canada has implemented a diverse program of microdata access offering different options. These options can be grouped into three types of services:

  • Access to public use microdata files (PUMFs) (access to the Public Use Microdata File Collection (PUMFs Collection) and the Data Liberation Initiative (DLI))
  • Direct access to detailed microdata in a secure physical environment (research data centres (RDCs), ¬†Federal Research Data Centre (FRDC), and Canadian Centre for Data Development and Economic Research (CDER))
  • Real Time Remote Access (RTRA).

1. Access to public use microdata files (PUMFs)

Two types of services allow the public use of microdata files: access to PUMFs and access to the DLI.

1.1 Access to the Public Use Microdata File (PUMF) Collection

Public use microdata files contain anonymized microdata. Individual access to PUMFs is free to the public. Individuals wishing to access a PUMF need only contact Statistics Canada and sign a license agreement to receive the file (to be used for statistical purposes).

The PUMF Collection is a subscription-based service that offers institutional access to the collection of available public microdata files. For an annual fee, designated contacts at subscribing institutions can have unlimited access to all microdata and documentation available in the PUMF Collection.

1.2 Access to the Data Liberation Initiative (DLI)

The DLI is a partnership between postsecondary institutions and Statistics Canada for improving access to Canadian data resources by academic researchers, teachers and students. The DLI provides a wide range of data and metadata to participating postsecondary educational institutions, allowing their faculty and students unlimited access to numerous PUMFs, databases and geographic files. Academic institutions pay a service fee for DLI support.

The public use of microdata files (both the PUMF Collection and the DLI) is guided by the same legal rules, including the following:

  • The Policy on Microdata Release governs access to Statistics Canada microdata by providing a framework for authorizing the release of Statistics Canada microdata files for statistical purposes, while ensuring that the confidentiality of the information is protected. The policy details the governance structures, mechanisms and resources in place to ensure the continuous and effective management of Statistics Canada's microdata holdings.
  • All PUMFs require a license agreement in place for a user to obtain a file. This agreement stipulates that the agency grants the user a worldwide, royalty-free, non-exclusive license to use, reproduce, publish, freely distribute, or sell that information. Meanwhile, usage should conform to certain rules, including the requirement to reproduce the information accurately and the obligation not to merge, or otherwise use, this information concurrently with information in other database(s) for the purpose of attempting to identify an individual person, business or organization.

2. Direct access to detailed microdata in a secure physical environment

2.1 The Research Data Centres (RDC) Program

RDCs provide researchers with direct access to a wide range of population and household surveys, as well as administrative microdata files, in a secure facility managed and staffed by Statistics Canada but located in a university setting. RDCs are staffed by Statistics Canada analysts and are accessible only to researchers with approved projects who have been sworn in under the Statistics Act as "deemed employees." Being sworn in under the act ensures not only the confidentiality requirements but also the legal sanctions outlined in the act. Researchers must submit research proposals and the associated microdata access requirements. Proposals are vetted by Statistics Canada for compliance with the public-good criteria that underlie the RDC Program.

RDCs are located throughout the country so that researchers do not have to travel to Ottawa to access Statistics Canada microdata. The RDC network is a cost-recovery program funded by the universities and academic funding agencies. In some situations, access fees may be charged to researchers not affiliated with a university in the network.

2.2 The Federal Research Data Centre (FRDC) Program

The FRDC Program is similar to the RDC Program in that this access service is streamlined for the research needs of federal government departments. The FRDC provides a secure site where federal employees can conduct complex statistical analysis. Like RDCs, the FRDC provides researchers with access to a wide range of population and household surveys, as well as administrative microdata files, in a secure setting. The FRDC Program has two locations in the National Capital Region; both are staffed by Statistics Canada employees and are accessible only to researchers with approved projects who have been sworn in under the Statistics Act as "deemed employees." The FRDC operates on a cost-recovery basis.

2.3 The Canadian Centre for Data Development and Economic Research (CDER)

The Canadian Centre for Data Development and Economic Research provides researchers with direct access to a wide range of business and economic microdata files for analytical research. The Centre is located at Statistics Canada's head office (in the National Capital Region). It operates entirely on a cost-recovery basis. The microdata files used for approved projects are accessible to only researchers with approved projects who have been sworn in under the Statistics Act as "deemed employees."

The RDC Program, the FRDC Program, and CDER are guided by the following policy frameworks:

  • The Directive on the Use of Deemed Employees outlines the Statistics Act requirements that allow researchers access to confidential microdata in RDCs, the FRDC, and CDER. This directive distinguishes between researchers accessing microdata for analysis and those accessing data for other purposes, such as quality control validation.
  • The Policy on Microdata Access provides a framework to help achieve efficient and effective access to Statistics Canada microdata for statistical purposes, while ensuring that the confidentiality of the information is protected.

3. The Real Time Remote Access (RTRA) system

The RTRA system is an online remote-access service allowing users to run statistical software used for tabulation (SAS), in real-time against microdata files located in a central and secure location. Researchers using the RTRA system do not gain direct access to the microdata and cannot view the contents of the microdata file. Instead, users submit SAS programs to extract results in the form of frequency tables. As RTRA researchers cannot view the microdata, becoming a deemed employees of Statistics Canada is not necessary. There is a subscription fee to obtain access to the RTRA service, and there is no requirement to submit proposals.

To obtain an account on the RTRA, each user must be associated with an organization; each user must also acknowledge in writing the terms and conditions of use for the RTRA system. Access to microdata through the RTRA system allows researchers to submit a program and receive output that has been automatically vetted for confidentiality. All aggregate or statistical outputs from the RTRA are covered by the open-data license, and can be shared freely.

4. Governance of Microdata Access Services

The Microdata Management Access Committee plays an important role in the governance of access to microdata. It has a mandate to provide advice, guidance and direction on access to microdata and matters pertaining to access to information, privacy and confidentiality obligations under the Statistics Act and other federal legislation and policies, by

  • reviewing all requests for changes to policies and processes pertaining to access to microdata and providing strategic direction to the Microdata Access Division on matters related to data access, as per the Policy on Microdata Access;
  • overseeing the review and approval of submissions for advance release (including work-in-progress for data validation purposes and collaborative programs);
  • approving all research proposals initiated or sponsored by Statistics Canada, and using the services of researchers as deemed employees;
  • supporting evidence-based research in Canada by working to increase the data available in RDCs, the FRDC, and the CDER, and by working to expand secure access modes to data by developing new technologies.

Key success factors

The growing use of these programs and services by universities, government, institutions, and the private sector is a key success factor for microdata access. This increase in use is also continuous, which is evident not only by the number of universities and federal departments accessing the files, but also by the number of microdata files available in Canadian universities, RDCs and other institutions.

It is important to acknowledge that these access services have been built and enhanced as a result of partnerships with data providers and researchers from government, non-governmental institutions, academia, and the private sector.

More specifically, the DLI and the RDC program models are considered success stories; the successful partnerships implemented with the researcher and academic communities have become internationally recognized "brands." Other national statistical agencies and the international research community have expressed interest in adopting a model similar to the DLI, which would make their own data accessible to researchers around the world.

Although the DLI and the RDC Program are distinct access programs, they have developed synergies over the years through enhanced information-sharing, efficiencies, best practices, and close teamwork between the two programs.

The CDER, while a newer model, is another success story which has opened the door to more research using micro-data files from business surveys. The CDER is also modelling a lot of its practices from those used in the RDC to ensure consistency in approaches.

In parallel, the RTRA system is the most recent access program to be offered to the research community, and is seeing early success. The success of the RTRA can be attributed to the speed at which researchers can access a large volume of data files for tabulations. Since the RTRA does not require a rigorous review process or limit access to the particular files specified in a proposal, researchers can create tables from multiple microdata files very quickly. The service fills a gap between RDCs, where researchers can use many methods to analyze detailed microdata, but are restricted to a single purpose, and the PUMFs, which provide broader use of the microdata, but have more limited content.

Challenges

The DLI faces mostly technical barriers, relating to users accessing, downloading or manipulating data. Statistics Canada is currently investing in the update of the technical infrastructure and the enhancement of the search platform.

With regard to RDCs and CDER, challenges relate to the availability of resources. The scope of the program has grown more quickly than its funding; main stakeholders have expressed the need to access an increasing number and variety of data types. Additional resources are required to support the expected growth: from improving the information technology infrastructure to protecting confidentiality and better meeting researchers' needs.

The challenges facing the RTRA are centred on the statistics that the system can output, which are currently limited to descriptive statistics. Research is currently underway to determine whether more analytical statistics, such as regression analysis, could be included as part of the RTRA system.

Looking ahead

In the coming years, Statistics Canada's microdata access programs will work towards increasing the number and types of available data files and facilitating access to data, by enhancing metadata, building tools, and improving their technological infrastructure.

Access programs will continue to expand their collections by making new and existing Statistics Canada survey and administrative data available through the DLI, the PUMF Collection, the RTRA, RDCs, and CDER. The RDC and CDER Programs will increase their data holdings through efforts in data development, pilot and linkage projects, and acquisitions of administrative data from external organizations.

Going forward, CDER will continue to improve its data documentation and databases that consider longitudinality. In addition, CDER will continue to examine how it can implement broader access while, at the same time, maintain the required control conditions.

Bibliography

Statistics Canada. Corporate Business Plan, 2014-15 to 2015-2016, Ottawa. Internal document. Accessible on demand.

Statistics Canada (1985). Statistics Act (R.S.C., 1985, c. S-19). Consulted on the 11th of March 2016 and retrieved from http://laws-lois.justice.gc.ca/eng/acts/S-19/FullText.html.

Statistics Canada (1987). Policy on Microdata Release, Ottawa. Internal document. Accessible on demand.

Statistics Canada (2012). Policy on Microdata Access, Ottawa. Internal document. Accessible on demand.

Statistics Canada (2013). Directive on the Use of Deemed Employees, Ottawa. Internal document. Accessible on demand.

Date modified: