The Research Data Centres (RDC) Program

Decision-makers need an up-to-date and in-depth understanding of Canadian society to help them respond not only to today's needs, but to anticipate tomorrow's as well. This need is underlined by a growing demand for analytical output from the rich sources of data collected by Statistics Canada.

The Research Data Centres RDCs are part of an initiative by Statistics Canada, the Social Sciences and Humanities Research Council (SSHRC) and university consortia to help strengthen Canada's social research capacity and to support the policy research community. The program would like to acknowledge the generous support of the Canada Foundation for Innovation (CFI) and the Canadian Institutes of Health Research (CIHR).

RDCs provide researchers with access, in a secure university setting, to microdata from population and household surveys. The centres are staffed by Statistics Canada employees. They are operated under the provisions of the Statistics Act in accordance with all the confidentiality rules and are accessible only to researchers with approved projects who have been sworn in under the Statistics Act as 'deemed employees.'

RDCs are located throughout the country, so researchers do not need to travel to Ottawa to access Statistics Canada microdata.

RDC origins

In 1998, the Canadian Initiative on Social Statistics studied the challenges facing the research community in Canada. One of the recommendations of the national task force PDF iconreport on the Advancement of Research using Social Statistics was the creation of research facilities to give academic researchers improved access to Statistics Canada's microdata files. This access would allow researchers in the social sciences to build expertise in quantitative methodology and analysis.

The benefits of RDCs

The research data centres provide opportunities to:

  • generate a wide perspective on Canada's social landscape;
  • provide social science research facilities across the country in both larger and smaller population centres;
  • expand the collaboration between Statistics Canada, SSHRC, CIHR, CFI, universities and academic researchers, and build on the Data Liberation Initiative; and
  • train a new generation of Canadian quantitative social scientists.

The difference between a confidential microdata file and a Public Use Microdata file (PUMF)

A Research Data Centre (RDC) provides access to Statistics Canada’s confidential microdata files. RDCs operate under the provisions of the Statistics Act in accordance with confidentiality rules. They are accessible only to researchers with approved projects who have been sworn in as "deemed employees" of Statistics Canada. The RDC confidential microdata files contain most of the original information collected during the survey interview with the subject as well as derived variables added to the dataset afterwards. They also contain the Bootstrap weights used to calculate the exact variance. These weights are available only in the Master file.

However, not all research projects require access to the confidential microdata files. In many cases, a Public Use Microdata File (PUMF) is sufficient in meeting a researcher’s needs. A PUMF is manipulated by aggregating, capping, or completely deleting variables that are considered "identifiers". As a result, an individual survey respondent cannot be identified. For example, age and income are grouped, body height is capped and grouped, and most of the geographic variables are removed with the exception, in most cases, of the province and health region where the respondent resides. The Data Liberation Initiative (DLI) at Statistics Canada is the portal which provides Canadian universities with the PUMFs.

The RDC mode of access is appropriate when a research question can only be answered using inferential statistical analysis on the confidential microdata. The researcher must become a “deemed employee” of Statistics Canada and conducts the analysis in a secure computer lab. RDC research is appropriate when:

  • access to sensitive variables not provided in the PUMF is required for the analysis
  • a PUMF does not exist
  • longitudinal data are required for the analysis
  • the analytical work is complex in nature and not suitable for other forms of data access.

Example of a PUMF and a Confidential Micro-data File

To protect confidentiality, the CCHS PUMF contains limited information on each respondent involving suppressed data, levels of geography restricted to the province and health region and broad variable coding (e.g. income and education categories).

By contrast, the CCHS micro-data Master file contains disaggregated information on each respondent including variables suppressed in the public file (e.g. the exact body height of a respondent), lower levels of geography (e.g. CMA and postal code), more detailed variable coding (e.g. additional categories of income and education) and variables at a continuous scale (e.g. age).

