Frequently asked questions on using new and existing data for official statistics

Administrative data

What are administrative data?

Administrative data are information collected by government and private sector organizations as part of their ongoing operations, which are then obtained by Statistics Canada to help meet its mandated objectives. Examples include records of births and deaths, taxation records, records about the flow of goods and people across borders, and data collected by satellites. Statistics Canada has the obligation to keep administrative data, such as vital statistics or tax data - private, secure, and confidential. It's the law.

What does Statistics Canada do with administrative data?

Like most other statistical agencies, Statistics Canada uses administrative data in lieu of or to complement survey data, and to support statistical operations. Using administrative data responsibly means the agency is able to improve data quality, meet new and ongoing information needs, reduce data collection costs and save time for Canadians who respond to our surveys. Administrative data are especially helpful to obtain data pertaining to populations or topics that may be difficult or costly to obtain by survey. Statistics Canada does this in a transparent manner.

These data enable Statistics Canada to produce statistics and research to benefit Canada, such as the use of health records to provide input into local health initiatives.

How does Statistics Canada ensure confidentiality of administrative data?

At Statistics Canada, the confidentiality of data is governed by the Statistics Act, the Access to Information Act, and the Privacy Act — and departmental policies, directives and supporting systems and tools on the collection, protection and use of administrative data.

Statistics Canada employees and deemed employees are also sworn to secrecy, and subject to fines and/or imprisonment, should they reveal confidential information.

Is the use of administrative data something new at Statistics Canada?

No. Statistics Canada has been turning existing data into official statistics for about 100 years. For example, we have been receiving vital statistics data from the provinces and territories since 1921 and import and export data about businesses since 1938. Today, a large number of Statistics Canada's programs are based in whole or in part on data available from administrative sources.

Why is Statistics Canada asking for more administrative data now?

Statistics Canada has always worked towards achieving greater efficiency in data collection to reduce both duplication and the response burden placed on Canadians. It also helps to improve data quality, accuracy and timeliness of results. Moreover, this type of data is used to measure changes in the economy or society (such as the digital economy, opioid use or the cannabis industry, etc.), that cannot always be measured with survey data.

What are the benefits of using administrative data?

Using administrative data save time and money—yours and ours. Administrative data can complement or replace survey data, reduce response burden and costs, make statistical operations more efficient and also improve data quality and timeliness.

What legislations govern the disclosure of administrative data by organizations to Statistics Canada?

Privacy legislations such as the Personal Information Protection and Electronic Documents Act (PIPEDA), and legislations pertaining to data providers, govern the disclosure of administrative data to Statistics Canada.

Crowdsourcing

What is crowdsourcing?

Crowdsourcing involves collecting information from a large community of users, and relies on the principle that individual citizens are experts within their local environments. A couple of examples of crowdsourcing surveys include the project on crowdsourcing the price of the cannabis sector and the OpenStreetMap crowdsourcing pilot project.

How is crowdsourcing useful?

Crowdsourcing is an innovative new way to collect information. It relies on the public, who are considered experts within their local environments, to provide data on a voluntary basis on a given subject. Crowdsourcing surveys further permit us to benchmark and validate the data with other sources of complementary data to ensure that the results are of good quality. This valuable information can provide data for new and exciting projects, in a timely and less costly manner.

Microdata linkage

What is microdata linkage?

Microdata linkage is an internationally recognized statistical method that maximizes the use of existing information by linking different files and variables to create new information that benefit Canadians. Statistics Canada performs microdata linkages to support the design, maintenance, evaluation, research and redesign of ongoing data collection and methodological studies within Statistics Canada, as well as to provide statistical information in aggregate or anonymous format in support of research studies.

How do we ensure confidentiality of microdata linkages?

Statistics Canada takes your privacy and the confidentality of your data very seriously. Several steps are taken during the process of microdata linkage to ensure that your personal information is kept confidential at all times.

Microdata can include units of a population, such as individuals, households or businesses. We first link the different data records by using the variables they have in common. In order to protect your confidentiality, all personal information are removed, so that the linked files are anonymized, or de-identified. In addition, synthetic versions rather than original data are often generated and accessed by researchers. This ensures your private information remains confidential while allowing researchers the access required to develop policies to help the Canadian public. The data is next aggregated or compiled to produce non-confidential published products. These confidential files are restricted to only Statistics Canada and deemed employees who are sworn to secrecy, and subject to fines and/or imprisonment, should they reveal confidential information.

Statistics Canada does recognize that researchers require access to microdata at the individual business, household or person level for research purposes. To preserve the privacy and confidentiality of individuals and businesses, and to encourage the use of microdata, Statistics Canada offers a wide range of options through a series of online channels, facilities and programs. More details are available at Access to microdata.

Microdata linkages must adhere to Statistics Canada's Directive on Microdata Linkage, which is designed to ensure the public value of each linkage truly outweighs any intrusion on privacy that it represents.

Open data

What is open data?

According to the Canada Open Government website, open data is defined as structured data that is machine-readable, freely shared, used and built on without restrictions. For more information, and to see other open data released by the Canadian federal government, visit the Canada Open data portal. Statistics Canada is a publisher of open data, but in addition we are also now considering open data files as input to our statistical programs and processes.

What is the importance of open data?

Smart cities and governments are increasingly making use of data when looking to implement problem-solving measures in order to provide efficient and effective services to constituents. Open data invites innovation, not only through governmental channels, but also through grassroots organizations, individuals, and businesses.

The benefit of open data is that any user can access and make use of it freely. Individuals, formal and informal organizations, or enterprises can use the data and other information to research and innovate on any number of topics.

Using new and existing data for official statistics

How does Statistics Canada collect data?

Statistics Canada collects survey data by paper, through crowdsourcing or online surveys, as well as by telephone or in person. We also use existing data such as administrative data, web scraping, open data and microdata to complement or in lieu of survey data, in the development of official statistics.

We collect data directly from individuals, businesses or organizations, and when possible, we use existing data. At times, data can also be combined from different sources to provide additional insight into a specific subject. Data collection is done in the most timely and cost-efficient manner, and we always ensure that data quality remains high, and that response burden is lessened where possible.

Why does Statistics Canada use existing data sources for official statistics?

Today, many of Statistics Canada's programs use existing data sources. We have been using existing data and turning them into official statistics for about 100 years.

Government agencies and private sector organizations collect or produce data as part of their ongoing operations, which can then be used by Statistics Canada as a complement or in lieu of survey data. This aims to reduce time and effort in data collection, improve data quality and accuracy, and ensures that we meet new information needs in a timely fashion. Using existing data provide researchers and policy makers with insight into our society and economy -- creating a statistical portrait of our country.

Why does Statistics Canada collect data?

Canadians need accurate and reliable information—the cornerstone for democratic decision making. Through the Statistics Act, Parliament has mandated Statistics Canada, as the national statistical agency, to produce such information.

Statistics Canada collects data on an ongoing basis to measure and report on the state of Canada's economy and society.

Why does Statistics Canada conduct mandatory surveys?

Statistics Canada conducts mandatory surveys because of their impact on the economy and society. The more precise, detailed and timely the needs for the data that impact decisions we make or are made for us, the greater the need for unbiased and accurate data. As a result, in some cases, we have to conduct a mandatory survey, which is approved by the Chief Statistician and is mandated by the Statistics Act.

Business and agricultural surveys, for example, collect important economic information that is used by businesses, unions, non-profit organizations and all levels of government to make informed decisions such as monitoring the growth of inflation, gathering information to provide more affordable housing to Canadians, and learning about infrastructure needs in communities across Canada (i.e., determining the need for more schools, daycare centres, public transportation, etc.).

Mandatory participation is also required for the Labour Force Survey, where data is used to produce the unemployment rate, other indicators such as the employment rate and the participation rate, as well as for evaluation and planning of employment programs in Canada.

Moreover, Statistics Canada is required by law to conduct the Census of Population and the Census of Agriculture, both of which are mandatory.

Even though some Statistics Canada surveys are voluntary in nature, your survey participation is essential to produce results that best represent you, your community, and your country. Aggregated data guides policy makers and researchers in making important decisions for you.

More questions on survey collection

Web scraping

What is web scraping?

Web scraping is a process through which information is gathered and copied from the Web, for retrieval and analysis. It can be conducted manually or through the use of an automated software.

Why is Statistics Canada web scraping?

Statistics Canada is committed to exploring alternative information sources to complement traditional collection methods.

As more goods and services are available online for purchase by Canadians, Statistics Canada is testing the use of automated means for collecting information from websites. The use of web scraping is part of a broader effort to reduce burden on individuals, businesses and organizations, while continuing to provide high-quality data in a timely and cost-effective manner.

What is the goal of Statistics Canada's web scraping initiative?

Statistics Canada is evaluating the viability of web scraping as a means of automating the collection of information from websites. Data collected will be used only for statistical and research purposes, to meet the needs of the agency's various programs.

For example, the Consumer Price Index program will use web scraped data to identify differences between online and in-store price movements and trends, and to assess the possibility of using such collection methods to supplement or replace field collection activities in the future.

What are the benefits of web scraping?

Web scraping has the possibility to significantly reduce survey content and response burden, thus saving businesses and organizations valuable time and resources. It is also a cost-effective means of acquiring large volumes of information. Web scraping can also be used to complement traditional collection methods, particularly in the areas of data analysis and research, and is expected to produce better quality information.

Businesses, organizations, and Canadians at large will benefit from this new method of collection, as more timely and accurate statistics will be made available.

How are websites selected for web scraping?

Websites are selected based on several factors, including the design, function and amount of activity on the site, as well as the size and composition of the industry to which the underlying enterprise belongs.

How frequently will websites be scraped?

The frequency of data collection will be determined by the various requirements of Statistics Canada's program areas, and in line with industry best practices. In general, price collection activities for a particular website may occur up to once per day.

Will there be any impact on the websites?

As with all methods of data collection, Statistics Canada takes steps to minimize the burden on individuals, businesses and organizations. These steps include limiting collection to only what is required, and coordinating across statistical program areas to avoid obtaining the same information twice.

Date modified: