Request for information - Population and demography

Under the authority of the Statistics Act, Statistics Canada is hereby requesting the following information, which will be used solely for statistical and research purposes and will be protected in accordance with the provisions of the Statistics Act and any other applicable law. This is a mandatory request for data.

Census counts

Alberta Driver's Licence information

What information is being requested?

Statistics Canada is requesting selected variables from the complete listing of driver's licences in Alberta for the purposes of improving Census addresses in its statistical registers, and for other statistical linkage functions.

This request will be limited to variables from the Alberta Driver's Licence file that are relevant for statistical purposes. These variables include:

  1. Driver's licence number (Operator number)
  2. Full name (First, Middle, Last names)
  3. Sex (M/F)
  4. Date of birth
  5. Residential, civic, or service address
  6. Mailing address
  7. Phone number (Home phone and/or Business phone)
  8. Licence issue/renewal date
  9. Licence expiry date
  10. Close-out indicator (cancellation due to registration in another province)

What personal information is included in this request?

This request contains personal information such as: Name, Address, Demographics, and Contact information. Personal identifiers (Name, Driver's licence number) are required to perform data linkages for statistical purposes only. Once the data are linked, the personal identifiers will be deleted, and only an anonymous ID number will be retained.

What years of data will be requested?

The requested data would begin in May 2016 or the earliest available, and monthly thereafter.

From whom will the information be requested?

This information is being requested from the Ministry of Service Alberta.

Why is this information being requested?

Statistics Canada is requesting this information to update the address databases with addresses from Alberta. Obtaining an accurate and comprehensive list of all dwellings is necessary for achieving full Census coverage, especially in light of the current COVID-19 pandemic. A key COVID-19 adaptation of the 2021 census is to limit the amount of door-to-door census visits, and therefore addresses become more important in order to mail census invitation letters. Historically, the address databases have not adequately covered rural areas in the Western provinces. The updated addresses databases, with information supplied by administrative data from Alberta Driver's Licence files, will ensure these areas are adequately covered by the 2021 Census. To support the 2021 Census, admin data such as the Driver's Licence files could be used, after collection, if enumeration is not possible (or is severely impacted) in a given area due to the COVID-19 pandemic or other natural disaster, and only where the quality of the available administrative data is deemed acceptable.

Statistics Canada may also use the information for other statistical and research purposes.

Why were these organizations selected as data providers?

The Ministry of Service Alberta manages the Alberta Driver's Licence registration system. This Provincial Government Department collects and maintains up-to-date information on Driver's Licence holders in Alberta.

When will this information be requested?

This request for information to the Province of Alberta will be sent by May 01, 2021. The request will be for monthly data on an ongoing basis.

What Statistics Canada programs will primarily use these data?

  • Census of Population
  • Statistical Building Register (SBgR), a product of the Data Integration Infrastructure Division (DIID)

When was this request published?

April 14, 2021

Québec Driver’s Licence information

What information is being requested?

Statistics Canada is requesting essential variables from the complete listing of driver’s licences in Québec for the purposes of improving Census coverage in its statistical registers, and for other statistical linkage functions.

This request will be limited to variables that are relevant for statistical purposes. These variables include:

  • Driver’s licence number
  • Full name (First, Middle, Last names)
  • Date of birth
  • Residential, civic, or service address
  • Phone number
  • Licence issue date
  • Licence expiry date
  • Date last updated
  • Close-out / cancellation indicator
  • Sex and/or Gender

What personal information is included in this request?

This request contains personal information such as name, address, demographics, and contact information.  Personal identifiers (name, driver’s licence number) are required to perform data linkages, for statistical purposes only. Once the data are linked, the personal identifiers will be replaced by an anonymized key.

What years of data will be requested?

The requested data would begin in 2025 (September) or the earliest available, and monthly thereafter.

From whom will the information be requested?

This information is being requested from the Société de l'assurance automobile du Québec (SAAQ).

Why is this information being requested?

Statistics Canada is requesting this information to update the address databases with addresses from Québec.  The updated information will help improve the coverage of addresses and geographies/areas, including rural population groups, within Québec.  Obtaining an accurate and comprehensive list of all dwellings is necessary for achieving full Census coverage.  Driver’s licence data from other provinces and territories was found to be highly accurate in placing individuals at the correct address, as well as enumerating individuals not found on other sources.

The data will also be used to update the Agency’s Census contingency plans and in future Census research.

Statistics Canada may also use the information for other statistical and research purposes.

Why were these organizations selected as data providers?

The Société de l'assurance automobile du Québec is the only organization capable of providing the requested attributes for the province of Québec.

When will this information be requested?

September 2025

What Statistics Canada programs will primarily use these data?

When was this request published?

April 9, 2025

Veterans Affairs Canada administrative data file (additional variables)

What information is being requested?

Statistics Canada is requesting to receive four additional variables as part of the annual Veterans Affairs Canada administrative data file that is routinely provided by Veterans Affairs Canada to Statistics Canada under the existing data acquisition agreement between the two departments. These variables are:

  • gender
  • Veterans Affairs Canada program participation indicator
  • preferred language of correspondence
  • Indigenous identity (when collected in future by Veterans Affairs Canada)

What personal information is included in this request?

This request contains personal information such as gender and Indigenous identity. Personal identifiers (gender and Indigenous identity) are required to perform data linkages, for statistical purposes only. Once the data are linked, the personal identifiers will be replaced by an anonymized person key.

What years of data will be requested?

Annual data as of 2023 (ongoing)

From whom will the information be requested?

Veterans Affairs Canada

Why is this information being requested?

Statistics Canada requires this information to create and publish statistics on Veterans Affairs Canada (VAC) client reported gender or gender identity, the VAC program(s) in which VAC clients have participated, preferred language of correspondence of VAC clients, and Indigenous identity (as applicable) of VAC clients. 

These statistics will help Veterans Affairs Canada and will be used by policy makers, researchers, industry stakeholders to measure the well-being of Veterans and their families, report on the stratified socioeconomic and health outcomes of Veterans as well as compare them to the respective outcomes of the broader Canadian population, to deliver programs that are responsive to identified areas of need, and to enhance the public understanding of Veterans. Moreover, the gender and Indigenous identity variables are important for delivering on the initiatives of equity in service and Indigenous reconciliation.

Statistics Canada may also use the information for other statistical and research purposes.

Why were these organizations selected as data providers?

Veterans Affairs Canada is responsible for supporting the well-being of Veterans and their families, and to promote recognition and remembrance of the achievements and sacrifices of those who served Canada in times of war, military conflict and peace. Veterans Affairs Canada collects and maintains up-to-date data on former Canadian Armed Forces personnel.

When will this information be requested?

September 2023 and by January 2024 onward for the next calendar year (annually)

What Statistics Canada programs will primarily use these data?

When was this request published?

August 25, 2023

Migration

Immigrant data (permanent and non-permanent residents)

What information is being requested?

In addition to the data already collected by Statistics Canada on immigrant selection characteristics, sponsorship and citizenship, Statistics Canada is requesting additional variables related to asylum claims and permits.

What personal information is included in this request?

This request covers details on the immigration process such as dates, status and selection results, as well as sociodemographic characteristics. The new data we receive will provide additional information on these sociodemographic characteristics of immigrants.

Immigrant data collected by Statistics Canada contain personal identifiers, including unique client identifiers, name, sex and date of birth.

The requested information is required for record linkage purposes for statistical use only. Once the data have been linked, personal identifiers will be replaced by an anonymized key.

What years of data will be requested?

Statistics Canada holds data dating back to 1996, which are received on an ongoing basis.

The newly requested data will cover the period starting from 2004 and will also be received on a continuous basis.

From whom will the information be requested?

The information is being requested from the department of Immigration, Refugees and Citizenship Canada (IRCC) and provincial partners.

Why is this information being requested?

Statistics Canada responds to the data needs of federal, provincial and municipal stakeholders, as well as academics. The Government of Canada and IRCC use this information to better understand the impact of policies and to ensure that IRCC is meeting its integration and demographic objectives.

The requested information will be used to produce and publish accurate and up-to-date statistics on permanent and temporary immigration. It will also be used to evaluate policies on immigrant selection, family reunification and citizenship grants. Statistics Canada requests this information so that these policies can be adjusted to meet the needs of Canadians.

Statistics Canada may also use the information for other statistical and research purposes.

Why were these organizations selected as data providers?

IRCC is responsible for permanent and temporary immigration services, as well as citizenship. IRCC develops and implements policies, programs and services that facilitate the arrival and integration of individuals in Canada in a way that maximizes their contribution to the country. IRCC is the only department that collects and maintains information on immigrants (permanent residents) and non-permanent residents (temporary residents).

Provincial partners (departments managing immigration) are responsible for selecting immigrants through the Provincial Nominee Program and can provide up-to-date data on the immigrants they have selected.

When will this information be requested?

Statistics Canada has been receiving immigration data annually since 2021.

The mailing address file will be received monthly starting in October 2026.

When was this request published?

November 4, 2025

Summary of changes

Variables on the characteristics of asylum claims and permits were requested in 2025.

Migration within Canada

What information is being requested?

Statistics Canada is requesting the number of private household moves in Canada from Canada Post's "Mail Forwarding Program." The data includes the month in which the move was made, the province/territory of origin and the province/territory of destination.

What personal information is included in this request?

No personal information is included in this request.

What years of data will be requested?

Data are being requested for the period between January 1, 2021, and December 31, 2021.

From whom will the information be requested?

Canada Post

Why is this information being requested?

Statistics Canada must produce monthly population estimates by province and territory. To do so, monthly estimates of interprovincial migration are required.

These monthly population estimates of interprovincial migration are required to calculate demographic control totals used by the Labour Force Survey, a monthly survey that measures the current state of the Canadian labour market and is used, among other things, to calculate the national, provincial, territorial and regional unemployment rates.

These monthly population estimates also help produce the Population Certificates that Statistics Canada must provide annually to Finance Canada to fulfill its mandate under the Federal-Provincial Fiscal Arrangements Act.

The requested data will help determine if the current estimation model reflects changes in migration due to the COVID-19 pandemic.

Statistics Canada may also use the information for other statistical and research purposes.

Why were these organizations selected as data providers?

Canada Post is the only organization that maintains timely nation-wide data on interprovincial moves by Canadian private households.

When will this information be requested?

July 2020

When was this request published?

July 14, 2020

Population estimates

Sub-provincial population data

What information is being requested?

The annual aggregated number of British Columbia residents by age, sex and postal code is being requested.

What personal information is included in this request?

This request does not contain any personal information.

What years of data will be requested?

Annual data as of July 1, 2001 and onward. (ongoing)

From whom will the information be requested?

This information is being requested from the British Columbia Ministry of Health.

Why is this information being requested?

This information is required to ensure robust population estimates and will be used to evaluate the accuracy of population estimates at the census subdivisions level (municipality) in British Columbia. These estimates are currently derived from Canada Revenue Agency data.

This information will be used for an evaluation of municipality population estimates in British Columbia.

Statistics Canada may consider using this information for the population estimates calculation at the municipality level.

Statistics Canada may also use the information for other statistical and research purposes.

Why were these organizations selected as data providers?

The British Columbia Ministry of Health collects and maintains timely data on its population at the sub-provincial level.

When will this information be requested?

This information will be requested in January 2021 and onward (yearly).

What Statistics Canada programs will primarily use these data?

When was this request published?

December 22, 2020

Archived - Annual Exploration, Development and Capital Expenditures Survey - Petroleum and Natural Gas Industry - Preliminary Estimate for 2020 and Intentions for 2021 - Reporting Guide

Preliminary Estimate for 2020 and Intentions for 2021

Integrated Business Statistics Program (IBSP)

This guide is designed to assist you as you complete the Annual Exploration, Development and Capital Expenditures Survey Petroleum and Natural Gas Industry Preliminary Estimate for 2020 and Intentions for 2021.

If you need more information, please call the Statistics Canada Help Line at the number below.

Help Line: 1-833-977-8287 (1-833-97STATS)

Table of contents

Reporting period information
Definitions

Reporting period information

For the purpose of this survey, please report information for your 12 month fiscal period for which the final day occurs on or between April 1, 2020 – March 31, 2021.

  • May 1, 2019 – April 30, 2020
  • June 1, 2019 – May 31, 2020
  • July 1, 2019 June 30, 2020
  • August 1, 2019 – July 31, 2020
  • September 1, 2019 – August 31, 2020
  • October 1, 2019 – September 30, 2020
  • November 1, 2019 – October 31, 2020
  • December 1, 2019 – November 30, 2020
  • January 1, 2020 – December 31, 2020
  • February 1, 2020 – January 31, 2021
  • March 1, 2020 – February 28, 2021
  • April 1, 2020 – March 31, 2021

Here are other examples of fiscal periods that fall within the required dates:

  • September 18, 2019 to September 15, 2020 (e.g., floating year-end)
  • June 1, 2020 to December 31, 2020 (e.g., a newly opened business)

Definitions

  • When there are partnerships and joint venture activities or projects, report the expenditures reflecting this corporation's net interest in such projects or ventures.
  • Report all dollar amounts in thousands of Canadian dollars ('000).
  • Do not include sales tax. Percentages should be rounded to whole numbers.
  • When precise figures are not available, please provide your best estimates.

If there are no capital expenditures, please enter '0'.

What are Capital Expenditures?

Capital Expenditures are the gross expenditures on fixed assets for use in the operations of your organization or for lease or rent to others. Gross expenditures are expenditures before deducting proceeds from disposals, and credits (capital grants, donations, government assistance and investment tax credits).

Include:

  • Cost of all new buildings, engineering, machinery and equipment which normally have a life of more than one year and are charged to fixed asset accounts
  • Modifications, acquisitions and major renovations
  • Capital costs such as feasibility studies, architectural, legal, installation and engineering fees
  • Subsidies received and used for capital expenditures
  • Capitalized interest charges on loans with which capital projects are financed
  • Work done by own labour force
  • Additions to capital work in progress.

Exclude:

  • transfers from capital work in progress (construction-in-progress) to fixed assets accounts
  • assets associated with the acquisition of companies property developed for sale and machinery or equipment acquired for sale (inventory).

1. Oil and gas rights acquisition and retention costs (exclude inter-company sales or transfers):

Include acquisition costs and fees for oil and gas rights (include bonuses, legal fees and filing fees), and oil and gas retention costs

2. Exploration and evaluation, capitalized or expensed (e.g., seismic, exploration drilling):

These expenditures include geological, geophysical and seismic expenses, exploration drilling, and other costs incurred during the reporting period in order to determine whether oil or gas reserves exist and can be exploited commercially. Report gross expenditures, before deducting any incentive grants, incurred for oil and gas activities on a contracted basis and/or by your own employees. Exclude the cost of land acquired from other oil and gas companies.

3. Building construction (e.g., process building, office building, camp, storage building, and maintenance garage):

Include capital expenditures on buildings such as office buildings, camps, warehouses, maintenance garages, workshops, and laboratories. Fixtures, facilities and equipment that are integral parts of the building are included.

4. Other construction assets (e.g., development drilling and completions, processing facilities, natural gas plants, upgraders):

Include all infrastructure, other than buildings, such as the cost of well pads, extraction and processing infrastructure and plants, upgrading units, transportation infrastructure, water and sewage infrastructure, tailings, pipelines and wellhead production facilities (pumpjacks, separators, etc). Include all preconstruction planning and design costs such as development drilling, regulatory approvals, environmental assessments, engineering and consulting fees and any materials supplied to construction contractors for installation, as well as site clearance and preparation. Equipment which is installed as an integral or built-in feature of a fixed structure (e.g. casings, tanks, steam generators, pumps, electrical apparatus, separators, flow lines, etc.) should be reported with the construction asset; however, when the equipment is replaced within an existing structure, the replacement cost should be reported in machinery and equipment (sustaining capital).

5. Machinery and equipment purchases (e.g., trucks, shovels, computers, etc.):

Include transportation equipment for people and materials, computers, software, communication equipment, and processing equipment not included in the above categories.

Research and Development

Research and experimental development (R&D) comprise creative and systematic work undertaken in order to increase the stock of knowledge – including knowledge of humankind, culture and society – and to devise new applications of available knowledge.

For an activity to be an R&D activity, it must satisfy five core criteria:

  1. To be aimed at new findings (novel);
  2. To be based on original, not obvious, concepts and hypothesis (creative);
  3. To be uncertain about the final outcome (uncertainty);
  4. To be planned and budgeted (systematic);
  5. To lead to results to could be possibly reproduced (transferable/ or reproducible).

The term R&D covers three types of activity: basic research, applied research and experimental development. Basic research is experimental or theoretical work undertaken primarily to acquire new knowledge of the underlying foundations of phenomena and observable facts, without any particular application or use in view. Applied research is original investigation undertaken in order to acquire new knowledge. It is, however, directed primarily towards a specific, practical aim or objective. Experimental development is systematic work, drawing on knowledge gained from research and practical experience and producing additional knowledge, which is directed to producing new products or processes or to improving existing products or processes.

Archived - Annual Capital Expenditures Survey - Preliminary Estimate for 2020 and Intentions for 2021 - Reporting Guide

Integrated Business Statistics Program (IBSP)

This guide is designed to assist you as you complete the Annual Capital Expenditures Survey

Preliminary Estimate for 2020 and Intentions for 2021. If you need more information, please call the Statistics Canada Help Line at the number below.

Help Line: 1-833-977-8287 (1-833-97STATS)

Table of contents

Reporting period information

For the purpose of this survey, please report information for your 12 month fiscal period for which the Final day occurs on or between April 1, 2020 – March 31, 2021.

  • May 1, 2019 – April 30, 2020
  • June 1, 2019 – May 31, 2020
  • July 1, 2019 – June 30, 2020
  • August 1, 2019 – July 31, 2020
  • September 1, 2019 – August 31, 2020
  • October 1, 2019 – September 30, 2020
  • November 1, 2019 – October 31, 2020
  • December 1, 2019 – November 30, 2020
  • January 1, 2020 – December 31, 2020
  • February 1, 2020 – January 31, 2021
  • March 1, 2020 – February 28, 2021
  • April 1, 2020 – March 31, 2021

Here are other examples of fiscal periods that fall within the required dates:

  • September 18, 2019 to September 15, 2020 (e.g., floating year-end)
  • June 1, 2020 to December 31, 2020 (e.g., a newly opened business)

Dollar amounts

  • all dollar amounts reported should be rounded to thousands of Canadian dollars (e.g., $6,555,444.00 should be rounded to $6,555);
  • exclude sales tax;
  • your best estimates are acceptable when precise figures are not available;
  • if there are no capital expenditures, please enter '0'.

Definitions

What are Capital Expenditures?

Capital Expenditures are the gross expenditures on fixed assets for use in the operations of your organization or for lease or rent to others. Gross expenditures are expenditures before deducting proceeds from disposals, and credits (capital grants, donations, government assistance and investment tax credits).

Include:

  • Cost of all new buildings, engineering, machinery and equipment which normally have a life of more than one year and are charged to fixed asset accounts
  • Modifications, acquisitions and major renovations
  • Capital costs such as feasibility studies, architectural, legal, installation and engineering fees
  • Subsidies received and used for capital expenditures
  • Capitalized interest charges on loans with which capital projects are financed
  • Work done by own labour force
  • Additions to capital work in progress.

Exclude:

  • transfers from capital work in progress (construction-in-progress) to fixed assets accounts
  • assets associated with the acquisition of companies
  • property developed for sale and machinery or equipment acquired for sale (inventory).

How to Treat Leases

Include:

  • assets acquired as a lessee through either a capital or financial lease;
  • assets acquired for lease to others as an operating lease.

Industry characteristics

Report the value of the projects expected to be put in place during the year. Include the gross expenditures (including subsidies) on fixed assets for use in the operations of your organization or for lease or rent to others. Include all capital costs such as feasibility studies, architectural, legal, installation and engineering fees as well as work done by your own labour force. Include all additions to work in progress.

New Assets, Renovation, Retrofit, includes both existing assets being upgraded and acquisitions of new assets.

Purchase of Used Canadian Assets

Definition: Used fixed assets may be defined as existing buildings, structures or machinery and equipment which have been previously used by another organization in Canada that you have acquired during the time period being reported on this questionnaire.

Explanation: The objective of our survey is to measure gross annual new acquisitions to fixed assets separately from the acquisition of gross annual used fixed assets in the Canadian economy as a whole.

Hence, the acquisition of a used fixed Canadian asset should be reported separately since such acquisitions would not change the aggregates of our domestic inventory of fixed assets, it would simply mean a transfer of assets within Canada from one organization to another.

Imports of used assets, on the other hand, should be included with the new assets (Column 1) because they are newly acquired for the Canadian economy.

Work in Progress

Work in progress represents accumulated costs since the start of capital projects which are intended to be capitalized upon completion.

Land

Capital expenditures for land should include all costs associated with the purchase of the land that are not amortized or depreciated.

Residential Construction

Report the value of residential structures including the housing portion of multi-purpose projects and of townsites.

Exclude:

  • buildings that have accommodation units without self-contained or exclusive use of bathroom and kitchen facilities (e.g., some student and senior citizen residences)
  • the non-residential portion of multi-purpose projects and of townsites
  • associated expenditures on services

The exclusions should be included in the appropriate construction (e.g., non-residential) asset.

Non-Residential Building Construction (excluding land purchase and residential construction)

Building construction represents any permanent structure with walls and a roof affording protection and shelter from and for a social and/or physical environment for people and/or materials.

For example, building construction represents expenditures on aircraft hangars, factories, hospitals, hotels, office buildings, railway stations, schools and shopping centres.

Report the total cost incurred during the year of building construction (contract and by own employees) whether for your own use or rent to others.

Include also:

  • the cost of demolition of buildings, land servicing and of site-preparation
  • leasehold and land improvements
  • all preconstruction planning and design costs such as engineer and consulting fees and any materials supplied to construction contractors for installation, etc.
  • townsite facilities, such as streets, sewers, stores, schools.

Non-residential engineering construction

Engineering construction encompasses the direct or indirect conveyance of people, machinery, materials, gases, and/or electrical impulses. It also includes free standing structures which contain or restrain such objects either as part of such conveyance or separately and independently.

In addition, the cost associated with significantly altering any terrain in the preparation for specialized use of that terrain will fall under engineering construction.

Report the total cost incurred during the year of engineering construction (contract and by own employees) whether for your own use or rent to others. Include also:

  • the cost of demolition of buildings, land servicing and of site-preparation
  • leasehold and land improvements
  • all preconstruction planning and design costs such as engineer and consulting fees and any materials supplied to construction contractors for installation, etc.
  • oil or gas pipelines, including pipe and installation costs
  • communication engineering, including transmission support structures, cables and lines, etc.
  • electric power engineering, including wind and solar plants, nuclear production plants, power distribution networks, etc.

Machinery and Equipment

Report total cost incurred during the year of all new machinery, whether for your own use or for lease or rent to others. Any capitalized tooling should also be included. Include progress payments paid out before delivery in the year in which such payments are made. Receipts from the sale of your own fixed assets or allowance for scrap or trade-in should not be deducted from your total capital expenditures. Any balance owing or holdbacks should be reported in the year the cost is incurred.

Include:

  • automobiles, trucks, professional and scientific equipment, office and store furniture and appliances
  • computers (hardware and software), broadcasting, telecommunication and other information and communication technology equipment
  • motors, generators, transformers
  • any capitalized tooling expenses
  • progress payments paid out before delivery in the year in which such payments are made
  • any balance owing or holdbacks should be reported in the year the cost is incurred
  • leasehold improvements.

Software

Capital expenditures for software should include all costs associated with the purchase or development of software.

Include:

  • Pre-packaged software
  • Custom software developed in-house/own account
  • Custom software design and development, contracted out

Research and Development

Research and experimental development (R&D) comprise creative and systematic work undertaken in order to increase the stock of knowledge – including knowledge of humankind, culture and society – and to devise new applications of available knowledge.

For an activity to be an R&D activity, it must satisfy five core criteria:

  1. To be aimed at new findings (novel);
  2. To be based on original, not obvious, concepts and hypothesis (creative);
  3. To be uncertain about the final outcome (uncertainty);
  4. To be planned and budgeted (systematic);
  5. To lead to results to could be possibly reproduced (transferable/ or reproducible).

The term R&D covers three types of activity: basic research, applied research and experimental development. Basic research is experimental or theoretical work undertaken primarily to acquire new knowledge of the underlying foundations of phenomena and observable facts, without any particular application or use in view. Applied research is original investigation undertaken in order to acquire new knowledge. It is, however, directed primarily towards a specific, practical aim or objective. Experimental development is systematic work, drawing on knowledge gained from research and practical experience and producing additional knowledge, which is directed to producing new products or processes or to improving existing products or processes.

Data science projects

Data science plays an important role at Statistics Canada. All across the agency, new data science methods are being used to make our projects more efficient and provide better data insights to Canadians.

Project categories

Contact the Centre for Artificial Intelligence Research and Excellence (CAIRE) for more information on Statistics Canada's data science projects.

Natural language processing

Event Detection and Sentiment Indicators

Statistics Canada is developing a tool to detect specific economic events by analyzing millions of news articles. The tool uses machine learning algorithms to research and summarize information from the articles and organize the data into an informative dashboard. Time that was once spent on research can now be spent analyzing the reasons for economic changes.

The agency is also exploring the development of sentiment indicators to measure economic tendencies and their connection with key economic variables. Based on positive and negative interpretations of economic-related news articles, these indicators could allow subject matter experts to gain better insights into economic trends by industry, and support the publication of near real-time economic indicators.

Retail Scanner Data

Statistics Canada publishes the total amount of products sold, as classified by the North American Product Classification System (NAPCS). Large scanner data bases are currently available from major retailers, with millions of records. Previously, products were assigned to NAPCS with dictionary-based coding in combination with manual coding when required, according to their description and other indicators. Statistics Canada uses machine learning to classify all of the product descriptions in the scanner data to the NAPCS and then obtains aggregate sales for each area. This approach has resulted in higher degree of automation, as well as in accurate, detailed retail data and a reduced response burden for major retailers.

Survey of Sexual Misconducts in the Canadian Armed Forces Comment Classification

Data scientists at Statistics Canada created a machine learning model to automatically classify the electronic comments from respondents of the Survey of Sexual Misconduct in the Canadian Armed Forces (SSMCAF). The SSMCAF required automated classification of comments into five categories: personal story, negative, positive, advice for content, and other. The machine learning model coded 6,000 comments from the first 2018 survey cycle with 89% accuracy for French and English comments. This approach will be expanded to other surveys at Statistics Canada.

Census 2021 Comments Classification

Statistics Canada has developed a machine learning algorithm to classify 1.8 million French and English respondent comments from the 2021 Census. This algorithm quickly and objectively classifies comments into different classes. The model is trained on comments from the 2016 Census and the 2019 Census test. Respondent feedback is used to support decision making regarding content determination for the next census and to monitor factors such as respondent burden. Visit 2021 Census Comment Classification for more information about this project.

Canadian Coroner and Medical Examiner Database (CCMED) Dynamic Topic Modelling

Statistics Canada has designed and deployed a dynamic topic modelling system. This system uses data from the Canadian Coroner and Medical Examiner Database to detect emerging narratives on causes of death. The objective is to provide analysts with patterns of death over time. For more information, please visit Topic Modelling and Dynamic Topic Modelling: A Technical Review.

Canadian Export Reporting System Text Classification

The Canada Border Services Agency (CBSA) and Statistics Canada recently developed a new web-based reporting tool for Canadian exporters to non-US countries called the Canadian Export Reporting System (CERS). CERS requires that an exporter self-code their goods' Harmonized System (HS) code plus an additional text description for more information for CBSA. The Data Science Division, in partnership with the International Accounts and Trade Division (IATD), developed a FastText machine learning model to classify the additional text descriptions for the exported commodities to the HS codes so that IATD can use them to validate the self-coded HS codes provided by the exporters. The motivation for adding this validation is that analysis of the data from the previous systems revealed inconsistencies between the product description and the code chosen by the exporter. With the move to CERS, electronic reporting is now mandatory and may result in an increase of cases with such inconsistencies, which is why an automated solution for review is being developed.

Image classification

In-Season Crop Classification

Monitoring the production of farms in Canada is an important but costly undertaking. Surveys and in-person inspections require a large amount of resources, and the current approach to predicting crop yields is time-consuming. For these reasons, Statistics Canada is modernizing crop classification using an image classification approach. An automated pipeline is used to download and process freely available Landsat-8 satellite imagery throughout the crop season.

Crop types are predicted using satellite imagery and the application of neural networks. The new model estimations are then used to update a database, allowing end users to acquire the most up-to-date estimates throughout the crop season. Initial results show that this approach is much faster and will reduce the survey response burden for farm-owners, especially during the busy times of the year.

Geo-Spatial Construction Starts Using Satellite Images

Canadian Mortgage and Housing Corporation tracks the starts and completions of residential building construction projects across Canada, and results are used by Statistics Canada to calibrate estimates for its Investment in Building Construction program. Statistics Canada has been employing various data science methods to detect construction starts from satellite images, such as using image augmentation to diversify and increase the data set. These methods enabled data scientists to detect the area of the building in the pre-foundation and foundation phase. The process of pre-foundation consists of creating footings and concrete slabs to support the foundation walls, including excavation. The foundation is part of a structural system that supports and anchors the superstructure of a building. AI model building and evaluation required the processing of more than 1,400 km2 imagery of 50cm resolution over many months for which a highly scalable and efficient processing pipeline was created. The developed artificial intelligence algorithms might eventually lead to more accurate and timely data, while aiding in eliminating existing data gaps for the non-residential sector and small/remote communities excluded from the current survey.

Agriculture Greenhouse Detection Using Aerial Images

The greenhouse project has been using Earth Observation data to identify greenhouses and measure their total area in Canada, in addition to a proof of concept to determine our ability to classify greenhouses based on their produce inside of the greenhouses, and the type of greenhouses (Glass or plastic cover). In an effort to produce more timely estimates and reduce the need for survey respondents, data scientists at Statistics Canada are working to automate the identification process using machine learning, administrative data sources and other technologies, such as satellite imagery and high resolution aerial imagery.

PDF extraction

Extraction of Economic Variables from Financial Reports

Statistics Canada has been applying data science solutions to extract information from PDFs and other documents in a timely and more efficient manner. For example, Statistics Canada has been experimenting with the historical dataset for SEDAR, a system used by publicly-traded Canadian companies, to file securities documents to various Canadian securities commissions.

Statistics Canada's data scientists developed a state-of-the-art machine learning pipeline that correctly identifies and extracts key financial variables (e.g. total assets) from the correct table (e.g. balance sheet) in an annual financial statement of a company. The algorithm used for table extraction called SLICE (Spatial Layout-Based Information and Content Extraction) was developed within Statistics Canada and made open-source under MIT license. SLICE is a unique computer vision algorithm that simultaneously uses textual, visual and layout information to segment pages into a tabular structure. The pipeline therefore turns a large amount of unstructured public documents from SEDAR into structure datasets, allowing the automation of information extraction related to Canadian companies. This significantly reduces the manual hours spent identifying and capturing the required information and reduces data redundancy within the organization by providing a one-point solution to access information.

Public Sector Statistics Division Scanned PDF Extraction

Public Sector Statistics Division (PSSD) at Statistics Canada receives financial statements from provincial governments and their respective municipalities on a quarterly and annual basis. These statements are in text-based and scanned PDF format, and store valuable information in tables. Each row of the table contains numerical values which must be manually extracted and stored in a database for further analysis, but this manual process is time-consuming and subject to human error. Data scientists at Statistics Canada developed a proof-of-concept that involves extracting financial data from reported financial statements using an in-house machine learning algorithm and displaying them in a tabular format that can be edited by the analysts. Additionally, the data is auto-coded and records of previous and current year numerical values are provided. Once the project transitions to production, it will reduce data redundancy within the organization by providing a one-point solution to access information, as well as save manual work hours identifying and capturing required information by analysts in the PSSD.

Predictive Analytics

Nowcasting of Economic Indicators

Many initiatives at Statistics Canada work towards near real-time estimates and the production of advanced indicators for many of the agency's key data series. In the Investment in Building Construction Program building permit values are a key series for which an early indicator via nowcasting could be produced. To facilitate the effort, an analytical cloud environment was created which allows analysts to leverage timely external data and advanced time series models. An extensive time series database with economic time series (from Statistics Canada programs), external open data, temperature sensor data and stock market data were created. This environment may potentially pave the way towards a generalized Nowcasting System at Statistics Canada. Exploratory analysis was conducted to apply nowcasting models including ARIMA-X, PROPHET and the machine learning algorithm XGBoost in nowcasting several economic indicators including monthly building permit values. It was found that ARIMA-X and PROPHET performed similarly in terms of mean absolute percentage error and mean directional accuracy while XGBoost with external open data did not perform as well.

Crop Yield Predictions

Statistics Canada recently completed a research project for the Field Crop Reporting Series (FCRS) on the use of machine learning , specifically supervised regression techniques, for early-season crop yield prediction. The objective was to investigate whether this could be used to improve the precision of the existing crop yield prediction method, while also reducing the survey response burden for busy farm operators. The main contribution of the research project was the adaptation of rolling window forward validation (RWFV) as validation protocol. RWFV is a special case of forward validation, a family of validation protocols designed to prevent temporal information leakage for supervised learning based on time series data. Our adaptation of RWFV enabled a customized validation protocol that realistically reflects the statistical production context of the FCRS. Visit Use of Machine Learning for Crop Yield Prediction for more details on the technical side of this project.

Hospital Occupancy Forecast

Data scientists at Statistics Canada are helping in the fight against COVID-19 by creating short-term hospital occupancy forecasts based on two daily inputs using Ottawa hospital data as a testcase. The inputs are daily new hospital admissions counts and hospital midnight in-patient headcounts. Admission Forecasts are determined by using two hierarchical Bayesian models. The first input models the random delay between the unobserved event of COVID-19 infection and hospital admission, for the subgroup of infected individuals who will be hospitalized due to COVID-19. The second input models the random delay between hospital admission and discharge/death.

A series of 25 consecutive weeks of mock forecasts based on real data was performed to assess the effectiveness of the forecast model. The resulting credible bands, on the one hand, encompassed consistently the real hospitalization counts within one week after the respective training data cut-offs, and on the other hand, were sufficiently narrow to be informative. The results of this project strongly suggest the feasibility of accurate and informative hospitalization forecasts at the municipality level, provided timely hospital admission and discharge/death data are available.

High Pandemic Hubs

Data scientists at Statistics Canada created a research project using a general machine learning framework to identify and predict health regions that could be considered vulnerable or at high-risk of increased COVID-19 infection rates. By identifying these regions, federal and provincial health authorities would be able to divert public health resources such as PPE or frontline workers from lower risk regions to higher risk regions; and would also able to contain cases in higher risk areas sooner with contact tracing and quarantine measures.

This effort also contributed to the creation of an interactive dashboard that could allow users to monitor COVID-19 cases and deaths at the health-region level and to choose among multiple risk prediction models and approaches.

Using COVID-19 Epidemiological Modelling to Inform Personal Protective Equipment Supply and Demand in Canada

At the beginning of the pandemic, there were concerns surrounding Personal Protective Equipment (PPE) preparedness in Canada and whether there was enough supply to support the healthcare sector, and other sectors of the economy throughout the pandemic. In response to this emerging need, Statistics Canada customized an existing epidemiological model to allow policy makers to stress-test the PPE supply under various epidemiological scenarios. Projections generated from this epidemiological model have been used by the PPE supply and demand model to compare on-hand and in-bound supplies with demand projections over twelve months. For more information, please visit Modelling SARS-CoV-2 Dynamics to Forecast PPE Demand.

Optimal Social Distancing Policy via Reinforcement Learning

Data scientists at Statistics Canada collaborated with the Public Health Agency of Canada to develop a novel epidemiological modelling framework optimizing Non-Pharmaceutical Interventions using Reinforcement Learning. This model determines the best set of population behaviours to minimize the spread of an infection within simulations. Visit Non-Pharmaceutical Intervention and Reinforcement Learning for more details on this project.

Research

Statistics Canada's First Quantum Machine Learning Project: A collaboration with Université de Sherbrooke

Quantum computing—a new way of computing that uses principles of quantum mechanics to store and process information—holds a lot of promise as a solution for some computationally heavy processes and algorithms. Increasingly, governments and major companies are working to assess how quantum computing will impact their businesses in the near future.

As of June 2021, Statistics Canada is collaborating with the Université de Sherbrooke to explore the potential of quantum computing and identify opportunities early on in its development. The six month-long project marks the first collaboration between Statistics Canada and the Quantum Hub at Université de Sherbrooke's Institut quantique (IQ). The Quantum Hub offers its members cloud-based access to advanced quantum computing systems, as well as a community of experts to support quantum research projects.

The project will explore ways to optimize the agency's machine learning processes and text classification computations, and how this technology could be used to support Statistics Canada's goal of providing high-quality data and insights to Canadians.

Homomorphic Encryption

Data security remains one of the highest priorities at Statistics Canada. Our data scientists are training a machine learning text classifier to use homomorphic encryption to protect data while they're being processed. The data are protected at two points. The first point, located at their ingestion points, allows data files to be processed remotely or on the cloud. The second point is at their dissemination points–this allows accredited external researchers at virtual labs access to more data in a secure manner. The use of homomorphic encryption not only ensures data protection but also acts as a solution for outsourcing computation. Visit A Brief Survey of Privacy Preserving Technologies for more information on homomorphic encryption and other privacy-preserving approaches.

A Novel Estimation Method for Non-Probability Samples

Probability samples allow reliable estimation of population characteristics and have been successfully used in statistics for many decades. However, due to rising costs and declining response rates, researchers have begun to develop theory for reliable estimation based on alternative data sources. Non-probability samples, such as web-based opt-in panels, are often relatively easy and inexpensive to obtain, but may suffer from severe self-selection bias where traditional estimation techniques cannot be applied. To help with this, researchers at Statistics Canada have developed nppCART, a novel estimation methodology for non-probability samples. nppCART attempts to correct for the self-selection bias by incorporating additional information from an auxiliary probability sample. nppCART is a variant of the well-known CART algorithm, and may be considered a nonparametric method. It was conceived with the hope that its nonparametric nature may be more useful against nonlinearity or complex interactions among predictor variables than existing non-probability sample estimation techniques. Visit the 2019 Annual Meeting in Calgary site for resources on the project.

Framework

Framework for Responsible Machine Learning Processes

Machine learning is becoming an increasingly integral part of many projects at Statistics Canada. Data scientists are looking to implement a responsible framework for machine learning and artificial intelligence applications that are transitioning to production. The framework includes an evaluation of the project through the use of a checklist, followed by a peer review of the project. As a final step, the methodology is presented to the Scientific Review Committee. The goal of this project is to establish a review process that ensures responsible machine learning processes are put into production while promoting good and ethical data science practices. This framework will also guide data scientists as they develop new projects. For more information, please visit Responsible Machine Learning at Statistics Canada.

Data science expertise

The agency's data scientists are experts in artificial intelligence and machine learning, leading the agency in data science-related research and development.

The data scientists are pioneering new technologies and innovative data science methods, offering expertise in image processing, natural language processing, integration of cloud tools, traceability methods, privacy preserving techniques, information retrieval and much more!

These experts have many areas of specialization, including supervised and unsupervised learning, artificial neural networks, reinforcement learning, data cloud engineering and more.

At Statistics Canada, these innovative methods are used to make more meaningful, powerful data insights.

Mission: building data science capacity

Statistics Canada's data science mission is to expand the capacity of data science and analytics within the Government of Canada and beyond.

What are the keys to building data science capacity?

Trust—Deliver concrete results while adhering to high ethical standards at all times to build trust in data science methods.

Innovation—Statistics Canada's data scientists are committed to identifying and adopting the latest data science practices to deliver fast results.

Quality—Statistics Canada's data science methods follow rigorous practices, including internal reviews of projects, to ensure high-quality results and valid statistical inference.

Collaboration—The agency is working with partners across the Government of Canada, academia, international partners and other members of the data science community to learn from one another and share leading-edge data science methods.

What are the benefits of data science for Canadians?

Data science allows Statistics Canada to better serve Canadians by creating high-value products and services. By applying the latest machine learning and artificial intelligence practices, the agency is able to quickly process large data sets in shorter periods of time, supporting the need for increasingly nuanced data to better understand our country and our economy.

Machine Learning can also be used to make sense of unstructured data such as images or sensor data, quickly classify large amounts of information, summarize and extract key information from narratives, provide predictions and assist with research.

Providing timely, high-quality information

As information needs continue to expand, it is critical for national statistical agencies to apply these innovative solutions to support evidence-based decision making. There are many benefits to data science for Canadians, including:

  • faster, timelier access to data products
  • more accurate results
  • more detailed, granular data
  • reduced response burden on households and businesses.

These solutions also benefit Statistics Canada by giving data scientists the ability to process large amounts of unstructured data, eliminating manual work and reducing costs without compromising data quality.

Data science at Statistics Canada

As the world around us continues to evolve and change rapidly in the digital age, the importance of data and how they are used is critical.

Data science is a rapidly evolving field that can tap into the power of data and empower governments to serve citizens more effectively and efficiently. As the role of national statistical organizations continues to change and expand, these organizations must adapt and embrace new technologies and innovative thinking to support the information needs of society.

Statistics Canada is one of the leaders in the Government of Canada's adoption of data science and artificial intelligence. By taking a collaborative approach to data science, the agency is pushing the boundaries of modernization and harnessing the power of new approaches and technologies to better serve Canadians.

Data science supporting the COVID-19 response

Data science allows statistical agencies to respond quickly to changing economic and social situations. Statistics Canada is using the power of data science to support the COVID-19 response in Canada.

The agency collaborated with Health Canada to visualize the supply and demand information for Personal Protective Equipment (PPE). Before the data visualization could begin, the data needed to be extracted and ingested. The data were coming daily from many different sources (different provincial/territorial governments, other federal departments and private sector companies that had been hired to help source the PPE) and in many different formats (e.g. Word documents, Excel files, PDFs) and required a significant amount of manual work to create standardized reports.

To improve this process, data scientists at Statistics Canada created an algorithm that parses the data into different data entries. Machine learning was used to identify numbers and dates within the text. The structured data were then presented in a PowerBI dashboard that was shared with other government departments to meet their information needs and better understand the supply and demand for PPE in Canada.

For more information on Statistics Canada's response to COVID-19, visit COVID-19: A data perspective portal.

Commitment to privacy and security

As Statistics Canada continues to implement new technologies and innovations, the agency's commitment to protecting privacy and security remains the highest priority. The agency has rigorous measures in place to preserve confidentiality and privacy in the modern digital era.

The amount of data we gather and use and the power of the insights they generate are increasing rapidly. It is known that data are vulnerable throughout its lifecycle: at rest, in-transit and during computation or processing. While the security mechanisms for data protection while at rest (e.g. Symmetric Key Encryption) and in-transit (e.g. Transport Layer Security) are well studied, Privacy Preserving Technologies have emerged in recent years to provide data protection while enabling data processing, such as in statistical analyses.

Privacy Preserving Technologies, or Privacy Preserving Computation Techniques, is a generic term that covers a broad range of approaches that promise to provide protection while collecting the data, processing it and disseminating the results. These approaches are homomorphic encryption, secure multi-party computation, differential privacy, trusted execution environments and zero-knowledge proofs. The data scientists at Statistics Canada are exploring the use of these existing and emerging privacy preserving technologies to continuously address the privacy preservation needs for highly sensitive data. This will also allow for alternative storage options to permit secure remote computing on encrypted data, to benefit from potential multi-party computation opportunities and to derive insights from distributed and inaccessible data.

For more information on how Statistics Canada protects data, visit Statistics Canada's Trust Centre.

Visit Data science projects at Statistics Canada to see data science in action!

Data Science Centre

Data Science Centre

In this rapidly-changing digital era, statistical agencies need to find innovative ways to harness the power of data. Statistics Canada is embracing the possibilities of data science to better serve the information needs of Canadians.

Data science at Statistics Canada

Data science at Statistics Canada

Statistics Canada is one of the leaders in the Government of Canada’s adoption of data science and artificial intelligence. Find out about the benefits of data science and how they are being used at Canada’s national statistical agency.

Data Science Network for the Federal Public Service

Data Science Network for the Federal Public Service

Join a community of data science enthusiasts to learn all about data science in the public service, collaborate on projects, share information on the latest tools, and much more.

Mission: building data science capacity

Mission: building data science capacity

Learn about Statistics Canada’s mission to expand the capacity for data science within the Government of Canada and beyond.

Data science expertise

Data science expertise

Discover the various areas of expertise of Statistics Canada’s data scientists who are leading the way with cutting-edge research and development.

Data science projects

Data science projects

Explore some of the agency’s innovative projects that are fueled by data science using natural language processing, satellite images, neural networks and other cutting-edge techniques.

Data science resources

Data science resources

Learn more about data science with these helpful resources.

Contact

Contact the Centre for Artificial Intelligence Research and Excellence (CAIRE) for more information about data science at Statistics Canada.

Canadian Centre for Energy Information (CCEI)

Consultation objectives

The Canadian Centre for Energy Information (CCEI) is an independent one-stop shop for comprehensive energy data and expert analysis. The centre compiles, reconciles and integrates energy data from a number of Canadian sources and makes data from multiple providers available free of charge on a user-friendly website. It works collaboratively to harmonize energy definitions, measurements and standards, and improve completeness, coherence and timeliness of Canada's energy information.

The CCEI is being developed by Statistics Canada in partnership with Canada Energy Regulator (CER), Natural Resources Canada (NRCan) and Environment and Climate Change Canada (ECCC). Statistics Canada launched the CCEI to expand publicly available data and analysis, and ensure all Canadians have access to centralized energy information.

The consultations ensured that the CCEI meets users' needs and identified any potential usability issues.

Consultation methodology

Statistics Canada conducted remote usability testing in both official languages with participants from across the country. Participants were asked to complete a series of tasks and to provide feedback on the product.

How participants got involved

This consultation is now closed.

Individuals who wished to obtain more information or to take part in a consultation were asked to contact Statistics Canada by sending an email to statcan.consultations@statcan.gc.ca.

Statistics Canada is committed to respecting the privacy of consultation participants. All personal information created, held or collected by the Agency is protected by the Privacy Act. For more information on Statistics Canada's privacy policies, please consult the Privacy notice.

Results

Overall, the beta version of the CCEI website was well-received by participants. They reported that it was easy to navigate and that it provided easy access to a variety of information.

Participants noted that the following areas worked:

  • The overall look and feel of the website
  • The icons and subjects on the home page
  • The inclusion of interactive features, such as data visualizations

Participants suggested that the following areas could be improved:

  • The use of space in the search results
  • The contextual information provided in the indicators
  • The organization of lists of resources throughout the website

After analysis, recommendations include:

  • Condense the search results as much as possible to allow users to easily browse through them
  • Ensure that relevant contextual information is available for the indicators
  • Ensure that the lists of datasets and publication allow users to easily sort through the content by organizing the lists logically and adding a sort or filter function

Statistics Canada thanks participants for their participation in this consultation. Their insights will guide the agency's web development and ensure that the final products meet users' expectations.