Integrated Business Statistics Program (IBSP)

This guide is designed to assist you as you complete the Annual Capital Expenditures Survey

Preliminary Estimate for 2020 and Intentions for 2021. If you need more information, please call the Statistics Canada Help Line at the number below.

Help Line: 1-833-977-8287 (1-833-97STATS)

Table of contents

Reporting period information

For the purpose of this survey, please report information for your 12 month fiscal period for which the Final day occurs on or between April 1, 2020 – March 31, 2021.

  • May 1, 2019 – April 30, 2020
  • June 1, 2019 – May 31, 2020
  • July 1, 2019 – June 30, 2020
  • August 1, 2019 – July 31, 2020
  • September 1, 2019 – August 31, 2020
  • October 1, 2019 – September 30, 2020
  • November 1, 2019 – October 31, 2020
  • December 1, 2019 – November 30, 2020
  • January 1, 2020 – December 31, 2020
  • February 1, 2020 – January 31, 2021
  • March 1, 2020 – February 28, 2021
  • April 1, 2020 – March 31, 2021

Here are other examples of fiscal periods that fall within the required dates:

  • September 18, 2019 to September 15, 2020 (e.g., floating year-end)
  • June 1, 2020 to December 31, 2020 (e.g., a newly opened business)

Dollar amounts

  • all dollar amounts reported should be rounded to thousands of Canadian dollars (e.g., $6,555,444.00 should be rounded to $6,555);
  • exclude sales tax;
  • your best estimates are acceptable when precise figures are not available;
  • if there are no capital expenditures, please enter '0'.

Definitions

What are Capital Expenditures?

Capital Expenditures are the gross expenditures on fixed assets for use in the operations of your organization or for lease or rent to others. Gross expenditures are expenditures before deducting proceeds from disposals, and credits (capital grants, donations, government assistance and investment tax credits).

Include:

  • Cost of all new buildings, engineering, machinery and equipment which normally have a life of more than one year and are charged to fixed asset accounts
  • Modifications, acquisitions and major renovations
  • Capital costs such as feasibility studies, architectural, legal, installation and engineering fees
  • Subsidies received and used for capital expenditures
  • Capitalized interest charges on loans with which capital projects are financed
  • Work done by own labour force
  • Additions to capital work in progress.

Exclude:

  • transfers from capital work in progress (construction-in-progress) to fixed assets accounts
  • assets associated with the acquisition of companies
  • property developed for sale and machinery or equipment acquired for sale (inventory).

How to Treat Leases

Include:

  • assets acquired as a lessee through either a capital or financial lease;
  • assets acquired for lease to others as an operating lease.

Industry characteristics

Report the value of the projects expected to be put in place during the year. Include the gross expenditures (including subsidies) on fixed assets for use in the operations of your organization or for lease or rent to others. Include all capital costs such as feasibility studies, architectural, legal, installation and engineering fees as well as work done by your own labour force. Include all additions to work in progress.

New Assets, Renovation, Retrofit, includes both existing assets being upgraded and acquisitions of new assets.

Purchase of Used Canadian Assets

Definition: Used fixed assets may be defined as existing buildings, structures or machinery and equipment which have been previously used by another organization in Canada that you have acquired during the time period being reported on this questionnaire.

Explanation: The objective of our survey is to measure gross annual new acquisitions to fixed assets separately from the acquisition of gross annual used fixed assets in the Canadian economy as a whole.

Hence, the acquisition of a used fixed Canadian asset should be reported separately since such acquisitions would not change the aggregates of our domestic inventory of fixed assets, it would simply mean a transfer of assets within Canada from one organization to another.

Imports of used assets, on the other hand, should be included with the new assets (Column 1) because they are newly acquired for the Canadian economy.

Work in Progress

Work in progress represents accumulated costs since the start of capital projects which are intended to be capitalized upon completion.

Land

Capital expenditures for land should include all costs associated with the purchase of the land that are not amortized or depreciated.

Residential Construction

Report the value of residential structures including the housing portion of multi-purpose projects and of townsites.

Exclude:

  • buildings that have accommodation units without self-contained or exclusive use of bathroom and kitchen facilities (e.g., some student and senior citizen residences)
  • the non-residential portion of multi-purpose projects and of townsites
  • associated expenditures on services

The exclusions should be included in the appropriate construction (e.g., non-residential) asset.

Non-Residential Building Construction (excluding land purchase and residential construction)

Building construction represents any permanent structure with walls and a roof affording protection and shelter from and for a social and/or physical environment for people and/or materials.

For example, building construction represents expenditures on aircraft hangars, factories, hospitals, hotels, office buildings, railway stations, schools and shopping centres.

Report the total cost incurred during the year of building construction (contract and by own employees) whether for your own use or rent to others.

Include also:

  • the cost of demolition of buildings, land servicing and of site-preparation
  • leasehold and land improvements
  • all preconstruction planning and design costs such as engineer and consulting fees and any materials supplied to construction contractors for installation, etc.
  • townsite facilities, such as streets, sewers, stores, schools.

Non-residential engineering construction

Engineering construction encompasses the direct or indirect conveyance of people, machinery, materials, gases, and/or electrical impulses. It also includes free standing structures which contain or restrain such objects either as part of such conveyance or separately and independently.

In addition, the cost associated with significantly altering any terrain in the preparation for specialized use of that terrain will fall under engineering construction.

Report the total cost incurred during the year of engineering construction (contract and by own employees) whether for your own use or rent to others. Include also:

  • the cost of demolition of buildings, land servicing and of site-preparation
  • leasehold and land improvements
  • all preconstruction planning and design costs such as engineer and consulting fees and any materials supplied to construction contractors for installation, etc.
  • oil or gas pipelines, including pipe and installation costs
  • communication engineering, including transmission support structures, cables and lines, etc.
  • electric power engineering, including wind and solar plants, nuclear production plants, power distribution networks, etc.

Machinery and Equipment

Report total cost incurred during the year of all new machinery, whether for your own use or for lease or rent to others. Any capitalized tooling should also be included. Include progress payments paid out before delivery in the year in which such payments are made. Receipts from the sale of your own fixed assets or allowance for scrap or trade-in should not be deducted from your total capital expenditures. Any balance owing or holdbacks should be reported in the year the cost is incurred.

Include:

  • automobiles, trucks, professional and scientific equipment, office and store furniture and appliances
  • computers (hardware and software), broadcasting, telecommunication and other information and communication technology equipment
  • motors, generators, transformers
  • any capitalized tooling expenses
  • progress payments paid out before delivery in the year in which such payments are made
  • any balance owing or holdbacks should be reported in the year the cost is incurred
  • leasehold improvements.

Software

Capital expenditures for software should include all costs associated with the purchase or development of software.

Include:

  • Pre-packaged software
  • Custom software developed in-house/own account
  • Custom software design and development, contracted out

Research and Development

Research and experimental development (R&D) comprise creative and systematic work undertaken in order to increase the stock of knowledge – including knowledge of humankind, culture and society – and to devise new applications of available knowledge.

For an activity to be an R&D activity, it must satisfy five core criteria:

  1. To be aimed at new findings (novel);
  2. To be based on original, not obvious, concepts and hypothesis (creative);
  3. To be uncertain about the final outcome (uncertainty);
  4. To be planned and budgeted (systematic);
  5. To lead to results to could be possibly reproduced (transferable/ or reproducible).

The term R&D covers three types of activity: basic research, applied research and experimental development. Basic research is experimental or theoretical work undertaken primarily to acquire new knowledge of the underlying foundations of phenomena and observable facts, without any particular application or use in view. Applied research is original investigation undertaken in order to acquire new knowledge. It is, however, directed primarily towards a specific, practical aim or objective. Experimental development is systematic work, drawing on knowledge gained from research and practical experience and producing additional knowledge, which is directed to producing new products or processes or to improving existing products or processes.

Data science resources

Join the Data Science Network for the Federal Public Service

Calling all data science enthusiasts! Subscribe to the Data Science Network for the Federal Public Service newsletter to discover the world of data science and find opportunities to collaborate with peers.

Learn about data science

Learn about digital government

Training and tools

Training

Tools

Communities of practice

Data science projects

Data science plays an important role at Statistics Canada. All across the agency, new data science methods are being used to make our projects more efficient and provide better data insights to Canadians.

Project categories

Contact the Data Science Centre for more information on Statistics Canada's data science projects.

Natural language processing

Event Detection and Sentiment Indicators

Statistics Canada is developing a tool to detect specific economic events by analyzing millions of news articles. The tool uses machine learning algorithms to research and summarize information from the articles and organize the data into an informative dashboard. Time that was once spent on research can now be spent analyzing the reasons for economic changes.

The agency is also exploring the development of sentiment indicators to measure economic tendencies and their connection with key economic variables. Based on positive and negative interpretations of economic-related news articles, these indicators could allow subject matter experts to gain better insights into economic trends by industry, and support the publication of near real-time economic indicators.

Retail Scanner Data

Statistics Canada publishes the total amount of products sold, as classified by the North American Product Classification System (NAPCS). Large scanner data bases are currently available from major retailers, with millions of records. Previously, products were assigned to NAPCS with dictionary-based coding in combination with manual coding when required, according to their description and other indicators. Statistics Canada uses machine learning to classify all of the product descriptions in the scanner data to the NAPCS and then obtains aggregate sales for each area. This approach has resulted in higher degree of automation, as well as in accurate, detailed retail data and a reduced response burden for major retailers.

Survey of Sexual Misconducts in the Canadian Armed Forces Comment Classification

Data scientists at Statistics Canada created a machine learning model to automatically classify the electronic comments from respondents of the Survey of Sexual Misconduct in the Canadian Armed Forces (SSMCAF). The SSMCAF required automated classification of comments into five categories: personal story, negative, positive, advice for content, and other. The machine learning model coded 6,000 comments from the first 2018 survey cycle with 89% accuracy for French and English comments. This approach will be expanded to other surveys at Statistics Canada.

Census 2021 Comments Classification

Statistics Canada has developed a machine learning algorithm to classify 1.8 million French and English respondent comments from the 2021 Census. This algorithm quickly and objectively classifies comments into different classes. The model is trained on comments from the 2016 Census and the 2019 Census test. Respondent feedback is used to support decision making regarding content determination for the next census and to monitor factors such as respondent burden. Visit 2021 Census Comment Classification for more information about this project.

Canadian Coroner and Medical Examiner Database (CCMED) Dynamic Topic Modelling

Statistics Canada has designed and deployed a dynamic topic modelling system. This system uses data from the Canadian Coroner and Medical Examiner Database to detect emerging narratives on causes of death. The objective is to provide analysts with patterns of death over time. For more information, please visit Topic Modelling and Dynamic Topic Modelling: A Technical Review.

Canadian Export Reporting System Text Classification

The Canada Border Services Agency (CBSA) and Statistics Canada recently developed a new web-based reporting tool for Canadian exporters to non-US countries called the Canadian Export Reporting System (CERS). CERS requires that an exporter self-code their goods' Harmonized System (HS) code plus an additional text description for more information for CBSA. The Data Science Division, in partnership with the International Accounts and Trade Division (IATD), developed a FastText machine learning model to classify the additional text descriptions for the exported commodities to the HS codes so that IATD can use them to validate the self-coded HS codes provided by the exporters. The motivation for adding this validation is that analysis of the data from the previous systems revealed inconsistencies between the product description and the code chosen by the exporter. With the move to CERS, electronic reporting is now mandatory and may result in an increase of cases with such inconsistencies, which is why an automated solution for review is being developed.

Image classification

In-Season Crop Classification

Monitoring the production of farms in Canada is an important but costly undertaking. Surveys and in-person inspections require a large amount of resources, and the current approach to predicting crop yields is time-consuming. For these reasons, Statistics Canada is modernizing crop classification using an image classification approach. An automated pipeline is used to download and process freely available Landsat-8 satellite imagery throughout the crop season.

Crop types are predicted using satellite imagery and the application of neural networks. The new model estimations are then used to update a database, allowing end users to acquire the most up-to-date estimates throughout the crop season. Initial results show that this approach is much faster and will reduce the survey response burden for farm-owners, especially during the busy times of the year.

Geo-Spatial Construction Starts Using Satellite Images

Canadian Mortgage and Housing Corporation tracks the starts and completions of residential building construction projects across Canada, and results are used by Statistics Canada to calibrate estimates for its Investment in Building Construction program. Statistics Canada has been employing various data science methods to detect construction starts from satellite images, such as using image augmentation to diversify and increase the data set. These methods enabled data scientists to detect the area of the building in the pre-foundation and foundation phase. The process of pre-foundation consists of creating footings and concrete slabs to support the foundation walls, including excavation. The foundation is part of a structural system that supports and anchors the superstructure of a building. AI model building and evaluation required the processing of more than 1,400 km2 imagery of 50cm resolution over many months for which a highly scalable and efficient processing pipeline was created. The developed artificial intelligence algorithms might eventually lead to more accurate and timely data, while aiding in eliminating existing data gaps for the non-residential sector and small/remote communities excluded from the current survey.

Agriculture Greenhouse Detection Using Aerial Images

The greenhouse project has been using Earth Observation data to identify greenhouses and measure their total area in Canada, in addition to a proof of concept to determine our ability to classify greenhouses based on their produce inside of the greenhouses, and the type of greenhouses (Glass or plastic cover). In an effort to produce more timely estimates and reduce the need for survey respondents, data scientists at Statistics Canada are working to automate the identification process using machine learning, administrative data sources and other technologies, such as satellite imagery and high resolution aerial imagery.

PDF extraction

Extraction of Economic Variables from Financial Reports

Statistics Canada has been applying data science solutions to extract information from PDFs and other documents in a timely and more efficient manner. For example, Statistics Canada has been experimenting with the historical dataset for SEDAR, a system used by publicly-traded Canadian companies, to file securities documents to various Canadian securities commissions.

Statistics Canada's data scientists developed a state-of-the-art machine learning pipeline that correctly identifies and extracts key financial variables (e.g. total assets) from the correct table (e.g. balance sheet) in an annual financial statement of a company. The algorithm used for table extraction called SLICE (Spatial Layout-Based Information and Content Extraction) was developed within Statistics Canada and made open-source under MIT license. SLICE is a unique computer vision algorithm that simultaneously uses textual, visual and layout information to segment pages into a tabular structure. The pipeline therefore turns a large amount of unstructured public documents from SEDAR into structure datasets, allowing the automation of information extraction related to Canadian companies. This significantly reduces the manual hours spent identifying and capturing the required information and reduces data redundancy within the organization by providing a one-point solution to access information.

Public Sector Statistics Division Scanned PDF Extraction

Public Sector Statistics Division (PSSD) at Statistics Canada receives financial statements from provincial governments and their respective municipalities on a quarterly and annual basis. These statements are in text-based and scanned PDF format, and store valuable information in tables. Each row of the table contains numerical values which must be manually extracted and stored in a database for further analysis, but this manual process is time-consuming and subject to human error. Data scientists at Statistics Canada developed a proof-of-concept that involves extracting financial data from reported financial statements using an in-house machine learning algorithm and displaying them in a tabular format that can be edited by the analysts. Additionally, the data is auto-coded and records of previous and current year numerical values are provided. Once the project transitions to production, it will reduce data redundancy within the organization by providing a one-point solution to access information, as well as save manual work hours identifying and capturing required information by analysts in the PSSD.

Predictive Analytics

Nowcasting of Economic Indicators

Many initiatives at Statistics Canada work towards near real-time estimates and the production of advanced indicators for many of the agency's key data series. In the Investment in Building Construction Program building permit values are a key series for which an early indicator via nowcasting could be produced. To facilitate the effort, an analytical cloud environment was created which allows analysts to leverage timely external data and advanced time series models. An extensive time series database with economic time series (from Statistics Canada programs), external open data, temperature sensor data and stock market data were created. This environment may potentially pave the way towards a generalized Nowcasting System at Statistics Canada. Exploratory analysis was conducted to apply nowcasting models including ARIMA-X, PROPHET and the machine learning algorithm XGBoost in nowcasting several economic indicators including monthly building permit values. It was found that ARIMA-X and PROPHET performed similarly in terms of mean absolute percentage error and mean directional accuracy while XGBoost with external open data did not perform as well.

Crop Yield Predictions

Statistics Canada recently completed a research project for the Field Crop Reporting Series (FCRS) on the use of machine learning , specifically supervised regression techniques, for early-season crop yield prediction. The objective was to investigate whether this could be used to improve the precision of the existing crop yield prediction method, while also reducing the survey response burden for busy farm operators. The main contribution of the research project was the adaptation of rolling window forward validation (RWFV) as validation protocol. RWFV is a special case of forward validation, a family of validation protocols designed to prevent temporal information leakage for supervised learning based on time series data. Our adaptation of RWFV enabled a customized validation protocol that realistically reflects the statistical production context of the FCRS. Visit Use of Machine Learning for Crop Yield Prediction for more details on the technical side of this project.

Hospital Occupancy Forecast

Data scientists at Statistics Canada are helping in the fight against COVID-19 by creating short-term hospital occupancy forecasts based on two daily inputs using Ottawa hospital data as a testcase. The inputs are daily new hospital admissions counts and hospital midnight in-patient headcounts. Admission Forecasts are determined by using two hierarchical Bayesian models. The first input models the random delay between the unobserved event of COVID-19 infection and hospital admission, for the subgroup of infected individuals who will be hospitalized due to COVID-19. The second input models the random delay between hospital admission and discharge/death.

A series of 25 consecutive weeks of mock forecasts based on real data was performed to assess the effectiveness of the forecast model. The resulting credible bands, on the one hand, encompassed consistently the real hospitalization counts within one week after the respective training data cut-offs, and on the other hand, were sufficiently narrow to be informative. The results of this project strongly suggest the feasibility of accurate and informative hospitalization forecasts at the municipality level, provided timely hospital admission and discharge/death data are available.

High Pandemic Hubs

Data scientists at Statistics Canada created a research project using a general machine learning framework to identify and predict health regions that could be considered vulnerable or at high-risk of increased COVID-19 infection rates. By identifying these regions, federal and provincial health authorities would be able to divert public health resources such as PPE or frontline workers from lower risk regions to higher risk regions; and would also able to contain cases in higher risk areas sooner with contact tracing and quarantine measures.

This effort also contributed to the creation of an interactive dashboard that could allow users to monitor COVID-19 cases and deaths at the health-region level and to choose among multiple risk prediction models and approaches.

Using COVID-19 Epidemiological Modelling to Inform Personal Protective Equipment Supply and Demand in Canada

At the beginning of the pandemic, there were concerns surrounding Personal Protective Equipment (PPE) preparedness in Canada and whether there was enough supply to support the healthcare sector, and other sectors of the economy throughout the pandemic. In response to this emerging need, Statistics Canada customized an existing epidemiological model to allow policy makers to stress-test the PPE supply under various epidemiological scenarios. Projections generated from this epidemiological model have been used by the PPE supply and demand model to compare on-hand and in-bound supplies with demand projections over twelve months. For more information, please visit Modelling SARS-CoV-2 Dynamics to Forecast PPE Demand.

Optimal Social Distancing Policy via Reinforcement Learning

Data scientists at Statistics Canada collaborated with the Public Health Agency of Canada to develop a novel epidemiological modelling framework optimizing Non-Pharmaceutical Interventions using Reinforcement Learning. This model determines the best set of population behaviours to minimize the spread of an infection within simulations. Visit Non-Pharmaceutical Intervention and Reinforcement Learning for more details on this project.

Research

Statistics Canada's First Quantum Machine Learning Project: A collaboration with Université de Sherbrooke

Quantum computing—a new way of computing that uses principles of quantum mechanics to store and process information—holds a lot of promise as a solution for some computationally heavy processes and algorithms. Increasingly, governments and major companies are working to assess how quantum computing will impact their businesses in the near future.

As of June 2021, Statistics Canada is collaborating with the Université de Sherbrooke to explore the potential of quantum computing and identify opportunities early on in its development. The six month-long project marks the first collaboration between Statistics Canada and the Quantum Hub at Université de Sherbrooke's Institut quantique (IQ). The Quantum Hub offers its members cloud-based access to advanced quantum computing systems, as well as a community of experts to support quantum research projects.

The project will explore ways to optimize the agency's machine learning processes and text classification computations, and how this technology could be used to support Statistics Canada's goal of providing high-quality data and insights to Canadians.

Homomorphic Encryption

Data security remains one of the highest priorities at Statistics Canada. Our data scientists are training a machine learning text classifier to use homomorphic encryption to protect data while they're being processed. The data are protected at two points. The first point, located at their ingestion points, allows data files to be processed remotely or on the cloud. The second point is at their dissemination points–this allows accredited external researchers at virtual labs access to more data in a secure manner. The use of homomorphic encryption not only ensures data protection but also acts as a solution for outsourcing computation. Visit A Brief Survey of Privacy Preserving Technologies for more information on homomorphic encryption and other privacy-preserving approaches.

A Novel Estimation Method for Non-Probability Samples

Probability samples allow reliable estimation of population characteristics and have been successfully used in statistics for many decades. However, due to rising costs and declining response rates, researchers have begun to develop theory for reliable estimation based on alternative data sources. Non-probability samples, such as web-based opt-in panels, are often relatively easy and inexpensive to obtain, but may suffer from severe self-selection bias where traditional estimation techniques cannot be applied. To help with this, researchers at Statistics Canada have developed nppCART, a novel estimation methodology for non-probability samples. nppCART attempts to correct for the self-selection bias by incorporating additional information from an auxiliary probability sample. nppCART is a variant of the well-known CART algorithm, and may be considered a nonparametric method. It was conceived with the hope that its nonparametric nature may be more useful against nonlinearity or complex interactions among predictor variables than existing non-probability sample estimation techniques. Visit the 2019 Annual Meeting in Calgary site for resources on the project.

Framework

Framework for Responsible Machine Learning Processes

Machine learning is becoming an increasingly integral part of many projects at Statistics Canada. Data scientists are looking to implement a responsible framework for machine learning and artificial intelligence applications that are transitioning to production. The framework includes an evaluation of the project through the use of a checklist, followed by a peer review of the project. As a final step, the methodology is presented to the Scientific Review Committee. The goal of this project is to establish a review process that ensures responsible machine learning processes are put into production while promoting good and ethical data science practices. This framework will also guide data scientists as they develop new projects. For more information, please visit Responsible Machine Learning at Statistics Canada.

Data science expertise

The agency's data scientists are experts in artificial intelligence and machine learning, leading the agency in data science-related research and development.

The data scientists are pioneering new technologies and innovative data science methods, offering expertise in image processing, natural language processing, integration of cloud tools, traceability methods, privacy preserving techniques, information retrieval and much more!

These experts have many areas of specialization, including supervised and unsupervised learning, artificial neural networks, reinforcement learning, data cloud engineering and more.

At Statistics Canada, these innovative methods are used to make more meaningful, powerful data insights.

Mission: building data science capacity

Statistics Canada's data science mission is to expand the capacity of data science and analytics within the Government of Canada and beyond.

What are the keys to building data science capacity?

Trust—Deliver concrete results while adhering to high ethical standards at all times to build trust in data science methods.

Innovation—Statistics Canada's data scientists are committed to identifying and adopting the latest data science practices to deliver fast results.

Quality—Statistics Canada's data science methods follow rigorous practices, including internal reviews of projects, to ensure high-quality results and valid statistical inference.

Collaboration—The agency is working with partners across the Government of Canada, academia, international partners and other members of the data science community to learn from one another and share leading-edge data science methods.

What are the benefits of data science for Canadians?

Data science allows Statistics Canada to better serve Canadians by creating high-value products and services. By applying the latest machine learning and artificial intelligence practices, the agency is able to quickly process large data sets in shorter periods of time, supporting the need for increasingly nuanced data to better understand our country and our economy.

Machine Learning can also be used to make sense of unstructured data such as images or sensor data, quickly classify large amounts of information, summarize and extract key information from narratives, provide predictions and assist with research.

Providing timely, high-quality information

As information needs continue to expand, it is critical for national statistical agencies to apply these innovative solutions to support evidence-based decision making. There are many benefits to data science for Canadians, including:

  • faster, timelier access to data products
  • more accurate results
  • more detailed, granular data
  • reduced response burden on households and businesses.

These solutions also benefit Statistics Canada by giving data scientists the ability to process large amounts of unstructured data, eliminating manual work and reducing costs without compromising data quality.

Data science at Statistics Canada

As the world around us continues to evolve and change rapidly in the digital age, the importance of data and how they are used is critical.

Data science is a rapidly evolving field that can tap into the power of data and empower governments to serve citizens more effectively and efficiently. As the role of national statistical organizations continues to change and expand, these organizations must adapt and embrace new technologies and innovative thinking to support the information needs of society.

Statistics Canada is one of the leaders in the Government of Canada's adoption of data science and artificial intelligence. By taking a collaborative approach to data science, the agency is pushing the boundaries of modernization and harnessing the power of new approaches and technologies to better serve Canadians.

Data science supporting the COVID-19 response

Data science allows statistical agencies to respond quickly to changing economic and social situations. Statistics Canada is using the power of data science to support the COVID-19 response in Canada.

The agency collaborated with Health Canada to visualize the supply and demand information for Personal Protective Equipment (PPE). Before the data visualization could begin, the data needed to be extracted and ingested. The data were coming daily from many different sources (different provincial/territorial governments, other federal departments and private sector companies that had been hired to help source the PPE) and in many different formats (e.g. Word documents, Excel files, PDFs) and required a significant amount of manual work to create standardized reports.

To improve this process, data scientists at Statistics Canada created an algorithm that parses the data into different data entries. Machine learning was used to identify numbers and dates within the text. The structured data were then presented in a PowerBI dashboard that was shared with other government departments to meet their information needs and better understand the supply and demand for PPE in Canada.

For more information on Statistics Canada's response to COVID-19, visit COVID-19: A data perspective portal.

Commitment to privacy and security

As Statistics Canada continues to implement new technologies and innovations, the agency's commitment to protecting privacy and security remains the highest priority. The agency has rigorous measures in place to preserve confidentiality and privacy in the modern digital era.

The amount of data we gather and use and the power of the insights they generate are increasing rapidly. It is known that data are vulnerable throughout its lifecycle: at rest, in-transit and during computation or processing. While the security mechanisms for data protection while at rest (e.g. Symmetric Key Encryption) and in-transit (e.g. Transport Layer Security) are well studied, Privacy Preserving Technologies have emerged in recent years to provide data protection while enabling data processing, such as in statistical analyses.

Privacy Preserving Technologies, or Privacy Preserving Computation Techniques, is a generic term that covers a broad range of approaches that promise to provide protection while collecting the data, processing it and disseminating the results. These approaches are homomorphic encryption, secure multi-party computation, differential privacy, trusted execution environments and zero-knowledge proofs. The data scientists at Statistics Canada are exploring the use of these existing and emerging privacy preserving technologies to continuously address the privacy preservation needs for highly sensitive data. This will also allow for alternative storage options to permit secure remote computing on encrypted data, to benefit from potential multi-party computation opportunities and to derive insights from distributed and inaccessible data.

For more information on how Statistics Canada protects data, visit Statistics Canada's Trust Centre.

Visit Data science projects at Statistics Canada to see data science in action!

Data Science Centre

Data Science Centre

In this rapidly-changing digital era, statistical agencies need to find innovative ways to harness the power of data. Statistics Canada is embracing the possibilities of data science to better serve the information needs of Canadians.

Data science at Statistics Canada

Data science at Statistics Canada

Statistics Canada is one of the leaders in the Government of Canada’s adoption of data science and artificial intelligence. Find out about the benefits of data science and how they are being used at Canada’s national statistical agency.

Data Science Network for the Federal Public Service

Data Science Network for the Federal Public Service

Join a community of data science enthusiasts to learn all about data science in the public service, collaborate on projects, share information on the latest tools, and much more.

Mission: building data science capacity

Mission: building data science capacity

Learn about Statistics Canada’s mission to expand the capacity for data science within the Government of Canada and beyond.

Data science expertise

Data science expertise

Discover the various areas of expertise of Statistics Canada’s data scientists who are leading the way with cutting-edge research and development.

Data science projects

Data science projects

Explore some of the agency’s innovative projects that are fueled by data science using natural language processing, satellite images, neural networks and other cutting-edge techniques.

Data science resources

Data science resources

Learn more about data science with these helpful resources.

Contact

Contact the Data Science Centre for more information about data science at Statistics Canada.

Canadian Centre for Energy Information (CCEI)

Consultation objectives

The Canadian Centre for Energy Information (CCEI) is an independent one-stop shop for comprehensive energy data and expert analysis. The centre compiles, reconciles and integrates energy data from a number of Canadian sources and makes data from multiple providers available free of charge on a user-friendly website. It works collaboratively to harmonize energy definitions, measurements and standards, and improve completeness, coherence and timeliness of Canada's energy information.

The CCEI is being developed by Statistics Canada in partnership with Canada Energy Regulator (CER), Natural Resources Canada (NRCan) and Environment and Climate Change Canada (ECCC). Statistics Canada launched the CCEI to expand publicly available data and analysis, and ensure all Canadians have access to centralized energy information.

The consultations ensured that the CCEI meets users' needs and identified any potential usability issues.

Consultation methodology

Statistics Canada conducted remote usability testing in both official languages with participants from across the country. Participants were asked to complete a series of tasks and to provide feedback on the product.

How participants got involved

This consultation is now closed.

Individuals who wished to obtain more information or to take part in a consultation were asked to contact Statistics Canada by sending an email to statcan.consultations@statcan.gc.ca.

Statistics Canada is committed to respecting the privacy of consultation participants. All personal information created, held or collected by the Agency is protected by the Privacy Act. For more information on Statistics Canada's privacy policies, please consult the Privacy notice.

Results

Overall, the beta version of the CCEI website was well-received by participants. They reported that it was easy to navigate and that it provided easy access to a variety of information.

Participants noted that the following areas worked:

  • The overall look and feel of the website
  • The icons and subjects on the home page
  • The inclusion of interactive features, such as data visualizations

Participants suggested that the following areas could be improved:

  • The use of space in the search results
  • The contextual information provided in the indicators
  • The organization of lists of resources throughout the website

After analysis, recommendations include:

  • Condense the search results as much as possible to allow users to easily browse through them
  • Ensure that relevant contextual information is available for the indicators
  • Ensure that the lists of datasets and publication allow users to easily sort through the content by organizing the lists logically and adding a sort or filter function

Statistics Canada thanks participants for their participation in this consultation. Their insights will guide the agency's web development and ensure that the final products meet users' expectations.

Retail Commodity Survey: CVs for Total Sales (April 2020)

Retail Commodity Survey: CVs for Total Sales (April 2020)
NAPCS-CANADA Month
202001 202002 202003 202004
Total commodities, retail trade commissions and miscellaneous services 0.58 0.60 0.53 0.56
Retail Services (except commissions) [561]  0.58 0.60 0.52 0.56
Food at retail [56111]  0.86 0.54 0.50 0.78
Soft drinks and alcoholic beverages, at retail [56112]  0.51 0.42 0.45 0.57
Cannabis products, at retail [56113] 0.00 0.00 0.00 0.00
Clothing at retail [56121]  1.01 0.72 0.94 1.64
Footwear at retail [56122]  1.17 1.27 1.80 3.64
Jewellery and watches, luggage and briefcases, at retail [56123]  5.07 5.19 10.71 31.84
Home furniture, furnishings, housewares, appliances and electronics, at retail [56131]  0.90 0.67 0.64 0.78
Sporting and leisure products (except publications, audio and video recordings, and game software), at retail [56141]  2.60 3.68 3.45 3.78
Publications at retail [56142] 8.20 6.64 8.24 12.62
Audio and video recordings, and game software, at retail [56143] 5.38 4.88 0.99 0.84
Motor vehicles at retail [56151]  1.79 1.98 2.11 2.39
Recreational vehicles at retail [56152]  3.98 4.74 4.73 4.70
Motor vehicle parts, accessories and supplies, at retail [56153]  1.46 1.51 1.71 2.03
Automotive and household fuels, at retail [56161]  2.34 2.50 1.98 1.95
Home health products at retail [56171]  2.91 2.81 2.28 2.66
Infant care, personal and beauty products, at retail [56172]  2.69 2.77 2.66 3.40
Hardware, tools, renovation and lawn and garden products, at retail [56181]  2.61 2.49 1.69 1.97
Miscellaneous products at retail [56191]  2.35 1.89 2.25 2.47
Total retail trade commissions and miscellaneous services Footnotes 1 1.41 1.47 1.62 1.79

Footnotes

Footnote 1

Comprises the following North American Product Classification System (NAPCS): 51411, 51412, 53112, 56211, 57111, 58111, 58121, 58122, 58131, 58141, 72332, 833111, 841, 85131 and 851511.

Return to footnote 1 referrer

Data science terminology

Application Programming Interface (API)
Collection of software routines, protocols, and tools which provide a programmer with all the building blocks for developing an application program for a specific platform (environment). An API also provides an interface that allows a program to communicate with other programs, running in the same environment. (BusinessDictionary.com)
Artificial Intelligence (AI)

Artificial intelligence is a field of computer science dedicated to solving cognitive problems commonly associated with human intelligence such as learning, problem solving, visual perception and speech and pattern recognition.

Artificial Intelligence System

A technological system that uses a model to make inferences to generate output, including predictions, recommendations or decisions.

Corpus
In linguistics, corpus is referred to as a large and structured set of texts. In the context of topic modelling, a corpus is a set of documents and each document is viewed as a mixture of topics that are present in the corpus. (wikipedia.org)
Data Science
Data Science is an interdisciplinary field that uses scientific methods and algorithms to extract information and insights from diverse data types. It combines domain expertise, programming skills and knowledge of mathematics and statistics to solve analytically complex problems.
Deep Learning
Subset of machine learning that imitates the workings of the human brain in processing data and improves performance. Typically, a multi-level algorithm that gradually identifies things at higher levels of abstraction. For example, the first level may identify certain lines, then the next level identifies combinations of lines as shapes, and then the next level identifies combinations of shapes as specific objects. Deep learning is popular for image classification. (www.datascienceglossary.org)
Event
An event in the Unified Modeling Language (UML) is a notable occurrence at a particular point in time. Events can, but do not necessarily, cause state transitions from one state to another in state machines represented. (wikipedia.org)
Latent variables
Latent variables are variables that are not directly observed but are rather inferred (through a mathematical model) from other variables that are observed (directly measured). Mathematical models that aim to explain observed variables in terms of latent variables are called latent variable models. (datascienceglossary.org)
Machine Learning (ML)

"Machine learning is the science of getting computers to automatically learn from experience instead of relying on explicitly programmed rules, and generalize the acquired knowledge to new settings."

United Nations Economic Commission for Europe's Machine Learning Team (2018 report)
The use of machine learning in official statistics.

In essence, Machine Learning automates analytical model building through optimization algorithms and parameters that can be modified and fine-tuned.

Machine Learning Algorithms
Machine learning algorithms use computational methods to "learn" information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. (Mathworks.com)
Machine Learning Model
A digital representation of patterns identified in data through automated processing using an algorithm designed to enable the recognition or replication of those patterns.
Natural Language Processing (NLP)
Natural language processing (NLP) is a method to translate between computer and human languages. It is a method of getting a computer to understandably read a line of text without the computer being fed some sort of clue or calculation. In other words, NLP automates the translation process between computers and humans. (techopedia.com)
One-hot vector
In NLP, a one-hot vector is a 1 x N matrix (vector), made of 0 and 1, used to distinguish each word in a vocabulary from every other word in the vocabulary. One-hot encoding ensures that machine learning does not assume that higher numbers are more important. For example, 'laughter' is not more important than 'laugh' when both words are represented in the vector. (wikipedia.org)
Parsing
Breaking a data block into smaller chunks by following a set of rules, so that it can be more easily interpreted, managed, or transmitted by a computer. Spreadsheet programs, for example, parse a data to fit it into a cell of certain size. (businessdictionary.com). ML algorithms can also be used to parse data.
Poisson process

A Poisson Process is a model for a series of discrete events where the average time between events is known, but the exact timing of events is random. A Poisson Process meets the following criteria: (towardsdatascience.com)

  • Events are independent of each other. The occurrence of one event does not affect the probability another event will occur.
  • The average rate (events per time period) is constant.
  • Two events cannot occur at the same time
Python
A programming language available since 1994 that is popular with people doing data science. Python is noted for ease of use among beginners, and great power when used by advanced users, especially when taking advantage of specialized libraries such as those designed for machine learning and graph generation. (datascienceglossary.org)
R
An open-source programming language and environment for statistical computing and graph generation available for Linux, Windows, and Mac. (datascienceglossary.org)
Reinforcement Learning (RL)
Reinforcement Learning (RL) is a sub-field of Machine Learning involving a controller (termed an agent) capable of taking actions in the form of decisions within a system. After each decision is made by the controller, the system evolves to a new state and the controller receives a measure of utility. By trial and error, the controller learns from its experience to optimize an action selection strategy that maximizes the expected cumulative utility within the system. RL is typically used to solve problems that can be modelled as sequential decision processes.
Robotic Process Automation (RPA)
Robotic process automation (RPA) is the term used for software tools that partially or fully automate human activities that are manual, rule-based, and repetitive. They work by replicating the actions of an actual human interacting with one or more software applications to perform tasks such as data entry, process standard transactions, or respond to simple customer service queries. (aiim.org)
Semantic
Semantics can address meaning at the levels of words, phrases, sentences, or larger units of discourse. In machine learning, semantic analysis of a corpus is the task of building structures that approximate concepts from a large set of documents. It generally does not involve prior semantic understanding of the documents. (wikipedia.org)
Stochastic optimization
Stochastic optimization methods are optimization methods that generate and use random variables. For stochastic problems, the random variables appear in the formulation of the optimization problem itself, which involves random objective functions or random constraints. Stochastic optimization methods also include methods with random iterates. (wikipedia.org)
Supervised Learning
A type of machine learning algorithm in which a system is taught via examples. For instance, a supervised learning algorithm can be taught to classify input into specific, known classes. The classic example is sorting email into spam versus non-spam. (datascienceglossary.org)
Unsupervised Learning
A class of machine learning algorithms designed to identify groupings of data without knowing in advance what the groups will be. (datascienceglossary.org)
Web scraping
Web scraping is a term for various methods used to collect information from across the Internet. Generally, this is done with software that simulates human Web surfing to collect specified bits of information from different websites. (techopedia)