Data science at Statistics Canada

As the world around us continues to evolve and change rapidly in the digital age, the importance of data and how they are used is critical.

Data science is a rapidly evolving field that can tap into the power of data and empower governments to serve citizens more effectively and efficiently. As the role of national statistical organizations continues to change and expand, these organizations must adapt and embrace new technologies and innovative thinking to support the information needs of society.

What is data science?

Data Science is an interdisciplinary field that uses scientific methods and algorithms to extract information and insights from diverse data types. It combines domain expertise, programming skills and knowledge of mathematics and statistics to solve analytically complex problems.

Statistics Canada is one of the leaders in the Government of Canada's adoption of data science and artificial intelligence. By taking a collaborative approach to data science, the agency is pushing the boundaries of modernization and harnessing the power of new approaches and technologies to better serve Canadians.

What is artificial intelligence?

Artificial intelligence is a field of computer science dedicated to solving cognitive problems commonly associated with human intelligence such as learning, problem solving, visual perception and speech and pattern recognition.

Data science supporting the COVID-19 response

Data science allows statistical agencies to respond quickly to changing economic and social situations. Statistics Canada is using the power of data science to support the COVID-19 response in Canada.

The agency collaborated with Health Canada to visualize the supply and demand information for Personal Protective Equipment (PPE). Before the data visualization could begin, the data needed to be extracted and ingested. The data were coming daily from many different sources (different provincial/territorial governments, other federal departments and private sector companies that had been hired to help source the PPE) and in many different formats (e.g. Word documents, Excel files, PDFs) and required a significant amount of manual work to create standardized reports.

To improve this process, data scientists at Statistics Canada created an algorithm that parses the data into different data entries. Machine learning was used to identify numbers and dates within the text. The structured data were then presented in a PowerBI dashboard that was shared with other government departments to meet their information needs and better understand the supply and demand for PPE in Canada.

For more information on Statistics Canada's response to COVID-19, visit COVID-19: A data perspective portal.

Commitment to privacy and security

As Statistics Canada continues to implement new technologies and innovations, the agency's commitment to protecting privacy and security remains the highest priority. The agency has rigourous measures in place to preserve confidentiality and privacy in the modern digital era.

The amount of data we gather and use and the power of the insights they generate are increasing rapidly. It is known that data are vulnerable throughout its lifecycle: at rest, in-transit and during computation or processing. While the security mechanisms for data protection while at rest (e.g. Symmetric Key Encryption) and in-transit (e.g. Transport Layer Security) are well studied, Privacy Preserving Technologies have emerged in recent years to provide data protection while enabling data processing, such as in statistical analyses.

Privacy Preserving Technologies, or Privacy Preserving Computation Techniques, is a generic term that covers a broad range of approaches that promise to provide protection while collecting the data, processing it and disseminating the results. These approaches are homomorphic encryption, secure multi-party computation, differential privacy, trusted execution environments and zero-knowledge proofs. The data scientists at Statistics Canada are exploring the use of these existing and emerging privacy preserving technologies to continuously address the privacy preservation needs for highly sensitive data. This will also allow for alternative storage options to permit secure remote computing on encrypted data, to benefit from potential multi-party computation opportunities and to derive insights from distributed and inaccessible data.

For more information on how Statistics Canada protects data, visit Statistics Canada's Trust Centre.

What is machine learning?

"Machine learning is the science of getting computers to automatically learn from experience instead of relying on explicitly programmed rules, and generalize the acquired knowledge to new settings."

United Nations Economic Commission for Europe's Machine Learning Team (2018 report)
The use of machine learning in official statistics.

In essence, Machine Learning automates analytical model building through optimization algorithms and parameters that can be modified and fine-tuned.

Visit Data science projects at Statistics Canada to see data science in action!

Date modified: