Data Science Network for the Federal Public Service (DSNFPS)

Updates to this page are on hold until the end of the election period (in accordance with the Caretaker Convention).

Responsible use of machine learning at Statistics Canada

Responsible use of machine learning at Statistics Canada

As the volume and velocity of data continues to increase, Statistics Canada is embracing this new reality and conducting many projects that use machine learning methods to deliver insights to Canadians. While there are many benefits to this approach, the agency recognized the need for a framework to guide the development of machine learning processes in a responsible, accountable manner. This article gives an overview of the Framework for Responsible Machine Learning Processes at Statistics Canada.

Deploying your machine learning article as a service

Deploying your machine learning project as a service

Software engineering practices and deployment operations are commonly overlooked Software Development Life Cycle steps in machine learning projects. Many projects never live long enough to be shared among multiple teams because of the complex setup, lack of scalability or failure to deploy. The capability to share work is as important as the work itself. Developing solutions as shareable products gives the work an excellent advantage for reusability, collaboration, deployment and future development. This article shares five relevant practices and open-source tools that can help deploy any machine learning project as a service: version control, documentation, REST APIs, containerization and modular programming.

Enabling responsible use of automated decision systems in the federal government

Automated decision systems are technologies used to assist or replace decision-making by humans. The Treasury Board Directive on Automated Decision-Making is the Government of Canada policy instrument which helps ensure the responsible and ethical use of automated decision systems (including systems using artificial intelligence) by federal institutions. This article highlights the importance of the Directive to data scientists by describing situations where the Directive applies and giving an overview of some of its requirements.

An image is worth a thousand words: let your dashboard speak for you!

Dashboards are powerful tools that can be used to consolidate information, observe trends and monitor the performance of models to facilitate decision-making. A team of data scientists at Statistics Canada built dashboards to visualize millions of COVID-19-related news articles and economic events. This article explores two projects that took different approaches to building dashboards, and highlights what makes some solutions more desirable than others.

From Exploring to Building Accurate Interpretable Machine Learning Models for Decision-Making: Think Simple, not Complex

The machine learning (ML) technology for building interpretable models from scratch has advanced considerably since the early models from the 1980s. A spectrum of techniques now exist, which deliver different levels of interpretability and accuracy for differing applications. This article explores how accurate simple models, (labeled as “interpretable”), can be used as a tool to help debug scenarios that rely on more complex, “black box” models. Current advances also offer advantages and already allow many opportunities for choosing understandable and simple, prediction models for decision-making.

Topic Modelling and Dynamic Topic Modelling : A technical review

Topic Modelling and Dynamic Topic Modelling : A technical review

Topic modelling is a form of unsupervised learning used for discovering the topics that occurs in a collection of documents (called a corpus). Dynamic topic modelling, however, keeps track of how the topics vary over time within the corpus. Latent Dirichlet Allocation (LDA) is an example of a topic model commonly used in the machine learning community. This article provides a technical review of the LDA model and its application in identifying emerging topics from the Canadian Coroner and Medical Examiner Database (CCMED).

Official Languages in Natural Language Processing

Official Languages in Natural Language Processing

As you may know, English is the most studied language in the field of Natural Language Processing (NLP). Given the prevalence of English on the internet and in technology, most methods or tools are first developed and optimized for it. This results in an imbalance between the two official languages in terms of available resources when applying NLP techniques. This can be a significant challenge for Government of Canada data science practitioners, who must ensure treatment of equivalent quality for both French and English data.

Non-Pharmaceutical Intervention and Reinforcement Learning

To reduce the spread of COVID-19, and reduce hospitalizations and deaths resulting from infections, provinces and territories across Canada have imposed restrictions and limitations on the population based on Non-Pharmaceutical Intervention (NPI) strategies. One of the many factors that plays a role into the selection of NPIs is the use of epidemiological modelling. Using Reinforcement Learning, data scientists at Statistics Canada, in collaboration with partners at the Public Health Agency of Canada, helped determine the optimal set of population behaviours that minimize the spread of an infection within simulations to model and/or forecast the effect of a set of specific scenarios.

Greenhouse Detection with Remote Sensing and Machine Learning: Phase One

A modernization effort is underway at Statistics Canada to replace agricultural surveys with more innovative data collection methods. A key part of this modernization is the use of remote sensing classification methods of land use mapping and building detection from satellite imagery. This project explored machine learning techniques to detect the total area of greenhouses in Canada from satellite imagery.

Writing a Satellite Imaging Pipeline, Twice: A Success Story

A story about the writing, rewriting and writing again, of a satellite imaging pipeline, and how computing costs were cut from an estimated $80,000 to $200. Follow along for adventures with cloud computing, kubernetes, consultants, two different pipeline orchestrators, and lessons learned about the development process, collaboration between data engineering and data science, and the timely business of open source licenses.

2021 Census Comment Classification

2021 Census Comment Classification

In an effort to improve the analysis of the 2021 Census of Population comments, Statistics Canada’s Data Science Division worked in collaboration with Census Subject Matter Secretariat to create a proof of concept on the use of machine learning techniques to quickly and objectively classify census comments. In addition to classifying comments by subject matter area, the model also sought to classify comments regarding technical issues and privacy concerns.

A Brief Survey of Privacy Preserving Technologies

Big data technologies such as deep learning have increased the utility of data exponentially, and cloud computing has been an enabling vehicle for this, in particular when working with unclassified data. However, computations of unencrypted sensitive data in a cloud environment may expose this data to confidentiality threats and cyber-security attacks. To address the new requirements for operating in the cloud, we consider a class of new cryptographic techniques called Privacy Preserving Technologies (PPTs) that might help increase utility by taking greater advantage of technologies such as the cloud or machine learning while continuing to preserve the security of information resources.

First Data Science Network Directors' Meeting

On November 25, 2020, senior managers involved in many facets of data science gathered for the first directors' meeting of the Data Science Network for the Federal Public Service. This meetings was an important stepping stone for the Network, as it continues to grow and expand its reach in the public service and beyond.

Use of Machine Learning for Crop Yield Prediction

Data scientists at Statistics Canada recently investigated how to incorporate machine learning techniques into an official statistics production environment to improve the crop yield prediction method and how to evaluate prediction methods meaningfully within the production context.

NRCan’s Digital Accelerator: Revolutionizing the way NRCan serves Canadians through digital innovation

Natural Resources Canada (NRCan) has been integrating advanced analytics into its science and research programs and aims to lead the digital transformation of the natural resource sector. Learn how their Digital Accelerator supports the exploration of innovative applications and the development of strategic partnerships to augment NRCan’s expertise.

Version Control with Git for Analytics Professionals

Analytics and data science workflows are becoming more complex than ever before, and the need to enable collaboration among team members is one that has parallels in classic computer science workflows. We take a look at leveraging Git and applying it to collaboration problems faced by analytics teams.

Using data science and cloud-based tools to assess the economic impact of COVID-19

As COVID-19 continues to impact the economy at a rapid pace, it is more important than ever for Canadians and businesses to have reliable information to understand these changes. A team of data scientists and analysts at Statistics Canada are working hard to meet this information need by automating the extraction and near real-time analysis of text data from a variety of sources.

Protected workloads on public cloud

This summer saw an increased need for flexible services that could be accessed outside of traditional networks and scale rapidly, all while maintaining the security of information entrusted to the public service. The opportunity for data science to provide timely insights to help decision makers and the public alike has never been so great, but at the same time data scientists need to be able to ensure data and workflows operate in secure environments.

The COVID-19 cloud platform for advanced analytics

As Canadians grew increasingly concerned about the impact of COVID-19 on our society and our economy in March 2020, Statistics Canada set to work collecting vital information to support citizens and critical government operations during unprecedented times.

Co-op student explores the power of Big Data

A look at what it is like to be a co-op student within the Data Science Division at Statistics Canada, this article highlights the experience of Mihir Gajjar, a co-op student from Simon Fraser University’s (SFU) Big Data program.

Date modified: