Data Science Network for the Federal Public Service (DSNFPS)

The information in these articles is provided 'as-is' and Statistics Canada makes no warranty, either expressed or implied, including but not limited to, warranties of merchantability and fitness for a particular purpose. In no event will Statistics Canada be liable for any direct, special, indirect, consequential or other damages, however caused.

Recent articles

Computer vision models: seed classification project

Topics covered in this article: Computer Vision

By collaborating with members from inter-departmental branches of government, the Artificial Intelligence Lab team at the Canadian Food Inspection Agency leverages state-of-the-art machine learning algorithms to provide data-driven solutions to real-world problems and drive positive change.

Continue reading: Computer vision models: seed classification project


Introduction to Privacy Enhancing Cryptographic Techniques – Trusted Execution Environment

Topics covered in this article: Ethics and responsible machine learning

With the increasing popularity of connected devices and the prevalence of technologies, such as cloud, mobile computing, and the Internet of Things, organizations that handle Personally Identifiable Information must "mitigate threats that target the confidentiality and integrity of either the application, or the data in system memory" (The Confidential Computing Consortium, 2021). This article introduces Trusted Execution Environment, an environment built with special hardware modules that allows for a secure area inside the device.

Continue reading: Introduction to Privacy Enhancing Cryptographic Techniques – Trusted Execution Environment


Production level code in Data Science

Topics covered in this article: Other

Production level code in Data Science

This article emphasizes the importance of implementing production-level code in data science projects, highlighting its essential practices and benefits. It discusses scalability, robustness, and maintainability as key aspects for efficient deployment and integration of data science models. The article covers often overlooked practices like version control, testing, documentation, code reviews, reproducibility, style guidelines, type hints, error logging, data validation, low-maintenance code, and continuous integration and deployment (CI /CD). A real-life case study of the Administrative Data Preprocessing (ADP) project showcases the consequences of neglecting programming practices and the effectiveness of the Red Green refactoring approach. Prioritizing production-level code enables businesses to maximize data science investments, gain a competitive edge, and make informed decisions in the data-driven economy.

Continue reading: Production level code in Data Science


Indigenous Communities Food Receipts Crowdsourcing with Optical Character Recognition

Topics covered in this article: Computer vision Text analysis and generation

Many Canadians living in northern and isolated communities face increased costs related to shipping rates and supply chains. To better understand the challenges impacting food security, Statistics Canada's Data Science Division investigated crowdsourcing as a potential solution to collect food price data. This included assessing the feasibility of using optical character recognition (OCR) and natural language processing (NLP) to extract and tabulate pricing information from images of grocery receipts as well as developing a web application for the uploading and processing of receipt images. This article focuses on the text identification and extraction algorithm; the web application component is not presented in this article.

Continue reading: Indigenous Communities Food Receipts Crowdsourcing with Optical Character Recognition


Other recent articles

Browse articles by topic

Computer vision
Data processing and engineering
Predictive analytics
Text analysis and generation
Ethics and responsible machine learning
Other