Market Basket Measure of poverty (2023-base) consultative engagement

Opened: September 2023
Closed: April 2024

Consultative engagement objectives

From Spring 2023 to Winter 2024, Statistics Canada will conduct consultation activities to gather input from Canadians to help validate how we measure poverty. These activities will be key to informing the Third Comprehensive Review of the Market Basket Measure (MBM).

The Market Basket Measure (MBM) was adopted as Canada's Official Poverty Line in 2019 following the release of Opportunity for All – Canada's First Poverty Reduction Strategy. According to the MBM, an individual or family is considered to live in poverty if their disposable income is insufficient to purchase a predetermined basket of goods and services required to achieve a modest, basic standard of living. Reviews and updates to the MBM methodology are required on a regular basis to ensure that the MBM continues to reflect these modest, basic living standards over time, and that key MBM parameters are sourced using the latest available data and methods. This process is referred to as a comprehensive review of the MBM.

The consultative engagement phase of the third comprehensive review of the MBM began with the publication of Launch of the Third Comprehensive Review of the Market Basket Measure. These consultative engagement activities will help inform decisions made to refine and update Canada's official poverty measure.

By participating in this consultation, you will support Statistics Canada's ability to accurately measure poverty.

How to get involved

This consultative engagement activity is now closed.

Statistics Canada is now at the stage of engaging with the public as part of the third comprehensive review of the Market Basket Measure (MBM), Canada's Official Poverty Line and we invite you to take part in these consultative engagement activities.

A series of public information sessions to explain the key features of the MBM have been planned for:

  1. Halifax, N.S. on October 31, 2023, at 1:00 p.m. ET / 2:00 p.m. AT (English)
  2. Montréal, Que. on November 7, 2023, at 1:00 p.m. ET (French)
  3. Ottawa, Ont. on November 14, 2023, at 1:00 p.m. ET (English)
  4. Saskatoon, Sask. on December 5, 2023, at 2:00 p.m. ET / 1:00 p.m. CT (English)
  5. Yellowknife, N.W.T. on November 21, 2023, at 1:00 p.m. ET / 11:00 a.m. MT (English)
  6. Vancouver, B.C. on November 28, 2023, at 1:00 p.m. ET / 10:00 a.m. PT (English)

The objective of these sessions is to engage with Canadians of different backgrounds and to hear from individuals working directly in the area of poverty reduction. These sessions will have options for both in-person and virtual attendance. The spaces available for in-person attendance are limited and will be assigned on a first come, first served basis through registration.

To register for your preferred session, please email statcan.mbm-mpc.statcan@statcan.gc.ca and indicate if you plan on attending virtually or in-person. Once you are registered, you will be sent an Outlook invitation containing a Microsoft Teams link for virtual attendance, and the address for in-person attendance.

Please note that these public information sessions are not intended for an expert or academic audience. Thematic workshops for an expert and academic audience will be scheduled in the first quarter of 2024. More information will be available closer to those dates.

The feedback collected from these sessions will allow Statistics Canada to identify key themes for discussion and future research.

For more information regarding the third comprehensive review of the MBM, you can consult the following resource: Launch of the Third Comprehensive Review of the Market Basket Measure or you may contact us by email at statcan.mbm-mpc.statcan@statcan.gc.ca.

Statistics Canada is committed to respecting the privacy of participants. All personal information created, held or collected by the agency is protected by the Privacy Act. For more information on Statistics Canada's privacy policies, please consult the privacy notice.

Results

Summary results of the engagement initiatives will be published when available.

Invitation to participate in the revision of the North American Product Classification System (NAPCS) Canada

Opened: August 2023

Introduction

Statistics Canada invites data producers and data users, representatives of business associations, government bodies at the federal, provincial and local levels, academics and researchers and all other interested parties to submit proposals for the revision to the North American Product Classification System (NAPCS) Canada.

Following the decision of Statistics Canada's Economic Standards Steering Committee (ESSC) on April 28, 2023 to institute a permanent consultation process for NAPCS Canada, proposals for changes to NAPCS Canada may be submitted and reviewed on an ongoing basis. Only a cut-off date for considering proposed changes to be included in a new version of NAPCS Canada will be instituted moving forward. For example, for NAPCS Canada 2027, the deadline for changes to be included has been set to the end of June 2025. For revisions beyond 2027, such a cut-off date will be maintained at about one and a half years prior to the release date of the new classification version based on the 5-year revision cycle.

As was done with NAPCS Canada 2017 (2 updates), in exceptional circumstances, when a consensus is reached among the data producers and users at Statistics Canada, the classification might be revised before the regular revision cycle of 5-years, as the way of 'evergreening' of the standard.

In the context of statistical classifications, evergreening refers to updating the classification and the related reference (index) file on a continuous basis with the objective of maintaining timeliness and relevance. Though, evergreening does not necessary result in the release of a new version of the classification every year. A decision to release a new version (before the end of the regular 5 years revision cycle) needs to be discussed and assessed by key classification stewards considering potential impacts on data and statistical programs.

Objectives

We are seeking proposals for changes for two main reasons:

  • collect input from data producers and users as an integral part of the NAPCS Canada revision process, and
  • ensure users' needs continue to be met, therefore the classification remains relevant.

Background

The first version of NAPCS Canada, known as provisional version 0.1, was published in 2007. The development of the classification started a few years earlier as a joint project of the national statistical agencies of Canada, Mexico and the United States (U.S.). The NAPCS project followed the adoption by the three countries of a common industry classification, the North American Industry Classification System (NAICS) in 1997. The purpose of the NAICS project was to develop a standard that allowed comparisons of industry data among the three participating countries. The development of NAPCS was meant to serve the same purpose for product statistics.

There are currently two types of NAPCS classification: NAPCS Canada and Trilateral NAPCS. Each NAPCS cooperating country has the choice to maintain its own version of NAPCS or a national product classification. Trilateral NAPCS is hence considered a reference classification. At the moment, only Canada has published a national version using the NAPCS acronym. Mexico is preparing for the release of their new national product classification. NAPCS Canada differs from Trilateral NAPCS in the aggregation structure. NAPCS Canada uses a more traditional aggregation structure, a supply-based attributes of products, more or less based on the industry of origin of products. On the other hand, the Trilateral NAPCS structure emphasizes demand-based attributes of products, such as the substitutability of products, the complementary nature of products or the similarity in markets being served by the products. The most detailed categories of NAPCS Canada have been defined so as to permit mapping into the most detailed trilateral categories. This means that the Canadian detailed categories can be re-organized using the trilateral aggregation structure. In effect, the trilateral aggregation structure becomes a regrouping variant of the Canadian aggregation structure, with few exceptions. The availability of data at the most detailed level of NAPCS Canada will continue to influence the extent of the trilateral work.

Since the creation of this product classification, NAPCS Canada, has been revised on a 5-year cycle, in 2012, 2017 and 2022.

NAPCS Canada was also revised in 2018 with NAPCS Canada 2017 Version 2.0, as Canada has started evolving towards adopting a permanent "evergreen" practice with regards to NAPCS, which means the updating of NAPCS Canada on an as-needed basis, with version updates between the standard 5-year revision milestones. These "evergreen" updates strive to be constrained to specific situations or cases. For instance, in NAPCS Canada 2017 Version 2.0, the classification was revised to account for new products created after Canada adopted a new law legalizing cannabis for non-medical use, with impacts on the whole Canadian economy and society.

Nature and content of proposals

Respondents are invited to provide their comments, feedback and suggestions on how to improve the NAPCS Canada content. They must outline their rationale for proposed changes.

No restrictions have been placed on content. Respondents may propose virtual (not affecting the meaning of a classification item) and real changes (affecting the meaning of a classification item, whether or not accompanied by changes in naming and/or coding). Examples of real changes, those that affect the scope of the classification items or categories (with or without a change in the codes), are: the creation of new classification items, the combination or decomposition of classification items, as well as the elimination of classification items. A classification item (sometimes referred to as a "class") represents a category at a certain level within a statistical classification structure. It defines the content and the borders of the category, and generally contains a code, title, definition/description, as well as exclusions where necessary. For NAPCS Canada, classifications items are: Group (3-digit), Class (5-digit), Subclass (6-digit) and Detail (7-digit).

Key dates for NAPCS Canada 2027 revision process

Here are key dates for the NAPCS Canada 2027 revision process:

  • Official public consultation period for changes proposed for inclusion in NAPCS Canada 2027: Ongoing to the end of June 2025. Beyond 2027, the cut-off date to incorporate approved changes from proposals into the new classification version will be about a year and a half before the release date of the next version of NAPCS Canada based on the 5-year revision cycle.
  • Completion of trilateral negotiations: July 2025
  • Public notice containing proposals in consideration for changes in NAPCS Canada 2027: November 2025
  • Public notice containing the final approved proposal for changes in NAPCS Canada 2027: February 2026
  • Public release of NAPCS Canada 2027 Version 1.0: February 2027

The next version of NAPCS Canada will be called NAPCS Canada 2027 Version 1.0.

Individuals and organizations wishing to submit proposals for changes in NAPCS Canada may do so at any time, in accordance with the permanent consultation process adopted by Statistics Canada with regards to NAPCS Canada.

Submitting Proposals

Proposals for NAPCS Canada revisions must contain the contact information of those submitting the change request:

  1. Name
  2. Organization (when an individual is proposing changes on behalf of an organization)
  3. Mailing address
  4. Email address
  5. Phone number

Should additional information or clarification to the proposal be required, participants might be contacted.

Proposals must be submitted by email to statcan.napcs-consultation-scpan-consultation.statcan@statcan.gc.ca.

Consultation guidelines for submitting proposals for change in NAPCS Canada

Individuals or organizations are encouraged to follow the guidelines below when developing their proposals.

Proposals should:

  • clearly identify the proposed addition or change to NAPCS Canada; this can include the creation of entirely new classes, or modifications to existing classes;
  • outline the rationale and include supporting information for the proposed change;
  • when possible, describe the empirical significance (i.e., revenue or sales, expenses, value-added, trade values, prices, volume of sales or production) of proposed changes, and especially real changes;
  • be consistent with classification principles (e.g., mutual exclusivity, exhaustiveness and homogeneity within categories);
  • be relevant, that is
    • describe the present analytical interest;
    • enhance the usefulness of data;
    • base the proposal on appropriate statistical research or subject matter expertise.

Please consider the questions below when preparing your input for the consultation on the revision of NAPCS Canada:

  • Are there products for which you cannot find a satisfactory NAPCS Canada code?
  • Are there products that you find difficult to place in NAPCS Canada?
  • Are any products missing?
  • Are there products or combinations of products that have significant economic value and analytical interest that you would like to see with a specific or separate NAPCS classification item (group, class, subclass or detail)?
  • Are there classification items you find difficult to use because their descriptions are vague or unclear?
  • Are there pairs of classification items you find difficult to distinguish from each other? Are there boundaries that could be clarified?
  • Are there products that you are able to locate in NAPCS Canada, but you would like to have them located in a different classification item or group of products? And Why?
  • Is the language or terminology used in NAPCS Canada in need of updating to be consistent with current usage?

Note that submissions do not need to cover every topic; you can submit your comments on your particular area(s) of concern only.

The following criteria will be used to review the proposals received:

  • consistency with classification principles such as mutual exclusivity, exhaustiveness, and homogeneity of products within categories;
  • have empirical significance as an industry output (goods or services), inputs to production, consumer expenditures, exports, imports, etc.;
  • data be collectable and publishable;
  • proposal can be linked to a funded program for data collection;
  • be relevant, that is, it must be of analytical interest, result in data useful to users, and be based on appropriate statistical research and subject-matter expertise;
  • be consistent with the Canadian System of National Accounts;
  • products which can possibly be used to construct price indexes;
  • products closely aligned with Trilateral NAPCS and other product classifications such as the Classification of Individual Consumption by Purpose (COICOP), the Central Product Classification (CPC), the Harmonized Commodity Description and Coding System (HS) and the Extended Balance of Payments Services (EBOPS);
  • special attention will be given to specific products, including:
    • new or emerging goods and services
    • products related to new or advanced technologies
    • bundles in general (of services in particular).

NAPCS Canada Classification Structure

NAPCS Canada is a 7-digit classification, grouped in a 4-level structure: groups formed by 3-digits, classes formed by 5-digits, subclasses by 6-digits and 7-digit details.

Changes may be proposed for any level, however changes to the 3, 4 and 5-digit levels have the most impact on the existing statistical programs using NAPCS Canada (national accounts, price indexes for businesses, international trade statistics, retail trade statistics, agriculture statistics, manufacturing statistics, etc.), and also on the correspondence to be maintained with Trilateral NAPCS. Any changes made to NAPCS Canada could have an impact on Trilateral NAPCS, which is subject to trilateral negotiation and approval of the three countries (though, Mexico will start concentrating more on its national product classification). Changes to the 7-digit of NAPCS Canada are less likely to have a huge impact on the current statistical programs or Trilateral NAPCS. Statistics Canada makes the final decision about changes to all levels of NAPCS Canada but needs to consider the impact on the alignment to Trilateral NAPCS, in particular to avoid conceptual misalignments and maintain comparability.

The North American Product Classification System (NAPCS) Canada 2022 Version 1.0 is the latest version of the classification for the participants of this consultation to base their input on. In the context of a permanent consultation process, persons or organizations proposing a change should always make sure they refer to the latest available version of NAPCS Canada.

Costs associated with proposals

Statistics Canada will not reimburse respondents for expenses incurred in developing their proposal.

Treatment of proposals

Statistics Canada will review all proposals received. Statistics Canada reserves the right to use independent consultants or government employees, if deemed necessary, to assess proposals.

If deemed appropriate, a representative of Statistics Canada will contact respondents to ask additional questions or seek clarification on a particular aspect of their proposal.

Please note that proposals will not necessarily result in changes to NAPCS Canada.

Official languages

Proposals may be written in either of Canada's official languages – English or French.

Confidentiality

Statistics Canada is committed to respecting the privacy of consultation participants. All personal information created, held or collected by the Agency is protected by the Privacy Act. For more information on Statistics Canada's privacy policies, please consult the Privacy notice.

Thank You

We thank all participants for their continued interest and participation in the various NAPCS Canada engagement activities.

Enquiries

If you have any enquiries about this process, please send them to statcan.napcs-consultation-scpan-consultation.statcan@statcan.gc.ca.

Date modified:

Invitation to participate in the revision of the North American Industry Classification System (NAICS) Canada

Opened: August 2023

Introduction

Statistics Canada invites data producers and data users, representatives of business associations, government bodies at the federal, provincial and local levels, academics and researchers and all other interested parties to submit proposals for the revision to the North American Industry Classification System (NAICS) Canada.

Following the decision of the Statistics Canada's Economic Standards Steering Committee (ESSC) on April 28, 2023 to institute a permanent consultation process for NAICS Canada, proposals for changes to NAICS Canada may be submitted and reviewed on an ongoing basis. Only a cut-off date for considering proposed changes to be included into a new version of NAICS Canada will be instituted moving forward. For example, for NAICS Canada 2027, the deadline for changes to be included has been set to the end of June 2025. For revisions beyond 2027, such a cut-off date will be maintained at about one year and half prior to the release date of the new classification version based on the 5-year revision cycle.

As was done with NAICS Canada 2017 (2 updates), in exceptional circumstances, when a consensus is reached among the data producers and users at Statistics Canada, the classification might be revised before the regular revision cycle of 5-years, as the way of 'evergreening' of the standard.

In the context of statistical classifications, evergreening refers to updating the classification and the related reference (index) file on a continuous basis with the objective of maintaining timeliness and relevance. Though, evergreening does not necessary result in the release of a new version of the classification every year. A decision to release a new version (before the end of the regular 5 years revision cycle) needs to be discussed and assessed by key classification stewards considering potential impacts on data and statistical programs.

Objectives

We are seeking proposals for changes for two main reasons:

  • collect input from data producers and users as an integral part of the NAICS revision process, and
  • ensure users' needs continue to be met, therefore the classification remains relevant.

Background

The North American Industry Classification System was released for the first time in 1997, with NAICS 1997. This classification was developed through the cooperation of Statistics Canada, Mexico's Instituto Nacional de Estadistica y Geografia (INEGI) and the Economic Classification Policy Committee (ECPC) of the United States. Each country maintains its own version of NAICS (NAICS Canada, NAICS U.S., and NAICS Mexico). The three country versions are generally the same with some differences found primarily in wholesale trade, retail trade and government, and at the 6-digit national industry level.

NAICS replaced the existing industry classification system used in Canada, which was the Standard Industrial Classification (SIC). Since then, NAICS Canada, U.S. and Mexico have been revised on a 5-year cycle in 2002, 2007, 2012, 2017 and 2022. The three NAICS partner agencies meet regularly to discuss possible changes to the common NAICS structure.

Canada has adopted a permanent "evergreen" practice with regards to NAICS, which means the updating of NAICS Canada on an as-needed basis, with version updates between the standard 5-year revision milestones, usually to adapt to exceptional circumstances if structural changes are approved. In fact, these "evergreen" updates strive to be constrained to specific situations or cases, e.g., in the cases of NAICS Canada 2017 Version 2.0 where changes were made to Internet publishing activities and NAICS Canada 2017 Version 3.0 where the classification was revised to account for new industries created after Canada has adopted a new law legalizing cannabis for non-medical use with impacts on the whole Canadian economy and society. These changes were approved based on a consensus following demands from data producers and users at Statistics Canada and externally. The intent remains to minimize revisions of the structure of NAICS between revision cycles, as we are trying to strike a balance between having a timely or relevant classification and maintaining historical data series, in particular from a National Accounts perspective.

We will continue to look at the best way to communicate to the public about revisions not affecting the structure or scope of NAICS categories between revision cycles (e.g., adding new activities to help for coding or identification of their placement in the classification, clarifying the texts/explanatory notes, etc.).

Nature and content of proposals

Respondents are invited to provide their comments, feedback, and suggestions on how to improve the NAICS content. They must outline their rationale for proposed changes.

No restrictions have been placed on content. Respondents may propose virtual (not affecting the meaning of a classification item) and real changes (affecting the meaning of a classification item, whether or not accompanied by changes in naming and/or coding). Examples of real changes, those that affect the scope of the classification items or categories (with or without a change in the codes), are: the creation of new classification items, the combination or decomposition of classification items, as well as the elimination of classification items. A classification item (sometimes referred to as a "class") represents a category at a certain level within a statistical classification structure. It defines the content and the borders of the category, and generally contains a code, title, definition/description, as well as exclusions where necessary. For NAICS, classifications items are: Sectors (2-digit), Subsectors (3-digit), Industry group (4-digit) and Industry (5-digit), and Canadian industry (6-digit).

Key dates for NAICS Canada 2027 revision process

Here are key dates for the NAICS Canada 2027 revision process:

  • Official public consultation period for changes proposed for inclusion in NAICS Canada 2027: Ongoing to the end of June 2025. Beyond 2027, the cut-off date to incorporate approved changes from proposals into the new classification version will be around a year and half before the release date of the next version of NAICS Canada based on the 5-year revision cycle.
  • Completion of trilateral negotiations: September 2025.
  • Public notice containing proposals in consideration for changes in NAICS Canada: November 2025.
  • Public notice containing the final approved proposal for changes in NAICS Canada: February 2026.
  • Public release of NAICS Canada 2027 Version 1.0: January 2027.

The next revised version of NAICS Canada will be called NAICS Canada 2027 Version 1.0.

Individuals and organizations wishing to submit proposals for changes in NAICS Canada may do so at any time, in accordance with the permanent consultation process adopted by Statistics Canada with regards to NAICS Canada.

Submitting Proposals

Proposals for NAICS Canada revisions must contain the contact information of those submitting the change request:

  1. Name
  2. Organization (when an individual is proposing changes on behalf of an organization)
  3. Mailing address
  4. Email address
  5. Phone number

Should additional information or clarification to the proposal be required, participants might be contacted.

Proposals must be submitted by email to statcan.naics-consultation-scian-consultation.statcan@statcan.gc.ca.

Consultation guidelines for submitting proposals for change in NAICS Canada

Individuals or organizations are encouraged to follow the guidelines below when developing their proposals.

Proposals should:

  • clearly identify the proposed addition or change to NAICS; this can include the creation of entirely new classes, or modifications to existing classes;
  • outline the rationale and include supporting information for the proposed change;
  • if possible, describe the empirical significance (i.e. revenue, expenses, value-added, employment) of proposed changes, and especially changes affecting the scope of existing classification items/categories;
  • new industries could be subjected to tests of empirical significance with respect to revenue, value added, employment, and number of establishments;
  • be consistent with classification principles (e.g., mutual exclusivity, exhaustiveness, and homogeneity within categories)
  • be relevant, that is
    • describe the present analytical interest;
    • enhance the usefulness of data;
    • base the proposal on appropriate statistical research or subject matter expertise.

Please consider the questions below when preparing your input for the consultation on the revision of NAICS Canada:

  • Are there socioeconomic activities for which you cannot find a satisfactory NAICS code?
  • Are there classification items that you find difficult to use because their descriptions are vague or unclear?
  • Are there pairs of classification items you find difficult to distinguish from each other? Are there boundaries that could be clarified?
  • Are there socioeconomic activities that you think should have their own NAICS category? Please indicate at which level and why, with the support documentation about the activities (see guidelines above for a proposal).
  • Are there activities that you are able to locate in NAICS, but you would like to have them located in a different sector or industry?
  • Is the language or terminology used in NAICS in need of updating to be consistent with current usage?

Note that submissions do not need to cover every topic; you can submit your comments on your particular area(s) of concern only.

The following criteria will be used to review the proposals received:

  • consistency with classification principles such as mutual exclusivity, exhaustiveness, and homogeneity of activities and output (products) within categories;
  • have empirical significance as an industry
  • data be collectable and publishable;
  • proposal can be linked to a funded program for data collection;
  • be relevant, that is, it must be of analytical interest, result in data useful to users, and be based on appropriate statistical research and subject-matter expertise;
  • be consistent with the Canadian System of National Accounts;
  • special attention will be given to specific industries, including:
    • new or emerging activities
    • activities related to new production processes.

NAICS Classification Structure

NAICS has a 6-digit, 5-level classification structure, consisting of 2-digit sectors, 3-digit sub-sectors, 4-digit industry groups, 5-digit industries and 6-digit national industries. Changes may be proposed for any level, but changes to the 2-digit to 5-digit levels will be subject to trilateral negotiation and approval. Changes to the 6-digit national industry level are at the discretion of each trilateral partner (i.e., Statistics Canada makes the final decision about changes to 6-digit industries in NAICS Canada).

North American Industry Classification System (NAICS) Canada 2022 Version 1.0 is the latest version of the classification for the participants of this consultation to base their input on. In the context of a permanent consultation process, persons or organizations proposing a change should always make sure they refer to the latest available version of NAICS Canada.

Costs associated with proposals

Statistics Canada will not reimburse respondents for expenses incurred in developing their proposal.

Treatment of proposals

Statistics Canada will review all proposals received. Statistics Canada reserves the right to use independent consultants or government employees, if deemed necessary, to assess proposals.

If deemed appropriate, a representative of Statistics Canada will contact respondents to ask additional questions or seek clarification on a particular aspect of their proposal.

Please note that a proposal will not necessarily result in changes to NAICS Canada.

Official languages

Proposals may be written in either of Canada's official languages – English or French.

Confidentiality

Statistics Canada is committed to respecting the privacy of consultation participants. All personal information created, held or collected by the Agency is protected by the Privacy Act. For more information on Statistics Canada's privacy policies, please consult the Privacy notice.

Thank You

We thank all participants for their continued interest and participation in the various NAICS engagement activities.

Enquiries

If you have any enquiries about this process, please send them to statcan.naics-consultation-scian-consultation.statcan@statcan.gc.ca.

Computer vision models: seed classification project

By AI Lab, Canadian Food Inspection Agency

Introduction

The AI Lab team at the Canadian Food Inspection Agency CFIA) is composed of a diverse group of experts, including data scientists, software developers, and graduate researchers, all working together to provide innovative solutions for the advancement of Canadian society. By collaborating with members from inter-departmental branches of government, the AI Lab leverages state-of-the-art machine learning algorithms to provide data-driven solutions to real-world problems and drive positive change.

At the CFIA's AI Lab, we harness the full potential of deep learning models. Our dedicated team of Data Scientists leverage the power of this transformative technology and develop customised solutions tailored to meet the specific needs of our clients.

In this article, we motivate the need for computer vision models for the automatic classification of seed species. We demonstrate how our custom models have achieved promising results using "real-world" seed images and describe our future directions for deploying a user-friendly SeedID application.

At the CFIA AI Lab, we strive not only to push the frontiers of science by leveraging cutting-edge models but also in rendering these services accessible to others and foster knowledge sharing, for the continuous advancement of our Canadian society.

Computer vision

To understand how image classification models work, we first define what exactly computer vision tasks aim to address.

What is computer vision:

Computer Vision models are fundamentally trying to solve what is mathematically referred to as ill-posed problems. They seek to answer the question: what gave rise to the image?

As humans, we do this naturally. When photons enter our eyes, our brain is able to process the different patterns of light enabling us to infer the physical world in front of us. In the context of computer vision, we are trying to replicate our innate human ability of visual perception through mathematical algorithms. Successful computer vision models could then be used to address questions related to:

  • Object categorisation: the ability to classify objects in an image scene or recognise someone's face in pictures
  • Scene and context categorisation: the ability to understand what is going in an image through its components (e.g. indoor/outdoor, traffic/no traffic, etc.)
  • Qualitative spatial information: the ability to qualitatively describe objects in an image, such as a rigid moving object (e.g. bus), a non-rigid moving object (e.g. flag), a vertical/horizontal/slanted object, etc.

Yet, while these appear to be simple tasks, computers still have difficulties in accurately interpreting and understanding our complex world.

Why is computer vision so hard:

To understand why computers seemingly struggle to perform these tasks, we must first consider what an image is.

Figure 1

Are you able to describe what this image is from these values?

Description - Figure 1

This image shows a brown and white pixelated image of a person’s face. The person's face is pixelated, with the pixels being white and the background being brown. Next to the image, there's a zoomed in image showing the pixel values corresponding to a small patch of the original image.

An image is a set of numbers, with typically three colour channels: Red, Green, Blue. In order to derive any meaning from these values, the computer must perform what is known as image reconstruction. In its most simplified form, we can mathematically express this idea through an inverse function:

x = F-1(y)

Where:

y represents data measurements (ie. pixel values).
x represents a reconstructed version of measurements, y, into an image.

However, it turns out solving this inverse problem is harder than expected due to its ill-posed nature.

What is an ill-posed problem

When an image is registered, there is an inherent loss of information as the 3D world gets projected onto a 2D plane. Even for us, collapsing the spatial information we get from the physical world can make it difficult to discern what we are looking at through photos.

Figure 2

Michelangelo (1475-1564). Occlusion caused by different viewpoints can make it difficult to recognise the same person.

Description - Figure 2

The image shows three paintings of different figures, each with a different expression on their faces. One figure appears to be in deep thought, while the other two appear to be in a state of contemplation. The paintings are made of a dark, rough material, and the details of their faces are well-defined. The overall effect of the image is one of depth and complexity. The paintings are rotated in each frame to create a sense of change.

Figure 3

Bottom of soda cans. Different orientations can make it impossible to identify what is contained in the can.

Description - Figure 3

The image shows five metal cans, four of them with a different patch of color on the lid. The colors are blue, green, red, and yellow. The cans are arranged on a countertop. The countertop is made of a dark surface, such as granite or concrete.

Figure 4

Yale Database of faces. Variations in lighting can make it difficult to recognise the same person (recall: all computers “see” are pixel values).

Description - Figure 4

The image shows two images of the same face. The images are captured from different angles, resulting in two different perceived expressions of the face. On the left frame the man a neutral facial expression, whereas on the right frame he has a serious and angry expression.

Figure 5

Rick Scuteri-USA TODAY Sports. Different scales can make it difficult to understand context from images.

Description - Figure 5

The image shows four different images, at different scales. The first images contain only what looks like the eye of a bird. The second image contains the head and neck of a goose. The third image shows the entire animal, and the fourth image shows a man standing in front of the bird pointing in a direction.

Figure 6

Different photos of chairs. Intra-class variation can make it difficult to categorise objects (we can discern a chair through its functional aspect)

Description -Figure 6

The image shows 5 different chairs. The first one is a red chair with a wooden frame. The second one is a black leather swivel chair. The third looks like an unconventional artistic chair. The fourth one looks like a minimalist office chair, and the last one looks like a bench.

It can be difficult to recognise objects in 2D pictures due to possible ill-posed properties, such as:

  • Lack of uniqueness: Several objects can give rise to the same measurement.
  • Uncertainty: Noise (e.g. blurring, pixilation, physical damage) in photos can make it difficult or impossible to reconstruct and identify an image.
  • Inconsistency: slight changes in images (e.g. different viewpoints, different lighting, different scales, etc.) can make it challenging to solve for the solution, x, from available data points, y.

While computer vision tasks may, at first glance, appear superficial, the underlying problem they are trying to address is quite challenging!

Now we will address some Deep Learning driven solutions to tackle computer vision problems.

Convolutional Neural Networks (CNNs)

Figure 7

Graphical representation of a convolutional neural network (CNN) architecture for image recognition. (Hoeser and Kuenzer, 2020)

Description - Figure 7

This is a diagram of a convolutional neural network (ConvNet) architecture. The network consists of several layers, including an input layer, a convolutional layer, a pooling layer, and an output layer. The input layer takes in an image and passes it through the convolutional layer, which applies a set of filters to the image to extract features. The pooling layer reduces the size of the image by applying a pooling operation to the output of the convolutional layer. The output layer processes the image and produces a final output. The network is trained using a dataset of images and their corresponding labels.

Convolutional Neural Networks (CNNs) are a type of algorithm that has been really successful in solving many computer vision problems, as previously described. In order to classify or identify objects in images, a CNN model first learns to recognize simple features in the images, such as edges, corners, and textures. It does this by applying different filters to the image. These filters help the network focus on specific patterns. As the model learns, it starts recognizing more complex features and combines the simple features it learned in the previous step to create more abstract and meaningful representations. Finally, the CNN takes the learned features and to classify images based on the classes it's been trained with.

Figure 8

Evolution of CNN architectures and their accuracy, for image recognition tasks from 2012 to 2019. (Hoeser and Kuenzer, 2020).

Description - Figure 8

The image shows the plot of the size of different CNN architectures and models from the year 2012 until 2019. Each neural network is depicted as a circle, with the size of the circle corresponding to the size of the neural network in terms of number of parameters.

The first CNN was first proposed by Yann LeCun in 1989 (LeCun, 1989) for the recognition of handwritten digits. Since then, CNNs have evolved significantly over the years, driven by advancements in both model architecture and available computing power. To this day, CNNs continue to prove themselves are powerful architectures for various recognition and data analysis tasks.

Vision Transformers (ViTs)

Vision Transformers (ViTs) are a recent development in the field of computer vision that apply the concept of transformers, originally designed for natural language processing tasks, to visual data. Instead of treating an image as a 2D object, Vision Transformers view an image as a sequence of patches, similar to how transformers treat a sentence as a sequence of words.

Figure 9

An overview of a ViT as illustrated in An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Since the publication of the original ViT, numerous variations and flavours have been proposed and studied.

Description - Figure 9

The image shows the diagram of the ViT architecture. There is an image of the input image, being splitted into different patches, and each patch is fed into the neural network. The network consists of a transformer encoder block and an MLP Head block, followed by a classification head.

The process starts by splitting an image into a grid of patches. Each patch is then flattened into a sequence of pixel vectors. Positional encodings are added to retain the positional information, as is done in transformers for language tasks. The transformed input is then processed through multiple layers of transformer encoders to create a model capable of understanding complex visual data.

Just as Convolutional Neural Networks (CNNs) learn to identify patterns and features in an image through convolutional layers, Vision Transformers identify patterns by focusing on the relationships between patches in an image. They essentially learn to weigh the importance of different patches in relation to others to make accurate classifications. The ViT model was first introduced by Google's Brain team in a paper in 2020. While CNNs dominated the field of computer vision for years, the introduction of Vision Transformers demonstrated that methods developed for natural language processing could also be used for image classification tasks, often with superior results.

One significant advantage of Vision Transformers is that, unlike CNNs, they do not have a built-in assumption of spatial locality and shift invariance. This means they are better suited for tasks where global understanding of an image is required, or where small shifts can drastically change the meaning of an image.

However, ViTs typically require a larger amount of data and compute resources compared to CNNs. This factor has led to a trend of hybrid models that combine both CNNs and transformers to harness the strengths of both architectures.

Seed classification

Background:

Canada's multi-billion seed and grain industry has established a global reputation in the production, processing, and exportation of premium-grade seeds for planting or grains for food across a diverse range of crops. Its success is achieved through Canada's commitment to innovation and the development of advanced technologies, allowing for the delivery of high-quality products with national and international standards with diagnostic certification that meet both international and domestic needs.

Naturally, a collaboration between a research group from the Seed Science and Technology Section and the AI Lab of CFIA was formed to maintain Canada's role as a reputable leader in the global seed or grain and their associated testing industries.

Background: Quality Control

The seed quality of a crop is reflected in a grading report, whereby the final grade reflects how well a seed lot conforms with Canada's Seeds Regulations to meet minimum quality standards. Factors used to determine crop quality include contaminated weed seeds according to Canada's Weed Seeds Order, purity analysis, and germination and disease. While germination provides potentials of field performance, assessing content of physical purity is essential in ensuring that the crop contains a high amount of the desired seeds and is free from contaminants, such as prohibited and regulated species, other crop seeds, or other weed seeds. Seed inspection plays an important role in preventing the spread of prohibited and regulated species listed in the Canadian Weed Seeds Order. Canada is one of the biggest production bases for global food supply, exporting huge number of grains such as wheat, canola, lentils, and flax. To meet the Phyto certification requirement and be able to access wide foreign markets, analyzing regulated weed seeds for importing destinations is in high demand with quick turnaround time and frequent changes. Testing capacity for weeds seeds requires the support of advanced technologies since the traditional methods are facing a great challenge under the demands.

Motivation

Presently, the evaluation of a crop's quality is done manually by human experts. However, this process is tedious and time consuming. At the AI Lab, we leverage advanced computer vision models to automatically classify seed species from images, rendering this process more efficient and reliable.

This project aims to develop and deploy a powerful computer vision pipeline for seed species classification. By automating this classification process, we are able to streamline and accelerate the assessment of crop quality. We develop upon advanced algorithms and deep learning techniques, while ensuring an unbiased and efficient evaluation of crop quality, paving the way for improved agricultural practices.

Project #1: Multispectral Imaging and Analysis

In this project, we employ a custom computer vision model to assess content purity, by identifying and classifying desired seed species from undesired seed species.

We successfully recover and identify the contamination by three different weed species in a screening mixture of wheat samples.

Our model is customised to accept unique high resolution, 19-channel multi-spectral image inputs and achieves greater than 95% accuracy on held out testing data.

We further explored our model's potential to classify new species, by injecting five new canola species into the dataset and observing similar results. These encouraging findings highlight our model's potential for continual use even as new seed species are introduced.

Our model was trained to classify the following species:

  • Three different thistles (weed) species:
    • Cirsium arvense (regulated species)
    • Carduus nutans (Similar to the regulated species)
    • Cirsium vulgare (Similar to the regulated species)
  • Six Crop seeds,
    • Triticum aestivum subspecies aestivum
    • Brassica napus subspecies napus
    • Brassica juncea
    • Brassica juncea (yellow type)
    • Brassica rapa subspecies oleifera
    • Brassica rapa subspecies oleifera (brown type)

Our model was able to correctly identify each seed species with an accuracy of over 95%.

Moreover, when the three thistle seeds were integrated with the wheat screening, the model achieved an average accuracy of 99.64% across 360 seeds. This demonstrated the model's robustness and ability to classify new images.

Finally, we introduced five new canola species and types and evaluated our model's performance. Preliminary results from this experiment showed a ~93% accuracy on the testing data.

Project #2: Digital Microscope RGB Imaging and Analysis

In this project, we employ a 2-step process to identify a total of 15 different seed species with regulatory significance and morphological challenge across varying magnification levels.

First, a seed segmentation model is used to identify each instance of a seed in the image. Then, a classification model classifies each seed species instance.

We perform multiple ablation studies by training on one magnification profile then testing on seeds coming from a different magnification set. We show promising preliminary results of over 90% accuracy across magnification levels.

Three different magnification levels were provided for the following 15 species:

  • Ambrosia artemisiifolia
  • Ambrosia trifida
  • Ambrosia psilostachya
  • Brassica junsea
  • Brassica napus
  • Bromus hordeaceus
  • Bromus japonicus
  • Bromus secalinus
  • Carduus nutans
  • Cirsium arvense
  • Cirsium vulgare
  • Lolium temulentum
  • Solanum carolinense
  • Solanum nigrum
  • Solanum rostratum

A mix of 15 different species were taken at varying magnification levels. The magnification level was denoted by the total number of instances of seeds present in the image, either: 1, 2, 6, 8, or 15 seeds per image.

In order to establish a standardised image registration protocol, we independently trained separate models from a subset of data at each magnification then evaluated the model performance across a reserved test set for all magnification levels.

Preliminary results demonstrated the model's ability to correctly identify seed species across magnifications with over 90% accuracy.

This revealed the model's potential to accurately classify previously unseen data at varying magnification levels.

Throughout our experiments, we tried and tested out different methodologies and models.

Advanced models equipped with a canonical form such as Swin Transformers fared much better and proved to be less perturbed by the magnification and zoom level.

Discussion + Challenges

Automatic seed classification is a challenging task. Training a machine learning model to classify seeds poses several challenges due to the inherent heterogeneity within and between different species. Consequently, large datasets are required to effectively train a model to learn species-specific features. Additionally, the high degree of similarity among different species within genera for some of them makes it challenging for even human experts to differentiate between closely related intra-genus species. Furthermore, the quality of image acquisition can also impact the performance of seed classification models, as low-quality images can result in the loss of important information necessary for accurate classification.

To address these challenges and improve model robustness, data augmentation techniques were performed as part of the preprocessing steps. Affine transformations, such as scaling and translating images, were used to increase the sample size, while adding Gaussian noise can increase variation and improve generalization on unseen data, preventing overfitting on the training data.

Selecting the appropriate model architecture was crucial in achieving the desired outcome. A model may fail to produce accurate results if end users do not adhere to a standardized protocol, particularly when given data that falls outside the expected distribution. Therefore, it was imperative to consider various data sources and utilize a model that can effectively generalize across domains to ensure accurate seed classification.

Conclusion

The seed classification project is an example of the successful and ongoing collaboration between the AI Lab and the Seed Science group at the CFIA. By pooling their respective knowledge and expertise, both teams contribute to the advancement of Canada's seed and grain industries. The seed classification project showcases how leveraging advanced machine learning tools has the potential to significantly enhance the accuracy and efficiency of evaluating seed or grain quality with compliance of Seed or Plant Protection regulations, ultimately benefiting both the agricultural industry, consumers, Canadian biosecurity, and food safety.

As Data Scientists, we recognise the importance of open-source collaboration, and we are committed to upholding the principles of open science. Our objective is to promote transparency and engagement through open sharing with the public.

By making our application available, we invite fellow researchers, seed experts, and developers to contribute to its further improvement and customisation. This collaborative approach fosters innovation, allowing the community to collectively enhance the capabilities of the SeedID application and address specific domain requirements.

Meet the Data Scientist

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.

Register for the Meet the Data Scientist event. We hope to see you there!

MS Teams – link will be provided to the registrants by email

Subscribe to the Data Science Network for the Federal Public Service newsletter to keep up with the latest data science news.

Date modified:

Eh Sayers Episode 14 - I Got 99 Problems But Being Misgendered on the Census Isn't One

Release date: August 21, 2023

Catalogue number: 45200003
ISSN: 2816-2250

I got 99 problems but being misgendered on the census isn't one

Listen to "Eh Sayers" on:

Social media shareables

Tag us in your social media posts

  • Facebook StatisticsCanada
  • Instagram @statcan_eng
  • Twitter @StatCan_eng
  • Reddit StatCanada
  • YouTube StatisticsCanada

Visuals for social media

Gender graphic 1

Gender graphic 2

Ladies, Gentlemen, and Gentlethem!

While every census is special, the 2021 Census was historic. It was the first to include a question about gender, making Canada the first country to collect and publish data on gender diversity from a national census.

In this episode, we explore gender with drag king Cyril Cinder and we talk Census 2021 with StatCan’s Anne Milan.

Join us for a new kind of gender reveal.

Host

Tegan Bridge

Guests

Cyril Cinder, Anne Milan

Listen to audio

Eh Sayers Episode 14 - I Got 99 Problems But Being Misgendered on the Census Isn't One - Transcript

Cyril: Molly knew from experience that dresses were trouble. Dresses have tight places and zippers you can’t reach. Dresses mean troublesome tights and fancy shoes with no purpose. Dresses with no pockets mean nowhere to put interesting rocks, and nowhere to keep dog treats in case you find a stray. Dresses were not right on a regular day and they were definitely not right for something as important as picture day!

Tegan: (Stage whisper) Welcome to Eh Sayers a podcast from statistics, Canada, where we meet the people behind the data and explore the stories behind the numbers. I'm your host, Tegan Bridge, and I'm whispering because we're listening to drag story time. Shh!.

Cyril: Molly wanted to look like she was going on an adventure, not like she was going to a tea party. But she had an idea to save picture day. Her brother’s old tuxedo! It was perfect.

Dashing. Comfortable. Plenty of pockets. She had tried it on once when he was at chess club. It fit just right then, and Molly was sure it would look just as great today!

Tegan: You just heard part of Molly’s Tuxedo by Vicki Johnson, in which Molly has to decide whether to wear the dress her mom picked out for her school picture day or the tuxedo she wants to wear. It’s a book which explores gender in a kid-friendly way. You might hear Molly’s Tuxedo read at a drag story time, and if you’re in the Ottawa area, you might just hear it read by Cyril Cinder.

Cyril: My name is Cyril Cinder, and I'm an Ottawa based drag performer and drag king who has been performing since 2014.

Tegan: What's a drag king?

Cyril: Drag kings are drag artists who present or perform a masculine persona as part of their drag performance. That might involve parody, exploration or expansion of masculine gender norms and performance types. Um, they can be, you know, suave, they can be comedic, they can be big and extravagant and super over the top. Drag kings can be absolutely anyone and everything.

Tegan: Where does the name Cyril Cinder come from?

Cyril: Uh, I named myself in drag. That's not always a thing that happens. Sometimes you are given a name, but in the Canadian scene, usually we pick our own names, and I kind of wanted to sound like the alter ego of a super villain. So, and alliteration sounds cool. So I went with Cyril Cinder. I didn't wanna choose a pun name because I didn't trust myself to be clever enough to come up with something that nobody else would come up with. And the great benefit of it has been that it's also appropriate for all ages. It just sounds cool.

Tegan: I love it. What kind of performances do you do?

Cyril: I tend to lean into the age-old tradition of drag, which is the lip sync performance. I perform a lot in bars or in very different venues, music venues. I'm also a story time performer, so I will entertain and read books to children and audiences and families of all ages. I'm also a speaker and travel to conferences to talk about what I do as a drag performer, the increase attacks against the 2SLGBTQIA+ community in Canada, and how that does tend to specifically target drag performers, as well as mental health because I also work as a registered psychotherapist.

Tegan: We mentioned drag story times at the top of the show. Why are drag story times important?

Cyril: For me, drag story time is a couple of important things. I mean, right off the bat, it's a pro-literacy initiative. Whenever you're doing something to make reading look a little bit more fun, you are encouraging children to be more interested in reading, to be more excited about books, and literacy is such an important foundation of our society. But the other part is that it's also an anti-bullying initiative, right? Whether or not we are, you know, exposing kids to a positive queer role model for the first time, or to someone who's maybe a bit more gender nonconforming, doesn't quite match the boys and girls archetypes and binaries that they're often exposed to at home and at school and in media. It's an opportunity for them to see that and see that that's not that strange actually, that that's okay, that there's nothing too weird about being fabulous and sparkly and excited and fun. And if we can introduce children to positive role models of all different kinds of diversity at a young age, as they get older, that becomes less and less of a axis of difference for them, something for them to isolate or pick out about their peers and say, oh, you're not like me in this way. It becomes something that they can say, oh yeah, there are people like this. This is normal. This is okay.

Tegan:  You mentioned binaries and archetypes for little boys and little girls. Could you say more about that? What's the gender binary and why is it a problem?

Cyril: So the gender binary is the idea that there are only two genders. Gender is distinct from sex.

Sex is a series of biological characteristics. Sex is also not an actual binary between male and female, and we don't only see this non-binary nature of sex represented in the human species, but in multiple different species. So that is just to say that there are many, many different biological expressions that don't fit within the male-female solid archetype.  

Gender is a different thing than sex, though. Gender is an experience of one's gender, role, and identity within society: the concept of being masculine, feminine, androgynous, the idea of rather than being male or female, of being man or woman. And as much as the archetypes and binary roles of male and female don't capture the full breadth of the experience of sex, the binary of woman and man also doesn't capture the full breadth of experience that humans can have of our gender and of our gender identities.

The gender binary can be used to control people. It can be used to force people into things that they don't want for themselves, right? We can see very strong, uh, expectations on men, for example, on what kind of emotions they're allowed to show, what kind of careers they're allowed to pursue, how they're supposed to feel about caretaking or sex or power or any of these things. And these are equally damaging to people of all genders. You know, women get told not to be too bossy, uh, that they're over emotional, that they can't be trusted to make decisions or be in positions of leadership, right? These sorts of boxes that we force people into. And people who don't feel like they fit into either of the binary gender options, male or female, people who fall under the very large umbrella of the non-binary spectrum really also deserve to have their experience of gender understood, respected, and validated.

Tegan: What are your preferred pronouns?

Cyril: Yeah, so I actually identify as non-binary out of drag, but my drag character, Cyril Cinder, is a man that is the gender identity of that character. So in drag, I exclusively use he/ him pronouns, but out of drag I use she/he/they pronouns. Any pronouns really I'm comfortable with. But whenever I'm referring to Cyril Cinder, the drag character, I always prefer to use he/him.

Tegan: Is it a challenge living as an out non-binary person?

Cyril: It can be. I think I certainly don't experience certain challenges that other people who might hold the same identity of non-binary as, as me actually experience, um, because of my flexibility with pronouns, with the she/he/they. I am comfortable if someone, you know, defaults to she looking at me because I was assigned female at birth and I, you know, when I'm not in drag, do have a somewhat feminine gender presentation and someone looks at me and goes, ah, a woman. It's not correct, but it's also not like the worst thing in the world for me.

At the same time, other people for whom that would feel really actually quite distressing and upsetting and invalidating for them. I think they face a greater challenge, right? Someone who exclusively uses they/them pronouns who might constantly have to correct people who would default to he or she or might overtly refuse to use their pronouns because of a belief that they have and a subsequent then desire to invalidate that person.

And that's really hard because then you're moving through the world and you're trying to tell people who you are. We have this innate human desire to be seen by the people around us. We are a social animal. We exist in a society. We don't do well independently on our own. We're not built for that. So when you're trying to say to someone like, “Hey, this is who I am!” And they go, “No, you're not. I know you better than you do, and in fact, this thing that you're doing, it's really a problem. It's really dangerous. It's really bad actually, and you should feel shame for that.” That's not an experience anyone seeks to have when they're trying to order a coffee or talk to their boss or just go about their day and go about their life.

Tegan: You're a drag king. You're a performer, and I think it goes without saying that drag is a performance. In what ways is all gender a performance?

Cyril: So gender is performance. It is something that we put on whether you are choosing to wear a dress or a three-piece suit or both at the same time. You were saying something about what feels right to you about who you are, about, you know, just different little elements of yourself that we can express in little ways, and that might be through comparison, through contrast, through exaggeration, through celebration. We're all doing little things and all of those expressions are a completely valid way of looking at everyone's individual experience of gender.

We have gender identity, we have gender expression. Those are also two different things. How someone identifies in their gender might be different than how they express it. I'm non-binary. And so that's my gender identity, but I have a very feminine gender expression. I also can exhibit a very masculine gender expression. That's quite fluid for me. That moves around a lot. Not everybody might have the same experience, but it's, it's important to be able to articulate these different parts of it because we're only just enriching our understanding of the human experience.

Tegan: A good friend of mine whenever he needs someone who looks at the world in a very different way than he does. He pauses and gets a thoughtful look on his face, and he says, “Life is a rich tapestry!” The human experience and the diversity of that experience certainly is a rich tapestry. Gender identity, especially, is fascinating. But it's not something that we here at stat can have measured. Until now.

Tegan: Why was the 2021 census a big deal?

Anne: Well, I would say each census is a big deal. The 2021 specifically, in terms of gender, is because it was the first time that we asked the gender question on the, on the census.

Tegan: This is, of course, our resident census expert.

Anne: I am Anne Milan. I am chief of the Census Demographic Variables Section in the center for Demography at Statistics Canada.

Tegan: Yeah. So you said it's the gender question. Could you elaborate? So what's different about the 2021 census that we didn't ask before?

Anne: There's two, two main changes. There is a precision of “at birth” added to the sex question, and the question on gender is completely new. So that asked someone to identify whether they were male, female and there was an “or please specify” category where persons could write in their response.

It was historic to, to include this information.  It's the first time that, census data was released on the transgender and non-binary population among all the countries around the world. And so we're very proud of that.

Tegan: The census allowed people to write in whatever gender identity they would prefer. Why do that instead of boxes to tick?

Anne: With the gender question, we felt that having a write-in, or please specify this person's gender, was the most respectful and inclusive strategy to use so that people could select the gender that was most relevant to them.

Tegan: Do you have any idea how many different genders people put?

Anne: There were many.

Tegan: Many.

Anne: There were many,

Tegan: Many is a valid answer.

Tegan: Putting a blank space allowed respondents to describe themselves as they saw fit. And they did. StatCan uses the term non-binary as an umbrella term, but that's not necessarily how everyone describes themselves. Almost a third of those under this rainbow coloured non-binary umbrella used at different word to describe their gender: androgynous, bigender, intergender, pangender, polygender, queer, and two spirit. These were all terms provided by census respondents, but to be sure these weren't all of them. As Anne said, there were many many more.

Anne: If there was no gender question before, they wouldn't have had that opportunity to select their own gender. And that was one of the things that we noticed with the 2016 census when we were reviewing the comments that, uh, that people had put in, they were saying that the sex question, which had been there for many census cycles, was not precise enough for their needs. So some people felt that they were excluded. And of course, the goal of the census is to count all Canadians and have everybody see themselves in the data.

Tegan: Why is it important that people see themselves in the data?

Anne: Well, I think the, the census is a count of the total population in Canada. So of course it's important for people to see themselves in the data. We want people to have that experience and to feel that this information is relevant to them, that they're counted, that their voices matter. So really, that's the goal of the census: to include everybody and to have everybody feel included.

Tegan: So you said that people gave feedback in 2016, uh, saying that the census wasn't, asking necessarily the right question. Is that where the idea came from to make this change?

Anne: It was part of it. Each census cycle, we review all of the content, all of the questions, all of the definitions, and as part of that content determination process, there's an extensive consultation and engagement that takes place almost immediately after any census. And one of the common responses was that, gender was an information gap that, uh, that was needed.

And so following that, we had some more specific focus groups, individual, in-depth conversations. And this included all Canadians, cisgender, as well as transgender, non-binary persons. And so we, we took all that information into account. We developed the new content.

Tegan: What have we learned? What do the stats say about gender?

Anne: One in 300 Canadians aged 15 and over living in private households were transgender or non-binary individuals. Numerically, this is about a hundred thousand, so about 60,000 were transgender persons and about just over 40,000 were non-binary persons.

They tend to be younger on average than cisgender persons. So just to give you an idea about two thirds are less than age 35, so between age 15 and 34.

Tegan: There really is quite an age difference. Among those who were between 15 and 34, about one in 150 were transgender or non-binary. For those who were over the age of 35, it was only one in 550. That means, proportionally, there were more than three times as many transgender people aged 15 to 34 as there were 35 and over.  
 
Anne: And it might be that, you know, younger people are more comfortable reporting their gender. From a generational perspective, attitudes and behaviors of a particular generation are informed by historical context in which they're raised. So age differences were a big trend that we noticed.

Tegan: How comfortable people are reporting their gender on an official government form was something Cyril cinder mentioned as well.

Cyril: I think a lot of people in the 2SLGBTQIA+ community are still somewhat apprehensive or nervous to, you know, really tell the government, yes, I am transgender, or, yes, I am non-binary due to historical systems of oppression and how they have impacted those communities.

Tegan: Age differences weren't the only notable finding in the data.

Anne: There were also regional differences. So, for example, among the largest urban areas, what we call census metropolitan areas, Victoria, Halifax, and Fredericton really stood out. And these three large urban areas had certain elements in common. They had stronger population growth between 2016 and 2021 compared to the national average. They had larger shares, 15 to 34 than the national average. And they're all home to several major universities and colleges. And of course students tend to have a, a younger age profile.

So those are some of the high level findings. And we were so pleased with the reaction to the information. It was overwhelmingly positive. And so we were very, very proud of that.

Cyril: I think it's so important. I mean, other levels of government, we are recognizing that transgender and non-binary people exist, right? We have the option for different gender markers, on our IDs, and, you know, normally when you go to a doctor's office, or you fill out any other demographic form, you're gonna have an opportunity to indicate what your gender identity is. And when we're talking about something as large-scale as the census, which drives so many of the decisions that we make at various levels of government. It informs, you know, grant money, who's getting what, how many resources need to be allocated to which communities. It's important to have an accurate measure of those communities. You know, Canada was the first country to include this in the census, but these calls have been going on for over a decade to be able to access this information.

But even just asking the question, it, it indicates that the government does care about transgender and non-binary Canadians, that our experiences are important and that we are part of Canadian society.

Tegan: From your perspective, now that we have these data about transgender Canadians, what should we do with it? How can it be put to best use? And what would your hopes be for next steps?

Cyril: Mm-hmm. I think some of the important things is, is being able to use this information about like where are resources needed in particular, right. I was looking through the data and things like the vast majority of non-binary people live in six urban centers in Canada.

Right? That is huge to know, but also to know like how many aging trans, non-binary people do we have in Canada? What kind of services might they need within the elder care system in this country, which is dealing with a lot of struggles, but how might these people's needs be unique? Where are they? Where are those services needed to go? What can effectively serve these communities? How can we support these people who we know are more likely to struggle with negative mental health impacts? And other research also shows us more likely to live in poverty, more likely to deal with other axes of systemic oppression and various things like that. Making the information publicly available is also very helpful because it allows us to use it for advocacy work. And I think also just kind of sometimes putting in context how much vitriol is directed towards the trans non-binary community and how few of us there actually are. Right? We are a small community. Looking at the data, there's just about a hundred thousand of us in the country and knowing how to support that. Putting in context of how large our population is actually becoming and at the same time , in terms of next steps I think it’s important to get more accurate data.

Tegan: This is why the census is so crucial. StatCan doesn't just gather data. Our experts also analyze them, and Cyril's not the only one looking forward to getting more information.

Anne: It's very exciting and as an analyst this is the part that I enjoy because all of the census variables are available now: education, labor, income. So we can do a deeper dive into some of the patterns. And that's exactly what we're doing now. So there's a paper that's currently underway on the socioeconomic wellbeing of transgender and non-binary population, looking at characteristics like education labor force participation, income, housing. So that's underway, and that will be available in the coming months. That's one activity that we're working on: in depth analysis.

And what's exciting about 2026 is we will have trends. 2021, it was, it was the excitement of having that data for the first time. But now we will have two time points so we can see what were the changes over time. And that will allow us to do even more interesting analysis.

Tegan: How often does the census change, and maybe more importantly, why does it change?

Anne: I would say that the census changes every census cycle. And that's what keeps it relevant. I mean, for, for over the past a hundred years, it continues to evolve as as society evolves, and that's what makes it exciting.

One example I can maybe give of content that has continued but also changed over time is the adult population 15 plus in couples. We've been measuring it for, for over a hundred years. So in 1921, a couple was a married couple. In 1981, we introduced the concept of common law couples. In 2001, we introduced the idea of same sex and opposite sex common law couples. Following national legislation that permitted same-sex couples to marry in 2005, we then counted same-sex married couples in 2006. So we have this increasing way to slice and dice the data, but we also have this continuity over time. And so then in 2021, we added this further element of being able to look at couples by gender. So whether a couple is comprised of one transgender person or one non-binary person. And so that ability to look at emerging family forms continues while maintaining that ability to look at historical trends as well.

Tegan: While on the subject of history and trends. It's important to make the point that even though 2021 was the first census to ask about gender, trans and non-binary Canadians have always been here.

Anne: I think there is a recognition that... that, for example, in this, in this situation, transgender and non-binary people have, have always existed, but it's our ability to measure it. And that's the, that's the new part.

Tegan: Just because you can't, you're not measuring something, doesn't mean it doesn't exist.

Anne: Exactly.

Tegan: Does the future look bright for trans and gender nonconforming Canadian kids? What opportunities and what challenges do you foresee?

Cyril: I, I think the future does look bright for Canadian transgender, non-binary, gender non-conforming youth. I, I think that there is something really, really wonderful ahead, but the path to that wonderful future currently has a lot of barriers in the way.

We have made incredible progress in recent decades as a community and we are seeing intense reactions to that progress from people who would like to see it clawed back and, you know, everything I ever let go had claw marks on it. So good luck with that initiative. I think a lot of us feel that way. But we cannot become complacent. We cannot, you know, pat ourselves on the back and say, job well done. The fight's over. We did it. And ignore what's actually happening on the ground. Because if we do that, we are going to lose that bright future.

We are going to repeat history and the repetition of history loses lives. People die in the circumstances that we have been living in for centuries at this point. And to me, that's not an acceptable way forward. It's, it's not okay to have our trans non-binary and gender non-conforming siblings lost in this fight.

Queer and trans kids should get to grow up to be queer and trans adults. And that should not be a matter of debate. That is asked and answered at this point. And we need to be firm in that and not fall into the paradox of tolerance, whereby tolerating intolerance, it is allowed to fester and grow and become cancerous and take over, and then all of a sudden, oh, where'd all those rights go that we fought so hard to win?

So, I do believe that the future is bright because I do believe that Canadians care about this, and I do believe that Canadians are intelligent and capable of understanding honest facts when they're placed in front of them, that we can dispel negative myths, that we can march forward together towards something that is better for all of us, but we need to put in the work to make sure that that happens.

Tegan: What is allyship to you and how can people be allies to the queer community?

Cyril: Allyship is active, not passive. A lot of people, you know, identify with the idea of allyship. They want to be an ally, and I think that's a wonderful thing. But when someone tells me like, oh, I'm an ally to the queer community, I'm like, great. What does that look like? What do you do to be an ally to the queer community? Because it's not enough to just not be homophobic, transphobic, and queer phobic. It's not enough to just not be a bigot. You have to oppose it in some way. You have to support the community in some way.

We can't be left alone just to fight for our own cause. We need our cisgender and heterosexual allies to also show up for us. And so allyship is an active thing. It is something you can be bestowed, something you can be granted from the community. You are an ally. You are showing up for us. You fight for us. You are willing to be uncomfortable if it means being able to protect our dignity and our personhood. That means a lot. It is not something that you can just claim.

Tegan: Is there anything you'd like to add?

Anne: Well, maybe just one more word about the value of the census in general. I see the value of it every day to us at StatCan, but also to broader Canadians. It's the best source of data for looking at smaller populations and subgroups, and of course, transgender and non-binary persons fall into that category. But there's many other smaller populations as well that's important to study. It's a valuable source for detailed and local geographies so that municipal planners can plan schools and hospitals and home care. As the concepts broaden, we often don't lose content, but it does allow us to, to integrate these new patterns that we're seeing. And so that makes it be able to evolve and maintain its relevance, and I think I can't finish without thanking all Canadians for their participation, for their input. It's very much appreciated and we certainly couldn't have the census without them.

Tegan: Thank you for your time. Thank you for your sharing your expertise.

Anne: Thank you.

Tegan: If someone would like to learn more about you and your work, maybe they'd be interested in seeing what drag is firsthand. Where can they go?

Cyril: Oh, so I have a website, www.cyrilcinder.com. C-Y-R-I-L-C-I-N-D-E-R. I'm also all over all the social medias, uh, Instagram, TikTok, Facebook, all of those required things. They can come support their local drag. That is, to me, the greatest, most important thing. It is your local drag artists, the ones who maybe don't get to be on tv, who are maybe a little bit more different. Who are the ones who are out working in your community, who I think have the most valuable things to say. Um, I saw my first drag show in 2014 and it opened my eyes so, so much, and I just hope more people can go have that experience.

Tegan: And if maybe someone's listening and questioning their own gender. Do you have any suggestions or resources to recommend?

Cyril: If you're sort of questioning your own gender identity, there has been so much work done writing done to help you with that experience. There are really wonderful books. Um, you and your Gender Identity by Hoffman-Fox is a great workbook that can, people look, can look through. It's often available at your local library. Your local library will have a lot of resources on gender identity and gender exploration for a variety of age groups. Um, you could look at Interligne, which is a 2SLGBTQIA+ listening service in Canada that are based out of Montreal. If you're indigenous, there are indigenous focused resources for exploring two-spirit identity, you know? Open yourself up, ask questions. Go to your local queer bookstore or your local queer venue if you have one. If you don't have one, the internet is a fantastic place to find some good free educational resources and support from other people who feel like you, because I promise you, no matter what questions you have, no matter what feeling you are struggling with, you are not alone in that experience, and there is somebody out there who's asking the exact same questions, and you don't deserve to go through that journey alone.

Tegan: Thank you so much for joining us. We really appreciate it.

Cyril: Thank you for having me

Tegan: You’ve been listening to Eh Sayers. Thank you to our guests, Cyril Cinder and Anne Milan. Molly’s Tuxedo was written by Vicki Johnson and illustrated by Gillian Reid. It was published by little bee books. Thank you for letting us share it on our show. It was read by Cyril

Cinder. If you’re interested in learning more about our census gender data, check out the links in the show notes.

You can subscribe to this show wherever you get your podcasts. There you can also find the French version of our show, called Hé-coutez bien. If you liked this show, please rate, review, and subscribe. Thanks for listening!

Sources:

The Daily - Canada is the first country to provide census data on transgender and non-binary people

Filling the gaps: Information on gender in the 2021 Census

2021 Census: Sex at birth and gender - the whole picture

Production level code in Data Science

By David Chiumera, Statistics Canada

In recent years, the field of data science has experienced explosive growth, with businesses across many sectors investing heavily in data-driven solutions to optimize decision-making processes. However, the success of any data science project relies heavily on the quality of the code that underpins it. Writing production-level code is crucial to ensure that data science models and applications can be deployed and maintained effectively, enabling businesses to realize the full value of their investment in data science.

Production-level code refers to code that is designed to meet the needs of the end user, with a focus on scalability, robustness, and maintainability. This contrasts with code that is written purely for experimentation and exploratory purposes, which may not be optimized for production use. Writing production-level code is essential for data science projects as it allows for the efficient deployment of solutions into production environments, where they can be integrated with other systems and used to inform decision-making.

Production-level code has several key benefits for data science projects. Firstly, it ensures that data science solutions can be easily deployed and maintained. Secondly, it reduces the risk of errors, vulnerabilities, and downtime. Lastly, it facilitates collaboration between data scientists and software developers, enabling them to work together more effectively to deliver high-quality solutions. Finally, it promotes code reuse and transparency, allowing data scientists to share their work with others and build on existing code to improve future projects.

Overall, production-level code is an essential component of any successful data science project. By prioritizing the development of high-quality, scalable, and maintainable code, businesses can ensure that their investment in data science delivers maximum value, enabling them to make more informed decisions and gain a competitive edge in today's data-driven economy.

Scope of Data Science and its various applications

The scope of data science is vast, encompassing a broad range of techniques and tools used to extract insights from data. At its core, data science involves the collection, cleaning, and analysis of data to identify patterns and make predictions. Its applications are numerous, ranging from business intelligence and marketing analytics to healthcare and scientific research. Data science is used to solve a wide range of problems, such as predicting consumer behavior, detecting fraud, optimizing operations, and improving healthcare outcomes. As the amount of data generated continues to grow, the scope of data science is expected to expand further, with increasing emphasis on the use of advanced techniques such as machine learning and artificial intelligence.

Proper programming and software engineering practices for Data Scientists

Proper programming and software engineering practices are essential for building robust data science applications that can be deployed and maintained effectively. Robust applications are those that are reliable, scalable, and efficient, with a focus on meeting the needs of the end user. There are several types of programming and software engineering practices that are particularly important in the context of data science, such as version control, automated testing, documentation, security, code optimization, and proper use of design patterns to name a few.

By following proper practices, data scientists can build robust applications that are reliable, scalable, and efficient, with a focus on meeting the needs of the end user. This is critical for ensuring that data science solutions deliver maximum value to businesses and other organizations.

Administrative Data Pre-processing Project (ADP) project and its purpose - an example.

The ADP project is a Field 7 application that required involvement from the Data Science Division to refactor a citizen developed component due to a variety of issues that were negatively impacting its production readiness. Specifically, the codebase used to integrate workflows external to the system was found to be lacking in adherence to established programming practices, leading to a cumbersome and difficult user experience. Moreover, there was a notable absence of meaningful feedback from the program upon failure, making it difficult to diagnose and address issues.

Further exacerbating the problem, the codebase was also found to be lacking in documentation, error logging, and meaningful error messages for users. The codebase was overly coupled, making it difficult to modify or extend the functionality of the program as needed, and there were no unit tests in place to ensure reliability or accuracy. Additionally, the code was overfitted to a single example, which made it challenging to generalize to other use cases, there were also several desired features that were not present to meet the needs of the client.

Given these issues, the ability for the ADP project to pre-process semi-structured data was seriously compromised. The lack of feedback and documentation made it exceedingly difficult for the client to use the integrated workflows effectively, if at all, leading to frustration and inefficiencies. The program outputs were often inconsistent with expectations, and the absence of unit tests meant that reliability and accuracy were not assured. In summary, the ADP project's need for a refactor of the integrated workflows (a.k.a. clean-up or redesign) was multifaceted and involved addressing a range of programming and engineering challenges to ensure a more robust and production-ready application. To accomplish this, we used a Red Green refactoring approach to improve the quality of the product.

Red Green vs Green Red approach to refactoring

Refactoring is the process of restructuring existing code in order to improve its quality, readability, maintainability, and performance. This can involve a variety of activities, including cleaning up code formatting, eliminating code duplication, improving naming conventions, and introducing new abstractions and design patterns.

There are several reasons why refactoring is beneficial. Firstly, it can improve the overall quality of the codebase, making it easier to understand and maintain. This can save time and effort over the long term, especially as codebases become larger and more complex. Additionally, refactoring can improve performance and reduce the risk of bugs or errors, leading to a more reliable and robust application.

One popular approach to refactoring is the "Red Green" approach, as part of the test-driven development process. In the Red Green approach, a failing test case is written before any code is written or refactored. This failing test is then followed by writing the minimum amount of code required to make the test pass, before proceeding to refactor the code to a better state if necessary. In contrast, the Green Red approach is the reverse of this, where the code is written before the test cases are written and run.

The benefits of the Red Green approach include the ability to catch errors early in the development process, leading to fewer bugs and more efficient development cycles. The approach also emphasizes test-driven development, which can lead to more reliable and accurate code. Additionally, it encourages developers to consider the user experience from the outset, ensuring that the codebase is designed with the end user in mind.

Figure 1: Red Green Refactor

Figure 1: Red Green Refactor

The first step, the Red component, refers to writing a test that fails. From here the code is modified to make the test pass, which refers to the Green component. Lastly, any refactoring that needs to be done to further improve the codebase is done, another test is created and run to test which fails, this is the red component again. The cycle continues indefinitely until the desired state is reached terminating the feedback loop.

  • Deployed and maintained – means code can be configured and put into production by operations team easily,  is easy to understood by developers, and is robust (with added benefit of the code can be readily picked up by new resources,
  • Reduce risk of errors and downtime means the code is of good quality, stable (or handles errors gracefully), and provides consistent and accurate results

In the case of the ADP project, the Red Green approach was applied during the refactoring process. This led to a smooth deployment process, with the application being more reliable, robust, and easier to use. By applying this approach, we were able to address the various programming and engineering challenges facing the project, resulting in a more efficient, effective, stable, and production-ready application.

Standard Practices Often Missing in Data Science Work

While data science has become a critical field in many industries, it is not without its challenges. One of the biggest issues is the lack of standard practices that are often missing in data science work. While there are many standard practices that can improve the quality, maintainability, and reproducibility of data science code, many data scientists overlook them in favor of quick solutions.

This section will cover some of the most important standard practices that are often missing in data science work. These include:

  • version control
  • testing code (unit, integration, system, acceptance)
  • documentation
  • code reviews
  • ensuring reproducibility
  • adhering to style guidelines (i.e. PEP standards)
  • using type hints
  • writing clear docstrings
  • logging errors
  • validating data
  • writing low-overhead code
  • implementing continuous integration and continuous deployment (CI/CD) processes

By following these standard practices, data scientists can improve the quality and reliability of their code, reduce errors and bugs, and make their work more accessible to others.

Documenting Code

Documenting code is crucial for making code understandable and usable by other developers. In data science, this can include documenting data cleaning, feature engineering, model training, and evaluation steps. Without proper documentation, it can be difficult for others to understand what the code does, what assumptions were made, and what trade-offs were considered. It can also make it difficult to reproduce results, which is a fundamental aspect of scientific research as well as building robust and reliable applications.

Writing Clear Docstrings

Docstrings are strings that provide documentation for functions, classes, and modules. They are typically written in a special format that can be easily parsed by tools like Sphinx to generate documentation. Writing clear docstrings can help other developers understand what a function or module does, what arguments it takes, and what it returns. It can also provide examples of how to use the code, which can make it easier for other developers to integrate the code into their own projects.

def complex (real=0.0, imag=0.0):
    """Form a complex number.

    Keyword arguments:
    real -- the real part (default 0.0)
    imag -- the imaginary part (default 0.0)
    """
    if imag == 0.0 and real == 0.0:
        return compelx_zero
    ...

Multi-Line Docstring Example

Adhering to Style Guidelines

Style guidelines in code play a crucial role in ensuring readability, maintainability, and consistency across a project. By adhering to these guidelines, developers can enhance collaboration and reduce the risk of errors. Consistent indentation, clear variable naming, concise commenting, and following established conventions are some key elements of effective style guidelines that contribute to producing high-quality, well-organized code. An example of this are PEP (Python Enhancement Proposal) standards, which provide guidelines and best practices for writing Python code. It ensures that code can be understood by other Python developers, which is important in collaborative projects but also for general maintainability. Some PEP standards address naming conventions, code formatting, and how to handle errors and exceptions.

Using Type Hints

Type hints are annotations that indicate the type of a variable or function argument. They are not strictly necessary for Python code to run, but they can improve code readability, maintainability, and reliability. Type hints can help detect errors earlier in the development process and make code easier to understand by other developers. They also provide better interactive development environment (IDE) support and can improve performance by allowing for more efficient memory allocation.

Version Control

Version control is the process of managing changes to code and other files over time. It allows developers to track and revert changes, collaborate on code, and ensure that everyone is working with the same version of the code. In data science, version control is particularly important because experiments can generate large amounts of data and code. By using version control, data scientists can ensure that they can reproduce and compare results across different versions of their code and data. It also provides a way to track and document changes, which can be important for compliance and auditing purposes.

Figure 2: Version Control Illustration

Figure 2: Version Control Illustration

A master branch (V1) is created as the main project. A new branch off shooting V1 is created in order to develop and test until the modifications are ready to be merged with V1, creating V2 of the master branch. V2 is then released.

Testing Code

Testing code is the formal (and sometimes automated) verification of the completeness, quality, and accuracy of code against expected results. Testing code is essential for ensuring that the codebase  works as expected and can be relied upon. In data science, testing can include unit tests for functions and classes, integration tests for models and pipelines, and validation tests for datasets. By testing code, data scientists can catch errors and bugs earlier in the development process and ensure that changes to the code do not introduce new problems. This can save time and resources in the long run by reducing the likelihood of unexpected errors and improving the overall quality of the code.

Code Reviews

Code reviews are a process in which other developers review new code and code changes to ensure that they meet quality and style standards, are maintainable, and meet the project requirements. In data science, code reviews can be particularly important because experiments can generate complex code and data, and because data scientists often work independently or in small teams. Code reviews can catch errors, ensure that code adheres to best practices and project requirements, and promote knowledge sharing and collaboration among team members.

Ensuring Reproducibility

Reproducibility is a critical aspect of scientific research and data science. Reproducible results are necessary for verifying and building on previous research, and for ensuring that results are consistent, valid and reliable. In data science, ensuring reproducibility can include documenting code and data, using version control, rigorous testing, and providing detailed instructions for running experiments. By ensuring reproducibility, data scientists can make their results more trustworthy and credible and can increase confidence in their findings.

Logging

Logging refers to the act of keeping a register of events that occur in a computer system. This is important for troubleshooting, information gathering, security, providing audit info, among other reasons. It generally refers to writing messages to a log file. Logging is a crucial part of developing robust and reliable software, including data science applications. Logging errors can help identify issues with the application, which in turn helps to debug and improve it. By logging errors, developers can gain visibility into what went wrong in the application, which can help them diagnose the problem and take corrective action.

Logging also enables developers to track the performance of the application over time, allowing them to identify potential bottlenecks and areas for improvement. This can be particularly important for data science applications that may be dealing with large datasets or complex algorithms.

Overall, logging is an essential practice for developing and maintaining high-quality data science applications.

Writing Low-Overhead Code

When it comes to data science applications, performance is often a key consideration. To ensure that the application is fast and responsive, it's important to write code that is optimized for speed and efficiency.

One way to achieve this is by writing low-overhead code. Low-overhead code is code that uses minimal resources and has a low computational cost. This can help to improve the performance of the application, particularly when dealing with large datasets or complex algorithms.

Writing low-overhead code requires careful consideration of the algorithms and data structures used in the application, as well as attention to detail when it comes to memory usage and processing efficiency. Thought should be given to the system needs and overall architecture and design of a system up front to avoid major design changes down the road.

Additionally, low-overhead code is easily maintained requiring infrequent reviews and updates. This is important as it reduces the cost to maintain systems and allows for more focused development on improvements or new solutions.

Overall, writing low-overhead code is an important practice for data scientists looking to develop fast and responsive applications that can handle large datasets and complex analyses while keeping maintenance costs low.

Data Validation

Data validation is the process of checking that the input data meets certain requirements or standards. Data validation is another important practice in data science as it can help to identify errors or inconsistencies in the data before they impact the analysis or modeling process.

Data validation can take many forms, from checking that the data is in the correct format to verifying that it falls within expected ranges or values. Different types of data validation checks exist, such as type, format, correctness, consistency, and uniqueness. By validating data, data scientists can ensure that their analyses are based on accurate and reliable data, which can improve the accuracy and credibility of their results.

Continuous Integration and Continuous Deployment (CI/CD)

Continuous Integration and Continuous Deployment (CI/CD) is a set of best practices for automating the process of building, testing, and deploying software. CI/CD can help to improve the quality and reliability of data science applications by ensuring that changes are tested thoroughly and deployed quickly and reliably.

CI/CD involves automating the process of building, testing, and deploying software, often using tools and platforms such as Jenkins, GitLab, or GitHub Actions. By automating these processes, developers can ensure that the application is built and tested consistently, and that any errors or issues that block deployment of problematic code are identified and addressed quickly.

CI/CD can also help to improve collaboration among team members, by ensuring that changes are integrated and tested as soon as they are made, rather than waiting for a periodic release cycle.

Figure 3: CI/CD

Figure 3: CI/CD

The image illustrates a repeating process represented by the infinity symbol sectioned into 8 unequal parts. Starting from the middle and moving counterclockwise the first of these parts are: plan, code, build, and continuous testing. Then continuing from the last piece, which was in the center, moving clockwise the parts are: release, deploy, operate, and then monitor, before moving back to the original state of plan.

Overall, CI/CD is an important practice for data scientists looking to develop and deploy high-quality data science applications quickly and reliably.

Conclusion

In summary, production-level code is critical for data science projects and applications. Proper programming practices and software engineering principles such as adhering to PEP standards, using type hints, writing clear docstrings, version control, testing code, logging errors, validating data, writing low-overhead code, implementing continuous integration and continuous deployment (CI/CD), and ensuring reproducibility are essential for creating robust, maintainable, and scalable applications.

Not following these practices can result in difficulties such as a lack of documentation, no error logging, no meaningful error messages for users, highly coupled code, overfitted code to a single example, lacking features desired by clients, and failure to provide feedback upon failure. These issues can severely impact production readiness and frustrate users. If a user is frustrated, then productivity will be impacted and result in negative downstream impacts on businesses’ ability to effectively deliver their mandate.

The most practical tip for implementing production-level code is to work together, assign clear responsibilities and deadlines, and understand the importance of each of these concepts. By doing so, it becomes easy to implement these practices in projects and create maintainable and scalable applications.

Meet the Data Scientist

Register for the Data Science Network's Meet the Data Scientist Presentation

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.

Register for the Meet the Data Scientist event. We hope to see you there!

MS Teams – link will be provided to the registrants by email

Subscribe to the Data Science Network for the Federal Public Service newsletter to keep up with the latest data science news.

Date modified:

Introduction to Cryptographic Techniques: Trusted Execution Environment

Hardware-based protection of data in use that can be applied anywhere

By: Betty Ann Bryanton, Canada Revenue Agency

Introduction

The increasing popularity of connected devices and the prevalence of technologies, such as cloud, mobile computing, and the Internet of Things (IoT), has strained existing security capabilities and exposed "gaps in data security" (Lowans, 2020). Organizations that handle Personally Identifiable Information (PII) must "mitigate threats that target the confidentiality and integrity of either the application, or the data in system memory" (The Confidential Computing Consortium, 2021).

As a result, Gartner predicts, "by 2025, 50% of large organizations will adopt privacy-enhancing computation (PEC)Footnote1 for processing data in untrusted environmentsFootnote2 and multiparty data analytics use cases" (Gartner, 2020). Of the several PEC techniques, Trusted Execution Environment is the only one that relies on hardware to accomplish its privacy-enhancing goal.

What is a Trusted Execution Environment?

A Trusted Execution Environment (TEE), or Secure Enclave as they are sometimes known, is an environment built with special hardware modules that allows for a secure area inside the device. This isolated environment runs in parallel with the operating system (OS). Input is passed into the TEE and computation is performed within the TEE ('secure world'), thereby protected from the rest of the untrusted system ('normal world'). These secure and isolated environments protect content confidentiality and integrity, preventing unauthorizedFootnote3 access to, or modification of, applications and data while in use.

The term 'confidential computing' is often used synonymously with TEE; they are related, but distinct. As per the Confidential Computing Consortium (CCC),Footnote4 confidential computing is enabled by the TEE; further, confidential computing provided by the hardware-based TEE is independent of topographical location (no mention of cloud, a user's device, etc.), processors (a regular processor or a separate one), or isolation techniques (e.g., whether encryption is used).

Why is hardware necessary?

"Security is only as strong as the layers below it, since security in any layer of the compute stack could potentially be circumvented by a breach at an underlying layer" (The Confidential Computing Consortium, 2021). By moving security to its lowest silicon level, this reduces potential compromise since it minimizes dependencies higher in the stack (e.g., from the OS, peripherals, and their administrators and vendors).

Why is it important?

Using a TEE allows a massive range of functionality to be provided to the user, while still meeting the requirements of privacy and confidentiality, without risking data when it is decrypted during processing. This allows users to secure intellectual property and ensure that PII is inaccessible. This protects against insider threats, attackers running malicious code, or unknown cloud providers. As such, TEEs represent a crucial layer in a layered security approach (aka defense-in-depth) and "have the potential to significantly boost the security of systems" (Lindell, 2020).

Uses

A TEE "can be applied anywhere including public cloud servers, on-premises servers, gateways, IoT devices, EdgeFootnote5 deployments, user devices, etc." (The Confidential Computing Consortium, 2021).

As per Confidential Computing: Hardware-Based Trusted Execution for Applications and Data, below is a summary of possible use cases for a TEE.

  • Keys, secrets, credentials, tokens: These high-value assets are the 'keys to the kingdom.' Historically, the storage and processing of these assets required an on-premises hardware security module (HSM), but within TEEs, applications to manage these assets can provide security comparable to a traditional HSM.
  • Multi-party computing: TEEs allow organizations, such as those offering financial services or healthcare, to take advantage of shared data (e.g., federated analytics) without compromising the data sources.
  • Mobile, personal computing, and IoT devices: Device manufacturers or application developers include TEEs to provide assurances that personal data is not observable during sharing or processing.
  • Point of sale devices / payment processing: To protect user-entered information, such as a PIN, the input from the number pad is only readable by code within the device's hardware-based TEE, thereby ensuring it cannot be read or attacked by malicious software that may exist on the device.

Benefits

  • Controlled environment: Since the TEE runs on specialized hardware, it is controlled, and it prevents eavesdropping while encrypted data is decrypted.
  • Privacy: It is possible to encrypt PII in a database; however, to process the data, it must be decrypted, at which point it is vulnerable to any attacker and to insider threats. If the data is only ever decrypted and processed inside the TEE, it is isolated from unauthorized users, thereby safeguarding data privacy.
  • Speed: Since the TEE is a secure enclave already, code or data may exist in unencrypted form in the TEE. If so, "this allows execution within the TEE to be much faster than execution tied to complex cryptography" (Choi & Butler, 2019).
  • Trust: Since the data in the TEE is not obfuscated (as in some of the other PEC techniques), this provides a comfort level that the computation and its results are correct, i.e., not having errors introduced by the obfuscation techniques.
  • Separation of concerns: As there are two distinct environments, there is a separation of workload and data administered and owned by the 'normal world' versus workload and data isolated in the 'secure world.' This protects against insider threats and potentially corrupt workloads running on the same device.
  • Decryption: If the data is encrypted in the TEE, it must be decrypted for processing; however, that decryption benefits by being contained within a tightly controlled space.

Challenges

  • Implementation: Implementation is challenging and requires customized knowledge and expertise, whether building the entire secure OS from scratch, employing a trusted OS from a commercial vendor, or implementing emerging components such as Software Development Kits (SDKs), libraries, or utilities.
  • Lack of standardization: Not all TEEs offer the same security guarantees or the same requirements for integration with existing and new code.
  • Design specification: It is the TEE developer's responsibility to ensure secure TEE design. Mere existence of a TEE is not enough.
  • Lock-in: There is potential for lock-in and dependencies with hardware vendors, TEE developers, or proprietary processing (due to lack of standardization).
  • Not bullet proof: There is the possibility for side-channel attacksFootnote6, vulnerable application code, or hardware-based security vulnerabilities, e.g., in the hardware chip, which can make the whole security model collapse.
  • Performance and cost: In comparison to setup and processing in a 'normal world', using a TEE ('secure world') negatively impacts performance and will cost more.

What's possible now?

TEEs are provided by solutions such as Intel's Software Guard eXtensions (SGX) or ARM's TrustZone; via hardware vendor Software Development Kits (SDKs); or with abstraction layers (e.g., Google's Asylo) that eliminate the requirement to code explicitly for a TEE.

Many cloud vendors (e.g., Alibaba, Microsoft, IBM, and Oracle) are now providing TEE capabilities as a dedicated low-level service aligned with their computation offerings. However, due to lack of standardization, the specifications offered by cloud vendors should be closely examined to ensure they meet the organization's desired privacy and security requirements (Fritsch, Bartley, & Ni, 2020).

What's next?

While protecting sensitive data poses significant architecture, governance, and technology challenges, using a TEE may provide a starting point for an alternative means of enhancing security from the lowest level.

However, a TEE is not plug-and-play; it is a technically challenging mechanism that "should be reserved for the highest-risk use cases" (Lowans, 2020). Nonetheless, "it is certainly harder to steal secrets from inside [a secured TEE than from the unsecured 'normal world']. It makes the attacker's job harder, and that is always a good thing" (Lindell, 2020).

Related Topics

Homomorphic Encryption, Secure Multiparty Computation, differential privacy, data anonymization, Trusted Platform Module.

Meet the Data Scientist

Register for the Data Science Network's Meet the Data Scientist Presentation

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.

Register for the Meet the Data Scientist event. We hope to see you there!

MS Teams – link will be provided to the registrants by email

Subscribe to the Data Science Network for the Federal Public Service newsletter to keep up with the latest data science news.

References

Date modified:

2021 Public Consultation on Gender and Sexual Diversity Statistical Metadata Standards - What We Heard Report

PDF version (PDF, 254.2 KB)

Introduction

For this consultation, members of the Canadian public and international partners were invited to review and provide feedback on Statistics Canada's gender and sexual diversity statistical metadata standardsFootnote 1. Specifically, Statistics Canada was seeking feedback on proposed updates to the standard for gender of person and new standards for sexual orientation and LGBTQ2+Footnote 2 status. Statistical standards for gender and sexual diversity (such as the definition of each concept and the classification which establishes its categories) allow for the reporting of statistically diverse groups of the population in a consistent manner. This report summarizes the feedback received from the consultation. For more information on statistical standards as well as the additional engagement activities that took place to inform standards on gender of person, sexual orientation of person, and LGBTQ2+ status of person see Consulting Canadians landing page and StatCan Plus article.

Gender

Statistics Canada initially released new sex at birth and gender variables and classifications on April 13, 2018. Prior to the 2021 Census – in which the gender question was asked for the first time, and the 'at birth' precision was added to 'sex'– Statistics Canada reviewed the gender standard to ensure its relevance. Other engagement activities including a targeted expert consultation supplemented this public consultation in order to update the gender standard.

The updated sex at birth and gender standards were released on October 1, 2021. Among other changes, the definition of gender, the usage sections and the comparison to relevant internationally recognized standards were expanded. In addition, some category names and definitions in the classifications were updated.

Sexual orientation and LGBTQ2+ population

Statistics Canada has been collecting information about sexual orientation since 2003. The variable 'sexual orientation of person' used in the consultation included proposed classifications for the main components of sexual orientation - sexual identity, sexual attraction, and sexual behaviour - which could be measured separately.

The new sex at birth and gender standards have allowed for a more nuanced understanding of sexual orientation and the ability to collect data on the full LGBTQ2+ population. The creation of standards on sexual orientation and LGBTQ2+ status of person will establish a framework to address information gaps on sexual and gender diversity in Canada.

Consultation overview

The purpose of the consultation was to ask data producers and users; representatives of civil society organizations; government bodies at the federal, provincial and local levels; academics and researchers; and all other interested parties, including the general public, to submit feedback regarding the proposed updates to the standard for gender and the new standards for sexual orientation and LGBTQ2+ status.

The consultation was conducted electronically and publicized through public announcements that described the proposed updates to the standard for gender of person, and proposed new standards for sexual orientation of person and LGBTQ2+ status of person. The announcements also listed the types of inputs sought, provided a timeline for the consultation and gave contact information for interested parties to make submissions and contact Statistics Canada with questions and comments.

Announcements were disseminated through the Statistics Canada's website and social media. In addition, stakeholders and partners, including civil society organizations, as well as a number of researchers in the field of gender and sexual diversity and gender studies, were invited by email to participate and encouraged to share the consultation invitation with others within their network.

Interested parties were invited to submit written proposals to Statistics Canada. The official consultation period started on February 2, 2021 and closed on March 12, 2021. In addition to the public consultation, virtual meetings were organized with key stakeholders and researchers to gather their feedback.

Summary of submissions

Statistics Canada received 205 responses by email in both official languages from a range of individuals and organizations:

  • 19 responses from academics or research groups;
  • 31 responses from organizations, such as civil society organizations and government departments or agencies at the federal, provincial or territorial level in Canada and overseas;
  • 155 responses from the general public.

The consultation also included a number of follow up discussions with academics and subject matter experts.

Statistics Canada is committed to respecting the privacy of consultation participants. All personal information created, held or collected by the Agency is protected by the Privacy Act. As such, the identity of organizations, individuals and academics who participated in the consultation process are kept confidential.

Summary of feedback on the updated gender standard

Definition – Gender

The consultation materials presented an updated definition of gender. In this update, gender was defined as "a person's social or personal identity as a man, woman or non-binary person (a person who is not exclusively male or female)." This definition included the following concepts:

  • gender identity (felt gender), which is the gender that a person feels internally, and;
  • gender expression (lived gender), which is the gender a person expresses publicly in their daily life, including at work, at home or in the broader community.

The proposed definition stated that a person's current gender may differ from the sex they were assigned at birth (male or female) and that a person's gender may change over time.

Some of the most consistent feedback received regarding the English version of the gender definition was related to the incongruent use of the biological terms 'male' and 'female'. Respondents also commented that the gender standard conflated sex and gender. To this end, a number of suggestions were received regarding terminology. These included suggestions for the use of 'male', 'female' (or 'intersex') when referring to the biological characteristics of sex, and 'men/boy', 'women/girl', 'transgender', 'cisgender' and 'non-binary' when referring to gender identities. Suggestions were received for the use of gender terminology in the non-binary definition, replacing 'male' or 'female' with 'man' or 'woman'.

Input was received providing suggested modifications to the definitions of gender identity and expression. Feedback was also received on the sex at birth variable which reflected differing perspectives. Some respondents suggested that more emphasis be put on sex assigned at birth, while others suggested that sex is not assigned at birth, but rather observed and reported, and recommended using other terminology.

Usage – Gender

The proposed usage section included the following explanation, among other content:

The variable 'Gender of person' and the 'Classification of gender' are expected to be used by default in most social statistics programs at Statistics Canada. The variable 'Sex of person' and the 'Classification of sex' are to be used in conjunction with the variable 'Gender of person' and the 'Classification of gender', where information on sex at birth is needed.

While comments specifically referencing the gender usage section were limited, some overarching feedback was received that expressed disagreement with the overall concept of gender identity and communicated concerns about self-identification into protected groups. Some respondents were not supportive of the introduction of the gender variable by default at Statistics Canada and argued that collecting data on gender, rather than sex, could disrupt the historical comparability of the data and result in a loss of informationFootnote 3.

Classification – gender

The proposed Classification of gender contained three categories: male gender; female gender; and non-binary gender. The 'non-binary' gender category of the classification is intended to capture relevant write-in responses to the gender question where a respondent indicates being neither exclusively 'man' nor 'woman'.

A few respondents suggested that the classification contain additional categories, such as a Two-Spirit category, with the recommendation that the response option only be available to Indigenous respondents when asked on surveys.

Feedback received regarding the English version of the classifications of gender and transgender status was similar to the comments mentioned above regarding the use of the biological terms 'male' and 'female' in the proposed definitions, with 'man' and 'woman' as suggested replacements.

Comments regarding the reference to 'current gender' (e.g., "This category includes persons whose current gender was reported as male") were received, which suggested the removal of the term 'current'. Similar comments were made regarding the use of the word 'current' in the Classification of transgender status.

Classification – transgender status

Consultations sought input on a classification consisting of the following two broader categories with their respective subcategories (along with their definitions, not presented here):

  • 1. Cisgender person
    • 1.1 Cisgender man
    • 1.2 Cisgender woman
  • 2. Transgender person
    • 2.1 Transgender man
    • 2.2 Transgender woman
    • 2.3 Transgender non-binary person

Respondents suggested the creation of a third, standalone category, 'Non-binary person', rather than being included as a sub-category of 'Transgender person'.

Respondents also provided feedback regarding the terminology. A few commented that the terms 'trans' and 'transgender' are not necessarily interchangeable, while others suggested replacing the term 'cisgender' with 'non-transgender'. A few respondents suggested using the term 'gender modality' as the name of the classification; for example, the Classification of transgender status could be called the Classification of gender modality.

Summary of feedback on sexual orientation

Definition – sexual orientation

In the proposed standard, sexual orientation was presented as a multidimensional concept defined as an umbrella term that includes a person's sexual identity, sexual attraction and sexual behaviour. Sexual identity refers to how a person perceives their sexuality (e.g., lesbian, straight, bisexual), sexual attraction refers to whom a person finds sexually appealing, and sexual behaviour refers to with whom a person engages in sexual activity. A person's sexual orientation may change over time.

Input was supportive of sexual orientation being a multidimensional concept. Some minor changes were suggested for how to define sexual orientation as well as its different components. Feedback was received in favour of using the term 'sexual orientation' rather than 'sexual identity'. It was also suggested that the definition of sexual orientation should include the concept of emotional attraction.

Usage – sexual orientation

Feedback regarding usage mainly consisted of the need for transparency around the rationale for collecting data on sexual orientation, ensuring data are only collected as needed. While the consultation did not specifically focus on the minimum age for responding to sexual orientation question, a few organizations and academics provided input on the proposed minimum age of 15. They noted the value of having data on youth who are LGB+ (lesbian, gay, bisexual or of a sexual orientation other than heterosexual), and felt that a rationale should be provided for requiring a minimum age for asking questions about sexual orientation. Similarly, it was suggested that the minimum age to collect data on sexual orientation should be lower than 15 or that a minimum age may not be needed at all.

Classification - sexual identity

Consultations sought input on the Classification of sexual identity which included proposed categories along with their definitions. The classification included: 'heterosexual or straight'; 'gay or lesbian'; 'bisexual'; 'pansexual'; 'asexual'; 'queer'; and 'Two-Spirit'. Some respondents suggested including more categories, while others thought the classification should include fewer categories.

It was also pointed out that some of the proposed categories were not mutually exclusive and that this should be addressed (e.g., a person could be both Two-Spirit and bisexual, or asexual and gay, or queer and lesbian). In addition to not being a mutually exclusive category, 'queer' saw some support, but some respondents also suggested to avoid this term loaded with political history and potential derogatory interpretation. Input was also received that the 'Two-Spirit' category was a distinct concept requiring a separate measure only made available to Indigenous respondents.

In addition, feedback was received suggesting that the proposed definitions of different sexual orientations conflated sex and gender by referring to attraction based on sex and/or gender. Some respondents felt that the definition of sexual orientation should solely be based on sex. Other input suggested that sexual orientation definitions include being attracted to a person's gender expression, along with their sex and gender.

Feedback from different types of respondents (i.e., individuals, academics, and organizations) recommended combining the 'bisexual' and 'pansexual' categories, as these terms may overlap and be used interchangeably, making the two categories not mutually exclusive. It was pointed out in some comments that responses may be influenced by whether a person conceptualizes sex/gender as binary or not.

The proposed Classification of sexual identity also included higher levels of aggregation, including category groupings 'heterosexual or straight' and 'minority sexual identity'. Some of the most consistent feedback was that the 'minority sexual identity' category carried a negative connotation and that it was inappropriate. Other feedback suggested that different sexual identities should not be aggregated together.

Summary of feedback on sexual attraction

Classification - sexual attraction

The proposed Classification of sexual attraction was presented in two versions, each including a number of categories for respondents to identify their sexual attraction. One version measured attraction in reference to the respondent's own gender, without specifying the gender or genders of persons that they are attracted to (e.g., 'person only attracted to person of a different gender'). The other version specified the gender or genders of persons to which the respondent is attracted to (e.g., 'person only attracted to persons of male gender'). Each version also included categories for people who are 'equally attracted' to more than one gender, as well as for people who do not experience sexual attraction or who are unsure of their sexual attraction.

While feedback on which version was preferable was very limited, one of the key issues identified in responses was that both versions included too much detail or that they were too complicated. Others argued that sexual attraction should be defined on the basis of sex rather than gender. While both versions included a category for people who do not experience sexual attraction, some comments suggested that the classification should be more inclusive of people with little or no sexual attraction (i.e., people on the asexual spectrum). It was also suggested to re-name the 'unsure' category to 'questioning'.

Summary of feedback on sexual behaviour

Classification - sexual behaviour

Overall, this classification elicited stronger reactions than the other classifications. Some feedback indicated understanding the need to refer to the concept of sex rather than gender in the context of sexual behaviour. However, many comments expressed surprise or confusion that sex rather than gender terminology was used in the proposed sexual behaviour classification, which differed from the proposed sexual identity and sexual attraction classifications which used gender terminology. Some suggested that the purpose of the Classification of sexual behaviour was unclear, and proposed that the classification provide some base standard definitions of 'sexual activity'. Other input suggested to shift the focus away from the sex of the sexual partners towards specific acts.

A significant amount of feedback from different sources (i.e., organizations, individuals and academics) noted that intersex people were only included as partners in the Classification of sexual behaviour and that there was not a specific category for intersex respondents. Some input also indicated that no definition of intersex was provided.

Some respondents suggested that the number of sexual partners should be included within the sexual behaviour dimension. Finally, it was recommended not to refer to 'men who have sex with men' in the classification as the term may have a negative connotation to some people.

Summary of feedback on LGBTQ2+ status

Definition – LGBTQ2+

Statistics Canada is committed to supporting disaggregated data analysis in order to highlight the experiences of specific segments of the population. Recognizing that sample size may be an issue for small populations, the consultation proposed an aggregate LGBTQ2+ standard to establish a consistent approach to combining data on gender identity and sexual orientation. Input was sought on the proposed definition of LGBTQ2+ status as well as the choice of acronym. The proposed definition was that "LGBTQ2+ status refers to whether or not a person is lesbian, gay, bisexual, transgender, queer, Two-Spirit, or another non-binary gender or minority sexual identity." Feedback received focused on the acronym rather than the proposed definition. The majority of feedback proposed to move the '2' referring to Two-Spirit people at the beginning of the acronym to acknowledge Indigenous people in the context of reconciliation.

Classification – LGBTQ2+

Feedback was sought on the proposed Classification of LGBTQ2+  status as two distinct categories (i.e., 'LGBTQ2+ person' and 'non-LGBTQ2+ person' ('heterosexual and cisgender person') as well as their definitions. Some feedback argued against aggregating diverse populations under one umbrella category as these groups have different experiences and are not homogenous in their characteristics. However, others indicated that this approach was a useful way to analyze complex issues experienced by the LGBTQ2+ population as a whole.

Next steps

Statistics Canada has completed the review process for the updated gender standard and the new sexual orientation standard. The updated gender standard was released on October 1, 2021. All of the comments received during this consultation and other engagements activities were taken into account, and many are reflected in this updated standard.

The new sexual orientation standard was released on August 16, 2023. The public consultation summarized in this What We Heard report was one of four phases that informed the development of the sexual orientation standard. In addition to the public consultation, Statistics Canada undertook a targeted expert consultation, focus groups, and a testing phase which consisted of one-on-one interviews.

The focus groups and testing were conducted in English and French and engaged diverse participants from urban and rural communities in different regions across the country. Participants included LGBTQ2+ and non-LGBTQ2+ individuals from a range of ages, genders and socio-economic status groups. Focus groups and testing also engaged Indigenous Two-Spirit participants, as well as immigrant and racialized participants.

Date modified:

Participate in the consultation for the update of the Canadian Research and Development Classification (CRDC) 2020 V1.0

Opened: August 2023
Closed: October 2023 Results posted: March 2024

Introduction

The Social Sciences and Humanities Research Council of Canada (SSHRC), the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Foundation for Innovation (CFI), the Canadian Institutes of Health Research (CIHR), and Statistics Canada have collaboratively developed and released a new Canadian Research and Development Classification (CRDC) 2020 Version 1.0 in October 2020. This shared standard classification is available for use by the federal research granting agencies, Statistics Canada and any other organization or individual that find it useful to implement. The CRDC is aligned with international research and development classification standards.

Statistics Canada, as custodian of the CRDC, and its close partner research funding agencies, have agreed to undertake minor revisions of the classification every year or two, and major revision every five years. In fact, all parties already agreed that the first CRDC 2020 version 1.0 will be revised within 2 years of its first release date, and on a five-year cycle after that, with the possibility of 'evergreening' for minor changes once a year to reflect the changes in the research fields. We now have the opportunity to revise the CRDC 2020 V1.0 after being released for more than 2 years.

This consultation was only targeted toward Field of research (FOR) of the CRDC 2020 V1.0.

Consultative engagement objectives

This consultation aimed to gather feedback from users who have already implemented the classification, as well as other interested parties who might want to suggest updates or changes, but not significant conceptual or structural ones (which are reserved for the 5-year revision cycle).

Federal research funding agencies, Statistics Canada's statistical programs related to R&D data, members of the research community and their partners, and Canadians who feel the need for the CRDC 2020 V1.0 to be revised at this time, are invited to provide feedback for the revision of the Field of research (FOR) of the CRDC Version 1.0.

The feedback will be analyzed, and recommendations for changes or revisions to the CRDC will be made, following 2 key steps:

  1. Collection of feedback and data to assess classification revision needs and gaps
    1. Launch of a consultation process that will capture the needs and gaps of the CRDC 2020 V1.0 - FOR as perceived mainly by the federal research granting agencies, Statistics Canada and the research community
    2. Analysis of data collected at the research funding agencies to identify any missing fields of research
  2. Review of CRDC 2020 V1.0 - FOR and validation of proposed changes
    1. Review of feedback and analysis to inform any possible revisions
    2. Validation of proposed revisions with field expertise

Closing date

This consultation is closed.

Results of the consultative engagement

Statistics Canada received feedback from a variety of people such as members of the research community and organizations, and we want to thank participants for their contributions to this consultative engagement initiative. Their feedback have helped guide the revision to the CRDC 2020 V1.0.

We invite you to read the report on the Revision of the Canadian Research and Development Classification (CRDC) 2020 Version 1.0.

How to provide feedback during the consultation?

Proposals for the revision of the Field of research (FOR) of the CRDC 2020 V1.0 revisions must contain the contact information of those submitting the change request:

  • Full Name
  • Organization (when an individual is proposing changes on behalf of an organization)
  • Mailing address
  • Email address
  • Phone number

Should additional information or clarification to the proposal be required, participants might be contacted.

Proposals must be submitted by email to statcan.crdc-ccrd.statcan@statcan.gc.ca

Consultation guidelines

Individuals or organizations are encouraged to follow the guidelines below when developing their proposals.

Proposals should:

  • be based on the CRDC 2020 V1.0 - FOR, therefore reading it is important before submitting changes;
  • clearly identify the proposed addition or change to the Field of research (FOR) of the CRDC 2020 Version 1.0; this can include the creation of entirely new classification items related to the classes and subclasses or modifications to existing classification items within the classes and subclasses. This consultation will not result in the modification of higher-level classifications items (divisions and groups);
  • outline the rationale and include supporting information for the proposed change;
  • when possible, describe the empirical significance (i.e., expenses, value-added or GDP, number of researchers, etc.) of proposed changes, and especially real structural changes (resulting in a change in the scope of a current classification item);
  • be consistent with classification principles (e.g., mutual exclusivity, exhaustiveness and homogeneity within categories);
  • be relevant, that is, proposals should:
    • describe the present analytical interest;
    • define how the change would enhance the usefulness of data;
    • be based on appropriate statistical research or subject matter expertise.

Please consider the questions below when preparing your input for the consultation on the revision of CRDC 2020 V1.0-FOR:

  • Are there research and development (R&D) services or activities for which you cannot find a satisfactory CRDC code?
  • Are there R&D activities or services that you find difficult to place in CRDC 2020 V1.0?
  • Are any R&D activities or services missing?
  • Are there R&D or combinations of R&D that have significant economic value and analytical interest that you would like to see with a specific or separate CRDC classification item (classes and subclasses)?
  • Are there classification items you find difficult to use because their descriptions are vague or unclear?
  • Are there pairs of classification items you find difficult to distinguish from each other? Are there boundaries that could be clarified?
  • Are there R&D activities or services that you are able to locate in CRDC 2020 V.10, but you would like to have them located in a different classification item or level of R&D activities? Please clearly indicate why;
  • Is the language or terminology used in CRDC 2020 V1.0 in need of updating to be consistent with current usage in the research field?

Note that submissions do not need to cover every topic; you can submit your comments or proposals on your specific area(s) of concern only.

The following criteria can be used to review the proposals received:

  • consistency with classification principles such as mutual exclusivity, exhaustiveness, and homogeneity of R&D activities or services within categories, with no overlapping to avoid double counting;
  • have empirical significance as an R&D activity or service, expenditures (government and private sectors), number of researchers involved, etc.;
  • are related to collectable and publishable data;
  • be relevant, that is, it must be of analytical interest, result in data useful to users, and be based on appropriate statistical research, subject-matter expertise, and administrative relevance.
  • be consistent with the Canadian System of National Accounts to some extent (for statistical purposes);
  • special attention could be given to specific R&D activities or services, including:
    • new or emerging R&D activities or services;
    • R&D related to new or advanced technologies;
    • any field of research that may be missing from the current version of the classification.

Treatment of proposals

Statistics Canada will review all proposals received in collaboration with research funding agencies. They reserve the right to use independent parties or other government employees, if deemed necessary, to assess proposals.

The federal research granting agencies and Statistics Canada will consider feedback received from this consultation to finalize the revision of Canadian Research and Development Classification (CRDC) 2020 V1.0 - FOR, which will be published in early 2024, with a new version which could be either CRDC 2020 V1.1 or CRDC 2020 Version 2.0 depending on the extend of the approved changes.

If deemed appropriate, a representative of Statistics Canada or the research funding agencies will contact respondents (including virtual or physical meetings) to ask additional questions or seek clarification on a particular aspect of their proposal.

A report summarizing the findings of this consultation will be published on the Statistics Canada website later in 2024.

Please note each proposal will not necessarily result in a change to the CRDC 2020 V1.0.

Official languages

Proposals may be written in either of Canada's official languages - English or French.

Confidentiality

Statistics Canada is committed to respecting the privacy of consultation participants. All personal information created, held or collected by the Agency is protected by the Privacy Act. For more information on Statistics Canada's privacy policies, please consult the Privacy notice.

Note of appreciation

We thank all respondents in advance for their interest and participation in this consultation on the revision of the Canadian Research and Development Classification (CRDC) 2020 Version 1.0 - FOR. Your contributions are valuable to us.