Eh Sayers Episode 17 - It's 8pm... Do You Know What Your Kids Are Googling?

Release date: April 12, 2024

Catalogue number: 45200003
ISSN: 2816-2250

It's 8pm... Do You Know What Your Kids Are Googling?

Listen to "Eh Sayers" on:

Social media shareables

Tag us in your social media posts

  • Facebook StatisticsCanada
  • Instagram @statcan_eng
  • Twitter @StatCan_eng
  • Reddit StatCanada
  • YouTube StatisticsCanada

Visuals for social media

It's 8pm... Do You Know What Your Kids Are Googling? graphic 1

It's 8pm... Do You Know What Your Kids Are Googling? graphic 1

StatCan released new analysis into the online culture our kids are growing up in, and it’s far from the best of all possible worlds: misinformation, bullying, violence… and worse.

Analyst Rachel Tsitomeneas joins us to dive into the findings.

Host

Tegan Bridge

Guest

Rachel Tsitomeneas

Listen to audio

Eh Sayers Episode 17 - It's 8pm... Do You Know What Your Kids Are Googling? - Transcript

Tegan: Welcome to Eh Sayers, a podcast from Statistics Canada, where we meet the people behind the data and explore the stories behind the numbers. I'm your host, Tegan Bridge.

Listen, I'm not going to lie. I'm really glad that the Internet was a different place when I was a kid. I remember using online message boards: I was part of a community that talked about video games when I was a pre-teen. That was a kind of proto-social media, I suppose, but we didn't have smartphones. My first cell phone was a flip phone that lived in my backpack and was usually dead because it cost 15 cents to send one text message.

The Internet is not the same now as it used to be, and, I think it's an important point, even two people using the Internet at the same time, they're experiencing two different Internets: the sites you choose to visit, but even the same site might display different information to two different users: algorithms guessing what you want to see, what you might want to click on, what you might want to buy. Your Internet is not my Internet, and the Internet of today is definitely not the Internet of yesteryear.

StatCan recently released new analysis on the online environment and cyberaggression among young people. Joining us in the studio is that article's author.

Rachel: My name is Rachel Tsitomeneas and I am an analyst with Statistics Canada in the Center for Social Data Insights and Innovation.

Tegan: Are young people being exposed to more, shall we say, concerning content online compared to the average user?

Rachel: So, based on data from the 2022 Canadian Internet Use Survey, more than 8 in 10 Canadians who were age 15 to 24 saw information online in the year prior to the survey that they suspected to be false, which is considerably higher than the national average of 70%.

And misinformation can be as simple as giving somebody a wrong time or a date for a party, or it can go as far as becoming disinformation where it's intentionally weaponized and intends to mislead people and purposefully misstates facts.

And young people are seeing this more often, but they're also not as concerned about it as the rest of the population, which is quite interesting.

Tegan: If you're interested in misinformation, we did an entire episode about that called "A little less misinformation a little more true facts, please," but it isn't just misinformation that young people are seeing online.

Rachel: In the numbers that I have seen and in the research that I have recently done, I have found that young people are, in fact, seeing a lot more information and content that may incite hate or violence online than the rest of the population.

Tegan: This type of content might be things like terrorist content or violence towards ethnic groups. And young Canadians were more likely than any age group to see this content online: 71% compared to the national average of 49%.

People bring their whole selves online for better or worse, the good and the bad. What kind of aggressive behaviours do we see online?

Rachel: Aggressive behavior online definitely exists on a continuum: it starts with something as simple as bullying, name calling, teasing, and it can escalate all the way to a hate crime, which can be directed at individuals or groups of people. It's a criminal violation, and it's motivated by hate. It's based on race, language, colour, religion, sex, et cetera. And it can be taken that far online. So there's this whole big continuum, and youth, especially, are exposed to this because they're online so much more often than the rest of the population or older generations.

Tegan: So who's being targeted for bullying and who's being targeted for hate crimes? Is that the same population?

Rachel: It definitely is the same population, and what we found in the data, especially from the Uniform Crime Report, is that young people are definitely the most likely to be victims of online hate crime. And they're also the most likely to be the perpetrators of online hate crimes. The median age of victims of cyber related hate crimes was only 32 years old and the median age of cyber related hate crime perpetrators was only 27 years old.

There's some demographic differences in the types of victimization that people are experiencing. So, young women are often the most likely to be, um, bullied online in a sexualized nature, whereas just young people in general are going to be the victims and the perpetrators of hate crimes online.

Tegan: Young people, then, are both the victims of these online crimes, but they're also the ones we think are committing the crimes?

Rachel: Yeah, a lot of young people are being charged or suspected of committing cyber related hate crimes. A large chunk of people that were charged between 2018 and 2022 were younger people, you know, even between the ages of 12 and 17. So, so children were being charged with these crimes, but a very stark contrast in the perpetrators of cyber related hate crimes is between males and females. So, you know, between 2018 and 2022 again, 87% of the total people charged with or suspected of committing these types of crimes online were men or boys.

Tegan: What are some of the challenges in studying online interactions? Things like misinformation, bullying and hate crimes.

Rachel: The hard part about studying these online interactions is that it's so new. The internet, relatively speaking, is so incredibly new, and especially social media. So it's hard for us to figure out ways to, to collect data on these new and evolving, you know, spaces that people interact with each other and interact with media. So it's really hard for us to try and collect data in a way that's going to help us understand what's going on, and we don't really, uh, use a lot of web scraping and data, sciency type collection yet online. And we've relied really heavily on surveys for this type of data collection. The problem with relying on survey data all the time is that there's some limitations with that, as it's all self-reported data.

Tegan:  From the article that I read that was published a few weeks ago, it looked like it was also limited by police reported interactions.

So, the fact that it's police reported, if somebody, you know, calls me a mean word that you only use for women, you know, I'm not going to report that to the police, but that was certainly egregious behavior online.

Rachel: Absolutely. And that's, uh, that's where this continuum comes in again, from bullying to discrimination to hate crimes. People experience these types of things all the time, but when it comes to reporting to the police, there's a lot of limitations that people feel and that people have. They're, they're scared to report. They don't feel that they can trust authorities when they report. And so a lot of these, these instances go unreported. And so the numbers that we have are definitely underrepresenting what's actually happening.

Tegan: Why do these findings matter? Why is this important?

Rachel: These findings matter because we can see who views online hate content, who views and misinformation,  the types of people who are the perpetrators and are the victims of online hate crimes. And then we can better understand where we should be implementing policy and where we should be trying to help these people and provide resources to them or to try and, you know, encourage people to actually report when something like this happens to them.

Tegan: The findings also matter for well-being.

Rachel: So, recently, I actually have done some work that related, uh, hate crime rates and quality of life indicators. And so, what that research found is that, uh, census metropolitan areas with high rates of hate crime were actually associated with lower quality of life indicators, such as self-reported health, self-reported mental health, and knowing your neighbours.

So, I think that that is an interesting avenue that, uh, could definitely be explored more in the future.

Tegan: What's the biggest takeaway for you?

Rachel: Young people are both the victims and the perpetrators of cyber related hate crimes. And I think it really, um, points to the fact that we need to look at this demographic closer, and we need to understand why they are the victims, why they are the perpetrators beyond just that they're on the Internet more.

Tegan: Is there anything that you would have liked to include in this daily article that you weren't able to for whatever reason?

Rachel: I was going to say, I mean, I would have loved to include all the people that didn't report, but of course they didn't report.

I think that what I would have liked to include, but it just simply doesn't exist, is just more information on the types of content that young people are seeing online. So I would have loved to get more into the specifics of what young people thought as harmful or aggressive content online and really dig deeper into what they thought it was and how they experience it.

Tegan: You've been listening to Eh Sayers. Thank you to Rachel Tsitomeneas for taking the time to speak with us.

For more information on this topic, check out the article published in The Daily February 27, 2024, called "Online hate and aggression among young people in Canada."

You can subscribe to this show wherever you get your podcasts. There, you can also find the French version of our show, called Hé-coutez bien! If you liked this show, please rate, review, and subscribe. And thanks for listening!

Sources

The Daily - Online hate and aggression among young people in Canada

Invitation to participate in the revision of the Classification of Instructional Programs (CIP) Canada

Opened: April 2024

Introduction

Statistics Canada invites data producers and data users, representatives of educational institutions and professional associations, government bodies at the federal, provincial/territorial, and local levels, educational experts, academics and researchers and all other interested parties to submit proposals for the revision to the Classification of Instructional Programs (CIP) Canada.

Following the decision of the Statistics Canada's Social Standards Steering Committee (SSSC) on January 9, 2024, to institute a permanent consultation process for CIP Canada, proposals for changes may be submitted and reviewed on an ongoing basis. Only a cut-off date for considering proposed changes to be included into a new version of the CIP Canada will be instituted moving forward. For future revisions of the CIP Canada, a cut-off date will be maintained at about one and a half years prior to the release date of the new classification version.

In exceptional circumstances, when a consensus is reached among the data producers and users at Statistics Canada, the classification might be revised before the regular revision cycle of 5-years, as the way of 'evergreening' the standard.

In the context of statistical classifications, evergreening refers to updating the classification and the related reference (index) file on a continuous basis with the objective of maintaining quality, timeliness and relevance. Though, evergreening does not necessarily result in the release of a new version of the classification every year. A decision to release a new version (before the end of the regular 5 years revision cycle) needs to be discussed and assessed by key classification stewards considering potential impacts on data and statistical programs.

Objectives

This consultation aims to gather feedback from users who have already implemented the classification, as well as other interested parties who might want to suggest updates or changes. The principal objective of the consultation is to receive input from classification users to determine if the classification remains relevant and reflective of Canadian postsecondary educational programs. This ensures that quantitative and qualitative information on postsecondary educational programs continues to be reliable, timely and relevant for a wide range of audiences.

Background

The Classification of Instructional Programs (CIP) Canada 2021 is the fourth Canadian version of this classification; others being CIP Canada 2000, 2011 and 2016. The CIP Canada revisions were accomplished through the joint efforts of Statistics Canada and the National Center for Education Statistics (NCES) of the United States Department of Education.

In September 2023, Statistics Canada's Social Standards Steering Committee (SSSC) made the decision to move from a 10-year to a 5-year CIP revision cycle. The next version of the CIP Canada will be in 2027 and will align with the new 5-year U.S. CIP 2025.

Nature and content of proposals

Respondents are invited to provide their comments, feedback, and suggestions on how to improve the CIP Canada. They must outline their rationale for proposed changes.

Respondents may propose virtual (not affecting the meaning of a classification item) and real changes (affecting the meaning of a classification item, whether accompanied by changes in naming and/or coding or not). Examples of real changes are: the creation of new classification items, the combination or decomposition of classification items, as well as the elimination of classification items. A classification item (sometimes referred to as a "class") represents a category at a certain level within a statistical classification structure. It defines the content and the borders of the category, and generally contains a code, title, definition/description, as well as exclusions where necessary. For the CIP Canada, classifications items are: Series (2-digit), Sub-series (4-digit) and Class (6-digit).

Key dates for the CIP Canada 2027 revision process

Here are key dates for the CIP Canada 2027 revision process:

  • Official public consultation period for changes proposed for inclusion in the CIP Canada 2027: Ongoing to the end of June 2024. Beyond the CIP Canada 2027, the cut-off date to incorporate approved changes from proposals into the new classification version will be about a year and a half before the release date of the next version of the CIP based on the 5-year revision cycle.
  • Public notice containing proposals in consideration for changes in the CIP Canada 2027: winter of 2024
  • Public notice containing the final approved proposal for changes in the CIP Canada 2027: spring/summer 2025
  • Public release of the CIP Canada 2027 Version 1.0: late 2027/early 2028

The next revised version of CIP Canada will be called CIP Canada 2027 Version 1.0.

Individuals and organizations wishing to submit proposals for changes in CIP Canada may do so at any time, in accordance with the permanent consultation process adopted by Statistics Canada with regards to CIP Canada.

How to provide feedback during the consultation?

Proposals for CIP Canada revisions must contain the contact information of those submitting the change request:

  1. Name
  2. Organization (when an individual is proposing changes on behalf of an organization)
  3. Mailing address
  4. Email address
  5. Phone number

Should additional information or clarification to the proposal be required, participants might be contacted.

Proposals must be submitted by email to: statcan.cip-consultation-cpe-consultation.statcan@statcan.gc.ca.

Consultation guidelines

Individuals or organizations are encouraged to follow the guidelines below when developing their proposals.

Proposals should:

  • clearly identify the proposed addition, change or modification to CIP Canada; this can include creating new classes, or modifying existing classes;
  • outline the rationale and include supporting information for the proposed change, such as:
    • title/name of the proposed new postsecondary educational program
    • curriculum of the proposed new program (the courses or subjects that make up the program)
    • names and number of educational institutions offering the proposed new program;
  • when possible, describe the empirical significance (i.e., field of study analysis, educational forecasting, comparing education and salary outcomes amongst groups) of proposed changes;
  • be consistent with classification principles (e.g., mutual exclusivity, exhaustiveness, and homogeneity within categories);
  • be relevant, that is
    • describe the present analytical interest;
    • enhance the usefulness of data;
    • base the proposal on appropriate statistical research or subject matter expertise.

Please consider the questions below when preparing your input for the consultation on the current version of the CIP Canada:

  • Are there postsecondary educational programs for which you cannot find a satisfactory CIP Canada code?
  • Are there classification items that you find difficult to use because their descriptions are vague or unclear?
  • Are there different postsecondary educational programs you find difficult to distinguish from each other? Are there boundaries that could be clarified?
  • Are there postsecondary educational programs that you think should have their own CIP Canada category? Please indicate at which level and why, with the support documentation about the postsecondary educational program (see guidelines above for a proposal).
  • Are there postsecondary educational programs that you are able to locate in CIP Canada, but you would like to have them located in a different 2-digit series or 4-digit subseries? And why?
  • Does the language or terminology used in CIP Canada need updating to be consistent with current usage?

Note that submissions do not need to cover every topic; you can submit comments on your particular area(s) of concern only.

The following criteria will be used to review the proposals received:

  • consistency with classification principles such as mutual exclusivity, exhaustiveness, and homogeneity of postsecondary educational programs within categories;
  • have empirical significance as a field of study;
  • data be collectable and publishable;
  • be relevant, that is, it must be of analytical interest, result in data useful to users, and be based on appropriate statistical research and subject-matter expertise;
  • special attention will be given to specific postsecondary educational programs, including:
    • new or emerging programs
    • changes related to the scope of existing programs.

CIP Canada Classification Structure

CIP Canada has a three-level hierarchical classification structure, consisting of 2-digit 'series', 4-digit 'subseries', and 6-digit 'instructional program classes'. Changes may be proposed for any level.

The Classification of Instructional Programs (CIP) Canada 2021 Version 1.0 is the latest version of the classification for participants of this consultation to base their input on. Persons or organizations proposing a change should always make sure they refer to the latest available version of CIP Canada.

Costs associated with proposals

Statistics Canada will not reimburse respondents for expenses incurred in developing their proposal.

Treatment of proposals

A team of representatives from Statistics Canada will review all proposals received. Statistics Canada reserves the right to use independent consultants, or government employees, if deemed necessary, to assess proposals.

If deemed appropriate, Statistics Canada will contact the respondents to ask additional questions or ask for clarification on a particular aspect of their proposal.

Please note changes will only be implemented during planned revision cycles and that a proposal will not necessarily result in changes to CIP Canada.

Official languages

Proposals may be written in either of Canada's official languages – English or French.

Confidentiality

Statistics Canada is committed to respecting the privacy of consultation participants. All personal information created, held or collected by the Agency is protected by the Privacy Act. For more information on Statistics Canada's privacy policies, please consult the Privacy notice.

Thank You

We thank all participants for their continued interest and participation in the various CIP Canada engagement activities.

Enquiries

If you have any enquiries about this process, please send them to statcan.cip-consultation-cpe-consultation.statcan@statcan.gc.ca.

Invitation to participate in the revision of the National Occupational Classification (NOC)

Opened: April 2024

Introduction

Statistics Canada (StatCan) invites data producers and data users, experts in the field of employment, representatives of business associations, government bodies at the federal, provincial/territorial, and local levels, academics and researchers and all other interested parties to submit proposals for the revision to the National Occupational Classification (NOC).

Following the decision of the Statistics Canada's Social Standards Steering Committee (SSSC) on January 9, 2024, to institute a permanent consultation process for the NOC, proposals may be submitted and reviewed on an ongoing basis. Only a cut-off date for considering proposed changes to be included into a new version of the NOC will be instituted moving forward.

As was done with the NOC 2016, in exceptional circumstances, when a consensus is reached among the data producers and users at StatCan and our partners at Employment and Social Development Canada (ESDC), the classification might be revised before the regular revision cycle of 10-years or update cycle of 5-years, as the way of 'evergreening' the standard.

In the context of statistical classifications, evergreening refers to updating the classification and the related reference (index) file on a continuous basis with the objective of maintaining quality, timeliness, and relevance. Though, evergreening does not necessarily result in the release of a new version of the classification every year. A decision to release a new version before milestone revision/update cycles needs to be discussed and assessed by key classification stewards considering potential impacts on data and statistical programs.

Objectives

This consultation aims to gather feedback from users who have already implemented the classification, as well as other interested parties who might want to suggest updates or changes. The principal objective of the consultation is to receive input from classification users to determine if the classification remains relevant and reflective of the Canadian labour market. This ensures that quantitative and qualitative information on occupations continues to be reliable, timely and relevant for a wide range of audiences.

Background

The NOC was jointly developed by ESDC and StatCan and has been maintained in partnership since the first edition published in 1991/1992. Prior to 2011, ESDC NOC and StatCan NOC-S differed in their major group structures and, consequently, in their coding systems. However, the revised NOC 2011 eliminated the differences between the two former systems.

In 2016 the NOC was updated as part of an every 5-year cycle content update, which generally occurs in response to labour market changes or to improve clarity and has no impact on data. Since 2016, ESDC and StatCan have implemented an "evergreen" practice for the NOC, where updates occur on an as-needed basis between the standard update/revision milestones. These "evergreen" updates strive to be constrained to specific situations or cases. For instance, in the NOC 2016 Version 1.2 the classification was revised to account for the new job titles created after Canada adopted a new law legalizing cannabis for non-medical use, with impacts on the whole Canadian economy and society.

Nature and content of proposals

Respondents are invited to provide their comments, feedback, and suggestions on how to improve the NOC, including a rationale for proposed changes. No restrictions have been placed on the type of change.

Respondents may propose virtual (not affecting the meaning of a classification item) and real changes (affecting the meaning or scope of a classification item, whether accompanied by changes in naming and/or coding or not). Examples of real changes are: the creation of new classification items, the combination or decomposition of classification items, as well as the elimination of classification items. A classification item (sometimes referred to as a "class") represents a category at a certain level within a statistical classification structure. It defines the content and the borders of the category, and generally contains a code, title, definition/description, as well as exclusions where necessary. For the NOC 2021V1.0, classifications items are: Major group (2-digit), Sub-major group (3-digit), Minor group (4-digit) and Unit group (5-digit).

Individuals and organizations wishing to submit proposals for changes in the NOC may do so at any time, in accordance with the permanent consultation process adopted by Statistics Canada with regards to the NOC.

How to provide feedback during the consultation?

Proposals for the NOC revisions must contain the contact information of those submitting the change request:

  1. Name
  2. Organization (when an individual is proposing changes on behalf of an organization)
  3. Mailing address
  4. Email address
  5. Phone number

Should additional information or clarification to the proposal be required, participants might be contacted.

Proposals must be submitted by email to statcan.noc-consultation-cnp-consultation.statcan@statcan.gc.ca

Consultation guidelines

Individuals or organizations are encouraged to follow the guidelines below when developing their proposals.

Proposals should:

  • clearly identify the proposed addition, change or modification to the NOC;
  • outline the rationale and include supporting information for the proposed change, such as:
    • approximate population of workers across the country;
    • duties;
    • requirements for certification (if any);
    • educational background, tools and technology used, as well as experience required for entry into the occupation;
    • current job titles used in the labour market;
  • when possible, describe the empirical significance (i.e., labour market analysis, career intelligence, occupational forecasting, employment equity, job training and skills development) of proposed change;
  • be consistent with classification principles (e.g., mutual exclusivity, exhaustiveness, and homogeneity within categories);
  • be relevant, that is
    • describe the present analytical interest;
    • enhance the usefulness of data;
    • base the proposal on appropriate statistical research or subject matter expertise.

Please consider the questions below when preparing your input for the consultation on the current version of the NOC:

  • Are there occupations for which you cannot find a satisfactory NOC code?
  • Are there classification items that you find difficult to use because their descriptions are vague or unclear?
  • Are there different occupations you find difficult to distinguish from each other? Are there boundaries that could be clarified?
  • Are there occupations that you think should have their own NOC category? Please indicate at which level and why, with the support documentation about the occupation or occupational grouping (see guidelines above for a proposal).
  • Are there occupations that you are able to locate in the NOC, but you would like to have them located in a different broad occupational category or TEER? And Why?
  • Is the language or terminology used in the NOC in need of updating to be consistent with current usage?

Note that submissions do not need to cover every topic; you can submit comments on your particular area(s) of concern only.

The following criteria will be used to review the proposals received:

  • consistency with classification principles such as mutual exclusivity, exhaustiveness, and homogeneity of occupational groupings within categories;
  • have empirical significance as an occupation output (labor force), input to labour market information)
  • data be collectable and publishable;
  • be relevant, that is, it must be of analytical interest, result in data useful to users, and be based on appropriate statistical research and subject-matter expertise;
  • occupations which can be used to create labour market information;
  • special attention will be given to specific occupations, including:
    • new or emerging
    • changes related to duties and requirements.

NOC 2021 Classification Structure

The NOC 2021 V1.0 is a 5-digit, 5-hierarchical level classification structure, consisting of 1-digit broad groups, 2-digit major groups, 3-digit sub-major groups, 4-digit minor groups, 5-digit unit groups. Changes may be proposed for any level.

The National Occupational Classification (NOC) 2021 Version 1.0 is the latest version of the classification for the participants of this consultation to base their input on. Persons or organizations proposing a change should always make sure they refer to the latest available version of the NOC.

Costs associated with proposals

Statistics Canada will not reimburse respondents for expenses incurred in developing their proposal.

Treatment of proposals

A team of representatives from Statistics Canada and ESDC will review all proposals received. Canada reserves the right to use independent consultants, or government employees, if deemed necessary, to assess proposals.

If deemed appropriate, Statistics Canada will contact the respondents to ask additional questions or ask for clarification on a particular aspect of their proposal.

Please note changes will only be implemented during planned evergreening or milestone revision/update cycles and that a proposal will not necessarily result in changes to the NOC.

Official languages

Proposals may be written in either of Canada's official languages – English or French.

Confidentiality

Statistics Canada is committed to respecting the privacy of consultation participants. All personal information created, held or collected by the Agency is protected by the Privacy Act. For more information on Statistics Canada's privacy policies, please consult the Privacy notice.

Thank You

We thank all participants for their continued interest and participation in the various NOC engagement activities.

Enquiries

If you have any enquiries about this process, please send them to statcan.noc-consultation-cnp-consultation.statcan@statcan.gc.ca.

Revision of the Canadian Research and Development Classification (CRDC) 2020 Version 1.0 - What We Heard

March 2024

Introduction

In 2020, Statistics Canada, in collaboration with the Canada Foundation for Innovation (CFI), the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Social Sciences and Humanities Research Council of Canada (SSHRC) published the Canadian Research and Development Classification (CRDC). This new classification had been designed to include all research sectors and represent the current research landscape in Canada while also contributing to greater alignment with international standards. It is also comprehensive enough to support a wide range of needs within the R&D ecosystem. It also has been developed to facilitate the peer review process and the reporting of investments by federal research funding agencies and the Government of Canada. The CRDC is meant to help ensure the consistent compatibility and comparability of statistics across research funding agencies both in Canada and internationally while balancing the needs of different users and highlighting specific areas of Canadian research strength.

The CRDC is a set of three interrelated classifications:

  • Type of activity (TOA): This is categorization by type of research being undertaken, e.g., fundamental, applied, experimental development.
  • Field of research (FOR): This is categorization by field of research; it is the methodology used in R&D that is being considered. The categories within this classification include major fields of research based on knowledge source, subject of interest, and methods and techniques used.
    • There are four hierarchical levels: divisions are the broadest level, and groups, classes and subclasses represent increasingly detailed dissections of these categories. This resulted in a comprehensive list of fields of research—nearly 1,800 in total—to help reflect Canada's current research landscape.
  • Socioeconomic objectives (SEO): This is categorization by R&D purpose or outcome.
    • There are two hierarchical levels: divisions are the broadest level, followed by groups. There are approximately 85 groups.

Adopting a common approach for classifying research and expertise across different key stakeholders in Canada aims to:

  • provide a common language for discussing research in the higher education sector, in the public sector and within government, enabling better evidence-based decision making within the research ecosystem
  • make it possible to identify expertise and research areas in a truly multidisciplinary classification
  • improve the identification of emerging research fields
  • help identify potential collaboration opportunities to optimize research efforts and improve outcomes
  • improve the identification of research funding gaps and opportunities
  • provide the research community with harmonized and integrated R&D classification
  • improve reporting on the agencies' combined contributions to research and science in Canada
  • help the agencies streamline their operational processes for peer review, recruitment and reviewer selection.

One of the commitments made by the Statistics Canada and its collaborators was to conduct a minor review of the CRDC every two years and major review every 5 years. This commitment was based on the continuous improvement model and to respond to shifts in the research ecosystem, including new and emerging fields of research. Due to the pandemic, the minor review was delayed.
The need for this review was reinforced by messages from the research community which highlighted the urgency of a review. It was decided that the minor review would take place in 2023, to be followed by the major review in 2025. The scope of the 2023 review was limited to Fields of Research. Broader changes will fall within the scope of the 2025 review.

Engagement and Outreach

  • The CRDC 2020 Version 1.0 review notice was posted on the Statistics Canada's Consulting Canadians and Standards websites, as well as via StatCan's accounts on social media platforms such as X (Twitter), LinkedIn, Facebook and Reddit.
  • The consultation period for the public was launched in August 2023 and closed in October 2023. Feedback was gathered from the public through the consultation call.
  • CFI was invited to share its data on the CRDC since implementing the CRDC in their systems.
  • Feedback was also gathered through internal sources, such as various advisory groups to the agencies and reports.
  • University Vice Presidents of Research (VPRs) were also invited to provide feedback from the office of the Vice-President of Corporate Affairs at SSHRC; a similar process was used in the 2018 review.
  • Feedback gathered over the years on an ad hoc basis through the SSHRC CRDC inbox were included for consideration as well; these requests were sent by researchers and academic groups representing new or emerging fields.
  • A working group was formed with members of SSHRC, NSERC, CIHR, CFI and Statistics Canada to review the data and make recommendations for changes based on this information.

Summary of what we heard

In the open consultation (Participate in the consultation for the update of the Canadian Research and Development Classification (CRDC) 2020 V1.0) participants and subject-matter experts were asked to review proposed categories and suggest any changes to specific categories—including adding, removing, combining, splitting and renaming—to represent the current Canadian research landscape, and to ensure that the classification would meet the needs of different stakeholders across the Canadian research ecosystem. The objective of the consultation process was to obtain feedback on fields of research (FOR) and not on socioeconomic objectives (SEO) and type of activity (TOA).

Comments and suggestions provided for consideration
Field of research Most frequent comments and suggestions provided for consideration
General
  • Over 50 recommendations were received regarding the creation or elaboration of new FORs.
  • Of these 50 recommendations, 15 of them were accepted for revisions.
  • Interdisciplinarity came up several times as something that the CRDC failed to capture; this is outside of the scope of this review and will require more consultation and consideration in the future.
  • Some categories seem to be more granular than others.
  • Some categories seem to be outdated; there may be a need to consider new and emerging fields in the next review.
  • The delineation between categories is not always evident, and the definitions provided are not always helpful; it is acknowledged that the CRDC does not have specific definitions of each FOR (beside defining what R&D are about). In fact, defining each type of field of science is a large undertaking.
Black Studies
  • SSHRC's external Advisory Committee Addressing Anti-Black Racism in Research and Research Training (2021-2022) submitted several recommendations to SSHRC in its final report (2023).
  • The advisory committee recommended the addition of Black Studies in the CRDC.
  • Black Studies is a well-established field of research in the current Canadian research landscape.
  • The working group noted that since Black Studies is an interdisciplinary field, it did not fit neatly within either Social Sciences, Humanities or other domains such as health research. It was decided that on a provisional basis, that Black Studies would be included under 'Other Social Sciences' and with a definition addressing the interdisciplinary nature of this field. A wider consultation can take place in 2025.
Indigenous Studies
  • While Indigenous research is included under 16 different FOR currently (e.g. Indigenous Law, Indigenous languages, Indigenous education system, Indigenous economics, Indigenous literature, etc.), it has been observed by the research community, and members of the SSHRC Indigenous Advisory Circle, that this unnecessarily restricts and limits Indigenous research.
  • Feedback from the research community, including from members of the SSHRC Indigenous Advisory Circle, recommended adding Indigenous Studies included to the classification as an interdisciplinary field.
  • It was decided that on a provisional basis, that Indigenous Studies would be included under 'Other Social Sciences' and with a definition addressing the interdisciplinary nature of this field. A wider consultation can take place in 2025.
  • The SSHRC Indigenous Advisory Circle also recommended changing the existing FOR "Indigenous performing arts" to "Indigenous arts" to permit classification of other art forms.
  • The SSHRC Indigenous Advisory Circle also recommended adding "Indigenous Knowledge Systems." Based on the rationale provided, this would be a Level 1 change, and thus it will be considered for the 2025 review.
Critical Disabilities Studies
  • SSHRC's external Advisory committee on Accessibility and systemic Albeism (2022-2023) submitted several recommendations to SSHRC in its final report (2024).
  • The advisory committee recommended changes to the CRDC, specifically, the need to add Critical Disabilities Studies.
  • The advisory committee underlined the importance of advocacy as a part of their research and provided a detailed rationale for the inclusion of this FOR.
  • The working group recommended expanding the current definition of Disability Studies, adding the clause "including accessibility and critical disability studies".
  • A wider consultation can take place in 2025.
Comments and suggestions for consideration on the CRDC
Overall Comments and suggestions for consideration on the CRDC
Overall
  • The way the codes are displayed needs to be more user friendly and intuitive to make it easier for the user to identify their area of research or expertise. The categories will need to be reviewed regularly to ensure that areas that are developing past "emerging" are captured in the future.
  • There needs to be a framework or some guiding principles developed for determining how changes are made to the CRDC. These guidelines need to take into consideration objectives and measurable metrics, and keep the spirit of the CRDC classification system in mind, not just from a useability or 'visibility' perspective for the user, but also, for its reporting and statistical function. These will be considered in the context of a wider review for 2025.

Eh Sayers Episode 16 - How Do You Say "Language Revitalization" in Cree?

Release date: March 27, 2024

Catalogue number: 45200003
ISSN: 2816-2250

How Do You Say Language Revitalization in Cree

Listen to "Eh Sayers" on:

Social media shareables

Tag us in your social media posts

  • Facebook StatisticsCanada
  • Instagram @statcan_eng
  • Twitter @StatCan_eng
  • Reddit StatCanada
  • YouTube StatisticsCanada

Visuals for social media

How Do You Say "Language Revitalization" in Cree? graphic 1

How Do You Say Language Revitalization in Cree?

More than 70 distinct Indigenous languages are spoken by First Nations people, Métis and Inuit in Canada, but these languages are under threat.

In this episode, we speak with Randy Morin and Belinda kakiyosēw Daniels, who share their knowledge of the Cree language with learners at the Nêhiyawak Language Experience, about the wisdom encoded in Indigenous languages, as well as the opportunities for these languages and the barriers they face.

Host

Tegan Bridge

Guests

Randy Morin, Belinda kakiyosēw Daniels

Listen to audio

Eh Sayers Episode 16 - How Do You Say "Language Revitalization" in Cree? - Transcript

Tegan:  Welcome to Eh Sayers, a podcast from Statistics Canada, where we meet the people behind the data and explore the stories behind the numbers. I’m your host, Tegan Bridge. 

When we’re talking about Indigenous languages, we’re not talking about one thing. Canada is home to over 70 distinct Indigenous languages (Indigenous languages across Canada). 

They have different statuses. For example, Inuktitut is an official language of both Nunavut and the Northwest Territories, and almost 40,000 Inuit reported that they speak it well enough to have a conversation in the 2021 census. The Cree and Ojibway languages also have tens of thousands of speakers, making these three the most widely reported Indigenous languages spoken in Canada. Then, there are also languages like Haisla, Haida and Ktunaxa, each with less than 300 speakers. But among these, between the 2016 and the 2021 census, the number of Ktunaxa and Haisla speakers rose, while the number of Haida speakers declined.

Just over 237,000 Indigenous people in Canada reported speaking an Indigenous language well enough to have a conversation in the 2021 census, however, this number declined since the previous Census, when approximately 260 thousand Indigenous people reported being able to speak an Indigenous language. This decline is driven by an ongoing decrease in the number of people with an Indigenous language as a mother tongue (Indigenous languages across Canada). 

At the same time, the 2021 census found that more Indigenous people are learning an Indigenous language as a second language. Second-language speakers accounted for over one-quarter of Indigenous language speakers overall, up 4,100 speakers, or 6.7%, from 2016 (Indigenous languages across Canada).

UNESCO, or the United Nations Educational, Scientific and Cultural Organization, considers all Indigenous languages in Canada “at risk,” that is, either vulnerable or endangered. Indigenous languages are threatened because of discrimination and colonization and practices, including the Indian Residential School system, which aimed to destroy Indigenous cultures and languages (Indigenous languages across Canada). Indigenous children had their language taken away from them when they were forcibly removed from their families and punished or shamed for speaking their language.

All that means today, these languages are at risk, and Indigenous communities are fighting to keep their languages alive.

Randy: Yeah, they say, there’s only gonna be three languages that are gonna be still spoken in 20, 25 years and Cree is one of them. Cree, Inuktitut and Ojibwe.  That's what the stats are saying.

Tegan: This is knowledge keeper Randy Morin.

Randy: Randy Morin, Assistant Professor, University of Saskatchewan. 

Tegan: There were almost 87,000 Cree speakers in Canada according to the last Census (Indigenous languages in Canada, 2021). Cree would be the most widely spoken Indigenous language in Canada... if there were a single Cree language. But there are actually many Cree languages, and they aren't all the same. 

Randy: There is a great risk, especially for the smaller dialects, like the R dialect, the L dialect.  Those ones are more like Eastern Canada. But we do have also small dialects in Saskatchewan, the N dialect and the TH dialect, Woodland Cree, Swampy Cree, but the biggest one would be the Plains Cree dialect. So that one will probably survive a little more than the other dialects.

Tegan: It's not just that different dialects sound different or use different words.

Randy: Each dialect has their own distinct way of looking at the world, right? The Swampy Crees have their own way, the Woodland Cree, they call themselves the Rock Cree, and us Plains Cree, there's differences in the dialects and how we see the world.

Tegan: Randy is a first language Cree speaker.

Randy: Well, I grew up speaking the language. It was just spoken to me right from the womb, so I didn't speak any English until I was 10 years old.  And I often tell this story, how did I pass kindergarten and grade one with just speaking Cree? Many of us in my community, my age, we all spoke the language, and we passed grade one and kindergarten. So I just grew up speaking it. I didn't know any different. It's like a fish, a fish doesn't know it's in water, that's just how I was, in my language. I was just immersed in it was all around me.

Tegan: But, as I mentioned, many speakers of Indigenous languages learned these as second languages, like Belinda Kakiyosew Daniels.

Belinda:  Dr. Belinda Kakiyosew Daniels, University of Victoria. The Indigenous Education Department. 

As far as learning goes, I’ve always listened to the language ever since I was born. I was raised by my grandparents most of my life and so they spoke Cree to each other. They didn't encourage me to speak to them, but I've often heard the language. And so, just their ability to speak to each other in our original language was inspiring to me, and this is what raised the curiosity of why am I not encouraged to speak my own language? 

Tegan: In 2017, half of Indigenous youth reported that speaking an Indigenous language was important or very important, and while they were less likely to have an Indigenous mother tongue, many were learning it as a second language (Chapter 4: Indigenous Youth in Canada). The majority of First Nations and Inuit youth, 68% and 87% respectively, who could speak an Indigenous language learned their language as a mother tongue. Among Métis youth who could speak an Indigenous language, the share was closer to half, as 55% learned their language as a mother tongue and the remaining share learned it as a second language.

Randy and Kakiyosew work with second language speakers to hone their skills. They share the Cree language with students at a language immersion camp.

Belinda:  We are co-directors for the Néhiyawak Language Experience, which is grassroots not for profit. We are having our 20th anniversary this summer. 

So, this is exciting for the work that we do.  I would suggest that we are actually pioneers in this way of reclaiming land and language immersion in our home territories as a focused intention. 

Tegan: Their efforts range from grassroots to academia.

Belinda: We also write books together. We do research together.

I have 12 graduate students that I'm either supervising or sitting on a committee.

Randy: I teach the Cree language at the University of Saskatchewan. I'm creating a Cree speaking certificate program there, hoping to launch it next year.

Tegan: But it doesn't stop there! We're also talking TV and books.

Randy: I've also done a lot of work with APTN. So, I've done a lot of Cree work for Wapos Bay, the cartoon. Also, the Guardians, the cartoon. And now, the newest children's series is called CHUMS. And that's going to be launching this year, so I've been doing a lot of work in the Cree.

Belinda:  I'm holding the book that I coauthored with, um, Andrea Custer and it's called, Speaking Cree in the Home: A Beginner's Guide for Families. nēhiyawētān kīkināhk.

Tegan: There isn't one single way to revitalize Indigenous languages. There are historical reasons why these languages are at risk, but there are also barriers that exist to this day. 

Belinda: What are the barriers? Let's get some policy into place in making our languages official.  Let's get some funding in place for our First Nations languages in our school systems.  The money has always been underrated or underfunded. Let's start validating our speakers and showing the respect that those speakers deserve when it comes to credentials.  They might not have a B. Ed. degree or a Ph d. degree, but they are a speaker of the language and hold this vast amount of knowledge about the world.  So, let's value that and credit that.

Randy: Growing up, we all spoke the language and all of a sudden FM radio came, and pop culture came.  Oh, my goodness, everyone overnight just started speaking English, you know, and it's just been on a decline ever since with like, modernization, globalization, and celebrity...  I guess our young people look up to celebrities… the wealth, and that’s what they want to be. And so we need to bring them back to own role models, and that's a challenge. 

The federal and provincial governments, they really messed up, with Indigenous people for years, but there's still not that sense of urgency, like they need to like really to push for this sense of urgency. 

And universities need to work with language people. There's only two of us at the University of Saskatchewan, and it's a real challenge. We need more. We need more people that… All hands on deck is what we’re saying. There are only two schools in Saskatoon, St. Francis and wâhkôhtowin.

There's a lot of challenges, but there's also a lot of successes. Technology can be used to connect with people in remote communities.  In fact, our friend, Bill Cook, our brother, who is part of NLE, his PhD project is to connect speakers, remote communities to learners and, and they get paid. Isn't that cool? What an awesome project! All these elders that are home could be making money on their computers, talking to learners, man, what a project! I wish I thought of that for my PhD. So, lots of challenges, but also a lot of successes.

Belinda: I wanted to add to grassroots successes. So, I mentioned my name, right? Kakiyosew. I'm from Pakitahwâkan sâkahikan. Sturgeon Lake First Nations, Saskatchewan.  And as far as grassroots goes, I, for one, I'm always being of service to my home community and, overall, to my nation.

Some of the successes have been creating signs of Nēhiyawēwin in our home community. Um, creating camps for families, creating Oskapios programs for boys and men, creating a whole community wide language program from leadership to the different agencies that are employed in our home community, such as the health department, the education department, our economic department, resources for roads and for the bison, everybody learning Cree.

We have Indigenous language revitalization as a field of study at the University of Victoria. I know that First Nations University of Canada is always working, is also working on the same thing. As Néhiyawak people, original peoples of Turtle Island, we're working together. We're coming together, working together to help lift up our original languages. So these are some successes.

Tegan: Earlier in our conversation, Randy mentioned that the different Cree languages have their own unique worldviews. Different languages aren't just interchangeable labels for the same things. Languages are pretty interesting. English, with all its funny spellings and weird idioms, carries its history within itself. And the Cree languages are unique in their own way too.

Randy: You definitely see the world in two different ways when you know the language. And I'm speaking from experience, so your worldview is really different.

You can understand the stories, you know, the teachings that are embedded within the stories. You'll understand the ceremonies a lot more if you have an intimate knowledge of the language.

Humor also plays a big part in the language. The humor is so descriptive, you can see it in your mind better than in English. English, it's kind of one dimensional.So yeah, it's relationships to the natural world.  And how we see the world as living, and that really clashes with this worldview today. Like there's no connections to the earth. You know, they see the earth as inanimate. Not alive.  They see animals as having no spirits. Insects. Birds. You know, we call these CREE our relatives, eh? So, you have a better relationship when you have that. 

The language teaches you values. Laws, teachings, these are in our languages and again, if I mentioned valid knowledge systems are embedded in it, and this language is thousands of years old.

So, you see the world differently in two different ways.  Uh, and I wanna, you know, people ask me, how do you see it in two different ways?  I see it as making you a better human being. You know, you’re not as greedy. You’re not as selfish.

But in our, in our language it’s for the whole, for the betterment because we believe, philosophically, spiritually, we go back to the spirit world. So, we have to have these laws, we have to follow these teachings, because our time (Randy snaps his fingers) is like that. We can't take anything with us, right? So, uh, so that's the fundamental differences I think of seeing the world in two different ways.

Tegan: If you're familiar with French, you're familiar with the way it sorts its nouns by gender, masculine and feminine. Cree sorts its nouns according to animacy: that which is living and that which is not.

Belinda: Animacy is how we view the world as alive and with spirit, imbued with spirit.  And so, when we look at the land, askiy, the land provides us with everything that we need, that we’ve needed, so the land is alive, the earth is alive with a spirit. And when you look at the trees and the rocks and the mountains and the rivers, the animals, the ocean, the sun, even the weather. If you have that perspective of these elements are alive with spirit, imbued with a life force of something, a source, and if you can think of them like the way you think of your own parents, your own grandparents, your brother, your sister, your newborn child. That whole way of thinking of the natural world makes you more conscious and respectful and grateful of where we're living. 

Tegan: You mentioned going the importance of nature. Why you run a language camp with camp being a very key word, I think. Why is a camp being out in nature and disconnected, unplugged? Why is that the ideal place? Not for a language camp generally. But a Cree language camp specifically, why is that the best place?

Belinda: This is how our ancestors lived. This is what our ancestors have always done.  This is the solution. And again, just coming back and connecting. It's a spiritual place, like being out in nature, walking on the land, swimming in the waters, listening to the birds when they start singing early in the morning, and going to sleep when they stop. It's amazing. And then learning the language for your surroundings. I don't know why or how but when I'm out in the context, in the language, hearing the language, something goes on in my head in my brain in the way I think. There's a shift that happens and I often try to describe it as like... like a puzzle being put together. That's what happens to my brain. Like I can literally feel this connection to the context, on the land, in the language.  And my whole worldview is just, you know, like a switch goes off, and I leave this English world behind.  I leave all the memories of the violence behind in the English colonial context. And when I'm out on the land.  It's a feeling of sâkihitowin. It's a feeling of love. 

Randy: It teaches you humility, you know, being out on the land, you know, there's no ego, there's no power tripping, and you get to connect with the communities that you're in. It's the energy of the place, eh, it’s a really clean environment, detoxifying. It's empowering. It's very loving. It's very gentle.  It's very organic… It's just a feeling of home, you know, your spirit just, you leave this concrete jungle and go out into the natural, and the spirits of the land… It's a beautiful place.

Belinda: It's a natural rhythm.

Randy: Mm hmm.

Tegan: When your language categorizes nouns by being imbued by spirit, how does that affect your perception of the natural world? Especially in the context of the climate crisis?

Randy: Well, I'll say it in one word. Wâhkôhtowin, wâhkôhtowin, we are all related. That includes the plants, that includes the stars, the moon, the mountains. 

If you're related to a relative, are you going to hurt that relative? Because in our laws, we have pâstâhowin, overstepping Creator's law.  Right. So, if you know that you're not going to cut down trees… not overharvesting for the profit of exorbitant amounts. 

People are getting rich and, and, you know, there's no word in their worldview about wâhkôhtowin. And then we have this thing called ohcinêwin, it's like harm against sentient beings. And that includes everything in creation. The water is alive; we're not gonna poison the water; we're not gonna drill in the water. You know what I mean? We have all these examples of what not to do, in their language.

Belinda: It's just saying what we were saying earlier in regards to this idea of animacy. The word for people, like I'm always saying, the Néhiyawak is plural, but we also refer to that, such as like the trees, mîtos, mîtosak, or the animals, or like birds, piyêsîsak, it's the same reference of something being alive, rocks, asiniyak, it's the same reference of they're alive with spirit and that's how they referred to.  So, if you think of the world as alive and refer to them as kin, like Randy said, like we said earlier, you're not going to clear cut. You're not going to extract and build those big mining holes. Climate change is on this drastic rise. We're seeing it with the change in temperatures, these warm temperatures. It's frightening, unusual. You see bears coming out of this winter habitat. You see geese having ducklings, and it's only January.  

Tegan: It's unnatural.

Randy: Yeah, so different languages around the world refer, like, they know the Earth is a mother. The Earth is a mother. She provides. It's not just a concept. It's actually like a real belief that the Earth is our mother, right? So, with that understanding, like Belinda said, we're going to protect, we're going to protect and care for our mother and the medicine she provides for us. Everything has a medicinal purpose and a spirit, right? So, it's on us to learn those, those medicines so we keep them for future generations.

Tegan: Ryan DeCaire said: “It's said that people revitalize a language, but really, it's a language that revitalizes a people.” What are the benefits of learning an Indigenous language, whether as a first or a second language?

Belinda: The benefits are unlimiting, just as Randy was speaking as going through my own thoughts and the benefits are reclaiming your language is your connection to land. Your connection to where you belong, your culture, your connection to what you know of natural laws and natural governance in, in those systems and also the people, your connection to the people. So, there's multiple benefits. 

As well… once you realize and know where you belong and where you come from, where you've been, moving forward, you can walk into the trauma. You can walk into that historical trauma, that intergenerational trauma, and you can smash it. Once you know your language and where you come and understand all of that. And it's been that way for me. And it's been a feeling of coming home, knowing my purpose, knowing my role, knowing that my ancestors are behind me, and I stand on the shoulders of giants.

Tegan: What are your hopes for the future of your language and your community, and do you think the work will ever be done?

Belinda: My dreams for the future. I was kind of asked that question about 10 years ago so I'm glad I'm getting asked this question again. My dreams for the future is that our languages are federal, provincial laws throughout Canada. I hope that our communities are speaking, and the languages are flourishing in our communities, that our schools are land based, spaces and places operated all in the language and that we're speaking, not just our language, but our neighboring languages.

Prior to contact, this is based on Onowa McIvor’s work, is that our languages were very multilingual. We were a very multilingual continent. And we spoke more than one language. We had to speak multiple languages to have this commerce, this massive commerce, trading, alliance system. And when the early settlers came over, they learned our languages. So it's only been recent that we've had this banning of our ways of knowing and being and doing within the last hundred years or so, and so my dreams are, how do we rectify this? How does the federal government rectify this? And then how do the powers that be help lift up our languages?

And my dreams are, again, that we have this love and understanding and empathy for the original peoples of this land. And that there is no one incarcerated. That we have no addictions. That we have no homelessness. That we don't have mental illnesses. Those are my dreams, and I just hold on to that. 

Tegan: Randy?

Randy: I hope our environment’s also intact. And I hope we get an Indigenous prime minister. You know what I mean? These are all my hopes and dreams, but I want my children, my grandchildren to be in this world that values them for who they are. Cause you know, look at me, wherever I go, people are scared of me in my own lands, my own treaty territory. People are scared of me. I get stereotyped, the racism that's thrown at me daily. I want the future to be better for my children and grandchildren. We can get along and work together from kindergarten all the way to university that the language is, you know, is embedded, language instruction… Everyone's just speaking the language, it's a goal of mine, but I don't know if I'll see it, but that's what I want to see.

Tegan: Is there anything that I haven't brought up that you'd like to talk about?

Belinda: Oh, I would like to say for people listening, learn the original, names of the nations that live on Turtle Island. I'm not exactly sure where the word Indigenous comes from, but I would prefer being called a Nehiyawakamakiano. We're one nation. There's multiple nations within Canada. Learn a greeting, learn how to say hello in whatever lands you live on. I've learned how to say 'uy' skweyl. I've learned how to say 'uy' skweyl ch'u. Like, these are just very helpful little words, that go a long way. Especially if you're a visitor on lands that you don't originally come from. And I'm not sure if I said this, but encourage your children like you do, you know, with sports or with dance or with music. Encourage your children to speak the language and to participate in the culture. Those are just a couple of things that come to my mind.

Tegan: You've been listening to Eh Sayers. Thank you to Randy Morin and Belinda Kakiyosew Daniels for taking the time to speak with us.

You can subscribe to this show wherever you get your podcasts. There, you can also find the French version of our show, called Hé-coutez bien! If you liked this show, please rate, review, and subscribe. Thanks for listening!

Sources

Anderson, Thomas. “Chapter 4: Indigenous Youth in Canada.” Statistics Canada. Government of Canada, December 1, 2021.

Statistics Canada. “Indigenous Languages across Canada.” Statistics Canada. Government of Canada, March 29, 2023.

Indigenous Languages in Canada, 2021.” Statistics Canada. Government of Canada, March 29, 2023.

Visible minority concept consultative engagement

Opened: October 2022
Updated: September 2023
Results posted: October 2023

Consultative engagement objectives

The visible minority concept is currently under review. Statistics Canada has been committed to engaging with partners, stakeholders, ethnocultural groups, and the general public to identify the appropriate terminology and categories to describe the population and properly address data needs in health, education, justice, and employment equity.

Consultative engagement methods

These consultative engagements on the Visible Minority Concept were conducted virtually with group discussions and information sessions, and electronically with e-forms and written submissions in both official languages. It was publicized through Statistics Canada's Consulting Canadians page, various events and social media. Moreover, stakeholders and partners, ethnocultural groups, non profit and nongovernment organizations and researchers were invited by email to participate and to share the invitation with others within their network.

How participants got involved

Overall, Statistics Canada received feedback from more than 460 individuals in both official languages from a variety of people and organizations, including anti-racism groups, civil society organizations, ethnocultural community organizations, religious networks, social inclusion groups and the general public.

The consultative engagement also included several follow up discussions with subject-matter experts that came from these ethnic diverse groups.

Statistics Canada thanks participants for their contributions to this consultative engagement initiative. Their insights will help guide the agency in this review.

Initial findings of the consultative engagements

Terminology

What we heard regarding terminology to replace "visible minority"

A number of participants preferred the term "racialized groups." They noted that the term "racialized" is already used by various federal departments, by provincial and municipal governments, and in the media. They also argued that the term more accurately presents race as a social construct by emphasizing the process of racialization.

However, the term "racialized" was also the most controversial option. Most francophone participants did not think that Statistics Canada should adopt race-based terminology because it is more generally considered to be offensive in the French language. In fact, many participants (both French- and English-speaking) were offended when they were described as belonging to a racialized group. They also felt that labelling all non-White people as "racialized" reinforces that White is the dominant group. Participants also noted the various definitions of "racialization" currently in use, related to colour of skin, culture, religion, ethnicity, language, etc.

The term population group (or another neutral term, such as diverse groups) was the second most preferred. Participants argued that it is sufficiently broad and flexible to apply to a number of situations and to be defined differently according to the needs of different organizations or programs. It was considered to be a more neutral term that would likely have a longer lifespan, considering the sensitivity of this topic. Participants also noted that the term could include the White population, without making this population either the reference or the norm. On the other hand, some participants opposed this term because of its vagueness.

Categories

Option 1

  • White
  • South Asian (e.g., East Indian, Pakistani, Sri Lankan)
  • Chinese
  • Black
  • Filipino
  • Arab
  • Latin American
  • Southeast Asian (e.g., Vietnamese, Cambodian, Laotian, Thai)
  • West Asian (e.g., Iranian, Afghan)
  • Korean
  • Japanese

Option 2

  • White
  • South Asian
  • East Asian
  • Black
  • Southeast Asian
  • Middle Eastern
  • Latin American

Note:

  • The "Option 1 – Current categories" list above reflects the categories included in the last Census. Information collected from this question are in accordance with the Employment Equity Act. Respondents can select multiple categories and the data collected on these groups are used for various purposes, including in the fields of labour, education, health, justice, etc.
  • The Option 2 is currently being used by certain federal departments.
  • The Census does have a question on ethnic and cultural origin which includes a list of over 500 response options and derives multiple responses showing the diversity of the population at a very granular level (see this infographic created with data from the 2021 Census).
  • The Census also provides specific data on Indigenous identity, on place of birth, on generation status, on religion, and on languages.

What we heard regarding the categories

During the consultations, no clear consensus emerged on a list of categories to measure groups. Some participants suggested that combining certain categories, as seen below in option 2, would be more useful for anti-racism purposes because the resulting data collected would be more reflective of the perception of others rather than the respondent's personal identity - which often can be quite specific.

Other participants argued that more detail is always preferable and saw no advantage in a reduction of the number of categories. Moreover, these participants noted that reducing the number of categories would mean that detail for certain groups would be lost (e.g., Chinese, Japanese, Korean, Filipino, Arab, West Asian).

One common criticism was that the categories on both lists are incoherent because they straddle race, ethnicity, nationality, and geographical descent. Most respondents believed that some categories (in particular, the "Black" category) are too broad and should be more granular.

That said, most respondents felt that comparability between census cycles is important for their data needs and were concerned with the potential impacts caused by changing the categories in the questionnaire.

Further summary results of the consultative engagement initiatives will be published online when available.

Identifying Personal Identifiable Information (PII) in Unstructured Data with Microsoft Presidio

By Saptarshi Dutta Gupta, Statistics Canada

Editor's note: The content of this article represents the position of the author and may not necessarily represent that of Statistics Canada.

Introduction

In today's digital age, organizations collect and store vast amounts of data about their customers, employees, and partners. This data often contains Personal Identifiable Information (PII). With the growing prevalence of data breaches and cyber attacks, protecting PII has become a critical concern for businesses and government agencies alike. For example, Statistics Canada conducts hundreds of surveys each year on a variety of topics and is obligated to protect the information that individuals provide.

Canada has two federal privacy laws that are enforced by the Office of the Privacy Commissioner of Canada:

  • Privacy Act: covers how the federal government handles personal information. The Privacy Act offers protections for personal information, which it defines as any recorded information about an 'identifiable individual'.
  • Personal Information Protection and Electronic Documents Act (PIPEDA): PIPEDA is the federal privacy law that applies to organizations that collect, use, or disclose personal data during commercial activities. PIPEDA requires organizations to obtain consent for the collection, use, or disclosure of personal data, and to protect personal data from unauthorized access, use, or disclosure.

Other than the above-mentioned laws, all organizations are also bound by the General Data Protection Regulation (GDPR). GDPR is the toughest privacy and security law in the world. Though it was drafted and passed by the European Union (EU), it imposes obligations onto organizations anywhere, so long as they target or collect data related to people in the EU. The GDPR will levy harsh fines against those who violate its privacy and security standards, with penalties reaching into the tens of millions of euros.

In this article, we will take a detailed look at Microsoft Presidio and how it helps organizations in Canada comply with privacy laws. We will start by discussing the key features and capabilities of Microsoft Presidio and how Microsoft Presidio can assist organizations in meeting their obligations under these laws.

Definitions

Before proceeding with rest of the article it is important to understand the difference between the terms Anonymization, Deidentification and Pseudo-anonymization that has been used in the rest of the article.

  • Anonymization: Anonymization refers to the process of irreversibly removing or obscuring identifiable information from data in such a way that the original data cannot be re-identified. The goal is to make it impossible or extremely difficult to link the data back to the individual it represents. Anonymized data should not contain any direct or indirect identifiers that could be used to identify individuals. 
  • Deidentification: Deidentification involves the removal or alteration of PII from a data set in order to prevent the identification of individuals. Unlike anonymization, deidentification does not necessarily require the data to be rendered completely unidentifiable. Instead, it focuses on removing or modifying specific identifiers, such as names, addresses, social security numbers, or any other information that could be used alone or in combination with other data to identify individuals. 
  • Pseudo-anonymization: Pseudo-anonymization is a technique that involves replacing direct identifiers with pseudonyms or unique identifiers, thereby unlinking the data from the individuals it represents. Unlike anonymization, where the original data is altered to prevent re-identification, pseudo-anonymization retains the ability to re-identify individuals using additional information stored separately, such as a key or lookup table. Pseudo-anonymization is commonly used in situations where data needs to be linked across different systems or databases while still protecting individual privacy.

What is PII?

Personal identifiable information (PII) is any data that can be used to identify an individual. This includes, but is not limited to, names, addresses, phone numbers, social security numbers, financial information, and medical records. PII is highly sensitive information that needs to be protected from unauthorized access, as it can be used for identity theft and other fraudulent activities.

Depending on whether a piece of information can be used directly or indirectly to re-identify an individual, one can categorize the information mentioned above into direct-identifiers and quasi-identifiers [4]:

  • Direct-identifiers: A set of variables unique for an individual (a name, address, phone number, or bank account) that may be used to directly identify the subject.
  • Quasi-identifiers: Information such as gender, nationality, or city of residence that in isolation does not enable re-identification but may do so when combined with other quasi-identifiers and background knowledge.

Why is PII protection important?

PII protection is important because individuals have a right to privacy and should have control over how their personal information is collected, used, and disclosed. Data breaches and identity theft can have significant consequences for individuals, including financial losses, reputational damage, and emotional distress. Therefore, it is essential for organizations to have robust measures in place to protect PII.

Background

a) Anonymising structured data

When it comes to anonymizing structured data, there are established mathematical models of privacy. This includes:

  • K-anonymity: A masked dataset has k-anonymity property if in the dataset each information that a person contains, cannot be distinguished from at least k-1 other individuals. Two methods can be used to achieve k-anonymity: first one is suppression which involves completely removing an attribute's value from a dataset. The second one is generalization in which a specific value of an attribute is replaced with a more general one.
  • L-diversity: this is an extension of k-anonymity. If we put sets of rows in a dataset that have identical quasi-identifiers together, there are at least l distinct values for each sensitive attribute, then we can say that this dataset has l-diversity.
  • Differential privacy: this aims to ensure that the output of a process or algorithm remains roughly the same, regardless of whether an individual's data is included. This means that it is impossible to determine with certainty whether a specific individual is present in the dataset just by examining the output of a differentially private analysis.

There are several other anonymization techniques that can be applied to both structured and unstructured data. Some of these techniques include:

  • Data shuffling: This involves randomly rearranging the rows or columns of a dataset to disrupt any potential correlations between variables.
  • Data perturbation: This involves adding random noise or errors to the data to reduce the risk of re-identification. This can be done through techniques such as adding Gaussian noise or rounding values to the nearest multiple of a certain number.
  • Data aggregation: This involves aggregating the data at a higher level, such as at the city or state level, to protect individual-level data.
  • Data suppression: This involves removing sensitive information from the dataset altogether, such as by deleting specific columns or rows, or replacing sensitive values with a placeholder value (e.g., "******").
  • Data generalization: This involves replacing specific values with more general values, such as replacing a specific street address with just the city or state.
  • Data obfuscation: This involves replacing sensitive information with fake or misleading data, such as through random name generation or generating fake addresses.

It is essential to understand that no single anonymization technique is completely foolproof. Therefore, it is usually necessary to use a combination of techniques to effectively protect sensitive data. It is also crucial to continuously evaluate and update anonymization techniques as new re-identification risks and techniques arise.

b) Anonymizing Unstructured data

The process of anonymizing unstructured data, such as text or images, is a more challenging task. It entails detecting where the sensitive information is present in the unstructured data and then applying anonymization techniques to it. Because of the nature of the unstructured data, directly using simple rule-based models might not have a very good performance.

Therefore, Natural Language Processing (NLP) have been applied to text anonymization. In particular, Named Entity Recognition (NER) which is a type of sequence labeling task is used which indicates if a token (like a word) corresponds to a named entity, such as PERSON (PER), LOCATION, DATETIME or an ORGANIZATION (ORG) as shown below. O indicates no entities have been recognized.

Image 1. Sequence Labeling Task – Named Entity Recognition

Image 1. Sequence Labeling Task – Named Entity Recognition
Description - Image 1. Sequence Labeling Task – Named Entity Recognition

This picture describes the result after passing a sequence of string through a Named Entity Recognizer (NER). Input is the string “John bought 30 Amazon shares in 2022” and after passing the sequence through a NER model each word is being classified with its corresponding entity. John is tagged as a PERSON, Amazon as Organization, 2022 as Datetime, rest all the information is tagged as OTHERS.

Several neural models have achieved state-of-the-art performance on NER tasks on datasets with general named entities. When they are trained on medical domain data that contains various types of personal information, they are shown to achieve state-of-the-art performance on those data as well. These model architectures include Recurrent Neural Networks (RNNs) with character embeddings or Bidirectional Transformers (BERT).

SpaCy also uses a RoBERTa based language model fine-tuned on the Ontonotes dataset with 18 named entity categories, such as PERSON, GPE, CARDINAL, LOCATION, etc.

Microsoft Presidio uses a combination of rule based and Natural Language processing methods to anonymize sensitive content which we will discuss next.

Microsoft Presidio

Why do we need Microsoft Presidio?

When we apply PII anonymization to real-world applications, there might be different business requirements that make it challenging to use pretrained models directly. For example, Government of Canada (GoC) receives several applications during an advertised process which are then reviewed. Before the review process, PII needs to be redacted to ensure personal information is not leaked and to avoid bias. Apart from the common PII entities, GoC also uses a Personal Record Identifier (PRI) for every employee such that the last digit is a modulus-11 check digit [Source: TBS - Incumbent Data Element Dictionary]

A pre-trained NER model cannot identify these special entities. Finetuning the model with extra labeled data is required to achieve good performance. Therefore, there is a requirement for a tool that can utilize a pre-trained NER model and can easily be customized and extended.

Presidio (origin from Latin praesidium 'protection, garrison') helps to ensure sensitive data is properly managed and governed. It provides fast identification and anonymization modules for private entities in text and images such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data and more.

One of the key benefits of the Presidio framework is its ability to scale. It can handle large data sets, making it suitable for use by organizations with large amounts of data. It is also designed to be flexible and adaptable, allowing organizations to customize its use to meet their specific needs.

Image 2: PII detection workflow in Microsoft Presidio [Source: Presidio: Data Protection and De-identification SDK]

Image 2: PII detection workflow in Microsoft Presidio
Description - Image 2: PII detection workflow in Microsoft Presidio

The image shows the Presidio Detection flow which is used to detect PII. An input passes through regex which performs pattern recognition, followed by a Named Entity Recognition algorithm to detect entities, checksum to validate patterns, context words to increase the detection confidence and multiple anonymization techniques. The image shows the input: ‘Hi, my name is David, and my number is 212 555 1234’. After passing the input through the Presidio detection flow, David, and the number 212 55 1234 is detected as PII.

Goals

  • Introduce de-identification technologies to organizations in a user-friendly manner to promote privacy and transparency in decision-making.
  • Make the technology flexible and customizable to fit specific business needs.
  • Support both fully automated and semi-automated PII de-identification on multiple platforms.

Main features

  • Provides PII recognition using a variety of methods such as Named Entity Recognition, regular expressions, rule-based logic, and checksum with context, in multiple languages.
  • Offers the ability to connect to external PII detection models.
  • Offers multiple options for use, including Python or PySpark workloads, Docker, and Kubernetes.
  • Allows for customization in PII identification and anonymization.
  • Includes a module for redacting PII text in images.

Main modules of Presidio

a) Presidio Analyzer:

(i) Overview

The Presidio analyzer is a Python based service for detecting PII entities in text. During analysis, it runs a set of different PII Recognizers, each one in charge of detecting one or more PII entities using different mechanisms. Presidio analyzer comes with a set of predefined recognizers but can easily be extended with other types of custom recognizers. Predefined and custom recognizers leverage Named Entity Recognition, regular expressions, rule-based logic, and checksum with the relevant context in multiple languages to detect PII in unstructured text as shown in the Detection Workflow shown below:

Image 3: Presidio Analyzer for Identifying PII [Source: Presidio Analyzer]

Image 3: Presidio Analyzer for Identifying PII
Description - Image 3: Presidio Analyzer for Identifying PII

The image shows how the Presidio Analyzer is used for Identifying PII. The input text is passed through multiple PII Recognizers which includes built-in recognizer, custom recognizer, and custom models. The built-in recognizer includes Regex, checksum, NER, and context words. After passing the text input through all the recognizers, the PII is detected.

By default, Microsoft Presidio can recognize the following entities: Supported entities - Microsoft Presidio

(ii) Installation

Presidio Analyzer can be installed using pip, docker or can be build from the source.

(iii) Running a Basic Analyzer

Once installation is complete, a basic analyzer can be run with a few lines of code as shown:

from presidio_analyzer import AnalyzerEngine
# Set up the engine, loads the NLP module (spaCy model by default) and other PII recognizers
analyzer = AnalyzerEngine()
# Call analyzer to get results
results = analyzer.analyze(text="Mr. John lives in Vancouver. His email id is john@sfu.ca", language='en')
print(results)

[type: EMAIL_ADDRESS, start: 45, end: 56, score: 1.0, type: PERSON, start: 4, end: 8, score: 0.85, type: LOCATION, start: 18, end: 27, score: 0.85, type: URL, start: 50, end: 56, score: 0.5]

By default, Presidio uses spaCy's en_core_web_lg model and can identify the following entities: Supported entities - Microsoft Presidio. As seen in the above code, the PERSON, EMAIL_ADDRESS, LOCATION and URL has been identified. We can extend the analyzer to support detection of new entities which is discussed next.

(iv) Capabilities of Presidio Analyzer

  • Support detection of new PII entities

To expand Presidio's detection abilities to new types of PII entities, EntityRecognizer objects should be added to the current list of recognizers. These objects are Python-based and can detect one or more entities in a specific language.

The following class diagram shows the different types of recognizer families Presidio contains.

Image 4: Class Diagram for different types of Recognizers in Presidio [Source: Supporting detection of new types of PII entities]

Image 4: Class Diagram for different types of Recognizers in Presidio
Description - Image 4: Class Diagram for different types of Recognizers in Presidio

The image shows the class diagram for different types of recognizers in Presidio. The EntityRecognizer is an abstract class for all recognizers. The RemoteRecognizer is an abstract class for calling external PII detectors. The abstract class LocalRecognizer is implemented by all recognizers running within the Presidio-analyzer process. The PatternRecognizer is a class for supporting regex and deny-list based recognition logic, including validation (e.g., with checksum) and context support.

In the above diagram:

  • The EntityRecognizer is an abstract class for all recognizers.
  • The RemoteRecognizer is an abstract class for calling external PII detectors. See more info here.
  • The abstract class LocalRecognizer is implemented by all recognizers running within the Presidio-analyzer process.
  • The PatternRecognizer is a class for supporting regex and deny-list based recognition logic, including validation (e.g., with checksum) and context support.

A simple way of extending the analyzer to identify additional PII entities can be done in two steps:

  1. Creating a new class based on EntityRecognizer.
  2. Add the new recognizer to the recognizer registry so that the AnalyzerEngine can use the new recognizer during analysis.

Example:

For simple recognizers based on regular expressions or deny-lists, we can leverage the provided PatternRecognizer and call the recognizer itself as shown:

from presidio_analyzer import PatternRecognizer
titles_recognizer = PatternRecognizer(supported_entity="TITLE", deny_list=["Mr.","Mrs.","Miss"])
titles_recognizer.analyze(text="Mr. John lives in Vancouver. His email id is john@sfu.ca", entities="TITLE")

[type: TITLE, start: 0, end: 3, score: 1.0]

Next, we can add it to the list of Recognizers for the detection of additional PII entities:

from presidio_analyzer import AnalyzerEngine, RecognizerRegistry
registry = RecognizerRegistry()
registry.load_predefined_recognizers()
# Add the recognizer to the existing list of recognizers
registry.add_recognizer(titles_recognizer)
# Set up analyzer with our updated recognizer registry
analyzer = AnalyzerEngine(registry=registry)
# Run with input text
text="Mr. John lives in Vancouver. His email id is john@sfu.ca"
results = analyzer.analyze(text=text, language="en")
results

[type: TITLE, start: 0, end: 3, score: 1.0,
type: EMAIL_ADDRESS, start: 45, end: 56, score: 1.0,
type: PERSON, start: 4, end: 8, score: 0.85,
type: LOCATION, start: 18, end: 27, score: 0.85,
type: URL, start: 50, end: 56, score: 0.5]

For more complex EntityRecognizer like the detection of PRI for the Government of Canada, the recognizer can be created in code using the following steps:

  • Create a new Python class which implements LocalRecognizer. (LocalRecognizer implements the base EntityRecognizer class). This class has the following functions:
    • load: load a model / resource to be used during recognition
    • analyze: The main function to be called for getting entities out of the new recognizer
  • Add it to the recognizer registry using registry.add_recognizer(my_recognizer). For more examples, see the Customizing Presidio Analyzer Jupyter notebook.

There are several other ways to create a Custom Recognizer in Presidio, such as:

  • Creating a remote recognizer: Using a remote recognizer, which interacts with an external service for PII detection. This could be a 3rd party service or a custom service running alongside Presidio.
  • Creating ad-hoc recognizers: Creating ad-hoc recognizers using the Presidio Analyzer API. These recognizers, in JSON form, can be added to the /analyze request and are only used for that specific request.
  • Reading pattern Recognizers from YAML: Reading pattern Recognizers from YAML files, which allows users to add recognition logic without writing code. An example YAML file can be found here: Example Recognizers. Once the YAML file is created, it can be loaded into the RecognizerRegistry instance.

2. Multi-language support

Presidio can detect PII in multiple languages using its built-in recognizers and models. By default, it includes recognizers and models for English. However, these recognizers are language-dependent, either by their logic or by the context words used to scan for entities.

To improve the results for specific languages, it is possible to update the context words of existing recognizers or add new recognizers that support additional languages. Each recognizer can only support one language, so adding new recognizers for additional languages is necessary.

3. Customizing the NLP models

As mentioned before, the Presidio Analyzer by default uses spaCy's en_core_web_lg model but it can easily be customized by leveraging other NLP models, either public or proprietary. Presidio uses NLP engines for two main tasks: NER based PII identification, and feature extraction for custom rule-based logic (such as leveraging context words for improved detection). These models can be trained or downloaded from existing NLP frameworks like spaCy, Stanza and Transformers.

Configuring the new model can be done either by:

  • Via code: By creating an NlpEngine using the NlpEnginerProvider class and pass it to the AnalyzerEngine as input.
  • Via configuration: Set up the models which should be used in the default conf file. The default conf file is read during the default initialization of the AnalyzerEngine. Alternatively, the path to a custom configuration file can be passed to the NlpEngineProvider

In addition to the built-in spaCy/Stanza/transformers capabilities, it is possible to create new recognizers which serve as interfaces to other models for example, flair.

b) Presidio Anonymizer:

The Anonymizer is also a python-based service. It anonymizes the detected PII entities with desired values by applying certain operators such as replace, mask, and redact. By default, it replaces the detect PII by its entity type such as <EMAIL> or <PHONE_NUMBER> directly in the text. But one can customize it, providing different anonymizing logic for the different types of entities.

The Presidio-Anonymizer package contains both Anonymizers and Deanonymizers.

  • Anonymizers are used to replace a PII entity text with some other value by applying a certain operator. The various built-in operators are:
    • replace: Replace the PII with desired value
    • redact: Remove the PII completely from text
    • hash: Hashes the PII text (can be either sha256,sha512 or md5)
    • mask: Replace the PII with a given character
    • encrypt: Encrypt the PII using a given cryptographic key
    • custom: Replace the PII with the result of the function executed on the PII

Image 5: PII Anonymizer workflow [Source: Presidio Anonymizer]

Image 5: PII Anonymizer workflow
Description - Image 5: PII Anonymizer workflow

The image shows the function of the Presidio anonymizer. The left shows the text and detected PII being passed to both built in and custom anonymizer. The built-in anonymizer consists of operators like redact, hash, replace. After passing the text and detected PII through the PII Anonymizer, the anonymized text is returned.

Example:

frompresidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import RecognizerResult, OperatorConfig
# Initialize the engine:
engine = AnonymizerEngine()
# Invoke the anonymize function with the text, 
# analyzer results (potentially coming from presidio-analyzer) and
# Operators to get the anonymization output:
result = engine.anonymize(
    text="Mr. John lives in Vancouver. His email id is john@sfu.ca",
    analyzer_results= results
)


results

Output:

text: <TITLE> <PERSON> lives in <LOCATION>. His email id is <EMAIL_ADDRESS>
items:
[
    {'start': 54, 'end': 69, 'entity_type': 'EMAIL_ADDRESS', 'text': '<EMAIL_ADDRESS>', 'operator': 'replace'},
    {'start': 26, 'end': 36, 'entity_type': 'LOCATION', 'text': '<LOCATION>', 'operator': 'replace'},
    {'start': 8, 'end': 16, 'entity_type': 'PERSON', 'text': '<PERSON>', 'operator': 'replace'},
    {'start': 0, 'end': 7, 'entity_type': 'TITLE', 'text': '<TITLE>', 'operator': 'replace'}
]

Presidio also allows the extension of the Presidio anonymizer to support additional operators.

  • Deanonymizers are used to revert the anonymization operation. (e.g., to decrypt an encrypted text).

As the input text could potentially have overlapping PII entities, there are different anonymization scenarios that can happen:

  • No overlap (single PII): When there is no overlap in spans of entities, Presidio Anonymizer uses a given or default anonymization operator to anonymize and replace the PII text entity.
  • Full overlap of PII entities spans: When entities have overlapping substrings, the PII with the higher score will be taken. Between PIIs with identical scores, the selection is arbitrary.
  • One PII is contained in another: Presidio Anonymizer will use the PII with the larger text even if it's score is lower.
  • Partial intersection: Presidio Anonymizer will anonymize each individually and will return a concatenation of the anonymized text. To get started, after installing Presidio as instructed here: Installing Presidio

Conclusion

In conclusion, Microsoft Presidio is a valuable tool for detecting personally identifiable information (PII) in text data. Its flexible design allows users to create custom recognizers and models to match specific use cases, and its support for multiple languages allows for efficient PII detection in a wide range of scenarios. Additionally, the ability to use external services, ad-hoc recognizers, and pattern Recognizers from YAML files, enables users to easily incorporate new detection capabilities. Overall, Presidio's comprehensive PII detection capabilities, together with its customization options, make it an asset for organizations looking to protect sensitive data.

Meet the Data Scientist

Register for the Data Science Network's Meet the Data Scientist Presentation

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.

Register for the Meet the Data Scientist event. We hope to see you there!

MS Teams – link will be provided to the registrants by email

Subscribe to the Data Science Network for the Federal Public Service newsletter to keep up with the latest data science news.

References

Summary of privacy laws in Canada - Office of the Privacy Commissioner of Canada

What is GDPR, the EU's new data protection law? - GDPR.eu

How we protect the privacy and confidentiality of your personal information

Pierre Lison, Ildikó Pilán, David Sánchez, Montserrat Batet, and Lilja Øvrelid, Anonymisation Models for Text Data: State of the Art, Challenges and Future Directions (2021). Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing

Official Documentation: Microsoft Presidio

GitHub - microsoft/presidio: Context aware, pluggable and customizable data protection and de-identification SDK for text and images

PII anonymization made easy by Presidio | by Lingzhen Chen | Towards Data Science

Presidio Research · spaCy Universe

Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting - ScienceDirect

Statistics Canada’s Trust Centre

How we protect the privacy and confidentiality of your personal information

2026 Census of Agriculture dissemination consultative engagement

Opened: February 2024
Closed: April 2024

Before each Census of Agriculture, Statistics Canada conducts consultative engagement activities to obtain user feedback on the Census of Agriculture dissemination strategy and products.

Consultative engagement objectives

The consultative engagement activities will provide opportunities for you to share feedback and indicate your satisfaction with the 2021 Census of Agriculture dissemination products. The results will inform decisions on the 2026 Census of Agriculture dissemination strategy, its products and its services.

How to get involved

This consultative engagement activity is now closed.

Individuals who wish to obtain more information or to participate in engagement activities should contact us at statcan.censusconsultation-consultationrecensement.statcan@statcan.gc.ca

Statistics Canada is committed to respecting the privacy of participants. All personal information created, held or collected by the agency is protected by the Privacy Act. For more information on Statistics Canada's privacy policies, please consult the Privacy notice.

Results

Summary results of the consultative engagement will be published online in fall 2024.

Sexual and Reproductive Health Initiative - What We Heard Report

PDF Version (PDF, 354.53 KB)

Sexual and Reproductive Health Initiative

Introduction

The 2021 Canadian federal budget recognized the need for nationally representative data on sexual and reproductive health to better inform and support programs and policies. There are currently no existing comprehensive data on a wide range of sexual and reproductive health indicators in Canada. The budget designated $7.6 million in funding over five years to Statistics Canada to develop and implement a national survey on sexual and reproductive health, with a focus on supporting women's health. Better information will help government and non-governmental organizations monitor, plan, implement and evaluate programs to improve the sexual and reproductive health of women and the efficiency of health services.

To ensure that the information collected remains relevant for the people and organizations that use it, Statistics Canada embarked on a broad engagement process with stakeholders and data users.

Consultative engagement objectives

A program with an emphasis on sexual and reproductive health was a new area of focus for Statistics Canada. To ensure the relevancy of the initiative, Statistics Canada conducted extensive engagement activities with diverse data users, stakeholders and interested parties across Canada to understand the type of information that should be available through the initiative and how the data could be used.

The objectives of the engagement activities were to

  • understand specific sexual and reproductive health information needs
  • ensure the relevancy of products and analyses
  • enhance information sharing with stakeholders
  • manage stakeholder expectations with respect to the scope of the initiative
  • align with key government priorities, such as Statistics Canada's Disaggregated Data Action Plan and the federal 2SLGBTQI+ Action Plan.

Consultative engagement methods

Engagement activities with partners and stakeholders—which included federal, provincial and territorial governments; advocacy and civil society organizations; clinicians; administrators; medical and service delivery organizations; and academic researchers—took place between December 2021 and June 2022. Feedback was collected in three phases: virtual meetings with federal partners and non-governmental stakeholders; email submissions from partners, stakeholders, and data users and providers; and small-group virtual discussions with data users.

PHASE 1: Virtual meetings with partners and stakeholders

Statistics Canada began by engaging with federal partners and non-governmental stakeholders to inform them about the Sexual and Reproductive Health Initiative and to learn about their information needs for sexual and reproductive health. Virtual meetings were held in December 2021 to inform these stakeholders of broader engagement activities. These partners and stakeholders also helped identify additional people and groups, both governmental and non-governmental, for the second phase of engagement activities.

PHASE 2: Email submissions from partners, stakeholders, and data users and providers

The email submission phase was conducted between December 2021 and January 2022 to understand the information needs in relation to sexual and reproductive health from a broad perspective.

An open-ended engagement document was sent to partners and stakeholders. Partners and stakeholders were encouraged to forward the email to their networks and partners. Statistics Canada received feedback from about 40 different organizations. To gain an understanding of specific information needs and existing sources of information, participants were asked:

  • which topics related to sexual and reproductive health they or their organization were interested in
  • the population and geographies of interest
  • why the information was relevant to their organization
  • what existing sources of information their organization uses in their work on sexual and reproductive health.

PHASE 3: Virtual group discussions with data users

Email participants, their networks and partners were invited to participate in small-group discussions to identify priority topics for the Sexual and Reproductive Health Initiative. The discussions were held between March 2022 and June 2022.

The types of feedback sought from the discussions included:

  • sexual and reproductive health information needs in relation to policy and research questions
  • the policies and programs that the data and analyses could be used to inform
  • the most important sexual and reproductive health information to capture for decision making
  • barriers and challenges in regard to sexual and reproductive health equity.

Discussions were organized around the broad topics that came out of the virtual meetings and email submissions. The seven broad topics were:

  • access to sexual and reproductive health resources and services
  • reproductive decision making, contraception and abortion
  • sexual behaviour
  • sexual and reproductive health literacy
  • sexually transmitted and blood-borne infections (STBBIs)
  • female reproductive health over the life course, with a focus on menstruation, perimenopause and menopause
  • sexual and reproductive health equity.

Statistics Canada also reached out to stakeholders to participate in sessions with a specific focus on Indigenous peoples, people with disabilities, racialized groups, 2SLGBTQI+ people, youth and seniors.

Findings of the consultative engagements

Overall, the engagement activities generated close to 400 information needs. The engagement activities were not meant to prioritize information needs but rather identify key information needs from a wide range of stakeholders and data users. The following key information needs, in no particular order, were identified.

Equity

Equity was a recurring theme across all engagement activities. As one participant stated, "Equity is a spectrum—what serves the majority of the population is not enough. Sexual and reproductive health is subjective to each person, community and culture." Partners and stakeholders indicated a need for information about sexual and reproductive health, including gender, age, socioeconomic status, disability, sexual orientation, education, Indigenous groups, racialized groups, immigrant status and geographical location, to better understand differences between and among groups.

Sexual and reproductive health awareness and literacy

There was agreement across the engagement activities that knowledge about sexual and reproductive health would help in the development of healthy attitudes and practices towards sexual and reproductive health and facilitate conversations with health care professionals, sexual partners and families about sexuality, gender identity, and other sexual and reproductive health needs.

Feedback indicated a need for information about access to comprehensive sexuality education. This included information on the breadth of sexual and reproductive health topics taught in schools (for example, the treatment, management and prevention of STBBIs; pregnancy prevention; and healthy sexuality). Participants also noted that culturally relevant sexuality education can help provide diverse perspectives and non-judgmental information. Participants raised concerns about misinformation around sexual and reproductive health and the sources people use to obtain information.

Menstruation, perimenopause and menopause

Participants stated that information is needed about physical, mental, social and economic experiences related to menstruation, perimenopause and menopause. It is important to understand the different reproductive stages and the impact on the daily lives of girls and women.

Information needs around menstruation included the age of menarche, as it marks the start of reproductive years. Participants stated that young women need better information before the start of menarche. This includes what is happening to their bodies, the types of products they can use, options for the management of menstrual pain, knowledge about pregnancy prevention and contraception, the prevention of sexually transmitted infections, and healthy sexuality.

Period poverty was seen as a contributing factor to reduced physical and social well-being of girls and women. Access to quality menstrual products was thought to be key to participating in school, work, home and leisure activities.

In relation to menopause, participants identified a need for information about the symptoms of perimenopause, the age of onset of menopause and whether the onset of menopause was natural or surgical. Menopause marks the end of the reproductive years and may be accompanied by physical, emotional and sexual changes. Understanding how perimenopausal and menopausal symptoms are experienced will help to understand the physical, mental, social and financial impact that menopause can have on women.

Menstruation to menopause: Symptoms and treatment

Participants mentioned that to identify health challenges related to sexual and reproductive health, women and health care providers need to be informed about what is normal and what is not normal in relation to symptoms of menstruation, perimenopause and menopause. Participants highlighted that too often women's symptoms of poor health are attributed to menstruation, perimenopause or menopause, and underlying conditions are not diagnosed. It was pointed out that women suffer through symptoms such as pain, excess bleeding, night sweats, memory loss and vaginal dryness rather than find a way to manage the symptoms or seek help from a health care provider.

Many participants stated that pain has been normalized across the lifespan for women and often prevents treatment. Women's experiences of both physical and emotional pain need to be validated and better treatment options offered.

Access to medical treatment rather than surgical intervention was another area of concern that participants pointed out. For example, many perimenopausal women may have hysterectomies rather than medical treatment such as the use of hormonal intrauterine devices because of costs. Surgical interventions are often pursued because of lack of coverage for medical treatment.

Maternal health, pregnancy and pregnancy outcomes

Participants spoke about the importance of data on maternal health, in particular maternal mortality and morbidity, pregnancy and childbirth, and access and barriers to receiving care during pregnancy. To track progress towards a more equitable health care system, it is important to have data available by Indigenous and racialized identity, care provider, geography, sexual orientation, and gender.

It was noted that data on alcohol and drug use during pregnancy were needed to understand what type of support and programs are required to help prevent negative maternal and fetal health outcomes, preterm birth, and fetal loss.

Nutrition during pregnancy was also of interest, in particular, to examine food insecurity. Knowing the prevalence and severity of food insecurity during pregnancy can help to inform policies and identify resources required for women who are pregnant. Participants also emphasized the importance of access to traditional foods for First Nations people, Métis and Inuit, as lack of access to these foods adds to Indigenous people's experiences of food insecurity and may negatively impact their spiritual, emotional, physical and mental well-being.

Participants also emphasized that services such as fertility treatments and in vitro fertilization can help same-sex couples, couples dealing with infertility and people wishing to have a child without a partner. Cost can be a barrier to accessing these services. Information on the need for, use of and accessibility of these services can help when making program and policy decisions.

Participants mentioned that people should be given the means to time their pregnancies and space their children to avoid unintended pregnancy and childbirth. Data are needed to determine rates of unintended pregnancy to help inform policy, program development and monitoring. Better information about unintended pregnancy and pregnancy outcomes will help to inform understanding of the relationship of these experiences to social, ecological and behavioural determinants of health. Information about the use and accessibility of care, medications and procedures for abortions, miscarriages, stillbirths and ectopic or tubal pregnancies will help to better understand issues related to accessing services, such as cost, wait times and distance travelled.

Contraception

Access and barriers to, and knowledge and use of, birth control were frequently mentioned by participants. Participants stated that access to contraception should be an informed choice. Women should be able to choose the type of contraception that is best suited to their needs rather than choosing no method or cheaper methods. Participants felt that there is a lack of information and knowledge about different types of contraception and the correct use of contraception. Some participants asserted that most policies focus on women's ability to breed the next generation rather than women having the choice of when and how to become pregnant.

Information needs around awareness, use and accessibility of emergency contraception were also identified.

Sexual behaviours

There were numerous information needs related to sexual behaviours. Sexual behaviours were regarded as a key indicator of sexual and reproductive health.

Participants noted that the age at first sexual experience and whether the experience was wanted or not can impact long-term sexual behaviour and well-being, as they can be associated with engaging in unprotected sex or being a victim of sexual abuse and can potentially lead to pregnancy. Participants discussed that information is needed on diverse sexual behaviours to better understand the prevalence of various sexual behaviours and the impact they have on sexual health outcomes.

Several participants expressed that while people with disabilities are sexual beings, their sexual and reproductive health needs have been disregarded. More information is needed to understand sexual and reproductive health care needs and barriers among people with disabilities to help support education, training, policies and programs.

Stress and anxiety around sexual performance were considered an important topic to many participants as they relate to sexual behaviour and sexual well-being. Additionally, information is needed to understand the impact of changes in sexual and reproductive health on mental health over the life cycle and the impact they may have on overall quality of life.

Participants emphasized that understanding healthy sexual functioning can reduce societal stigma and shame about sexual difficulties and empower people to seek treatment. It was further mentioned that data are needed on sexual satisfaction and pleasure, the ability to communicate about sexual needs and desires, and erectile dysfunction and other sexual difficulties. Additionally, information on the use of substances or medications for engaging in sexual activity can help to understand the impact they have on sexual behaviour and the enjoyment of sexual activity.

Participants explained that there is very little Canadian data on the prevalence of transactional sex (exchange of sex for money, favours or goods), the use of sexual assistive technology like vibrators for sexual activity, and the use of and increased access to pornography.

Participants highlighted that new trends may be emerging in the digital era. It is important to understand the impact of digital technology on sexual behaviours and relationships. Participants discussed the growth in using digital technology to find new sexual and romantic partners and initiate sexual experiences. Participants highlighted the need for more information about digital technologies in relation to sexual isolation, sexual satisfaction and sexual identity.

Aging and sexual and reproductive well-being

Participants expressed that aging should not be seen as a barrier to having a healthy sex life. Feedback included that there are many social stigmas attached to women as they age and are no longer fertile. However, this is a stage in life where there are fewer barriers for women; for example, there is no longer a need to worry about pregnancy.

Participants noted the need for information to help educate women as they age about the risks of sexually transmitted infections despite no longer requiring birth control to prevent pregnancies. Education should also include understanding symptoms such as vaginal dryness, hot flashes, sleep disturbances and changes to cognitive functioning and options for available treatments. Participants indicated that healthy sexual activity is still important as women age, and information is needed to promote their overall physical and emotional well-being.

Sexual violence

Experiences of sexual violence were an important issue for many participants. They stated that sexual violence can impact physical, mental and sexual health.

Participants expressed the need for information on unreported sexual assaults and on the availability of support services for sexual and reproductive health healing after experiencing sexual violence.

Participants also discussed the need for information about forced and coerced sterilization. Data are needed on the scope, severity and health outcomes among marginalized and vulnerable groups, for example, Indigenous women, people with disabilities and racialized groups.

Participants also identified a need for information on the health outcomes of women and girls who have experienced female genital mutilation/cutting (FGM/C). It was noted that this practice is recognized internationally as having physical and obstetric health complications, as well as psychological consequences. Determining whether the health outcomes of immigrant women and girls from FGM/C-practising countries differ from those of their counterparts from other countries can help inform health care providers of the unique health care needs of this group.

Chronic conditions and reproductive surgeries

Participants stated that information is needed to examine rates and determinants of health conditions affecting female sexual and reproductive health (e.g., pelvic organ prolapse, fibroids, endometriosis, polycystic ovary syndrome) and male sexual and reproductive health (e.g., ejaculation disorders, infertility). Other information needs included the prevalence of gynecological cancer; cancer treatments; and reproductive surgeries such as hysterectomy, removal of ovaries and vasectomy. Participants indicated that better information is needed on the impact these conditions and surgeries can have on overall sexual and reproductive health.

Infections

Participants indicated that STBBIs and human immunodeficiency virus (HIV) impact overall health, and the stigma and judgment of diagnoses can also impact one's social life. Data on STBBIs can support guidelines and can help to detect emerging trends of infections in the population. Participants further explained that information about access to testing, treatment and support for STBBIs, including preferred treatment and services such as self-test kits, can help identify and assess potential barriers and opportunities for new testing, treatment and prevention methods. A need for additional information on the use of pre-exposure prophylaxis and post-exposure prophylaxis for HIV prevention was also mentioned.

Information about urinary tract infections and yeast infections was also important to participants in understanding how these infections affect female reproductive health.

Access to and experiences with sexual and reproductive health services

Access to a family doctor whom patients are comfortable with was one of the most important information needs mentioned. People seeking services can be influenced by having access to care that is safe; tailored to their culture, religion and personal needs; and free from language barriers. Participants also mentioned the importance of having good experiences of health care, as high quality of care can help prevent negative health outcomes and increase the likelihood of seeking care in the future.

Participants indicated that there is very limited information about differences in access to sexual and reproductive health services. It is important to have information about who is accessing sexual and reproductive health services, the type of services being accessed, and the experiences of those accessing these services. For example, 2SLGBTQI+ individuals are considered underserved and require equitable sexual and reproductive health services that reflect their needs.

Information about experiences of discrimination in a health care setting was also of interest to participants, as they mentioned that this information can help to examine systemic and intersecting barriers and racism. Whether people have access to health care providers via alternate methods such as virtual clinics was also mentioned, as some participants pointed out that it may help to alleviate some barriers to sexual and reproductive health services.

Conclusion

The Sexual and Reproductive Health Initiative received support across all engagement activities. Participants agreed that data on sexual and reproductive health are important to support policies and programs at all levels of government, including across ministries, such as health and education ministries, and across advocacy organizations and support services. There are a number of existing and emerging data gaps related to sexual and reproductive health. While understanding health disparities is highly relevant to decision makers, the required data are often not available to support and implement evidence-based policies and programs.

Statistics Canada would like to thank all participants for their involvement in the engagement activities. Their valuable insights have helped guide the development of the Sexual and Reproductive Health Initiative, including the development of the questionnaire for the first pan-Canadian sexual and reproductive health survey conducted by Statistics Canada. This survey is scheduled to begin data collection in fall 2024.

By the numbers: Black History Month 2024

By the numbers: Black History Month 2024 (PDF, 994.96 KB) By the numbers: Black History Month 2024
Description: By the numbers: Black History Month 2024

Sociodemographic diversity

In 2021, the Black population in Canada reached 1.5 million, accounting for 4.3% of the total population and 16.1% of the racialized populationFootnote 1.

Source: Statistics Canada, Census of Population, 2021.

Among Canada’s Black population born outside the country, 55.3% were born in Africa and 35.6% were born in the Caribbean and Bermuda.

Source: Statistics Canada, Census of Population, 2021.

Educational attainment 

In 2021, about one-third (32.4%) of the Black population aged 25 to 64 held a bachelor’s degree or higher, which is comparable to the total population aged 25 to 64 (32.9%).

Source: Statistics Canada, Census of Population, 2021.

Future outlook

In 2021/2022, nearly three-quarters (72.5%) of the Black population reported having a hopeful view of the future, compared with 64.1% of the total population.

Source: Statistics Canada, Canadian Social Survey, 2021/2022.