Gender, sex at birth, sexual orientation, and related standards by variable

Statistical data and metadata standards are a set of rules that enable consistent and repeatable description, representation, structuring, and sharing of data and metadata. Statistics Canada has many statistical standards, data sources and methods used for collecting and publishing statistical data. This page contains standard variables of interest relating to the concepts of gender, sex at birth, and sexual orientation.

Couple relationships

Gender

Sex at birth

Sexual orientation

Gender, sex at birth, sexual orientation, and related standards by classification

Statistical data and metadata standards are a set of rules that enable consistent and repeatable description, representation, structuring, and sharing of data and metadata. Statistics Canada has many statistical standards, data sources and methods used for collecting and publishing statistical data. This page contains standard classifications of interest relating to the concepts of gender, sex at birth, and sexual orientation.

Couple relationships

Gender

Sex at birth

Sexual orientation

Gender, diversity and inclusion, and related standards by classification

Statistical data and metadata standards are a set of rules that enable consistent and repeatable description, representation, structuring, and sharing of data and metadata. Statistics Canada has many statistical standards, data sources and methods used for collecting and publishing statistical data. This page contains standard classifications of interest relating to the concepts of gender, diversity and inclusion.

Disability

Ethnocultural diversity

First Nations, Métis, and Inuit

Gender, sex at birth, sexual orientation and couple relationships

Immigration

Languages

Gender, diversity and inclusion, and related standards by variable

Statistical data and metadata standards are a set of rules that enable consistent and repeatable description, representation, structuring, and sharing of data and metadata. Statistics Canada has many statistical standards, data sources and methods used for collecting and publishing statistical data. This page contains standard variables of interest relating to the concepts of gender, diversity and inclusion.

Disability

Ethnocultural diversity

First Nations, Métis, and Inuit

Gender, sex at birth, sexual orientation and couple relationships

Immigration

Revision of the Canadian Research and Development Classification (CRDC) 2020 Version 1.0 - What We Heard

March 2024

Introduction

In 2020, Statistics Canada, in collaboration with the Canada Foundation for Innovation (CFI), the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Social Sciences and Humanities Research Council of Canada (SSHRC) published the Canadian Research and Development Classification (CRDC). This new classification had been designed to include all research sectors and represent the current research landscape in Canada while also contributing to greater alignment with international standards. It is also comprehensive enough to support a wide range of needs within the R&D ecosystem. It also has been developed to facilitate the peer review process and the reporting of investments by federal research funding agencies and the Government of Canada. The CRDC is meant to help ensure the consistent compatibility and comparability of statistics across research funding agencies both in Canada and internationally while balancing the needs of different users and highlighting specific areas of Canadian research strength.

The CRDC is a set of three interrelated classifications:

  • Type of activity (TOA): This is categorization by type of research being undertaken, e.g., fundamental, applied, experimental development.
  • Field of research (FOR): This is categorization by field of research; it is the methodology used in R&D that is being considered. The categories within this classification include major fields of research based on knowledge source, subject of interest, and methods and techniques used.
    • There are four hierarchical levels: divisions are the broadest level, and groups, classes and subclasses represent increasingly detailed dissections of these categories. This resulted in a comprehensive list of fields of research—nearly 1,800 in total—to help reflect Canada's current research landscape.
  • Socioeconomic objectives (SEO): This is categorization by R&D purpose or outcome.
    • There are two hierarchical levels: divisions are the broadest level, followed by groups. There are approximately 85 groups.

Adopting a common approach for classifying research and expertise across different key stakeholders in Canada aims to:

  • provide a common language for discussing research in the higher education sector, in the public sector and within government, enabling better evidence-based decision making within the research ecosystem
  • make it possible to identify expertise and research areas in a truly multidisciplinary classification
  • improve the identification of emerging research fields
  • help identify potential collaboration opportunities to optimize research efforts and improve outcomes
  • improve the identification of research funding gaps and opportunities
  • provide the research community with harmonized and integrated R&D classification
  • improve reporting on the agencies' combined contributions to research and science in Canada
  • help the agencies streamline their operational processes for peer review, recruitment and reviewer selection.

One of the commitments made by the Statistics Canada and its collaborators was to conduct a minor review of the CRDC every two years and major review every 5 years. This commitment was based on the continuous improvement model and to respond to shifts in the research ecosystem, including new and emerging fields of research. Due to the pandemic, the minor review was delayed.
The need for this review was reinforced by messages from the research community which highlighted the urgency of a review. It was decided that the minor review would take place in 2023, to be followed by the major review in 2025. The scope of the 2023 review was limited to Fields of Research. Broader changes will fall within the scope of the 2025 review.

Engagement and Outreach

  • The CRDC 2020 Version 1.0 review notice was posted on the Statistics Canada's Consulting Canadians and Standards websites, as well as via StatCan's accounts on social media platforms such as X (Twitter), LinkedIn, Facebook and Reddit.
  • The consultation period for the public was launched in August 2023 and closed in October 2023. Feedback was gathered from the public through the consultation call.
  • CFI was invited to share its data on the CRDC since implementing the CRDC in their systems.
  • Feedback was also gathered through internal sources, such as various advisory groups to the agencies and reports.
  • University Vice Presidents of Research (VPRs) were also invited to provide feedback from the office of the Vice-President of Corporate Affairs at SSHRC; a similar process was used in the 2018 review.
  • Feedback gathered over the years on an ad hoc basis through the SSHRC CRDC inbox were included for consideration as well; these requests were sent by researchers and academic groups representing new or emerging fields.
  • A working group was formed with members of SSHRC, NSERC, CIHR, CFI and Statistics Canada to review the data and make recommendations for changes based on this information.

Summary of what we heard

In the open consultation (Participate in the consultation for the update of the Canadian Research and Development Classification (CRDC) 2020 V1.0) participants and subject-matter experts were asked to review proposed categories and suggest any changes to specific categories—including adding, removing, combining, splitting and renaming—to represent the current Canadian research landscape, and to ensure that the classification would meet the needs of different stakeholders across the Canadian research ecosystem. The objective of the consultation process was to obtain feedback on fields of research (FOR) and not on socioeconomic objectives (SEO) and type of activity (TOA).

Comments and suggestions provided for consideration
Field of research Most frequent comments and suggestions provided for consideration
General
  • Over 50 recommendations were received regarding the creation or elaboration of new FORs.
  • Of these 50 recommendations, 15 of them were accepted for revisions.
  • Interdisciplinarity came up several times as something that the CRDC failed to capture; this is outside of the scope of this review and will require more consultation and consideration in the future.
  • Some categories seem to be more granular than others.
  • Some categories seem to be outdated; there may be a need to consider new and emerging fields in the next review.
  • The delineation between categories is not always evident, and the definitions provided are not always helpful; it is acknowledged that the CRDC does not have specific definitions of each FOR (beside defining what R&D are about). In fact, defining each type of field of science is a large undertaking.
Black Studies
  • SSHRC's external Advisory Committee Addressing Anti-Black Racism in Research and Research Training (2021-2022) submitted several recommendations to SSHRC in its final report (2023).
  • The advisory committee recommended the addition of Black Studies in the CRDC.
  • Black Studies is a well-established field of research in the current Canadian research landscape.
  • The working group noted that since Black Studies is an interdisciplinary field, it did not fit neatly within either Social Sciences, Humanities or other domains such as health research. It was decided that on a provisional basis, that Black Studies would be included under 'Other Social Sciences' and with a definition addressing the interdisciplinary nature of this field. A wider consultation can take place in 2025.
Indigenous Studies
  • While Indigenous research is included under 16 different FOR currently (e.g. Indigenous Law, Indigenous languages, Indigenous education system, Indigenous economics, Indigenous literature, etc.), it has been observed by the research community, and members of the SSHRC Indigenous Advisory Circle, that this unnecessarily restricts and limits Indigenous research.
  • Feedback from the research community, including from members of the SSHRC Indigenous Advisory Circle, recommended adding Indigenous Studies included to the classification as an interdisciplinary field.
  • It was decided that on a provisional basis, that Indigenous Studies would be included under 'Other Social Sciences' and with a definition addressing the interdisciplinary nature of this field. A wider consultation can take place in 2025.
  • The SSHRC Indigenous Advisory Circle also recommended changing the existing FOR "Indigenous performing arts" to "Indigenous arts" to permit classification of other art forms.
  • The SSHRC Indigenous Advisory Circle also recommended adding "Indigenous Knowledge Systems." Based on the rationale provided, this would be a Level 1 change, and thus it will be considered for the 2025 review.
Critical Disabilities Studies
  • SSHRC's external Advisory committee on Accessibility and systemic Albeism (2022-2023) submitted several recommendations to SSHRC in its final report (2024).
  • The advisory committee recommended changes to the CRDC, specifically, the need to add Critical Disabilities Studies.
  • The advisory committee underlined the importance of advocacy as a part of their research and provided a detailed rationale for the inclusion of this FOR.
  • The working group recommended expanding the current definition of Disability Studies, adding the clause "including accessibility and critical disability studies".
  • A wider consultation can take place in 2025.
Comments and suggestions for consideration on the CRDC
Overall Comments and suggestions for consideration on the CRDC
Overall
  • The way the codes are displayed needs to be more user friendly and intuitive to make it easier for the user to identify their area of research or expertise. The categories will need to be reviewed regularly to ensure that areas that are developing past "emerging" are captured in the future.
  • There needs to be a framework or some guiding principles developed for determining how changes are made to the CRDC. These guidelines need to take into consideration objectives and measurable metrics, and keep the spirit of the CRDC classification system in mind, not just from a useability or 'visibility' perspective for the user, but also, for its reporting and statistical function. These will be considered in the context of a wider review for 2025.

Eh Sayers Episode 16 - How Do You Say "Language Revitalization" in Cree?

Release date: March 27, 2024

Catalogue number: 45200003
ISSN: 2816-2250

How Do You Say Language Revitalization in Cree

Listen to "Eh Sayers" on:

Social media shareables

Tag us in your social media posts

  • Facebook StatisticsCanada
  • Instagram @statcan_eng
  • Twitter @StatCan_eng
  • Reddit StatCanada
  • YouTube StatisticsCanada

Visuals for social media

How Do You Say "Language Revitalization" in Cree? graphic 1

How Do You Say Language Revitalization in Cree?

More than 70 distinct Indigenous languages are spoken by First Nations people, Métis and Inuit in Canada, but these languages are under threat.

In this episode, we speak with Randy Morin and Belinda kakiyosēw Daniels, who share their knowledge of the Cree language with learners at the Nêhiyawak Language Experience, about the wisdom encoded in Indigenous languages, as well as the opportunities for these languages and the barriers they face.

Host

Tegan Bridge

Guests

Randy Morin, Belinda kakiyosēw Daniels

Listen to audio

Eh Sayers Episode 16 - How Do You Say "Language Revitalization" in Cree? - Transcript

Tegan:  Welcome to Eh Sayers, a podcast from Statistics Canada, where we meet the people behind the data and explore the stories behind the numbers. I’m your host, Tegan Bridge. 

When we’re talking about Indigenous languages, we’re not talking about one thing. Canada is home to over 70 distinct Indigenous languages (Indigenous languages across Canada). 

They have different statuses. For example, Inuktitut is an official language of both Nunavut and the Northwest Territories, and almost 40,000 Inuit reported that they speak it well enough to have a conversation in the 2021 census. The Cree and Ojibway languages also have tens of thousands of speakers, making these three the most widely reported Indigenous languages spoken in Canada. Then, there are also languages like Haisla, Haida and Ktunaxa, each with less than 300 speakers. But among these, between the 2016 and the 2021 census, the number of Ktunaxa and Haisla speakers rose, while the number of Haida speakers declined.

Just over 237,000 Indigenous people in Canada reported speaking an Indigenous language well enough to have a conversation in the 2021 census, however, this number declined since the previous Census, when approximately 260 thousand Indigenous people reported being able to speak an Indigenous language. This decline is driven by an ongoing decrease in the number of people with an Indigenous language as a mother tongue (Indigenous languages across Canada). 

At the same time, the 2021 census found that more Indigenous people are learning an Indigenous language as a second language. Second-language speakers accounted for over one-quarter of Indigenous language speakers overall, up 4,100 speakers, or 6.7%, from 2016 (Indigenous languages across Canada).

UNESCO, or the United Nations Educational, Scientific and Cultural Organization, considers all Indigenous languages in Canada “at risk,” that is, either vulnerable or endangered. Indigenous languages are threatened because of discrimination and colonization and practices, including the Indian Residential School system, which aimed to destroy Indigenous cultures and languages (Indigenous languages across Canada). Indigenous children had their language taken away from them when they were forcibly removed from their families and punished or shamed for speaking their language.

All that means today, these languages are at risk, and Indigenous communities are fighting to keep their languages alive.

Randy: Yeah, they say, there’s only gonna be three languages that are gonna be still spoken in 20, 25 years and Cree is one of them. Cree, Inuktitut and Ojibwe.  That's what the stats are saying.

Tegan: This is knowledge keeper Randy Morin.

Randy: Randy Morin, Assistant Professor, University of Saskatchewan. 

Tegan: There were almost 87,000 Cree speakers in Canada according to the last Census (Indigenous languages in Canada, 2021). Cree would be the most widely spoken Indigenous language in Canada... if there were a single Cree language. But there are actually many Cree languages, and they aren't all the same. 

Randy: There is a great risk, especially for the smaller dialects, like the R dialect, the L dialect.  Those ones are more like Eastern Canada. But we do have also small dialects in Saskatchewan, the N dialect and the TH dialect, Woodland Cree, Swampy Cree, but the biggest one would be the Plains Cree dialect. So that one will probably survive a little more than the other dialects.

Tegan: It's not just that different dialects sound different or use different words.

Randy: Each dialect has their own distinct way of looking at the world, right? The Swampy Crees have their own way, the Woodland Cree, they call themselves the Rock Cree, and us Plains Cree, there's differences in the dialects and how we see the world.

Tegan: Randy is a first language Cree speaker.

Randy: Well, I grew up speaking the language. It was just spoken to me right from the womb, so I didn't speak any English until I was 10 years old.  And I often tell this story, how did I pass kindergarten and grade one with just speaking Cree? Many of us in my community, my age, we all spoke the language, and we passed grade one and kindergarten. So I just grew up speaking it. I didn't know any different. It's like a fish, a fish doesn't know it's in water, that's just how I was, in my language. I was just immersed in it was all around me.

Tegan: But, as I mentioned, many speakers of Indigenous languages learned these as second languages, like Belinda Kakiyosew Daniels.

Belinda:  Dr. Belinda Kakiyosew Daniels, University of Victoria. The Indigenous Education Department. 

As far as learning goes, I’ve always listened to the language ever since I was born. I was raised by my grandparents most of my life and so they spoke Cree to each other. They didn't encourage me to speak to them, but I've often heard the language. And so, just their ability to speak to each other in our original language was inspiring to me, and this is what raised the curiosity of why am I not encouraged to speak my own language? 

Tegan: In 2017, half of Indigenous youth reported that speaking an Indigenous language was important or very important, and while they were less likely to have an Indigenous mother tongue, many were learning it as a second language (Chapter 4: Indigenous Youth in Canada). The majority of First Nations and Inuit youth, 68% and 87% respectively, who could speak an Indigenous language learned their language as a mother tongue. Among Métis youth who could speak an Indigenous language, the share was closer to half, as 55% learned their language as a mother tongue and the remaining share learned it as a second language.

Randy and Kakiyosew work with second language speakers to hone their skills. They share the Cree language with students at a language immersion camp.

Belinda:  We are co-directors for the Néhiyawak Language Experience, which is grassroots not for profit. We are having our 20th anniversary this summer. 

So, this is exciting for the work that we do.  I would suggest that we are actually pioneers in this way of reclaiming land and language immersion in our home territories as a focused intention. 

Tegan: Their efforts range from grassroots to academia.

Belinda: We also write books together. We do research together.

I have 12 graduate students that I'm either supervising or sitting on a committee.

Randy: I teach the Cree language at the University of Saskatchewan. I'm creating a Cree speaking certificate program there, hoping to launch it next year.

Tegan: But it doesn't stop there! We're also talking TV and books.

Randy: I've also done a lot of work with APTN. So, I've done a lot of Cree work for Wapos Bay, the cartoon. Also, the Guardians, the cartoon. And now, the newest children's series is called CHUMS. And that's going to be launching this year, so I've been doing a lot of work in the Cree.

Belinda:  I'm holding the book that I coauthored with, um, Andrea Custer and it's called, Speaking Cree in the Home: A Beginner's Guide for Families. nēhiyawētān kīkināhk.

Tegan: There isn't one single way to revitalize Indigenous languages. There are historical reasons why these languages are at risk, but there are also barriers that exist to this day. 

Belinda: What are the barriers? Let's get some policy into place in making our languages official.  Let's get some funding in place for our First Nations languages in our school systems.  The money has always been underrated or underfunded. Let's start validating our speakers and showing the respect that those speakers deserve when it comes to credentials.  They might not have a B. Ed. degree or a Ph d. degree, but they are a speaker of the language and hold this vast amount of knowledge about the world.  So, let's value that and credit that.

Randy: Growing up, we all spoke the language and all of a sudden FM radio came, and pop culture came.  Oh, my goodness, everyone overnight just started speaking English, you know, and it's just been on a decline ever since with like, modernization, globalization, and celebrity...  I guess our young people look up to celebrities… the wealth, and that’s what they want to be. And so we need to bring them back to own role models, and that's a challenge. 

The federal and provincial governments, they really messed up, with Indigenous people for years, but there's still not that sense of urgency, like they need to like really to push for this sense of urgency. 

And universities need to work with language people. There's only two of us at the University of Saskatchewan, and it's a real challenge. We need more. We need more people that… All hands on deck is what we’re saying. There are only two schools in Saskatoon, St. Francis and wâhkôhtowin.

There's a lot of challenges, but there's also a lot of successes. Technology can be used to connect with people in remote communities.  In fact, our friend, Bill Cook, our brother, who is part of NLE, his PhD project is to connect speakers, remote communities to learners and, and they get paid. Isn't that cool? What an awesome project! All these elders that are home could be making money on their computers, talking to learners, man, what a project! I wish I thought of that for my PhD. So, lots of challenges, but also a lot of successes.

Belinda: I wanted to add to grassroots successes. So, I mentioned my name, right? Kakiyosew. I'm from Pakitahwâkan sâkahikan. Sturgeon Lake First Nations, Saskatchewan.  And as far as grassroots goes, I, for one, I'm always being of service to my home community and, overall, to my nation.

Some of the successes have been creating signs of Nēhiyawēwin in our home community. Um, creating camps for families, creating Oskapios programs for boys and men, creating a whole community wide language program from leadership to the different agencies that are employed in our home community, such as the health department, the education department, our economic department, resources for roads and for the bison, everybody learning Cree.

We have Indigenous language revitalization as a field of study at the University of Victoria. I know that First Nations University of Canada is always working, is also working on the same thing. As Néhiyawak people, original peoples of Turtle Island, we're working together. We're coming together, working together to help lift up our original languages. So these are some successes.

Tegan: Earlier in our conversation, Randy mentioned that the different Cree languages have their own unique worldviews. Different languages aren't just interchangeable labels for the same things. Languages are pretty interesting. English, with all its funny spellings and weird idioms, carries its history within itself. And the Cree languages are unique in their own way too.

Randy: You definitely see the world in two different ways when you know the language. And I'm speaking from experience, so your worldview is really different.

You can understand the stories, you know, the teachings that are embedded within the stories. You'll understand the ceremonies a lot more if you have an intimate knowledge of the language.

Humor also plays a big part in the language. The humor is so descriptive, you can see it in your mind better than in English. English, it's kind of one dimensional.So yeah, it's relationships to the natural world.  And how we see the world as living, and that really clashes with this worldview today. Like there's no connections to the earth. You know, they see the earth as inanimate. Not alive.  They see animals as having no spirits. Insects. Birds. You know, we call these CREE our relatives, eh? So, you have a better relationship when you have that. 

The language teaches you values. Laws, teachings, these are in our languages and again, if I mentioned valid knowledge systems are embedded in it, and this language is thousands of years old.

So, you see the world differently in two different ways.  Uh, and I wanna, you know, people ask me, how do you see it in two different ways?  I see it as making you a better human being. You know, you’re not as greedy. You’re not as selfish.

But in our, in our language it’s for the whole, for the betterment because we believe, philosophically, spiritually, we go back to the spirit world. So, we have to have these laws, we have to follow these teachings, because our time (Randy snaps his fingers) is like that. We can't take anything with us, right? So, uh, so that's the fundamental differences I think of seeing the world in two different ways.

Tegan: If you're familiar with French, you're familiar with the way it sorts its nouns by gender, masculine and feminine. Cree sorts its nouns according to animacy: that which is living and that which is not.

Belinda: Animacy is how we view the world as alive and with spirit, imbued with spirit.  And so, when we look at the land, askiy, the land provides us with everything that we need, that we’ve needed, so the land is alive, the earth is alive with a spirit. And when you look at the trees and the rocks and the mountains and the rivers, the animals, the ocean, the sun, even the weather. If you have that perspective of these elements are alive with spirit, imbued with a life force of something, a source, and if you can think of them like the way you think of your own parents, your own grandparents, your brother, your sister, your newborn child. That whole way of thinking of the natural world makes you more conscious and respectful and grateful of where we're living. 

Tegan: You mentioned going the importance of nature. Why you run a language camp with camp being a very key word, I think. Why is a camp being out in nature and disconnected, unplugged? Why is that the ideal place? Not for a language camp generally. But a Cree language camp specifically, why is that the best place?

Belinda: This is how our ancestors lived. This is what our ancestors have always done.  This is the solution. And again, just coming back and connecting. It's a spiritual place, like being out in nature, walking on the land, swimming in the waters, listening to the birds when they start singing early in the morning, and going to sleep when they stop. It's amazing. And then learning the language for your surroundings. I don't know why or how but when I'm out in the context, in the language, hearing the language, something goes on in my head in my brain in the way I think. There's a shift that happens and I often try to describe it as like... like a puzzle being put together. That's what happens to my brain. Like I can literally feel this connection to the context, on the land, in the language.  And my whole worldview is just, you know, like a switch goes off, and I leave this English world behind.  I leave all the memories of the violence behind in the English colonial context. And when I'm out on the land.  It's a feeling of sâkihitowin. It's a feeling of love. 

Randy: It teaches you humility, you know, being out on the land, you know, there's no ego, there's no power tripping, and you get to connect with the communities that you're in. It's the energy of the place, eh, it’s a really clean environment, detoxifying. It's empowering. It's very loving. It's very gentle.  It's very organic… It's just a feeling of home, you know, your spirit just, you leave this concrete jungle and go out into the natural, and the spirits of the land… It's a beautiful place.

Belinda: It's a natural rhythm.

Randy: Mm hmm.

Tegan: When your language categorizes nouns by being imbued by spirit, how does that affect your perception of the natural world? Especially in the context of the climate crisis?

Randy: Well, I'll say it in one word. Wâhkôhtowin, wâhkôhtowin, we are all related. That includes the plants, that includes the stars, the moon, the mountains. 

If you're related to a relative, are you going to hurt that relative? Because in our laws, we have pâstâhowin, overstepping Creator's law.  Right. So, if you know that you're not going to cut down trees… not overharvesting for the profit of exorbitant amounts. 

People are getting rich and, and, you know, there's no word in their worldview about wâhkôhtowin. And then we have this thing called ohcinêwin, it's like harm against sentient beings. And that includes everything in creation. The water is alive; we're not gonna poison the water; we're not gonna drill in the water. You know what I mean? We have all these examples of what not to do, in their language.

Belinda: It's just saying what we were saying earlier in regards to this idea of animacy. The word for people, like I'm always saying, the Néhiyawak is plural, but we also refer to that, such as like the trees, mîtos, mîtosak, or the animals, or like birds, piyêsîsak, it's the same reference of something being alive, rocks, asiniyak, it's the same reference of they're alive with spirit and that's how they referred to.  So, if you think of the world as alive and refer to them as kin, like Randy said, like we said earlier, you're not going to clear cut. You're not going to extract and build those big mining holes. Climate change is on this drastic rise. We're seeing it with the change in temperatures, these warm temperatures. It's frightening, unusual. You see bears coming out of this winter habitat. You see geese having ducklings, and it's only January.  

Tegan: It's unnatural.

Randy: Yeah, so different languages around the world refer, like, they know the Earth is a mother. The Earth is a mother. She provides. It's not just a concept. It's actually like a real belief that the Earth is our mother, right? So, with that understanding, like Belinda said, we're going to protect, we're going to protect and care for our mother and the medicine she provides for us. Everything has a medicinal purpose and a spirit, right? So, it's on us to learn those, those medicines so we keep them for future generations.

Tegan: Ryan DeCaire said: “It's said that people revitalize a language, but really, it's a language that revitalizes a people.” What are the benefits of learning an Indigenous language, whether as a first or a second language?

Belinda: The benefits are unlimiting, just as Randy was speaking as going through my own thoughts and the benefits are reclaiming your language is your connection to land. Your connection to where you belong, your culture, your connection to what you know of natural laws and natural governance in, in those systems and also the people, your connection to the people. So, there's multiple benefits. 

As well… once you realize and know where you belong and where you come from, where you've been, moving forward, you can walk into the trauma. You can walk into that historical trauma, that intergenerational trauma, and you can smash it. Once you know your language and where you come and understand all of that. And it's been that way for me. And it's been a feeling of coming home, knowing my purpose, knowing my role, knowing that my ancestors are behind me, and I stand on the shoulders of giants.

Tegan: What are your hopes for the future of your language and your community, and do you think the work will ever be done?

Belinda: My dreams for the future. I was kind of asked that question about 10 years ago so I'm glad I'm getting asked this question again. My dreams for the future is that our languages are federal, provincial laws throughout Canada. I hope that our communities are speaking, and the languages are flourishing in our communities, that our schools are land based, spaces and places operated all in the language and that we're speaking, not just our language, but our neighboring languages.

Prior to contact, this is based on Onowa McIvor’s work, is that our languages were very multilingual. We were a very multilingual continent. And we spoke more than one language. We had to speak multiple languages to have this commerce, this massive commerce, trading, alliance system. And when the early settlers came over, they learned our languages. So it's only been recent that we've had this banning of our ways of knowing and being and doing within the last hundred years or so, and so my dreams are, how do we rectify this? How does the federal government rectify this? And then how do the powers that be help lift up our languages?

And my dreams are, again, that we have this love and understanding and empathy for the original peoples of this land. And that there is no one incarcerated. That we have no addictions. That we have no homelessness. That we don't have mental illnesses. Those are my dreams, and I just hold on to that. 

Tegan: Randy?

Randy: I hope our environment’s also intact. And I hope we get an Indigenous prime minister. You know what I mean? These are all my hopes and dreams, but I want my children, my grandchildren to be in this world that values them for who they are. Cause you know, look at me, wherever I go, people are scared of me in my own lands, my own treaty territory. People are scared of me. I get stereotyped, the racism that's thrown at me daily. I want the future to be better for my children and grandchildren. We can get along and work together from kindergarten all the way to university that the language is, you know, is embedded, language instruction… Everyone's just speaking the language, it's a goal of mine, but I don't know if I'll see it, but that's what I want to see.

Tegan: Is there anything that I haven't brought up that you'd like to talk about?

Belinda: Oh, I would like to say for people listening, learn the original, names of the nations that live on Turtle Island. I'm not exactly sure where the word Indigenous comes from, but I would prefer being called a Nehiyawakamakiano. We're one nation. There's multiple nations within Canada. Learn a greeting, learn how to say hello in whatever lands you live on. I've learned how to say 'uy' skweyl. I've learned how to say 'uy' skweyl ch'u. Like, these are just very helpful little words, that go a long way. Especially if you're a visitor on lands that you don't originally come from. And I'm not sure if I said this, but encourage your children like you do, you know, with sports or with dance or with music. Encourage your children to speak the language and to participate in the culture. Those are just a couple of things that come to my mind.

Tegan: You've been listening to Eh Sayers. Thank you to Randy Morin and Belinda Kakiyosew Daniels for taking the time to speak with us.

You can subscribe to this show wherever you get your podcasts. There, you can also find the French version of our show, called Hé-coutez bien! If you liked this show, please rate, review, and subscribe. Thanks for listening!

Sources

Anderson, Thomas. “Chapter 4: Indigenous Youth in Canada.” Statistics Canada. Government of Canada, December 1, 2021.

Statistics Canada. “Indigenous Languages across Canada.” Statistics Canada. Government of Canada, March 29, 2023.

Indigenous Languages in Canada, 2021.” Statistics Canada. Government of Canada, March 29, 2023.

Visible minority concept consultative engagement

Opened: October 2022
Updated: September 2023
Results posted: October 2023

Consultative engagement objectives

The visible minority concept is currently under review. Statistics Canada has been committed to engaging with partners, stakeholders, ethnocultural groups, and the general public to identify the appropriate terminology and categories to describe the population and properly address data needs in health, education, justice, and employment equity.

Consultative engagement methods

These consultative engagements on the Visible Minority Concept were conducted virtually with group discussions and information sessions, and electronically with e-forms and written submissions in both official languages. It was publicized through Statistics Canada's Consulting Canadians page, various events and social media. Moreover, stakeholders and partners, ethnocultural groups, non profit and nongovernment organizations and researchers were invited by email to participate and to share the invitation with others within their network.

How participants got involved

Overall, Statistics Canada received feedback from more than 460 individuals in both official languages from a variety of people and organizations, including anti-racism groups, civil society organizations, ethnocultural community organizations, religious networks, social inclusion groups and the general public.

The consultative engagement also included several follow up discussions with subject-matter experts that came from these ethnic diverse groups.

Statistics Canada thanks participants for their contributions to this consultative engagement initiative. Their insights will help guide the agency in this review.

Initial findings of the consultative engagements

Terminology

What we heard regarding terminology to replace "visible minority"

A number of participants preferred the term "racialized groups." They noted that the term "racialized" is already used by various federal departments, by provincial and municipal governments, and in the media. They also argued that the term more accurately presents race as a social construct by emphasizing the process of racialization.

However, the term "racialized" was also the most controversial option. Most francophone participants did not think that Statistics Canada should adopt race-based terminology because it is more generally considered to be offensive in the French language. In fact, many participants (both French- and English-speaking) were offended when they were described as belonging to a racialized group. They also felt that labelling all non-White people as "racialized" reinforces that White is the dominant group. Participants also noted the various definitions of "racialization" currently in use, related to colour of skin, culture, religion, ethnicity, language, etc.

The term population group (or another neutral term, such as diverse groups) was the second most preferred. Participants argued that it is sufficiently broad and flexible to apply to a number of situations and to be defined differently according to the needs of different organizations or programs. It was considered to be a more neutral term that would likely have a longer lifespan, considering the sensitivity of this topic. Participants also noted that the term could include the White population, without making this population either the reference or the norm. On the other hand, some participants opposed this term because of its vagueness.

Categories

Option 1

  • White
  • South Asian (e.g., East Indian, Pakistani, Sri Lankan)
  • Chinese
  • Black
  • Filipino
  • Arab
  • Latin American
  • Southeast Asian (e.g., Vietnamese, Cambodian, Laotian, Thai)
  • West Asian (e.g., Iranian, Afghan)
  • Korean
  • Japanese

Option 2

  • White
  • South Asian
  • East Asian
  • Black
  • Southeast Asian
  • Middle Eastern
  • Latin American

Note:

  • The "Option 1 – Current categories" list above reflects the categories included in the last Census. Information collected from this question are in accordance with the Employment Equity Act. Respondents can select multiple categories and the data collected on these groups are used for various purposes, including in the fields of labour, education, health, justice, etc.
  • The Option 2 is currently being used by certain federal departments.
  • The Census does have a question on ethnic and cultural origin which includes a list of over 500 response options and derives multiple responses showing the diversity of the population at a very granular level (see this infographic created with data from the 2021 Census).
  • The Census also provides specific data on Indigenous identity, on place of birth, on generation status, on religion, and on languages.

What we heard regarding the categories

During the consultations, no clear consensus emerged on a list of categories to measure groups. Some participants suggested that combining certain categories, as seen below in option 2, would be more useful for anti-racism purposes because the resulting data collected would be more reflective of the perception of others rather than the respondent's personal identity - which often can be quite specific.

Other participants argued that more detail is always preferable and saw no advantage in a reduction of the number of categories. Moreover, these participants noted that reducing the number of categories would mean that detail for certain groups would be lost (e.g., Chinese, Japanese, Korean, Filipino, Arab, West Asian).

One common criticism was that the categories on both lists are incoherent because they straddle race, ethnicity, nationality, and geographical descent. Most respondents believed that some categories (in particular, the "Black" category) are too broad and should be more granular.

That said, most respondents felt that comparability between census cycles is important for their data needs and were concerned with the potential impacts caused by changing the categories in the questionnaire.

Further summary results of the consultative engagement initiatives will be published online when available.

Identifying Personal Identifiable Information (PII) in Unstructured Data with Microsoft Presidio

By Saptarshi Dutta Gupta, Statistics Canada

Editor's note: The content of this article represents the position of the author and may not necessarily represent that of Statistics Canada.

Introduction

In today's digital age, organizations collect and store vast amounts of data about their customers, employees, and partners. This data often contains Personal Identifiable Information (PII). With the growing prevalence of data breaches and cyber attacks, protecting PII has become a critical concern for businesses and government agencies alike. For example, Statistics Canada conducts hundreds of surveys each year on a variety of topics and is obligated to protect the information that individuals provide.

Canada has two federal privacy laws that are enforced by the Office of the Privacy Commissioner of Canada:

  • Privacy Act: covers how the federal government handles personal information. The Privacy Act offers protections for personal information, which it defines as any recorded information about an 'identifiable individual'.
  • Personal Information Protection and Electronic Documents Act (PIPEDA): PIPEDA is the federal privacy law that applies to organizations that collect, use, or disclose personal data during commercial activities. PIPEDA requires organizations to obtain consent for the collection, use, or disclosure of personal data, and to protect personal data from unauthorized access, use, or disclosure.

Other than the above-mentioned laws, all organizations are also bound by the General Data Protection Regulation (GDPR). GDPR is the toughest privacy and security law in the world. Though it was drafted and passed by the European Union (EU), it imposes obligations onto organizations anywhere, so long as they target or collect data related to people in the EU. The GDPR will levy harsh fines against those who violate its privacy and security standards, with penalties reaching into the tens of millions of euros.

In this article, we will take a detailed look at Microsoft Presidio and how it helps organizations in Canada comply with privacy laws. We will start by discussing the key features and capabilities of Microsoft Presidio and how Microsoft Presidio can assist organizations in meeting their obligations under these laws.

Definitions

Before proceeding with rest of the article it is important to understand the difference between the terms Anonymization, Deidentification and Pseudo-anonymization that has been used in the rest of the article.

  • Anonymization: Anonymization refers to the process of irreversibly removing or obscuring identifiable information from data in such a way that the original data cannot be re-identified. The goal is to make it impossible or extremely difficult to link the data back to the individual it represents. Anonymized data should not contain any direct or indirect identifiers that could be used to identify individuals. 
  • Deidentification: Deidentification involves the removal or alteration of PII from a data set in order to prevent the identification of individuals. Unlike anonymization, deidentification does not necessarily require the data to be rendered completely unidentifiable. Instead, it focuses on removing or modifying specific identifiers, such as names, addresses, social security numbers, or any other information that could be used alone or in combination with other data to identify individuals. 
  • Pseudo-anonymization: Pseudo-anonymization is a technique that involves replacing direct identifiers with pseudonyms or unique identifiers, thereby unlinking the data from the individuals it represents. Unlike anonymization, where the original data is altered to prevent re-identification, pseudo-anonymization retains the ability to re-identify individuals using additional information stored separately, such as a key or lookup table. Pseudo-anonymization is commonly used in situations where data needs to be linked across different systems or databases while still protecting individual privacy.

What is PII?

Personal identifiable information (PII) is any data that can be used to identify an individual. This includes, but is not limited to, names, addresses, phone numbers, social security numbers, financial information, and medical records. PII is highly sensitive information that needs to be protected from unauthorized access, as it can be used for identity theft and other fraudulent activities.

Depending on whether a piece of information can be used directly or indirectly to re-identify an individual, one can categorize the information mentioned above into direct-identifiers and quasi-identifiers [4]:

  • Direct-identifiers: A set of variables unique for an individual (a name, address, phone number, or bank account) that may be used to directly identify the subject.
  • Quasi-identifiers: Information such as gender, nationality, or city of residence that in isolation does not enable re-identification but may do so when combined with other quasi-identifiers and background knowledge.

Why is PII protection important?

PII protection is important because individuals have a right to privacy and should have control over how their personal information is collected, used, and disclosed. Data breaches and identity theft can have significant consequences for individuals, including financial losses, reputational damage, and emotional distress. Therefore, it is essential for organizations to have robust measures in place to protect PII.

Background

a) Anonymising structured data

When it comes to anonymizing structured data, there are established mathematical models of privacy. This includes:

  • K-anonymity: A masked dataset has k-anonymity property if in the dataset each information that a person contains, cannot be distinguished from at least k-1 other individuals. Two methods can be used to achieve k-anonymity: first one is suppression which involves completely removing an attribute's value from a dataset. The second one is generalization in which a specific value of an attribute is replaced with a more general one.
  • L-diversity: this is an extension of k-anonymity. If we put sets of rows in a dataset that have identical quasi-identifiers together, there are at least l distinct values for each sensitive attribute, then we can say that this dataset has l-diversity.
  • Differential privacy: this aims to ensure that the output of a process or algorithm remains roughly the same, regardless of whether an individual's data is included. This means that it is impossible to determine with certainty whether a specific individual is present in the dataset just by examining the output of a differentially private analysis.

There are several other anonymization techniques that can be applied to both structured and unstructured data. Some of these techniques include:

  • Data shuffling: This involves randomly rearranging the rows or columns of a dataset to disrupt any potential correlations between variables.
  • Data perturbation: This involves adding random noise or errors to the data to reduce the risk of re-identification. This can be done through techniques such as adding Gaussian noise or rounding values to the nearest multiple of a certain number.
  • Data aggregation: This involves aggregating the data at a higher level, such as at the city or state level, to protect individual-level data.
  • Data suppression: This involves removing sensitive information from the dataset altogether, such as by deleting specific columns or rows, or replacing sensitive values with a placeholder value (e.g., "******").
  • Data generalization: This involves replacing specific values with more general values, such as replacing a specific street address with just the city or state.
  • Data obfuscation: This involves replacing sensitive information with fake or misleading data, such as through random name generation or generating fake addresses.

It is essential to understand that no single anonymization technique is completely foolproof. Therefore, it is usually necessary to use a combination of techniques to effectively protect sensitive data. It is also crucial to continuously evaluate and update anonymization techniques as new re-identification risks and techniques arise.

b) Anonymizing Unstructured data

The process of anonymizing unstructured data, such as text or images, is a more challenging task. It entails detecting where the sensitive information is present in the unstructured data and then applying anonymization techniques to it. Because of the nature of the unstructured data, directly using simple rule-based models might not have a very good performance.

Therefore, Natural Language Processing (NLP) have been applied to text anonymization. In particular, Named Entity Recognition (NER) which is a type of sequence labeling task is used which indicates if a token (like a word) corresponds to a named entity, such as PERSON (PER), LOCATION, DATETIME or an ORGANIZATION (ORG) as shown below. O indicates no entities have been recognized.

Image 1. Sequence Labeling Task – Named Entity Recognition

Image 1. Sequence Labeling Task – Named Entity Recognition
Description - Image 1. Sequence Labeling Task – Named Entity Recognition

This picture describes the result after passing a sequence of string through a Named Entity Recognizer (NER). Input is the string “John bought 30 Amazon shares in 2022” and after passing the sequence through a NER model each word is being classified with its corresponding entity. John is tagged as a PERSON, Amazon as Organization, 2022 as Datetime, rest all the information is tagged as OTHERS.

Several neural models have achieved state-of-the-art performance on NER tasks on datasets with general named entities. When they are trained on medical domain data that contains various types of personal information, they are shown to achieve state-of-the-art performance on those data as well. These model architectures include Recurrent Neural Networks (RNNs) with character embeddings or Bidirectional Transformers (BERT).

SpaCy also uses a RoBERTa based language model fine-tuned on the Ontonotes dataset with 18 named entity categories, such as PERSON, GPE, CARDINAL, LOCATION, etc.

Microsoft Presidio uses a combination of rule based and Natural Language processing methods to anonymize sensitive content which we will discuss next.

Microsoft Presidio

Why do we need Microsoft Presidio?

When we apply PII anonymization to real-world applications, there might be different business requirements that make it challenging to use pretrained models directly. For example, Government of Canada (GoC) receives several applications during an advertised process which are then reviewed. Before the review process, PII needs to be redacted to ensure personal information is not leaked and to avoid bias. Apart from the common PII entities, GoC also uses a Personal Record Identifier (PRI) for every employee such that the last digit is a modulus-11 check digit [Source: TBS - Incumbent Data Element Dictionary]

A pre-trained NER model cannot identify these special entities. Finetuning the model with extra labeled data is required to achieve good performance. Therefore, there is a requirement for a tool that can utilize a pre-trained NER model and can easily be customized and extended.

Presidio (origin from Latin praesidium 'protection, garrison') helps to ensure sensitive data is properly managed and governed. It provides fast identification and anonymization modules for private entities in text and images such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data and more.

One of the key benefits of the Presidio framework is its ability to scale. It can handle large data sets, making it suitable for use by organizations with large amounts of data. It is also designed to be flexible and adaptable, allowing organizations to customize its use to meet their specific needs.

Image 2: PII detection workflow in Microsoft Presidio [Source: Presidio: Data Protection and De-identification SDK]

Image 2: PII detection workflow in Microsoft Presidio
Description - Image 2: PII detection workflow in Microsoft Presidio

The image shows the Presidio Detection flow which is used to detect PII. An input passes through regex which performs pattern recognition, followed by a Named Entity Recognition algorithm to detect entities, checksum to validate patterns, context words to increase the detection confidence and multiple anonymization techniques. The image shows the input: ‘Hi, my name is David, and my number is 212 555 1234’. After passing the input through the Presidio detection flow, David, and the number 212 55 1234 is detected as PII.

Goals

  • Introduce de-identification technologies to organizations in a user-friendly manner to promote privacy and transparency in decision-making.
  • Make the technology flexible and customizable to fit specific business needs.
  • Support both fully automated and semi-automated PII de-identification on multiple platforms.

Main features

  • Provides PII recognition using a variety of methods such as Named Entity Recognition, regular expressions, rule-based logic, and checksum with context, in multiple languages.
  • Offers the ability to connect to external PII detection models.
  • Offers multiple options for use, including Python or PySpark workloads, Docker, and Kubernetes.
  • Allows for customization in PII identification and anonymization.
  • Includes a module for redacting PII text in images.

Main modules of Presidio

a) Presidio Analyzer:

(i) Overview

The Presidio analyzer is a Python based service for detecting PII entities in text. During analysis, it runs a set of different PII Recognizers, each one in charge of detecting one or more PII entities using different mechanisms. Presidio analyzer comes with a set of predefined recognizers but can easily be extended with other types of custom recognizers. Predefined and custom recognizers leverage Named Entity Recognition, regular expressions, rule-based logic, and checksum with the relevant context in multiple languages to detect PII in unstructured text as shown in the Detection Workflow shown below:

Image 3: Presidio Analyzer for Identifying PII [Source: Presidio Analyzer]

Image 3: Presidio Analyzer for Identifying PII
Description - Image 3: Presidio Analyzer for Identifying PII

The image shows how the Presidio Analyzer is used for Identifying PII. The input text is passed through multiple PII Recognizers which includes built-in recognizer, custom recognizer, and custom models. The built-in recognizer includes Regex, checksum, NER, and context words. After passing the text input through all the recognizers, the PII is detected.

By default, Microsoft Presidio can recognize the following entities: Supported entities - Microsoft Presidio

(ii) Installation

Presidio Analyzer can be installed using pip, docker or can be build from the source.

(iii) Running a Basic Analyzer

Once installation is complete, a basic analyzer can be run with a few lines of code as shown:

from presidio_analyzer import AnalyzerEngine
# Set up the engine, loads the NLP module (spaCy model by default) and other PII recognizers
analyzer = AnalyzerEngine()
# Call analyzer to get results
results = analyzer.analyze(text="Mr. John lives in Vancouver. His email id is john@sfu.ca", language='en')
print(results)

[type: EMAIL_ADDRESS, start: 45, end: 56, score: 1.0, type: PERSON, start: 4, end: 8, score: 0.85, type: LOCATION, start: 18, end: 27, score: 0.85, type: URL, start: 50, end: 56, score: 0.5]

By default, Presidio uses spaCy's en_core_web_lg model and can identify the following entities: Supported entities - Microsoft Presidio. As seen in the above code, the PERSON, EMAIL_ADDRESS, LOCATION and URL has been identified. We can extend the analyzer to support detection of new entities which is discussed next.

(iv) Capabilities of Presidio Analyzer

  • Support detection of new PII entities

To expand Presidio's detection abilities to new types of PII entities, EntityRecognizer objects should be added to the current list of recognizers. These objects are Python-based and can detect one or more entities in a specific language.

The following class diagram shows the different types of recognizer families Presidio contains.

Image 4: Class Diagram for different types of Recognizers in Presidio [Source: Supporting detection of new types of PII entities]

Image 4: Class Diagram for different types of Recognizers in Presidio
Description - Image 4: Class Diagram for different types of Recognizers in Presidio

The image shows the class diagram for different types of recognizers in Presidio. The EntityRecognizer is an abstract class for all recognizers. The RemoteRecognizer is an abstract class for calling external PII detectors. The abstract class LocalRecognizer is implemented by all recognizers running within the Presidio-analyzer process. The PatternRecognizer is a class for supporting regex and deny-list based recognition logic, including validation (e.g., with checksum) and context support.

In the above diagram:

  • The EntityRecognizer is an abstract class for all recognizers.
  • The RemoteRecognizer is an abstract class for calling external PII detectors. See more info here.
  • The abstract class LocalRecognizer is implemented by all recognizers running within the Presidio-analyzer process.
  • The PatternRecognizer is a class for supporting regex and deny-list based recognition logic, including validation (e.g., with checksum) and context support.

A simple way of extending the analyzer to identify additional PII entities can be done in two steps:

  1. Creating a new class based on EntityRecognizer.
  2. Add the new recognizer to the recognizer registry so that the AnalyzerEngine can use the new recognizer during analysis.

Example:

For simple recognizers based on regular expressions or deny-lists, we can leverage the provided PatternRecognizer and call the recognizer itself as shown:

from presidio_analyzer import PatternRecognizer
titles_recognizer = PatternRecognizer(supported_entity="TITLE", deny_list=["Mr.","Mrs.","Miss"])
titles_recognizer.analyze(text="Mr. John lives in Vancouver. His email id is john@sfu.ca", entities="TITLE")

[type: TITLE, start: 0, end: 3, score: 1.0]

Next, we can add it to the list of Recognizers for the detection of additional PII entities:

from presidio_analyzer import AnalyzerEngine, RecognizerRegistry
registry = RecognizerRegistry()
registry.load_predefined_recognizers()
# Add the recognizer to the existing list of recognizers
registry.add_recognizer(titles_recognizer)
# Set up analyzer with our updated recognizer registry
analyzer = AnalyzerEngine(registry=registry)
# Run with input text
text="Mr. John lives in Vancouver. His email id is john@sfu.ca"
results = analyzer.analyze(text=text, language="en")
results

[type: TITLE, start: 0, end: 3, score: 1.0,
type: EMAIL_ADDRESS, start: 45, end: 56, score: 1.0,
type: PERSON, start: 4, end: 8, score: 0.85,
type: LOCATION, start: 18, end: 27, score: 0.85,
type: URL, start: 50, end: 56, score: 0.5]

For more complex EntityRecognizer like the detection of PRI for the Government of Canada, the recognizer can be created in code using the following steps:

  • Create a new Python class which implements LocalRecognizer. (LocalRecognizer implements the base EntityRecognizer class). This class has the following functions:
    • load: load a model / resource to be used during recognition
    • analyze: The main function to be called for getting entities out of the new recognizer
  • Add it to the recognizer registry using registry.add_recognizer(my_recognizer). For more examples, see the Customizing Presidio Analyzer Jupyter notebook.

There are several other ways to create a Custom Recognizer in Presidio, such as:

  • Creating a remote recognizer: Using a remote recognizer, which interacts with an external service for PII detection. This could be a 3rd party service or a custom service running alongside Presidio.
  • Creating ad-hoc recognizers: Creating ad-hoc recognizers using the Presidio Analyzer API. These recognizers, in JSON form, can be added to the /analyze request and are only used for that specific request.
  • Reading pattern Recognizers from YAML: Reading pattern Recognizers from YAML files, which allows users to add recognition logic without writing code. An example YAML file can be found here: Example Recognizers. Once the YAML file is created, it can be loaded into the RecognizerRegistry instance.

2. Multi-language support

Presidio can detect PII in multiple languages using its built-in recognizers and models. By default, it includes recognizers and models for English. However, these recognizers are language-dependent, either by their logic or by the context words used to scan for entities.

To improve the results for specific languages, it is possible to update the context words of existing recognizers or add new recognizers that support additional languages. Each recognizer can only support one language, so adding new recognizers for additional languages is necessary.

3. Customizing the NLP models

As mentioned before, the Presidio Analyzer by default uses spaCy's en_core_web_lg model but it can easily be customized by leveraging other NLP models, either public or proprietary. Presidio uses NLP engines for two main tasks: NER based PII identification, and feature extraction for custom rule-based logic (such as leveraging context words for improved detection). These models can be trained or downloaded from existing NLP frameworks like spaCy, Stanza and Transformers.

Configuring the new model can be done either by:

  • Via code: By creating an NlpEngine using the NlpEnginerProvider class and pass it to the AnalyzerEngine as input.
  • Via configuration: Set up the models which should be used in the default conf file. The default conf file is read during the default initialization of the AnalyzerEngine. Alternatively, the path to a custom configuration file can be passed to the NlpEngineProvider

In addition to the built-in spaCy/Stanza/transformers capabilities, it is possible to create new recognizers which serve as interfaces to other models for example, flair.

b) Presidio Anonymizer:

The Anonymizer is also a python-based service. It anonymizes the detected PII entities with desired values by applying certain operators such as replace, mask, and redact. By default, it replaces the detect PII by its entity type such as <EMAIL> or <PHONE_NUMBER> directly in the text. But one can customize it, providing different anonymizing logic for the different types of entities.

The Presidio-Anonymizer package contains both Anonymizers and Deanonymizers.

  • Anonymizers are used to replace a PII entity text with some other value by applying a certain operator. The various built-in operators are:
    • replace: Replace the PII with desired value
    • redact: Remove the PII completely from text
    • hash: Hashes the PII text (can be either sha256,sha512 or md5)
    • mask: Replace the PII with a given character
    • encrypt: Encrypt the PII using a given cryptographic key
    • custom: Replace the PII with the result of the function executed on the PII

Image 5: PII Anonymizer workflow [Source: Presidio Anonymizer]

Image 5: PII Anonymizer workflow
Description - Image 5: PII Anonymizer workflow

The image shows the function of the Presidio anonymizer. The left shows the text and detected PII being passed to both built in and custom anonymizer. The built-in anonymizer consists of operators like redact, hash, replace. After passing the text and detected PII through the PII Anonymizer, the anonymized text is returned.

Example:

frompresidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import RecognizerResult, OperatorConfig
# Initialize the engine:
engine = AnonymizerEngine()
# Invoke the anonymize function with the text, 
# analyzer results (potentially coming from presidio-analyzer) and
# Operators to get the anonymization output:
result = engine.anonymize(
    text="Mr. John lives in Vancouver. His email id is john@sfu.ca",
    analyzer_results= results
)


results

Output:

text: <TITLE> <PERSON> lives in <LOCATION>. His email id is <EMAIL_ADDRESS>
items:
[
    {'start': 54, 'end': 69, 'entity_type': 'EMAIL_ADDRESS', 'text': '<EMAIL_ADDRESS>', 'operator': 'replace'},
    {'start': 26, 'end': 36, 'entity_type': 'LOCATION', 'text': '<LOCATION>', 'operator': 'replace'},
    {'start': 8, 'end': 16, 'entity_type': 'PERSON', 'text': '<PERSON>', 'operator': 'replace'},
    {'start': 0, 'end': 7, 'entity_type': 'TITLE', 'text': '<TITLE>', 'operator': 'replace'}
]

Presidio also allows the extension of the Presidio anonymizer to support additional operators.

  • Deanonymizers are used to revert the anonymization operation. (e.g., to decrypt an encrypted text).

As the input text could potentially have overlapping PII entities, there are different anonymization scenarios that can happen:

  • No overlap (single PII): When there is no overlap in spans of entities, Presidio Anonymizer uses a given or default anonymization operator to anonymize and replace the PII text entity.
  • Full overlap of PII entities spans: When entities have overlapping substrings, the PII with the higher score will be taken. Between PIIs with identical scores, the selection is arbitrary.
  • One PII is contained in another: Presidio Anonymizer will use the PII with the larger text even if it's score is lower.
  • Partial intersection: Presidio Anonymizer will anonymize each individually and will return a concatenation of the anonymized text. To get started, after installing Presidio as instructed here: Installing Presidio

Conclusion

In conclusion, Microsoft Presidio is a valuable tool for detecting personally identifiable information (PII) in text data. Its flexible design allows users to create custom recognizers and models to match specific use cases, and its support for multiple languages allows for efficient PII detection in a wide range of scenarios. Additionally, the ability to use external services, ad-hoc recognizers, and pattern Recognizers from YAML files, enables users to easily incorporate new detection capabilities. Overall, Presidio's comprehensive PII detection capabilities, together with its customization options, make it an asset for organizations looking to protect sensitive data.

Meet the Data Scientist

Register for the Data Science Network's Meet the Data Scientist Presentation

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.

Register for the Meet the Data Scientist event. We hope to see you there!

MS Teams – link will be provided to the registrants by email

Subscribe to the Data Science Network for the Federal Public Service newsletter to keep up with the latest data science news.

References

Summary of privacy laws in Canada - Office of the Privacy Commissioner of Canada

What is GDPR, the EU's new data protection law? - GDPR.eu

How we protect the privacy and confidentiality of your personal information

Pierre Lison, Ildikó Pilán, David Sánchez, Montserrat Batet, and Lilja Øvrelid, Anonymisation Models for Text Data: State of the Art, Challenges and Future Directions (2021). Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing

Official Documentation: Microsoft Presidio

GitHub - microsoft/presidio: Context aware, pluggable and customizable data protection and de-identification SDK for text and images

PII anonymization made easy by Presidio | by Lingzhen Chen | Towards Data Science

Presidio Research · spaCy Universe

Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting - ScienceDirect

Statistics Canada’s Trust Centre

How we protect the privacy and confidentiality of your personal information

Classifications, variables and statistical units

This page contains the full collection of standard classifications, variables and statistical units approved by Statistics Canada.

Filter Options

Use filters to change the focus of your results in the following list.

Classifications, variables and statistical units by subject
Title Subject Type
Sexual orientation of person Population and demography Variable
Sexual orientation Population and demography Classification
Province or territory of intended destination of immigrant Immigration and ethnocultural diversity Variable
Pre-admission experience of immigrant Immigration and ethnocultural diversity Variable
Pre-admission experience Immigration and ethnocultural diversity Classification
Admission category Immigration and ethnocultural diversity Classification
Accommodations of collective dwellings Housing Classification
After-tax income Income, pensions, spending and wealth Classification
After-tax income of persons Income, pensions, spending and wealth Classification
Age categories by five-year age groups Population and demography Classification
Apprenticeship certificates Education, training and learning Classification
Canadian citizenship status Immigration and ethnocultural diversity Classification
Canadian Classification of Institutional Units and Sectors (CCIUS) 2012 Education, training and learning Classification
Canadian Research and Development Classification (CRDC) 2020 Version 2.0 - Field of Research (FOR) Science and technology Classification
Canadian Research and Development Classification (CRDC) 2020 Version 2.0 - Socioeconomic Objective (SEO) Science and technology Classification
Canadian Research and Development Classification (CRDC) 2020 Version 2.0 - Type Of Activity (TOA) Science and technology Classification
Census family status Families, households and marital status Classification
Census family status, variant with children Families, households and marital status Classification
Census family structure (for census family) Families, households and marital status Classification
Census family type Families, households and marital status Classification
Chart of accounts (COA) Canada 2006 - Balance sheet accounts - Assets Economic accounts Classification
Chart of accounts (COA) Canada 2006 - Balance sheet accounts - Equity Economic accounts Classification
Chart of accounts (COA) Canada 2006 - Balance sheet accounts - Liabilities Economic accounts Classification
Chart of accounts (COA) Canada 2006 - Income statement - Expenses Economic accounts Classification
Chart of accounts (COA) Canada 2006 - Income statement - Extraordinary Gains/Losses, Non-recurring Items & Adjustments Economic accounts Classification
Chart of accounts (COA) Canada 2006 - Income statement - Gains/Losses, Corporate Taxes and Other Items Economic accounts Classification
Chart of accounts (COA) Canada 2006 - Income statement - Revenue Economic accounts Classification
Child presence Children and youth; Families, households and marital status Classification
Cisgender, transgender and non-binary Population and demography Classification
Class of Worker Labour Classification
Class of worker - variant on employees and self-employed Labour Classification
Classification of Instructional Programs (CIP) Canada 2021 Version 1.0 Education, training and learning Classification
Classification of the Economic Territory of Canada (CETC) 2011 Economic accounts Classification
College, CEGEP or other non-university certificates or diplomas Education, training and learning Classification
Combinations of college, CEGEP or other non-university certificate or diploma with a bachelor's degree or higher Education, training and learning Classification
Combinations of certificates, diplomas and degrees awarded Education, training and learning Classification
Combinations of trades certificates and diplomas Education, training and learning Classification
Commuting destination Labour Classification
Commuting vehicle occupancy Labour Classification
Completion of secondary (high) school diploma or equivalency certificate Education, training and learning Classification
Condominium status Housing Classification
Core housing need Housing Classification
Variant of the Standard Classification of Countries and Areas of Interest 2022 for Social Statistics Society and community Classification
Country of Citizenship 2021 Immigration and ethnocultural diversity Classification
Degree in medicine, dentistry, veterinary medicine or optometry Education, training and learning Classification
Drainage Regions - Variant of SDAC 2003 Environment Classification
Duration of commute Labour Classification
Dwelling condition Housing Classification
Ecological Land Classification (ELC) 2017 Environment Classification
Economic family status Families, households and marital status Classification
Economic family structure (for economic family) Families, households and marital status Classification
Economic family type Families, households and marital status Classification
Educational qualifications responses Education, training and learning Classification
Educational qualifications responses - variant for alternate reporting Education, training and learning Classification
Employed Status Labour Classification
Enrollment under an Inuit land claims agreement Indigenous peoples Classification
Ethnic or cultural origin: Single or multiple response indicator Immigration and ethnocultural diversity Classification
Ethnic or cultural origins 2021 - List Immigration and ethnocultural diversity Classification
Family size (for economic family) Families, households and marital status Classification
First Official Language Spoken Languages Classification
Frequency of Participation in Religious Activities (with once a day) Society and community Classification
Frequency of Participation in Religious Activities (without once a day) Society and community Classification
Full-time and Part-Time Work Hours Labour Classification
Full-time or part-time weeks worked during the reference year Labour Classification
Gender Population and demography Classification
Gender diversity status Families, households and marital status Classification
Gender diversity status of marriage or common-law union Families, households and marital status Classification
Gender diversity status of marriage or common-law union, variant Families, households and marital status Classification
Gender diversity status, variant Families, households and marital status Classification
Generation status Immigration and ethnocultural diversity Classification
Health problems, self-reported Health Classification
Health Regions (HR) 2017 Health Classification
Health Regions for Alternate Reporting - Variant of HR 2017 Health Classification
Health satisfaction, self-assessed Health Classification
Highest certificate, diploma or degree Education, training and learning Classification
Highest educational attainment Education, training and learning Classification
Highest educational attainment - variant for alternate reporting Education, training and learning Classification
Hours of activity per week Labour Classification
Household living arrangements Families, households and marital status Classification
Household maintainer status Families, households and marital status Classification
Household type Families, households and marital status Classification
Household type, multigenerational variant Families, households and marital status Classification
Housing suitability Families, households and marital status Classification
Immigration applicant type Immigration and ethnocultural diversity Classification
Immigrant status Immigration and ethnocultural diversity Classification
Indigenous ancestry Indigenous peoples Classification
Indigenous ancestry responses 2021 - List Indigenous peoples Classification
Indigenous group Indigenous peoples Classification
Indigenous group response Indigenous peoples Classification
Indigenous identity Indigenous peoples Classification
In-Migration, Five years Immigration and ethnocultural diversity Classification
In-Migration, One year Immigration and ethnocultural diversity Classification
Income sources Income, pensions, spending and wealth Classification
Institution of school attendance Education, training and learning Classification
Knowledge of Official Languages Languages Classification
Labour Force Status Labour Classification
Labour Force Status - Collapsed classification Labour Classification
Languages 2021 - Inuit languages variant Languages Classification
Languages 2021 - Indigenous languages variant - List Languages Classification
Languages 2021 - List Languages Classification
Languages 2021 - total responses - List Languages Classification
Languages 2021 - Collapsed classification Languages Classification
Legal marital status Families, households and marital status Classification
Level of perceived health Health Classification
Living in family household Families, households and marital status Classification
Location of study Education, training and learning Classification
Location of study compared with province or territory of residence Education, training and learning Classification
Low-income status Income, pensions, spending and wealth Classification
Main reason for absence from work (main reason for time lost) Labour Classification
Main reason for not working the full year Labour Classification
Main reason for work interruptions Labour Classification
Main reason for working mostly part time Labour Classification
Marital status Families, households and marital status; Society and community Classification
Marital status - short title variant for dissemination Society and community; Families, households and marital status Classification
Membership in a First Nation or Indian band Indigenous peoples Classification
Membership in a Métis organization or Settlement Indigenous peoples Classification
Military service status Labour Classification
Mobility status, five years Population and demography Classification
Mobility status, one year Population and demography Classification
Mode of commuting Labour Classification
National Occupational Classification (NOC) 2021 Version 1.0 Labour Classification
Non-official languages 2021 - List Languages Classification
North American Product Classification System (NAPCS) Canada 2022 Version 1.0 Manufacturing Classification
North American Industry Classification System (NAICS) Canada 2022 Version 1.0 Labour Classification
Number of bedrooms Housing Classification
Number of children ever born Population and demography Classification
Number of Citizenships Immigration and ethnocultural diversity Classification
Number of earners Income, pensions, spending and wealth Classification
Number of Non-official Language(s) Languages Classification
Number of persons per room Housing Classification
Number of rooms Housing Classification
Number of rooms - Collapsed classification Housing Classification
Number of years of education attended Education, training and learning Classification
Number of years of education completed Education, training and learning Classification
Number of years of other non-university education completed Education, training and learning Classification
Number of years of university education completed Education, training and learning Classification
Opposite- or same-sex married spouse or common-law partner Families, households and marital status Classification
Opposite- or same-sex status Families, households and marital status Classification
Out-Migration, Five years Population and demography Classification
Out-Migration, One year Population and demography Classification
Period of construction Construction; Housing Classification
Place of birth Population and demography Classification
Place of birth of parent Population and demography Classification
Place of birth of parents Population and demography Classification
Place of work status Labour Classification
Population Centre and Rural Area Classification 2016 Population and demography Classification
Population group Immigration and ethnocultural diversity; Population and demography Classification
Primary activity status Labour Classification
Recipient of unpaid service Labour Classification
Registered or Treaty Indian status Indigenous peoples Classification
Relationship structure Families, households and marital status Classification
Religions 2021 - Collapsed List Society and community Classification
Religions 2021 - List Society and community Classification
Residence inside or outside Inuit Nunangat Indigenous peoples Classification
Residence on or off reserve 2021 Indigenous peoples Classification
Residential structures Housing Classification
Retirement Labour; Older adults and population aging Classification
Sex at birth Population and demography Classification
Sex at birth, variant Population and demography Classification
Sex of reference person in a family Families, households and marital status Classification
Sex of reference person of lone-parent family Families, households and marital status Classification
Shelter cost to income ratio Housing; Income, pensions, spending and wealth Classification
Source(s) of financial compensation received during absence from work Labour Classification
Standard Classification of Countries and Areas of Interest (SCCAI) 2022 Population and demography Classification
Standard Drainage Area Classification (SDAC) 2003 Environment Classification
Standard Geographical Classification (SGC) 2021 Population and demography Classification
Status of presence of mortgage payments Families, households and marital status Classification
Status of school attendance Education, training and learning Classification
Status of school enrolment Education, training and learning Classification
Status of subsidized housing Housing Classification
Status of tenure of household Housing Classification
Stepfamily status Children and youth; Families, households and marital status Classification
Time arriving at work Labour Classification
Time leaving for work Labour Classification
Total income Income, pensions, spending and wealth Classification
Total income of person Income, pensions, spending and wealth Classification
Type of Canadian Citizenship Immigration and ethnocultural diversity Classification
Type of Citizenship Response Immigration and ethnocultural diversity Classification
Type of Participation in Religious Activities Society and community Classification
Type of union of couple Families, households and marital status Classification
Unpaid care Labour Classification
Usual Weekly Work Hours Labour Classification
Usual Weekly Work Hours - Collapsed classification Labour Classification
Variant: Population Centre and Rural Area 2016 by Province and Territory Population and demography Classification
Variant of the National Occupational Classification (NOC) 2021 Version 1.0 for Analysis by TEER (Training, Education, Experience and Variant of the National Occupational Classification (NOC) 2021 Version 1.0 for Analysis by TEER (Training, Education, Experience and Responsibility) Labour Classification
Variant of the National Occupational Classification (NOC) 2021 Version 1.0 for Science, Technology, Engineering and Mathematics (STEM) Science and technology Classification
Variant of the Classification of Instructional Programs (CIP) Canada 2021 Version 1.0 for Alternative primary groupings Education, training and learning Classification
Variant of the Classification of Instructional Programs (CIP) Canada 2021 Version 1.0 for Primary groupings Education, training and learning Classification
Variant of the Classification of Instructional Programs (CIP) Canada 2021 Version 1.0 for Science, technology, engineering and mathematics (STEM) and Business, humanities, health, arts, social science and education (BHASE) groupings Education, training and learning Classification
Variant of the National Occupational Classification (NOC) 2021 Version 1.0 for Analysis by TEER (Training, Education, Experience and Responsibility) categories Labour Classification
Variant of the National Occupational Classification (NOC) 2021 Version 1.0 for Science, Technology, Engineering and Mathematics (STEM) Science and technology Classification
Variant of the North American Product Classification System (NAPCS) Canada 2022 Version 1.0 for Agricultural goods (extension variant) Agriculture and food Classification
Variant of the North American Product Classification System (NAPCS) Canada 2022 Version 1.0 for Farm Product Price Index - FPPI (regrouping variant) Agriculture and food Classification
Variant of the North American Industry Classification System (NAICS) 2022 Version 1.0 for Content and media sector Digital economy and society Classification
Variant of the North American Industry Classification System (NAICS) 2022 Version 1.0 for Durable and non-durable manufacturing industries Manufacturing Classification
Variant of the North American Industry Classification System (NAICS) 2022 Version 1.0 for Energy sector Energy Classification
Variant of the North American Industry Classification System (NAICS) 2022 Version 1.0 for Goods and services producing industries Business and consumer services and culture Classification
Variant of the North American Industry Classification System (NAICS) 2022 Version 1.0 for Industrial production (based on the 2008 International Recommendations for Industrial Statistics) Manufacturing Classification
Variant of the North American Industry Classification System (NAICS) 2022 Version 1.0 for Industrial production (based on the 1950 United Nations definition) Manufacturing Classification
Variant of the North American Industry Classification System (NAICS) 2022 Version 1.0 for Information and communication technology (ICT) sector Digital economy and society Classification
Variant of NAPCS Canada 2012 Version 1.1 - Capital expenditures on non-residential construction Construction Classification
Variant of NAPCS Canada 2017 Version 1.0 - Merchandise import and export accounts International trade; Economic accounts Classification
Variant of NAPCS Canada 2017 Version 2.0 - Industrial Product Price Index (IPPI) Prices and price indexes Classification
Variant of NAPCS Canada 2017 Version 2.0 - Manufacturing and Logging Rev.1 Manufacturing Classification
Variant of NAPCS Canada 2017 Version 2.0 - Raw Materials Price Index (RMPI) Prices and price indexes Classification
Variant of NOC 2016 Version 1.0 - Highly aggregated data Labour Classification
Variant of Standard Geographical Classification (SGC) 2021 for Statistical area classification Population and demography Classification
Variant of Standard Geographical Classification (SGC) 2021 for Statistical area classification by Province and Territory Population and demography Classification
Variant of Standard Geographical Classification (SGC) 2021 for Economic Regions Population and demography Classification
Variant of Standard Geographical Classification (SGC) 2021 for Agricultural Regions Population and demography Classification
Variant of Standard Geographical Classification (SGC) 2021 for North and South Population and demography Classification
Variant of status of tenure of household Housing Classification
Visible minority Immigration and ethnocultural diversity Classification
Weeks worked during the reference year Labour Classification
Work activity during the reference year Labour Classification
Work interruption in the reference period Labour Classification
Industry of establishment Business and consumer services and culture Variable
Industry of experienced labour force person Business and consumer services and culture Variable
Stepfamily status of couple family with children Children and youth; Families, households and marital status Variable
Period of construction of private dwelling Construction Variable
College, CEGEP or other non-university certificates or diplomas of person Education, training and learning Variable
Degree in medicine, dentistry, veterinary medicine or optometry of person Education, training and learning Variable
Educational attainment of person Education, training and learning Variable
Educational qualifications of person Education, training and learning Variable
Field of study of person Education, training and learning Variable
Location of study of person Education, training and learning Variable
Number of years of education attended of person Education, training and learning Variable
Number of years of education completed of person Education, training and learning Variable
Number of years of other non-university education completed of person Education, training and learning Variable
Number of years of university education completed of person Education, training and learning Variable
School attendance of person Education, training and learning Variable
Secondary (high) school diploma or equivalency certificate of person Education, training and learning Variable
Trades certificates and diplomas of person Education, training and learning Variable
Year of certificate, diploma or degree completion of person Education, training and learning Variable
Census family status of person Families, households and marital status Variable
Census family structure (for census family) Variable Families, households and marital status Variable
Child presence in census family Children and youth; Families, households and marital status Variable
Child presence in economic family Families, households and marital status Variable
Economic family status of person Families, households and marital status Variable
Economic family structure (for economic family) Variable Families, households and marital status Variable
Family size (for census family) Families, households and marital status Variable
Family size (for economic family) Variable Families, households and marital status Variable
Gender diversity status of couple family Families, households and marital status Variable
Gender diversity status of marriage or common-law union of person Families, households and marital status Variable
Household living arrangements of person not in a census family Families, households and marital status Variable
Household maintainer status of person Families, households and marital status Variable
Household size of private household Families, households and marital status Variable
Household type of private household Families, households and marital status Variable
Housing suitability of private household Families, households and marital status Variable
Legal marital status of person Families, households and marital status Variable
Living in a family household (for person) Families, households and marital status Variable
Marital status of person Families, households and marital status Variable
Opposite- or same-sex married spouse or common-law partner of person Families, households and marital status Variable
Opposite- or same-sex status of couple family Families, households and marital status Variable
Relationship structure of stepfamily Families, households and marital status Variable
Sex of reference person of economic family Families, households and marital status Variable
Sex of reference person of lone-parent family Variable Families, households and marital status Variable
Type of census family Families, households and marital status Variable
Type of economic family Families, households and marital status Variable
Type of union of couple Variable Families, households and marital status Variable
Health problems, self-reported of person Health Variable
Health satisfaction, self-assessed of person Health Variable
Perceived health of person Health Variable
Accommodation type of collective dwelling Housing Variable
Bedrooms of private dwelling Housing Variable
Condominium status of private dwelling Housing Variable
Core housing need of private household Housing Variable
Dwelling condition of private dwelling Housing Variable
Persons per room of private household Housing Variable
Presence of mortgage payments of owner household Housing; Income, pensions, spending and wealth Variable
Rooms of private dwelling Housing Variable
Shelter cost of private household Housing; Income, pensions, spending and wealth Variable
Shelter-cost-to-income ratio of private household Housing; Income, pensions, spending and wealth Variable
Structural type of private dwelling Housing Variable
Subsidized housing of renter household Housing Variable
Tenure of private household Housing Variable
Value (owner estimated) of private dwelling Housing; Income, pensions, spending and wealth Variable
Admission category of immigrant Immigration and ethnocultural diversity Variable
Age at immigration of immigrant Immigration and ethnocultural diversity Variable
Applicant type of immigrant Immigration and ethnocultural diversity Variable
Citizenship of person Immigration and ethnocultural diversity Variable
Ethnic or cultural origin of person Immigration and ethnocultural diversity Variable
Generation status of person Immigration and ethnocultural diversity Variable
Immigrant status of person Immigration and ethnocultural diversity Variable
Place of origin of person Immigration and ethnocultural diversity Variable
Population group of person Immigration and ethnocultural diversity Variable
Visible minority of person Immigration and ethnocultural diversity Variable
Year of arrival of person Immigration and ethnocultural diversity Variable
Year of immigration of immigrant Immigration and ethnocultural diversity Variable
Adjusted after-tax income of economic family Income, pensions, spending and wealth Variable
Adjusted after-tax income of person not in economic family Income, pensions, spending and wealth Variable
Adjusted after-tax income of private household Income, pensions, spending and wealth Variable
Adjusted total income of private household Income, pensions, spending and wealth Variable
After-tax income of census family Income, pensions, spending and wealth Variable
After-tax income of economic family Income, pensions, spending and wealth Variable
After-tax income of person Income, pensions, spending and wealth Variable
After-tax income of private household Income, pensions, spending and wealth Variable
Income sources of person Income, pensions, spending and wealth Variable
Low-income status of economic family Income, pensions, spending and wealth Variable
Low-income status of person Income, pensions, spending and wealth Variable
Low-income status of private household Income, pensions, spending and wealth Variable
Number of earners of economic family Income, pensions, spending and wealth Variable
Total income of census family Income, pensions, spending and wealth Variable
Total income of economic family Income, pensions, spending and wealth Variable
Total income of person Variable Income, pensions, spending and wealth Variable
Total income of private household Income, pensions, spending and wealth Variable
Enrollment under an Inuit land claims agreement of person Indigenous peoples Variable
Indigenous ancestry of person Indigenous peoples Variable
Indigenous group of person Indigenous peoples Variable
Indigenous identity of person Indigenous peoples Variable
Membership in a First Nation or Indian band of person Indigenous peoples Variable
Membership in a Métis organization or Settlement of person Indigenous peoples Variable
Registered or Treaty Indian status of person Indigenous peoples Variable
Residence inside or outside Inuit Nunangat of person Indigenous peoples Variable
Residence on or off reserve of person Indigenous peoples Variable
Absences from work of employed person Labour Variable
Class of worker of employed person Labour Variable
Class of worker of experienced labour force person Labour Variable
Class of worker of person with recent work experience Labour Variable
Commuting destination of employed person Labour Variable
Commuting duration of employed person Labour Variable
Commuting vehicle occupancy of employed person Labour Variable
Distance (straight-line) from home to work of employed person Labour Variable
Industry of employed person Labour Variable
Industry of person with recent work experience Labour Variable
Labour force status of person Labour Variable
Location of workplace of employed person Labour Variable
Main mode of commuting of employed person Labour Variable
Main reason for not working the full year of person 15 years or over Labour Variable
Main reason for working mostly part time of person 15 years or over Labour Variable
Military service status of person Labour Variable
Multiple modes of commuting of employed person Labour Variable
Occupation of employed person Labour Variable
Occupation of experienced labour force person Labour Variable
Occupation of person with recent work experience Labour Variable
Place of work status of employed person Labour Variable
Retirement of person aged 55 or over Labour; Older adults and population aging Variable
Time arriving at work of employed person Labour Variable
Time leaving for work of employed person Labour Variable
Unpaid care provided by person Labour Variable
Unpaid housework by person Labour Variable
Usual work hours of employed person Labour Variable
Work activity during the reference year of person 15 years or over Labour Variable
Work interruptions of person Labour Variable
All languages spoken at home of person Languages Variable
All languages used at work of person 15 years or over Languages Variable
First official language spoken of person Languages Variable
Knowledge of non-official languages of person Languages Variable
Knowledge of official languages of person Languages Variable
Language spoken most often at home of person Languages Variable
Language used most often at work of person 15 years or over Languages Variable
Mother tongue of person Languages Variable
Other language(s) spoken regularly at home of person Languages Variable
Other language(s) used regularly at work of person 15 years or over Languages Variable
Age of person Population and demography Variable
Age-specific fertility rate of females Population and demography Variable
Children ever born of female Population and demography Variable
Components of migration (in and out), five years, of geographic area Population and demography Variable
Components of migration (in and out), one year, of geographic area Population and demography Variable
Gender of person Population and demography Variable
Location of residence five years ago of person Population and demography Variable
Location of residence one year ago of person Population and demography Variable
Mobility status, five years, of person Population and demography Variable
Mobility status, one year, of person Population and demography Variable
Place of birth of parent of person Population and demography Variable
Place of birth of person Population and demography Variable
Sex at birth of person Population and demography Variable
Total fertility rate of females Population and demography Variable
Usual place of residence in Canada of person Population and demography Variable
Participation in religious activities of person Society and community Variable
Religion of person Society and community Variable
Building Housing; Construction Statistical unit
Building unit Housing; Construction Statistical unit
Census family Population and demography Statistical unit
Collective dwelling Housing; Construction Statistical unit
Company Business performance and ownership Statistical unit
Couple Population and demography; Families, households and marital status Statistical unit
Couple family Population and demography; Families, households and marital status Statistical unit
Couple family with children Population and demography; Families, households and marital status Statistical unit
Dwelling Housing; Construction Statistical unit
Economic family Population and demography; Families, households and marital status Statistical unit
Employed person Population and demography; Labour Statistical unit
Enterprise Business performance and ownership Statistical unit
Establishment Business performance and ownership Statistical unit
Experienced labour force person Population and demography; Labour Statistical unit
Female Population and demography Statistical unit
Foreign resident Population and demography; Immigration and ethnocultural diversity Statistical unit
Geographic area Population and demography Statistical unit
Household Population and demography; Families, households and marital status Statistical unit
Immigrant Population and demography; Immigration and ethnocultural diversity Statistical unit
Location Business performance and ownership Statistical unit
Lone-parent family Population and demography; Families, households and marital status Statistical unit
Non-permanent resident Population and demography; Immigration and ethnocultural diversity Statistical unit
Owner household Families, households and marital status; Housing Statistical unit
Person Population and demography Statistical unit
Person 15 years or over Population and demography Statistical unit
Person aged 55 or over Population and demography Statistical unit
Person not in a census family Population and demography; Families, households and marital status Statistical unit
Person not in economic family Population and demography; Families, households and marital status Statistical unit
Person with recent work experience Population and demography; Labour Statistical unit
Private dwelling Housing Statistical unit
Private household Families, households and marital status; Housing Statistical unit
Renter household Families, households and marital status; Housing Statistical unit
Stepfamily Population and demography Statistical unit

2026 Census of Agriculture dissemination consultative engagement

Opened: February 2024
Closed: April 2024

Before each Census of Agriculture, Statistics Canada conducts consultative engagement activities to obtain user feedback on the Census of Agriculture dissemination strategy and products.

Consultative engagement objectives

The consultative engagement activities will provide opportunities for you to share feedback and indicate your satisfaction with the 2021 Census of Agriculture dissemination products. The results will inform decisions on the 2026 Census of Agriculture dissemination strategy, its products and its services.

How to get involved

This consultative engagement activity is now closed.

Individuals who wish to obtain more information or to participate in engagement activities should contact us at statcan.censusconsultation-consultationrecensement.statcan@statcan.gc.ca

Statistics Canada is committed to respecting the privacy of participants. All personal information created, held or collected by the agency is protected by the Privacy Act. For more information on Statistics Canada's privacy policies, please consult the Privacy notice.

Results

Summary results of the consultative engagement will be published online in fall 2024.