Supplement to Statistics Canada's Generic Privacy Impact Assessment related to the Longitudinal Immigration Statistical Environment (LISE)

Date: October 2022

Program manager: Director of Diversity and Sociocultural Statistics
Director General, Health, Justice, Diversity and Populations

Reference to Personal Information Bank (PIB)

In accordance with the Privacy Act, Statistics Canada has registered Personal Information Banks (PIBs) for its holdings of personal data contained by all databases linked through the LISE. These are: the Longitudinal Immigration Database (IMDB) (PPU 135), the Longitudinal Administrative Databank (LAD) (PPU 112), the Postsecondary Student Information System (PSIS) (PPU 090), the Registered Apprenticeship Information System (RAIS) (PPU 083), Health Research (PPU 076), and Census of Population and National Household Survey (PPU 005). The use of personal information from the programs for linkage purposes is described in these PIBs as part of the consistent uses.

When supplementary data sources are integrated into the LISE, the relevant PIBs are added or updated as required.

Please refer to Information about Programs and Information Holdings for descriptions of these Personal Information Banks.

Description of statistical activity

The Longitudinal Immigration Statistical Environment (LISE) is a secure linkage environment containing anonymous linkage keys to connect one of a subset of secondary databases with one of two primary databases; either the Longitudinal Immigration Database (IMDB) or the Longitudinal Administrative Databank (LAD). Pairwise linkages from one of the primary databases can be conducted with one of the six secondary databases: five administrative health or education databases – the Discharge Abstract Database (DAD), National Ambulatory Care Reporting System (NACRS), Ontario Mental Health Reporting System (OMHRS), Postsecondary Student Information System (PSIS), Registered Apprenticeship Information System (RAIS) – or the 2016 Census of Population. The program is conducted under the authority of the Statistics ActFootnote 1. The databases are categorized and described as follows:

Pairwise linkages between databases
Database Content Name Description
Primary Immigration Longitudinal Immigration Database (IMDB)

The IMDB combines administrative files on immigrant admissions and non-permanent resident permits from Immigration, Refugees and Citizenship Canada (IRCC) with tax files from the Canadian Revenue Agency (CRA).

  • Information is available for immigrants admitted since 1952.
  • Information is available for individuals who obtained non-permanent resident permits since 1980.
  • Tax records for 1982 and subsequent years are available for immigrant and non-permanent tax filers (T1 Family File (T1FF)).
Tax Longitudinal Administrative Databank (LAD)

The LAD is a random, 20% sample of the T1 Family File (T1FF) tax database.

  • Selection for LAD is based on an individual's SIN.
  • There is no age restriction.
Secondary Health Discharge Abstract Database (DAD)

The DAD captures administrative, clinical and demographic information on hospital discharges.

  • It includes in-hospital deaths, sign-outs and transfers, from all provinces and territories except Quebec.
  • Day surgery procedures, long-term care, rehabilitation and other types of care are included later on.
  • The most recent version of the linkage keys connect the primary databases to the DAD from the fiscal year of 1994/1995 onward.
National Ambulatory Care Reporting System (NACRS)

The NACRS contains data for hospital-based and community-based ambulatory care, including day surgery, outpatient and community-based clinics, and emergency departments.

  • NACRS consists of five categories of data elements: demographic, clinical, administrative, financial and service-specific.
  • Information on discharges, deaths and transfers is provided within a fiscal year, from April 1 (start of fiscal year) to March 31 (end of fiscal year).
  • Day surgery procedures, diagnostic imaging visits and numerous clinic visits, including renal dialysis, cardiac catheterization, oncology and mental health are included later on.
  • The most recent version of the linkage keys connect the primary databases to the NACRS from the fiscal year of 2002/2003 onward.
Ontario Mental Health Reporting System (OMHRS)

The OMHRS contains data for individuals who receive services in designated adult inpatient mental health beds in the province of Ontario, as well as those who receive services in designated adult mental health facilities outside Ontario that voluntarily submit data to the reporting system.

  • OMHRS contains the following information: mental and physical health, social supports and service use, care planning, outcome measurement, quality improvement, and case-mix funding applications.
  • The most recent version of the linkage keys connects the primary databases to the OMHRS from the fiscal year of 2005/2006 onward.
Education Postsecondary Student Information System (PSIS)

The PSIS provides detailed information on enrolments and graduates of Canadian public postsecondary institutions. The survey is a census with a cross-sectional design.

  • The PSIS contains information pertaining to programs and courses offered at an institution, as well as information regarding the students themselves and the programs and courses in which they enrolled, or from which they have graduated. The PSIS is also designed to collect continuing education data.
  • The most recent version of the linkage keys connects the primary databases to the PSIS from 2009 onward.
Registered Apprenticeship Information System (RAIS)

The RAIS is conducted on an annual basis as a census of all registered apprentices and trade qualifiers in Canada.

  • It compiles data on the number of registered apprentices taking in-class and/or on-the-job training in trades, which are either Red Seal or non-Red Seal, and where apprenticeship training is either compulsory or voluntary.
  • Information on the number of provincial and interprovincial certificates granted to apprentices or trade qualifiers (challengers) is also included.
  • The most recent version of the linkage keys connect the primary databases to the RAIS from 2008 onward.
Demographics Census of Population (2016)

The Census of Population (2016) is conducted every 5 years as a census with a cross-sectional design.

  • The short form (2A), which collects basic demographic information, surveys every household in Canada.
  • The long form (2A-L), which collects more detailed socio-economic information, surveys approximately 1 in 4 households.
  • Using this linkage key, variables of interest will be investigated within the context of the Census, such as education at landing, knowledge of official languages at landing, and many more.
  • The most recent version of this linkage key connects the IMDB to the long-form 2016 Census.

Development

The development of the LISE was originally initiated by the Ministry of Children, Community and Social Services (MCCSS) of the Government of Ontario, for the purpose of making evidence-based decisions about programs and services delivered to the Syrian refugee cohort and future refugee cohorts.

This new linkage environment supports research in a wide range, regarding relationships between employment, education, health status and social inclusion over time. It enables comparisons between the refugee group, other types of immigrants and native-born Canadians, so as to inform provincial and national policies related to the resettlement and integration of the immigrant population, and refugees in particular.

Statistics Canada will make the data information of the LISE and its analytical capacity known to Canadians through a variety of products, such as aggregated data tables and analytical reports (e.g., The Daily articles). Examples of how results will benefit Canadian society include: by providing information to researchers and policy makers on income earnings by immigrant admission category; by comparing immigrant healthcare resource utilization across admission cohorts; and by providing insights into the pre-admission characteristics of international students that play a significant role in postsecondary education participation and completion. All those results can be used to inform future immigration policies to benefit all Canadians' socioeconomic and overall well-being in a long run.

In order to address additional related research questions, further datasets can be integrated to the LISE using the Social Data Linkage Environment (SDLE)Footnote 2 described in more detail below. Personal Information Banks for these new datasets will be added or updated as required. Additional years of data information from existing included sources can be added, to maintain the timeliness of the LISE.

Privacy and confidentiality

The personal identifiers obtained for immigrants and non-immigrants are used in the SDLE to assign the anonymous statistical identifiers that allow Statistics Canada to link to other sources of information for statistical analysis and research, once approval has been obtained in accordance with Statistics Canada's Directive on Microdata Linkage. The personal identifiers obtained are removed from the rest of the information and securely stored with restricted access to a restricted number of Statistics Canada employees with an approved operational requirement to access them, and whose access is removed when no longer required. The retention period for their storage and their destruction is prescribed by Statistics Canada's Directive on the Management of Statistical Microdata Files.

The integrated datasets in the LISE are subject to the confidentiality requirements of the Statistics Act. As with all data collected under the Statistics Act, the integrated analytical datasets available for research do not contain any personal identifiers. Access is granted to researchers who have been deemed as Statistics Canada employees after they have obtained a security clearance and have sworn an oath of confidentiality under the Statistics Act. Data access is approved for a specific purpose and for a specified period of time, and must occur in a secure setting, such as Statistics Canada offices, the Research Data Centres or approved access to the virtual data lab (VDL) environment. Statistics Canada vets all output for confidentiality before removal from the secure setting or release to the public.

Only aggregated and non-confidential statistical information will be made publicly available and as such, individuals will not be identifiable in any product disseminated to the public. No personal information would ever be disclosed without consent of the original data collector and the authorization from the Chief Statistician, as required by the Statistics Act.

Reason for supplement

While the Generic Privacy Impact Assessment (PIA) presents and addresses most of the privacy principles and security risks related to statistical activities conducted by Statistics Canada, this supplement addresses any privacy risks associated with this new data environment. As is the case with all PIAs, Statistics Canada's privacy framework ensures that elements of privacy protection and privacy controls are documented and applied.

Necessity and Proportionality

The use of personal information for the LISE can be justified against Statistics Canada's Necessity and Proportionality Framework:

1. Necessity

The LISE was requested and funded by the Ontario Ministry of Children, Community and Social Services. MCCSS has commissioned Statistics Canada to create a linkage environment that would allow MCCSS to study the outcomes of immigrants across multiple indicators, such as their economic resilience, educational status, social integration, and their physical and mental health.

The LISE enables the integration of different datasets, both longitudinal and cross-sectional, to help address a wide range of priority policy questions pertaining to immigrant mobility, educational status, social integration, physical and mental health, as well as labour market outcomes over time, which were not possible to address with the underlying datasets alone. The LISE facilitates the production and publication of analysis, indicators and data tables on these topics.

2. Effectiveness

The LISE is essentially a centralized data environment that enables access to indicators for data integration and research activities. This new linkage environment supports research in a wide range, regarding relationships between employment, education, physical health, mental health and social inclusion over time.

The linkages to health and education administrative data as well as the 2016 Census expands research indicators for the LAD and IMDB. It enables comparisons between the refugee group, other types of immigrants and native-born Canadians, so as to inform provincial, as well as countrywide, policies related to the resettlement and integration of the immigrant population, and refugees in particular.

The linkage environment reduces or removes redundancies of specific data integration projects, e.g. LAD-IMDB, Census-IMDB and reduces redundancies of those projects, so as to relieve the financial and personnel burdens for Canadian taxpayers as well as the Government of Canada.

The LISE is also in-line with the fulfillment of Statistics Canada's mandate for disaggregated data by offering new indicators focusing on all Canadians, and especially vulnerable population groups, e.g., Syrian refugees.

As a linkage environment of multiple administrative databases (and the Census), it offers more accurate information, such as immigration admission categories, than what could be obtained from surveys, owing to the size and geographical distribution of the population of interest and the technical nature of some of the key variables.

Integrating administrative education and health databases with the IMDB and LAD will provide contextual and outcome information for immigrants, non-permanent residents and Canadians, resolving existing data gaps while circumventing any additional response burden. For example, large-scale investigations are scarce regarding the "healthy immigrant effect," which refers to a commonly observed trend. That is, immigrants usually arrive in their host countries with, on average, better health conditions than those of their native-born counterparts. However, their advantageous health conditions gradually disappears over time. In this context, information about health care use from health administrative databases can be used as a proxy of health conditions, so that linkages between the IMDB (or the LAD) and health data serve as an effective tool to examine immigrants' physical and mental well-being and their over-time changes.

The access to analytical datasets as well as the LISE anonymous matching keys expands access and research opportunities for using the rich information and enables evidence-based research and policy analyses conducted by researchers, governmental and non-governmental stakeholders. The utilization of the LISE helps decision makers develop well-informed policies, which aim to:

  • facilitate a smooth and effective process of resettlement and integration for refugees, newcomers and the immigrant population in general;
  • contribute to building a healthier Canadian society by enhancing Canadian residents' access to healthcare services, in particular vulnerable groups;
  • boost opportunities for educational/skill attainment among disadvantaged children and youth, such as those from refugee families and/or those residing in remote rural areas; and
  • improve all Canadians' health, socioeconomic and overall well-being by, for example, stimulating economic development with increasingly diverse labour forces, channelling immigrant workforces to jobs that are in high demand, and creating new jobs and services that benefit all Canadians.

3. Proportionality

The methods and practices behind the LISE (and the SDLE) have been designed to ensure the protection of privacy and personal information, while retaining the ability to integrate analytical variables from different existing sources to fill data gaps.

More specifically, by building the linkages between the IMDB (or the LAD) and health/education administrative databases as well as the Census, the LISE makes it possible to examine complex phenomena that are often difficult, if not impossible to understand when analyzing the existing data sources in isolation. For example, earnings for international students who completed different types of education or training programs can be compared using administrative data, rather than survey data – thus greatly reducing the burden of respondents. Pathways through postsecondary education can be examined for Canadian and international students of various ages and sociodemographic profiles, while economic outcomes can support analysis on skill utilization. Rates of program completion for immigrant compared to non-immigrant students or apprentices can be determined using several years of existing administrative data rather than waiting for the completion of one or more cycles of a new survey.

Beyond the immigrant population, the research capacity of the LISE benefits all Canadians. For instance, the linkages between the LAD and health administrative databases provide first-hand information on Canadians' access to healthcare services by demographic characteristics, by geographic areas, as well as over time. Such investigations will effectively inform policy making that provides timely, accessible and sufficient healthcare services to all immigrants as well as vulnerable population groups and Canadians in general, in order to maintain a healthy workforce for a sustainable and resilient Canadian society and economy.

In addition, the LISE enables in-depth data disaggregation to focus on particularly vulnerable population groups. Within the refugee population, little is known about differences among government-assisted refugees, privately-sponsored refugees and Protected Persons in Canada, in terms of their mobility patterns after arriving in Canada, access to healthcare services, integration patterns in Canadian labour markets, as well as their children's access to and interactions with the Canadian educational system. Such research will immensely benefit the adjustment of programs such as the Resettlement Assistance Program, facilitating a smooth integration process for all refugees.

In short, the development of the LISE allows for additional research opportunities using the core datasets to inform policy and practice. By integrating Statistics Canada's current administrative datasets, new and expanded statistical analysis can be undertaken. It also enables future linkage work with other Statistics Canada administrative and survey-based databases – again, enriching and expanding analytical opportunities to better inform public policy and research.

The associated invasion of privacy is proportional to the research, policy and various other benefits the LISE offers to vulnerable population groups, immigrants and Canadians in general. In most cases, the most privacy-intrusive data are basic personal information (e.g. date of claim, country of birth, postal code) for refugee claimants, as there are risks associated with being identified (especially if they live in a sparsely populated postal code). In terms of data linkages involving the Census, information from the Census might also make it possible to identify racialized groups amongst immigrants. However, such information is also the most important information for the IMDB, as well as the immigrant sub-population in the LAD, as it enables robust longitudinal data integration. Due to the small number of a certain racialized groups as well as refugees, they may also face the risk of being identified from health and education administrative data records, particularly when they are further grouped by immigration categories, refugee programs, countries of origin and landing destinations, etc.

After the application of Statistics Canada's dissemination rules, this privacy intrusion is deemed minimal and risk of external reidentification is low. This minimal intrusion will lead to the provision of better services to immigrants (including refugees) and the creation of policies that improve the settlement process, ultimately benefitting not only the immigrant population but also the Canadian society and economy as a whole. Policy makers can access the analytical and statistical products from the LISE to shape new policies and programs. Likewise, immigrant service providers can access the data to determine the services required to help immigrants settle in Canada successfully.

Based on the IMDB and the LAD, the LISE also shares concerns with those two data bases. For example, tax microdata provides detailed industry of employment information and income sources of everyone, as well as immigration data that identify source countries and year of immigration. In some cases, postal code from tax microdata (the lowest level of geography on the database) can be a low enough level of geography to identify individuals with unique characteristics, for example, minority population groups residing in remote areas. However, sub-provincial data are often required to respond to the needs of individuals at their place of residence (to more effectively guide local policy and support services), and any dissemination is subject to Statistics Canada suppression rules which minimize the chance of such reidentification. For immigrants, in particular, their socioeconomic outcomes vary depending on the place of residence, time since admission, and pre-admission characteristics, supporting the use of this personal information as proportional to the public good resulting from its use.

4. Alternatives

The LISE provides the keys to match annual records over time. Without this option for comprehensive longitudinal data, the analysis of Canadians, permanent and non-permanent residents' pathways through education or healthcare institutions is impossible. Longitudinal indicators using these keys provide the changing trajectories over time and enhance analytical capabilities of the IMDB and the LAD. Matching administrative data to additional health and education information allows analysis at a refined level of geography and enables analysis for vulnerable sub-populations. In an approved secure environment, employees and deemed employees can analyze the relationships of population characteristics over time, with future labour market, healthcare or education outcomes on an annual basis. No other sources allow such a detailed analysis. Survey sources are restricted by sample size, response rates, less frequent collection, and lack of granularity in the data. Moreover, Statistics Canada has longstanding evidence that response rates to longitudinal surveys decline considerably over time, introducing biases and substantially reducing quality and accuracy. For these reasons, most longitudinal surveys have been discontinued.

Creating a new longitudinal survey capable of capturing the same variables over the amount of time required for longitudinal analysis would cost a great amount of time and resources, add significantly greater response burden to respondents, and be subject to aforementioned survey restrictions that could diminish the accuracy or quality of the survey data when compared to the administrative data in question.

Moreover, the sources of the health and education administrative data contain objective and accurate data collected by trained professionals using pre-established classification systems. Such information is much more reliable than could be obtained through surveys that collect self-reported information often biased by respondents' memories, feelings and judgements. For example, the DAD receives information directly from acute care facilities (or from their respective health/regional authority or ministry/department of health), with focuses on standardized patient-centred clinical outcomes. Since 2004-2005, all DAD records have been reported in ICD-10-CA (International Statistical Classification of Diseases and Related Health Problems) and CCI (Canadian Classification of Health Interventions).Footnote 3 The NACRS pick-lists are the Presenting Complaint List (data element 136) and ED Discharge Diagnosis (data element 137). The Canadian Emergency Department Diagnoses Shortlist (CED-DxS) includes more than 800 diagnoses in common terms, which are mapped to ICD-10-CA codes.Footnote 4 OMHRS collects data using the Resident Assessment Instrument – Mental Health (RAI-MH) version 2.0, a standardized clinical instrument that is used to regularly assess those receiving inpatient mental health care.Footnote 5 All those health administrative databases support the collection and analysis of clinical outcomes, health systems use and performance reporting. Likewise, the two education administrative databases collect subjective information about students' enrollment into and graduation from programs in postsecondary institutions (vocational/training schools included) throughout the country. With linkages to those administrative databases, the analytical capabilities of the IMDB and LAD are expanded tremendously, by using the most reliable and objective information sources, while minimizing respondents' burden and the costs of data collection.

Mitigation factors

The integrated datasets in the LISE do not contain any personal identifiers and are subject to the privacy and confidentiality requirements of the Statistics Act. Anonymous linkage keys are created using the Social Data Linkage Environment (SDLE) for which a separate privacy impact assessment has been approved. Furthermore, the privacy principle of limited collection is applied to data linkages, which are limited to integrating only one primary database to one secondary database at a time, serving to further mitigate aforementioned risks:

Privacy principles of limited collection
Primary Secondary
One of either: IMDB integrated with one of: DAD
NACRS
OMHRS
LAD PSIS
RAIS
Census

The personal identifiers obtained for immigrants, non-permanent residents, as well as Canadians are used in the SDLE to assign the anonymous statistical identifiers that allow Statistics Canada to link to other sources of information for statistical analysis and research, once approval is obtained in accordance with Statistics Canada's Directive on Microdata Linkage. The personal identifiers obtained are removed from the rest of the information and securely stored with restricted access to no more than twenty-five Statistics Canada employees with an approved operational requirement to access them, and whose access is removed when no longer required. The retention period for their storage and their destruction is prescribed by Statistics Canada's Directive on the Management of Statistical Microdata Files. As mentioned, refugee claimants, particularly residing in sparsely populated areas and/or originating from small racialized groups, may still be exposed to a higher risk of reidentification. To protect respondents' confidentiality, any group with a small number of members will be collapsed into larger groups, for example, by using a higher level of geographic classification or by collapsing a certain ethnic group or country of origin with its adjacent ethnic groups/countries.

Therefore, the overall risk of harm to the individuals whose information resides in the LISE has been deemed manageable with existing Statistics Canada safeguards that are described in Statistics Canada's Generic Privacy Impact Assessment. These include, but are not limited to:

Collection

The information is transmitted electronically to Statistics Canada using a secure electronic file transfer protocol.

Storage & processing

Identity and roles-based access management controls are in place in support of least-privilege and need-to-know principles. Access is restricted to employees who demonstrated a valid requirement to access the data. Data is fully encrypted at rest and in transit as required by GC policy, and encryption keys are managed by the Government of Canada to ensure that only authorized users can decrypt the data. Statistics Canada's cloud implementation aligns with Treasury Board of Canada Secretariat (TBS) cloud services direction including the Government of Canada Cloud Security Risk Management Approach and Procedures.

After initial processing, a statistical identifier is generated by Statistics Canada to facilitate data integration. As per standard practice, following linkages with other sources of information, data is stripped of direct identifiers such as name and address, to help protect privacy and confidentiality.

Access

Access to any confidential data held by Statistics Canada is closely monitored. For information with personal identifiers, only a limited number of employees with a work-related need-to-know are allowed access.

The purpose of making the LISE available to approved researchers (as 'deemed employees')Footnote 6 through the Statistics Canada Research Data Centers (RDCs) is to ease data access for users while enforcing the security requirements of the microdata. Access to the LISE at RDCS is only granted after a successful security screening and on a need-to-know basis.

Dissemination

The Statistics ActFootnote 7 provides the legal basis for maintaining the confidentiality of personal information that Statistics Canada collects. Statistics Canada will not disclose confidential information to any third party, other than with the permission of the original data provider and the authorization from the Chief Statistician, as required by the Statistics Act.

Statistics Canada will publish only aggregated statistical information or anonymized public use microdata files as part of its general dissemination strategy.

Openness

Information concerning datasets available within the LISE through Statistics Canada's Research Data Centres or approved LISE-based research projects is available on Statistics Canada's website. This supplement to Statistics Canada's Generic PIA will also be publicly available Statistics Canada's website.

Conclusion

This assessment concludes that with the existing Statistics Canada safeguards, any remaining risks are such that Statistics Canada is prepared to accept and manage.

Formal approval

This Supplementary Privacy Impact Assessment has been reviewed and recommended for approval by Statistics Canada's Chief Privacy Officer, Director General for Modern Statistical Methods and Data Science, and Assistant Chief Statistician for Social, Health and Labour Statistics.

The Chief Statistician of Canada has the authority for section 10 of the Privacy Act for Statistics Canada, and is responsible for the Agency's operations, including the program area mentioned in this Supplementary Privacy Impact Assessment.

This Privacy Impact Assessment has been approved by the Chief Statistician of Canada.

Accessibility at Statistics Canada

The Accessible Canada Act (ACA) which came into force in July 2019, aims to create a barrier-free Canada by 2040. To achieve this goal, the ACA mandates regulated entities to develop and publish accessibility plans, establish feedback processes, and report transparently on their progress. As part of this effort, we encourage you to provide feedback to help us build an accessible and barrier-free Canada. You can comment on our accessibility plan and describe any accessibility barriers you have encountered with Statistics Canada Your input is vital to ensuring we make meaningful progress.

Provide feedback

Services and information

Road to Accessibility, 2023-2025

Accessibility plan: Policies, programs, practices, and services that help our organization contribute to the goal of an accessible and barrier-free Canada

Road to Accessibility, 2024 progress report

Results of our activities to improve how our organization contributes to the goal of an accessible and barrier-free Canada

Feedback process

Different ways to provide feedback, how to request alternate formats, what we do with your feedback and how to keep your feedback anonymous

Registered Apprenticeship Information System (RAIS) Guide, 2021

Concepts used by the Registered Apprenticeship Information System (RAIS)

Designated trades

Apprenticeship training and trade qualifications in Canada are governed by the provincial and territorial jurisdictions. These jurisdictions determine the trades, for which, apprenticeship training is made available as well as the trades, for which, certificates are granted. These are referred to as designated trades. The jurisdictions also determine which of the designated trades require certification in order to work unsupervised in the trade. The list of designated trades varies considerably between the jurisdictions. Data from the Registered Apprenticeship Information System (RAIS) include those trades that are designated in at least one province or territory.

Registered apprentices are people who are in a supervised work training program in a designated trade within their provincial or territorial jurisdiction. The apprentice must be registered with the appropriate governing body (usually a Ministry of Education or Labour or a trade specific industry's governing body) in order to complete the training.

Trade Qualifiers or Trade Challengers are people who have worked in a specific trade for an extended period of time, without necessarily having ever been an apprentice, and who have received certification from a jurisdiction, usually done via a skills assessment examination in the trade.

Registrations

The total registrations in apprenticeship programs is the count of any registrations that occurred during the reporting period (from January to December of the calendar year) within one of the 13 jurisdictions (province or territories).

Total registrations = Already registered + New registrations + Reinstatements

  • Already registered - the number of registrations carried forward from the previous calendar year
  • New registrations - new entrants to any apprenticeship program that occurred during the 12 months reporting period
  • Reinstatements - registrations by people who had left an apprenticeship program in a specific trade in a previous year and had returned to the same apprenticeship program during the reporting period
Red Seal and non-Red Seal Programs

The Red Seal Program sets common standards assessing the skills of tradespersons across Canada in specific trades, referred to as the "Red Seal" trades. Tradespersons who meet the Red Seal standards, through examination, receive a Red Seal endorsement on their provincial/territorial trade certificates. The Red Seal endorsement provides recognition that your certificate meets an interprovincial standard that is recognized in each province and territory.

Non-Red Seal trades do not have interprovincial standards. Many of these trades do not have an examination requirement in order to work in the trade.

Certification

The requirements for granting a certificate varies by jurisdiction in Canada. In most instances, an apprentice is issued a certificate if he or she completes requirements such as supervised on-the-job training, technical training, as well as passing one or more examinations. Most trade qualifiers (Challengers), meanwhile, become certified once they pass an examination.

Certification terminology

There are jurisdictional differences in the names of certificates awarded.

They may include:

  • Certificate of Apprenticeship
  • Diploma of Qualification
  • Certificate of Qualification
  • Journeyperson's Certificate
  • Certificat d'aptitude
  • Certificat de compagnon
  • Certificat de compétence
  • Diplôme d'apprentissage

Federal, provincial and territorial changes pertinent to the interpretation of RAIS data

1. Revisions have been made to the Quebec 1991 to 2005 data, which also changed the previous Canada totals.

2. Prior to 1999, Nunavut was part of the Northwest Territories.

3. Starting in 2003, a change occurred in the reporting of Newfoundland and Labrador's information concerning newly registered apprentices and cancellations/suspensions.

4. The British Columbia data have been revised in 2005. This changed the previous Canada totals for 2005.

5. Starting with the 2005 reporting year, Prince Edward Island changed their information system and this may have affected historical comparisons. At the end of 2006, Prince Edward Island made some adjustments and revisions to their database which accounted for the change in the carry-over of registered apprentices for the beginning of 2007. In 2007, an increase in new registrations is, to some extent, related to a demand for skilled workers outside of the province. In 2008, due to technical difficulties during the redesign of their Registered Apprenticeship Information System, Prince Edward Island was not able to report a number of apprentices.

6. In 2006, minor trade code revisions were made to Manitoba.

7. In 2006 and 2007, differences may occur in Ontario related to the carry-over totals of active apprentices between both years. This is a result of the conversion of client data into Ontario's new database system. As a result, a clean-up of inactive clients occurred and this adjusted the active total of registered apprentices and their carry-over into 2007.

8. As of 2008, the portion of total Quebec trade information coming from Emploi-Quebec (EQ) is no longer being provided in aggregated form. The data from the province includes all trades with the exception of the automotive sector.

9. In 2008, Alberta incorrectly included the Industrial warehousing trade with the Partsperson and Partsperson (material) trades and also excluded the Construction Craft Worker trade.

10. In 2008, a distinct feature of the Rig Technician trade is that although individuals may be registered as apprentices in the trade in Ontario, their certificates are granted as trade qualifiers (challengers).

11. In 2008, Alberta reported a large number of discontinued apprentices, which was a result of them implementing a series of cancellations and suspensions of inactive apprentices.

12. In 2008 and 2009, new Quebec legislation affecting the Emploi-Quebec (EQ) sector trade was introduced. This resulted in some changes in the reporting of registered apprenticeship registrations.

13. An adjustment has been made to the Joiner trade in British Columbia, to include the trade in the Interior finishing major trade group, rather than in the previous Carpenter's major trade group.

14. In 2010, the Emploi-Quebec (EQ) data included revised trade programs where some of the trades have been segmented into several levels. This segmentation created possible multiple registrations and completions by a single individual apprentice, where previously only one registration and completion existed for this individual.

15. In 2011, the Electronics technician (Consumer Products) trade was no longer designated as a Red Seal trade.

16. In 2012, the Gasfitter - Class A and Gasfitter - Class B trades were designated as Red Seal trades.

17. In 2013, changes in provincial regulations governing drinking water related trades reported by Emploi-Quebec (EQ), have resulted in program changes, as well as the transferring of responsibility of some of these trades to the Conseil de la Construction du Québec (CCQ).

18. Begining in 2013, Ontario's data is received from two organizations. The registration data continues to be reported by the Ministry of Advanced Education Skills Development (MASED). They are also responsible for issuing Certificates of Apprenticeships upon the completion of technical training and on-the-job hours. The Ontario College of Trades (OCOT) is responsible for reporting data on Certificates of Qualifications, which are issued to apprentices upon the completion of a certification exam. This administrative practice has affected the RAIS data in a number of different ways.

  1. On April 8, 2013, MASED awarded a Certificate of Apprenticeship to approximately 6,000 apprentices who had completed their technical training and on-the-job hours, and had not yet received a Certificate of Qualification.
  2. There are discrepancies in the number of apprentices in Ontario due to differences in how MASED and OCOT define an apprentice. OCOT considers apprentices to be their members, for whom they have received membership applications with payment of annual membership fees. MASED considers apprentices to be individuals for whom they have received signed training agreements. In the MASED registration data, apprentices can have active and inactive statuses, which can also contribute to discrepancies. Inactive apprentices are apprentices with whom MASED have not received information about their progression in their apprenticeship program for more than a certain period of time. Active and inactive apprentices are included in the RAIS data. As such, the RAIS data may include previously registered apprentices, who have since discontinued their apprenticeship program, but have not yet informed MASED that they have discontinued their program.
  3. Beginning in 2013, apprentices who discontinued from apprenticeship programs in the past, but who remained on the database as already registered apprentices began to be removed from MASED records. These removals appear in the RAIS data files in the following years. The clean-up occurred during odd years (2013, 2015, and 2017). After discussion with the Ontario data partners in 2019, it was indicated that the last of these batch discontinuations were completed in 2017. As a result, there will be less of a spike in discontinuations, and more of a normalized trend from here starting in 2018 and onwards. Normal discontinuation figures for the province will be about 5,000 to 7,000 per year.
  4. In 2014 and 2015, apprentices who did not receive their Certificate of Qualification or Certificate of Apprenticeship in the same year were classified as trade qualifiers (Challengers) rather than apprentices. To align the RAIS data with the standard definition of trade qualifier (Challengers), these records were reclassified as apprentices with the release of the 2016 RAIS data. This revision led to a decrease of about 2,600 trade qualifiers (Challengers) in Ontario in both 2014 and 2015 compared to the previously released data.

19. In 2013, a regulatory change came into effect which affects both Ornamental ironworkers and Structural steel erectors under the jurisdiction of the Conseil de la Construction du Québec (CCQ). Workers in these two trades are now considered Ironworkers. Both the 2014 and 2015 reference years were also impacted by these regulatory changes.

20. In 2013, changes were made to the Automotive Service Technician trades in British Columbia. Apprentices no longer have to complete mandatory work-based training hours at each program level before progressing to the next level of technical training. The 2014 reference year was also impacted by these changes.

21. Certificates in the Steamfitter/Pipefitter trade under the Conseil de la Construction du Québec (CCQ), also include Plumbers.

22. Starting in 2013, Building/Construction Metalworker are coded to Metal Workers (other) instead of being included in the 'Other' category.

23. In 2014, the Heavy Equipment Operator (Dozer), Heavy Equipment Operator (Excavator) and Heavy Equipment Operator (Tractor-Loader-Backhoe) trades were designated as Red Seal trades.

24. Trade qualifiers (Challengers) in trades governed by Emploi-Quebec (EQ) represents certificates granted to individuals who received recognition for previously completed training. Emploi-Quebec (EQ) may, for example, recognize training in the case where an individual has a certificate in other provinces, territories, countries, or if the individual received a Diploma of Vocational Studies (DVS) in Quebec. These trade qualifiers (Challengers) also represent certificates granted as part of the regular re-certification process required in certain trades.

25. In March of 2014, there were changes made to the eligibility for the Apprenticeship Training Tax Credit (ATTC) in Ontario. This may have affected registration counts in some trades including those for information technology.

26. Prior to 2014, three welder programs (level A, level B, and level C) were offered in British Columbia. Starting in 2014, these three programs began to be phased out and replaced by a single apprenticeship program for welders. This change will impact registrations and certifications in this trade for the years following 2014.

27. Starting in 2017, changes are being made to the Automotive Service Technician program in British Columbia. The program is being restructured to align with other Canadian jurisdictions Automotive Service Technician Red Seal programs. These changes impacted reinstatement totals for 2017 and will potentially influence registrations counts for years following 2017.

28. In July 2018, Manitoba announced that it will perform a data clean-up every two years, starting with the 2019 reporting year. This clean-up resulted in lower numbers for both registrations and certifications for the 2019 reporting year.

29. In 2013, the structural steel erector trade and locksmith trade merged to become the ironworker worker trade. Transitional measures were put in place for journeypersons in these trades, which ended in July 2018.

30. British Columbia has some broad categories of trades where it is possible to receive a certificate after each level is completed, while other jurisdictions only certify apprentices after completing the final level.

  1. In 2019, the Industry Training Authority (ITA) made a decision to group some of their trades under one general trade. For example, Automotive Service Technician 1, Automotive Service Technician 2, and Automotive Service Technician 3 were combined into Automotive Service Technician.
  2. All the trades under Welder were not consolidated, but a general version of the Welder trade was created in 2019.
  3. Also, some apprenticeships were deactivated for certain trades and replaced by Challenge Pathway only, which is for trade qualifiers. Rig Technician, Petroleum Equipment Service Technician, and Water Well Driller are examples of these trades.

31. Starting December 1st, 2019, British Columbia will no longer offer technical training for the Rig Technician apprenticeship program. The apprentices continuing in this trade were taking their technical training in Alberta; however, Alberta no longer offers technical training for this trade and is in the process of de-designating this apprenticeship. Individuals can still receive a designation in trade by challenging the exam in British Columbia.

32. In 2020, as a result of the pandemic some provinces cancelled or postponed in-class training, exams and apprenticeships throughout 2020. Counts for various indicators might be considered historical lows due to the pandemic in 2020. This created a larger deviation in the data for RAIS 2020 registrations, certifications and discontinuations.

Federal Patents, Licences and Royalties Survey 2021-2022

Information for respondents

This information is collected under the authority of the Statistics Act, Revised Statutes of Canada, 1985, Chapter S-19.

Completion of this questionnaire is a legal requirement under this act.

Survey Objective

This survey collects information that is necessary for monitoring federal patent, royalty and licensing related activities in Canada, and to support the development of science and technology policy. The data collected will be used by federal science policy analysts. Your information may also be used by Statistics Canada for other statistical and research purposes.

Confidentiality

Your answers are confidential. Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Statistics Canada will use the information from this survey for statistical purposes.

Security of emails and faxes

Statistics Canada advises you that there could be a risk of disclosure during the transmission of information by facsimile or e-mail. However upon receipt, Statistics Canada will provide the guaranteed level of protection afforded all information collected under the authority of the Statistics Act.

Data sharing agreement

To reduce response burden and to ensure more uniform statistics, Statistics Canada has entered into an agreement under Section 12 of the Statistics Act with Innovation, Science and Economic Development Canada (ISED) and National Research Council Canada (NRC) for sharing information from this survey. ISED and NRC have agreed to keep the information confidential and use it only for statistical purposes.

Under Section 12, you may refuse to share your information with ISED and NRC by writing a letter of objection to the Chief Statistician, specifying the organizations with which you do not want Statistics Canada to share your data and returning it with the completed questionnaire.

Record linkages

To enhance the data from this survey and to minimize the reporting burden, Statistics Canada may combine it with information from other surveys or from administrative sources.

I hereby authorize Statistics Canada to disclose any or all portions of the data supplied on this questionnaire that could identify this department.

  • Yes
  • No
  • Name of person authorized to sign
  • Official Position
  • Program
  • Department or agency
  • Email address
  • Telephone number
  • Extension

Section 1 - Identifying Intellectual Property (IP)

1.1 Reports and disclosures

Please indicate the number of new instances of Intellectual Property reported or disclosed during the reference year 2021/2022.

Please indicate how many instances of Intellectual Property (not necessarily new) resulted in protection activity by this organization and how many were declined for protection by this organization.

The types of Intellectual Property are defined in the Respondent Guide, Section 4.1.1.

In this question, the number of new IP reports and disclosures and the number of IP reports and disclosures (resulting in protection activity and / or declined for protection) are asked for the following categories:

  • Inventions
  • Copyrightable IP (computer software, databases, educational material, other material)
  • Industrial designs
  • Trademarks
  • Integrated circuit topographies
  • New plant varieties
  • Know-how
  • Other (please specify):

Section 2 - Protecting Intellectual Property (IP)

2.1 Patents

2.1 a) During reference year 2021/2022, how many initiating and follow-on patents were applied for and how many patents were issued with the support of this organization? Initiating patent applications include provisional or first filings.

Follow-on patent applications include any that claim priority from an initiating patent application.

International (for example, Patent Cooperation Treaty applications, PCT) and regional applications (e.g., European Patent Office applications) should be counted as single applications.

In this question, the number of New patent applications (Initiating, Follow-on, and Total) and Total patents issued are requested.

2.1 b) Patents held, commercialized and pending

In this question, the Total number are asked of each of the following categories:

  • Total patents held (including patents issued during the reference year)
  • Total patents pending
  • Patents (held or pending) licensed, assigned or otherwise commercialized during the reference year

Section 3 - Licences

3.1 New and active licences

Please report the number of new licences executed during the reference year 2021/2022 and the number of active licences at the end of the reference year 2021/2022. If detailed figures are not available, please report totals in the appropriate cells. Please see the Respondent Guide, Section 4.3.1, for detailed definitions.

In this question, the number of exclusive or sole licence, Non-exclusive or multiple licences, and total are asked of each of the following categories:

  1. New licences executed with Canadian licensees
  2. New licences executed with foreign licensees
    Total new licence (a + b)
  3. Active licences executed with Canadian licensees
  4. Active licences executed with foreign licensees
    Total active licences (c + d)

3.2 Income received from IP

Please specify the nature of the income received during the reference year 2021/2022 from IP commercialization.

In this question, Income received from IP commercialization (in thousands of Canadian dollars) are asked for the following:

  • Running royalties and milestones payments
  • One-time sale of IP (in exchange for a single payment or several payments)
  • Reimbursement of patent, legal and related costs
  • Licence income received from another Canadian institution under a revenue sharing agreement
  • Other (please specify):
  • Other (please specify):
  • Total income received from IP commercialization

Section 4 - Respondent Guide

This questionnaire, in general, covers the intellectual property generated from R&D activities. We acknowledge that commercializable IP arises from other activities as well and that it may be difficult to differentiate. Whenever possible, please report figures for IP generated from R&D activities. If this is not possible, please note that the figures include IP generated from non-R&D activities.

If exact numbers are not readily available, please provide estimates with a note indicating this.

Please do not leave any question blank. Enter zero responses with the digit «0» if the value is known to be zero. If the data are not available, enter «N/A». In cases where the question is not applicable, please indicate this.

Report all dollar amounts in thousands of Canadian dollars.

Notes on survey questions

1.1 Identifying IP – Reports and disclosures:

  • Invention: Includes any new and useful art, process, machine, manufacture or composition of matter, or any new and useful improvement in any art, process, machine, manufacture or composition of matter (Public Servants Inventions Act. R.S., c. P-31, s. 1.). Some inventions are patentable in some jurisdictions but not in others: these include novel genetically-engineered life forms, new microbial life forms, methods of medical treatment and computer software.
  • Copyrightable IP can be broken into the following:
    • Computer software or databases: As noted above, computer software can be patented but normally it is protected by copyright. Databases may also be copyrighted.
    • Educational materials: This category includes special materials that may be copyrighted but are not necessarily in the form of printed books. This could include broadcast lessons, Internet pages, booklets, posters or computer files, among others.
    • Other material: This category includes any copyrightable works other than computer software and databases and special educational materials such as literary, artistic, dramatic or musical works, books, and papers.
  • Industrial designs: These are original shapes, patterns or ornamentations applied to a manufactured article. Industrial designs are protected by registration with the Canadian Intellectual Property Office.
  • Trademarks: These are words, symbols, designs, or combinations thereof used to distinguish your wares or services from someone else's. Trademarks are registered with the Canadian Intellectual Property Office.
  • Integrated circuit topographies: This is a three-dimensional configuration of the electronic circuits used in microchips and semiconductor chips. Integrated circuit topographies can be protected by registration with the Canadian Intellectual Property Office.
  • New plant varieties: Certain plant varieties that are new, different, uniform and stable may be protected by registration with the Plant Breeders' Rights Office, Canadian Food Inspection Agency.
  • Know-how: This is practical knowledge, technique or expertise. For example, certain information is codified in the patent application but a researcher's know-how could be valuable for commercial optimization of the product. Know-how can be licensed independently of the terms of a related patent.

2.1 Patents:

  • Initiating patent applications include provisional or first filings.
  • Follow-on patent applications include any that claim priority from an initiating patent application.
  • Patents pending: A label sometimes affixed to new products informing others that the inventor has applied for a patent and that legal protection from infringement (including retroactive rights) may be forthcoming.

3.1 New and active licences:

  • "New licences executed" refers to the completion of an agreement with a client to use the institution's intellectual property for a fee or other consideration (such as equity in the company).
  • "Exclusive or Sole licences" refers to agreements allowing only one client the right to use the intellectual property.
  • "Exclusive licence" refers to one granted that is exclusive for a territory, for a field of use worldwide or otherwise. Hence, there may be multiple exclusive licences for a single patent.

3.2 Income received is in thousands of Canadian dollars:

  • Running royalties are those based on the sale of products.
  • Milestone payments are those made by a licensee at predetermined points in the commercialization process.
  • One time sales of IP includes income from assignments to commercial exploiters.
  • Other income received from IP: For example, if a potential licensee contributes the funds to apply for the patent, this could be considered another source of income. Please list all items whether or not figures are available.

Contact Person

Name of the contact person who completed this questionnaire:

  • First name
  • Last name
  • Title
  • Email address
  • Telephone number
  • Extension
  • Fax number

How long did you spend collecting the data and completing the questionnaire?

  • hour(s)
  • minutes

Comments

We invite your comments below.

If necessary, please attach a separate sheet.

Please be assured that we review all comments with the intent of improving the survey.

Thank you for completing this questionnaire.

Survey on Sexual Misconduct in the Canadian Armed Forces

Date: September 2022

Program manager: Director, Centre for Social Data Integration and Development Director General, Social Data Insights, Integration and Innovation Branch

Reference to Personal Information Bank (PIB):

Personal information collected through the Survey on Sexual Misconduct in the Canadian Armed Forces is described in Statistics Canada's "Special Surveys" Personal Information Bank. The Personal Information Bank refers to information collected through Statistics Canada's ad hoc surveys which are conducted on behalf of other government departments, under the authority of the Statistics Act. "Special surveys" covers a variety of socio-economic topics including health, housing, labour market, education and literacy, as well as demographic data.

The "Special Surveys" Personal Information Bank (Bank number: StatCan PPU 026) is published on the Statistics Canada website under the latest Info Source chapter.

Description of statistical activity:

Statistics Canada will be conducting the Survey on Sexual Misconduct in the Canadian Armed Forces, on a cost-recovery basis on behalf of the Department of National Defence. The survey will provide insight on sexual assault, sexualized and discriminatory behaviours, and knowledge and perceptions of policies and responses to sexual misconduct. This will be the third collection cycle for the Department of National Defence on this topic; the survey is collected every two years, with the previous two cycles being 2016 and 2018 (the 2020 collection was postponed due to COVID-19).

The survey content includes questions on witnessing and experiencing inappropriate sexual behaviours, discrimination based on sex, sexual orientation, or gender identity, and incidences of sexual assault. It also includes questions about the characteristics of sexual misconduct behaviours and incidences, their impact and reporting of these experiences. Additionally, it contains questions on the age, sex at birth, gender identity, visible minority, Indigenous status, and disability of the respondent. The survey includes specific questions about military members and reservists and their rank over the past 12 months leading up to collection.

This data will be collected from all Regular Force members (approximately 56,000 members, with some exclusions) and members of the Primary Reserve (approximately 27,000) using an employee list provided by the Department of National Defence. This survey is conducted under the authority of the Statistics Act and the response rate is expected to be 30%. Although this collection is being performed for the Department of National Defence, there is no data sharing agreement nor any intent or plan to share any microdata from this survey with them; only aggregate results will be reported. As with previous cycles, SSMCAF 2022 is requesting an exemption from the Directive of Informing Survey Respondents (ISR) to remove the general statement related to data linkage.

Reason for supplement:

While the Generic Privacy Impact Assessment (PIA) addresses most of the privacy and security risks related to statistical activities conducted by Statistics Canada and applied to the two previous cycles of the survey (2016 & 2018), this supplement describes the measures (see below, Mitigation Factors) being implemented for collection and access to the information for this cycle due to the sensitivity of the questions asked and the public scrutiny surrounding sexual misconduct in the Canadian Armed Forces following the release of the Independent External Comprehensive Review on the Department of National Defence and the Canadian Armed Forces in May 2022 highlighting deficiencies around the management of sexual misconduct. This supplement also presents an analysis of the necessity and proportionality of this new collection of personal information.

Necessity and Proportionality

The collection and use of personal information for the Mental Health and Access to Care Survey can be justified against Statistics Canada's Necessity and Proportionality Framework:

  1. Necessity:

    The Survey on Sexual Misconduct in the Canadian Armed Forces will support the Department of National Defence's continued efforts to address and prevent sexual misconduct in its workplace and amongst its workforce. The content of the survey, including the personal information being requested, was deemed necessary for understanding, and, ultimately, preventing and addressing experiences of inappropriate sexual behaviours. Research suggests the risk of experiencing sexual harassment and victimization varies according to a number of factors, many of which require the collection of personal information, such as age. Gathering non-identifiable data would not enable the identification of these risk factors and would result in potentially ineffective interventions.

    Research on sexual misconduct has identified certain risk factors such as gender, education, income, visible minority status, disability status and marital status. The data will be analyzed according to these factors to determine if they are also associated with an increased risk of sexual harassment and victimization in the workplace specifically.

    This work has become even more necessary in light of the publication of the Independent External Comprehensive Review on the Department of National Defence and the Canadian Armed Forces released in May 2022 highlighting deficiencies around the management of sexual misconduct. Notably, this report also highlighted privacy concerns around the Department of Defence's own sexual misconduct tracking and analysis system, further justifying the need for Statistics Canada, Canada's foremost statistical expert, to collect and analyze data independently.

  2. Effectiveness - Working assumptions:

    Conducting surveys is the only way to obtain estimates of both reported and unreported sexual misconduct. This is required in order to fully understand the scope of sexual misconduct in the workplace and to put in place preventative measures. This high quality, timely and relevant data will help inform workplace codes of conduct, as well as other policies, laws and programs designed to prevent and respond to sexual misconduct in the workplace. The survey is a census of individuals working for the Canadian Armed Forces. The expected benefit of the project will be proportional to the quality of the data.

    Other surveys of a similar nature have been carried out by Statistics Canada, such as:

    • Survey of Sexual Misconduct at Work (SSMW) (PIA);
    • Survey of Safety in Public and Private Spaces (SSPPS);
    • Survey of Individual Safety in the Postsecondary Student Population (SISPSP) (PIA);
    • General Social Survey (GSS) on Victimization, 1999, 2004, 2014, 2019; and,
    • General Statistics Survey (GSS) at Work and Home.
    These surveys provide valuable insights and are also used to study the prevalence of sexual harassment over time.
  3. Proportionality:
    Proportionality has been considered based on the following elements – sensitivity and ethics:
    • Sensitivity: The Survey on Sexual Misconduct in the Canadian Armed Forces is a voluntary survey, and the collection method is similar to other voluntary household surveys. Due to the fact that this information is submitted voluntarily, the risk related to the high sensitivity of this data collection method is considered low. However, the nature of the questions in this survey are of a more sensitive nature. As such, additional mitigation factors (see below) are being implemented to ensure that the collection methods are proportional to the needs for the data.
    • Ethics: The Survey on Sexual Misconduct in the Canadian Armed Forces has been developed using past, similar surveys as precedents to determining best practices, in particular to assist victims in accessing support and to reduce response burden. Additional steps are being taken to reduce burden and assist the Survey on Sexual Misconduct in the Canadian Armed Forces respondents (see below, Mitigation Factors).

    Data collected through the Survey on Sexual Misconduct in the Canadian Armed Forces will contain only the variables required to achieve the statistical goals of the survey. The public benefits of the survey findings, which are expected to inform policies, programs and support services aimed at improving workplace culture and work-related settings, are believed to be proportional to the potential privacy intrusion for this voluntary survey. The results will be used to inform policies and training to promote culture change and future support services for those affected by sexual misconduct.

  4. Alternatives:
    Few sources have gathered data on self-reported sexual victimization in the workplace. In 2016, the General Social Survey provided some insight on sexual harassment in a survey focused on the larger topic of Canadians at work and home. In 2017, Insights West, a market research firm surveyed women exclusively on whether and how often they experience sexual harassment at work. That same year, Employment and Social Development Canada surveyed 1,000 people and held public consultations to better understand the types of harassment behaviours that take place in Canadian workplaces. However, no other quality sources report comprehensive and in-depth information such as the characteristics, impact and reporting of these incidents or the industries and settings in which they occur. Furthermore, existing crime data available from administrative data sources are limited to officially reported events that meet the threshold for criminality and are known to significantly underrepresent true rates of sexual victimization in the population. As such, data gaps exist and more information is needed in order to help guide policies, laws, programs and support services that prevent and respond to these behaviours in the workplace. Additionally, considering the potential bias in the Department of National Defence's own reporting and analysis system, no viable data alternatives exist that could provide such information on the Canadian Armed Forces population specifically. Finally, despite previous cycles of the Survey on Sexual Misconduct in the Canadian Armed Forces providing similar insight, the issue persists, necessitating this regular data collection; it will also provide more up to date information than the previous 2018 cycle, as regular collections allow for time-series analyses which may provide even greater insight in the form of trends and comparisons.

Mitigation factors:

This content has undergone in-person testing, including a voluntary round of sensitivity testing to identify and address potential sources of harm for future respondents. As expected, some questions were considered sensitive by the test respondents but the overall risk of harm to survey participants was deemed manageable through the mitigating actions outlined here.

Consent

All respondents will be informed that their participation is voluntary before being asked any questions.

Access to personal information

Statistics Canada has established that answers collected from survey respondents will not be disclosed to the Department of National Defence or Canadian Armed Forces members. As with previous cycles, the master files for analysis will be placed in Research Data Centres (where all data sets have been stripped of personal details such as names, addresses and phone numbers that could be used to identify particular individuals), with additional clear restrictions preventing employees of the Department of National Defence or members of the Canadian Armed Forces from accessing. Furthermore, all results from analysis conducted at Research Data Centres is vetted by Statistics Canada, thus ensuring confidentiality of the survey respondents from their employer.

Support Services

Since survey questions may evoke emotional reactions from the respondents, contact information for support services and resources for victims of sexual violence will be made available to respondents in various forms, including in material communicated in their workplace, material included on the survey questionnaire and on the Statistics Canada website.

Feedback

At the end of the survey questionnaire, we have included an open question to understand the experience and impact that the survey had on respondents. We hope to be able to draw the same conclusions that other surveys on the topic have made: that although this topic is a difficult one, respondents appreciate being heard, feel valued and believe there are benefits to the survey.

Conclusion:

This assessment concludes that, with the existing Statistics Canada safeguards, any remaining risks are such that Statistics Canada is prepared to accept and manage the risk.

Instruction in the Minority Official Language – 2021 Census promotional material

Help spread the word about 2021 Census data on instruction in an official language minority in Canada. These data were released on November 30, 2022.

Quick facts

  • The 2021 Census of Population provides new data on the children eligible for instruction in the minority official language at the primary and secondary levels, based on the three criteria established by the Canadian Charter of Rights and Freedoms.
  • In 2021, 897,000 children were eligible for instruction in the minority official language at the primary and secondary levels, namely in English in Quebec (304,000) and in French in Canada outside Quebec (593,000).
  • Among the provinces and territories in Canada outside Quebec, Ontario (350,000), Alberta (67,000), British Columbia (56,000), New Brunswick (49,000) and Manitoba (30,000) had the highest population of children eligible for instruction in French.
  • Among the provinces and territories, New Brunswick (36.0%), Quebec (18.1%), Yukon (14.1%) and Ontario (12.6%) had the largest proportions of children eligible for instruction in the minority official language. About 1 in 10 children (10.5%) were eligible for instruction in French in Canada outside Quebec.
  • Across Canada, over 90% of eligible children were living within 15 kilometres of a minority official language school.
  • In Canada outside Quebec, 292,000 school-aged children attended a regular French program at a primary or secondary French-language school in Canada, representing 64.7% of eligible children aged 5 to 17. This proportion was higher in New Brunswick (80.6%) and Yukon (71.0%), but lower in British Columbia (55.7%), Newfoundland and Labrador (54.2%) and Alberta (49.6%). In Quebec, 175,000 school-aged children attended an English primary or secondary school in Canada, representing 76.2% of eligible children aged 5 to 17 in this province.
  • The new data on language of instruction show that, among persons in Canada outside Quebec aged 5 years and older, almost 1.2 million studied in a regular French program in a French-language school, 1.6 million in a French immersion program, and 137,000 in both types of programs.
  • Nearly 1 million people aged 5 and older living in Quebec at the time of the 2021 Census studied at an English primary or secondary school in Canada.

Resources

Social media content

Statistics Canada encourages our community supporters to share our content and images to their own social media accounts. You can save the images to your device and copy and paste the text content to your social media platforms.

Post 1

Almost one in eight children in Canada was admissible for instruction in the official minority language in 2021. Check out the new #2021Census data on the topic here:

bit.ly/3VR0byb

Download image for post 1

Post 1

New data from the #2021Census reveal an updated portrait on instruction in the official minority language in Canada, not only for children, but also for adults.

For more info:

bit.ly/3VR0byb

Download image for post 2

Web images

Official Language Tile (JPG, 103 KB)

Terms of use

See the terms of use for information on the approved use of official wordmarks, identifiers and content.

Date modified:

Labour and Language of Work – 2021 Census promotional material

Help spread the word about 2021 Census data on labour and language of work in Canada. These data were released on November 30, 2022.

Quick facts

  • In the face of population aging and the COVID-19 pandemic, the number of health care workers increases by over 200,000 in five years to 1.5 million in 2021.
  • The construction industry, with over 1.3 million workers, continues to be an important employer for men, who work mostly as labourers and in skilled trades.
  • Growth in professional, scientific and technical services employment outpaces that of all other industries, with 1.5 million employed in 2021.
  • Four million Canadians are working in sales and service occupations.
  • The participation rate fell from 65.2% in 2016 to 63.7% in 2021 as more baby boomers near or enter retirement age.
  • From 2016 to 2021, a record 1.3 million new immigrants came to Canada seeking opportunities, boosting labour market growth.
  • Recent immigrants in 2021 experienced lower unemployment rates than earlier cohorts.
  • Participation rates increased from 2016 to 2021 for many racialized groups, with notable increases for Korean and West Asian Canadians.
  • Participation rates declined for First Nations people and Inuit as their labour force growth lags behind their population increases.
  • In Canada's biggest cities, employment rates in 2021 are highest among those in Quebec and the Prairies.
  • The information and communication technology sector is a key employer in six Canadian high-tech hubs, and employed more than 600,000 workers nationally in 2021.
  • In May 2021, there were 4.2 million people working at home, up from 1.3 million in 2016.
  • Working at home is most prominent in big cities and among people in professional occupations—with over 5% of teleworkers relocating from where they lived 12 months earlier.
  • Despite a record-high number and share of Canadians speaking a non-official language at home, English and French remained the languages of convergence in workplaces across the country as 98.7% of workers used one of these two languages most often at work. Overall, 77.1% of workers mainly used English at work, 19.9% mainly used French, and 1.7% used English and French equally.

Resources

Social media content

Statistics Canada encourages our community supporters to post our content and images to their own social media accounts. You can save the images to your device and copy and paste the text content to your social media platforms to share.

Post 1

#DYK? Healthcare and social assistance; construction; and professional, scientific and technical services accounted for nearly one third of all employment in Canada in 2021.

To learn more, check out our new #2021Census data:

bit.ly/3gJqpDK

Download image for post 1

Post 1

In 2021, immigrants made up over one-quarter of Canada's core-aged labour force.

For more info from the #2021Census :

bit.ly/3gJqpDK

Download image for post 2

Post 31

While more Canadians than ever speak a non-official language at home, 77.1% of workers mainly used English at work, 19.9 % mainly used French, and 1.7% used both equally.

Learn more from the #2021Census data:

bit.ly/3gJqpDK

Download image for post 3

Web Images

Labour Tile (JPG, 111 KB)

Terms of use

See the terms of use for information on the approved use of official wordmarks, identifiers and content.

Date modified:

Privacy preserving technologies, part three: Private statistical analysis and private text classification based on homomorphic encryption

By: Benjamin Santos and Zachary Zanussi, Statistics Canada

Introduction

What's possible in the realm of the encrypted and what use cases can be captured with homomorphic encryption? The Data Science Network's first article in the privacy preserving series, A Brief Survey of Privacy Preserving Technologies, introduces privacy enhancing technologiesFootnote 1 (PETs) and how they enable analytics while protecting data privacy. The second article in the series, Privacy Preserving Technologies Part Two: Introduction to Homomorphic Encryption, took a deeper look at one of the PETs, more specifically homomorphic encryption (HE). In this article, we describe applications explored by data scientists at Statistics Canada in encrypted computation.

HE is an encryption technique that allows computation on encrypted data as well as several paradigms for secure computing. This technique includes secure outsourced computing, where a data holder allows a third party (perhaps, the cloud) to compute on sensitive data while ensuring that input data is protected. Indeed, if the data holder wants the cloud to compute some (polynomial) function f on their data v, they can encrypt it into a ciphertext, denoted [v], send it safely to the cloud which computes f homomorphically to obtain [f(v)], and forward the result back to the data holder, who can decrypt and view f(v). The cloud has no access to the input, output, or any intermediate data values.

Figure 1: Illustration of a typical HE workflow
Figure 1: Illustration of a typical HE workflow.

An illustration of a typical HE workflow. The data, v, is encrypted, putting it in a locked box [v]. This value is sent to the compute party (the cloud). Gears turn and the input encryption [v] is transformed into the output encryption, [f(v)], as desired. This result is forwarded back to the owner who can take it out of the locked box and view it. The cloud doesn't have access to input, output, or intermediate values.

HE is currently being considered by international groups for standardization. The Government of Canada does not recommend HE or the use of any cryptographic technique before it's standardized. While HE is not yet ready for use on sensitive data, this is a good time to explore its capabilities and potential use cases.

Scanner data

Statistics Canada collects real time data from major retailers for a variety of data products. This data describes the daily transactions performed such as a description of the product sold, the transaction price, and metadata about the retailer. This data is called "scanner data", after the price scanners used to ring a customer through checkout. One use of scanner data is to increase the accuracy of the Consumer Price Index, which measures inflation and the strength of the Canadian dollar. This valuable data source is treated as sensitive data—we respect the privacy of the data and the retailers that provide it.

The first step in processing this data is to classify the product descriptions into an internationally standardized system of product codes known as the North American Product Classification System (NAPCS) Canada 2017 Version 1.0. This hierarchical system of seven-digit codes is used to classify different types of products for analysis. For example, one code may correspond to coffee and related products. Each entry in the scanner data needs to be assigned one of these codes based on the product description given by the retailer. These descriptions, however, are not standardized and may differ widely between different retailers or across different brands of similar products. Thus, the desired task is to convert these product descriptions, which often include abbreviations and acronyms, into their codes.

After they've been classified, the data is grouped based on its NAPCS code and statistics are computed on these groups. This allows us to gain a sense of how much is spent on each type of product across the country, and how this value changes over time.

Figure 2: High level overview of the scanner data workflow with sample data
Figure 2: High level overview of the scanner data workflow with sample data.

High level overview of the scanner data workflow. First, the product descriptions are classified into NAPCS codes. Examples are given: "mochi ice cream bon bons" is assigned NAPCS code 5611121, while "chipotle barbeque sauce" is assigned 5611132. Application 2 is to assign these codes to the descriptions. The product descriptions have a few identifiers and a price value attached. Application 1 is to sort the data by these codes and identifiers, and compute statistics on the price values.

Sample dataset 1
Description ID1 ID2 Value
"mochi ice cream bon bons" 054 78 $5.31
"chipotle barbeque sauce" 201 34 $3.80

Application 2

Sample dataset 2
NAPCS ID1 ID2 Value
5611121 054 78 $5.31
5611132 201 34 $3.80

Application 1

Statistics (total, mean, variance)

Given the data's sensitivity and importance, we've targeted it as a potential area where PETs can preserve our data workflow while maintaining the high level of security required. The two tasks above have, up to now, been performed within Statistics Canada's secure infrastructure, where we can be sure the data is safe at the time of ingestion and throughout its use. In 2019, when we were first investigating PETs within the agency, we decided to experiment using the cloud as a third-party compute resource, secured by HE.

We model the cloud as a semi-honest party, meaning it will follow the protocol we assign it, but it will try to infer whatever it can about the data during the process. This means we need sensitive data to always be encrypted or obscured. As a proof-of-concept, we replaced the scanner data with a synthetic data source, which allows us to conduct experiments without putting the security of the data at risk.

Application 1: Private statistical analysis

Our first task was to perform the latter part of the scanner data workflow – the statistical analysis. We constructed a synthetic version of the scanner data to ensure its privacy. This mock scanner data consisted of thirteen million records, each consisting of a NAPCS code, a transaction price, and some identifiers. This represents about a week's worth of scanner data from a single retailer. The task was to sort the data into lists, encrypt it, forward it to the cloud, and instruct the cloud to compute the statistics. The cloud would then forward us the still-encrypted results, so we could decrypt and use them for further analysis.

Suppose our dataset is sorted into lists of the form v=(v1,,vl). It's relatively straightforward to encrypt each value vi into a ciphertext [vi] and send the list of ciphertexts ([v1],,[vl]) to the cloud. The cloud can use homomorphic addition and multiplication to calculate the total, mean, and variance and return these as ciphertexts to us (we'll see how division is handled for the mean and variance later in this article). We do this for every list, and decrypt and view our data. Simple, right?

The problem with a naïve implementation of this protocol is data expansion. A single CKKSFootnote 2 ciphertext is a pair of polynomials of degree 214 with 240-bit coefficients. All together, it may take 1 MB to store a single record. Over the entire dataset of thirteen million, this becomes 13 TB of data! The solution to this problem is called packing.

Packing

Ciphertexts are big, and we have a many small pieces of data. We can use packing to store an entire list of values into a single ciphertext, and the CKKS scheme allows us to perform Single Instruction Multiple Data (SIMD) type operations on that ciphertext, so we can compute several statistics at once! This ends up being a massive increase in efficiency for many HE tasks, and a clever data packing structure can make the difference between an intractable problem and a practical solution.

Suppose we have a list of l values, v=(v1,v2,,vl). Using CKKS packing, we can pack this entire list into a single ciphertext, denoted by [v]. Now, the operations of homomorphic addition and multiplication occur slot-wise in a SIMD fashion. That is, if u=(u1,u2,,ul) encrypts to [u], then we can compute homomorphic addition to get

[u][v]=[u+v]

where [u+v] is anFootnote 3 encryption of the list (u1+v1,u2+v2,,ul+vl). This homomorphic addition takes as much time to compute as if there was only one value in each ciphertext, so it's clear we can get an appreciable efficiency boost via packing. The downside is that we now must use this vector structure in all of our calculations, but with a little effort, we can figure out how to vectorize relevant calculations to take advantage of packing.

Figure 3: An illustration of packing. The four values can either be encrypted into four separate ciphertexts, or all be packed into one
Figure 3: An illustration of packing. The four values can either be encrypted into four separate ciphertexts, or all be packed into one.

An illustration of packing. Four values, v1,v2,v3,v4, need to be encrypted. In one case, they can all be encrypted into separate ciphertexts, depicted as locked boxes. In another, we can pack all four values into a single box. In the former case, it will take four boxes, which is less efficient to store and to work with. The latter case, packing as many values as possible, is almost always preferred.

Now I know what you are thinking - doesn't packing, which stores a bunch of values within a vector, make it impossible to compute values within a list? That is, if we have v=(v1,v2,,vl), what if I wanted v1+v2? We have access to an operation known as rotation. Rotation takes a ciphertext that is an encryption of (v1,v2,,vl) and turns it into Rot([v]), which is an encryption of (v2,v3,,vl,v1). That is, it shifts all the values left in one slot, sliding the first value into the last slot. So, by computing [v]Rot([v]), we get

(v1+v2,v2+v3,,vl+v1),

and the desired value is in the first slot.

Mathematically, packing is achieved by exploiting the properties of the cleartext, plaintext and ciphertext spaces. Recall that the encryption and decryption functions are maps between the latter two spaces. Packing requires another step called encoding, which encodes a vector of (potentially complex, though in our case, real) values v from the cleartext space into a plaintext polynomial p. While the data within p is not human-readable as-is, it can be decoded into the vector of values by any computer without requiring any keys. The plaintext polynomial p can then be encrypted into the ciphertext [v] and used to compute statistics on scanner data.Footnote 4

Efficient statistical analysis using packing

Getting back to the statistical analysis on scanner data, remember that the problem was that encrypting every value into a ciphertext was too expensive. Packing will allow us to vectorize this process, making its orders of magnitude more efficient in terms of communication and computation.

We can now begin to compute the desired statistics on our list v=(v1,v2,,vl). The first value of interest is the total, Tv=i=1lvi, obtained by summing all the values in the list. After encrypting v into a packed ciphertext [v], we can simply add rotations of the ciphertext [v] to itself until we have a slot with the sum of all the values. In fact, we can do better than this naïve strategy of l rotations and additions- we can do it in log2l steps by rotating one slot first, then by two slots, then four, then eight, and so on until we get the total Tv in a slot.

Next, we want the mean, Mv=Tv/l. To do this, we encrypt the value 1/l into the ciphertext [1/l] and send it along with the list [v]. We can then simply multiply this value by the ciphertext that we got when computing the total. It's a similar story for the variance, Vv=1/li=1l(vi-Mv)2, where we subtract the mean from [v], multiply the result by itself, compute the total again, and then multiply again by the [1/ l] ciphertext.

Let's investigate the savings that packing afforded us. In our case, we had about 13 million data points which separates into 18,000 lists. Assuming that we could pack every list into a single ciphertext, that reduces the size of the encrypted dataset by almost three orders of magnitude. But in reality, the different lists were all different sizes, with some being as large as tens of thousands of entries and others as small as two or three, with the majority falling in the range of hundreds to thousands. Through some clever manipulation, we were able to pack multiple lists into single ciphertexts and run the total, mean, and variance algorithms for them all at once. By using ciphertexts that can pack 8,192 values at once, we were able to reduce the number of ciphertexts to just 2,124. At about 1 MB per ciphertext, this makes the encrypted dataset about two gigabytes (GBs). With the cleartext data taking 84 megabytes (MB), this left us with a data expansion factor of about 25 times. Overall, the encrypted computation took around 19 minutes, which is 30 times longer than unencrypted.

Application 2: Private text classification based on homomorphic encryption

Next, we tackled the machine learning training task. Machine learning training is a notoriously expensive task, so it was unclear whether we'd be able to implement a practical solution.

Next, we tackled the machine learning training task. Machine learning training is a notoriously expensive task, so it was unclear whether we'd be able to implement a practical solution. Recall the first task in the scanner data workflow - the noisy, retailer-dependent product descriptions need to be classified into the NAPCS codes. This is a multiclass text classification task. We created a synthetic dataset from an online repository of product descriptions and tagged them with one of five NAPCS codes.

Running a neural network is basically multiplying a vector past a series of matrices, and training a neural network involves forward passes, which is evaluating training data in the network, as well as backward passes, which is using (stochastic) gradient descent and the chain rule to find the best way to update the model parameters to improve performance. All this boils down to multiplying values by other values, and by having access to homomorphic multiplication, training an encrypted network is possible in theory. In practice, this is hampered by a core limitation of the CKKS scheme: the leveled nature of homomorphic multiplications. We'll discuss this element first, and then explore the different protocol aspects designed to mitigate it.

Ciphertext levels in CKKS

In order to protect your data during encryption, the CKKS scheme adds a small amount of noise to each ciphertext. The downside is that this noise accumulates with consecutive operations and needs to be modulated. CKKS has a built-in mechanism for this, but unfortunately it only allows for a bounded number of operations on a single ciphertext.

Suppose we have two freshly encrypted ciphertexts - [v1] and [v2]. We can homomorphically multiply them to get the ciphertext [v1v2]. The problem is that the noiseFootnote 5 in this resulting ciphertext is much larger than in the freshly encrypted ones, so if we multiplied it by freshly encrypted [v3], the result would be affected by this mismatch.

What would first have to rescale the ciphertext [v1v2]. This is transparently handled by the HE library, but under the hood, the ciphertext is moved to a slightly different space. We say that [v1v2] has been moved down a level, meaning. the ciphertext started out on level L-1, and after rescaling, it is left on level L-1. The value L is determined by the security parameters we chose when we set up the HE scheme.

Now we have [v1v2] which has a normal amount of noise but is on level L -1, and the freshly encrypted [v3] which is still on level L. Unfortunately, we can't perform operations on ciphertexts that are on different levels, so we first have to reduce the level of [v3] to L-1 by modulus switching. Now that both ciphertexts are on the same level, we can finally multiply them as desired. We don't need to rescale the result of additions, but we do for every multiplication.

Figure 4: An illustration of levels
Figure 4: An illustration of levels

An illustration of levels. On the left we can see the level on which each ciphertext resides: from top to bottom, we have levels L, L-1, and L-2. Freshly encrypted v1, v2, and v3 all inhabit level L on top. After multiplying, v1v2 move down to level L-1. If we want to multiply v1v2 by v3, we need to first bring v3 down to level L-1. The resulting product, v1v2v3 lives on level L-2.

This leveled business has two consequences. One, the developer needs to be conscious of the level of the ciphertexts they're using. And two, the ciphertexts will eventually reach level 0 after many consecutive multiplications, at which point it's spent, and we can't perform any more multiplications.

There are a few options for extending computations beyond the number of levels available. The first is a process called bootstrapping, where the ciphertext is homomorphically decrypted and re-encrypted, resulting in a fresh ciphertext. This process can theoretically result in an unbounded number of multiplications. However, the added expense adds a cost to the computation. Alternatively, one can refresh the ciphertexts by returning them to the secret key holder, who can decrypt and re-encrypt them before returning them to the cloud. Sending ciphertexts back and forth adds a communication cost but this is sometimes worth it when there aren't many ciphertexts to send.

Impact of levels on our network structure

We had to consider this fundamental constraint on HE when designing our neural network. The process of training a network involves performing a prediction, evaluating the prediction, and updating the model parameters. This means that every round, or epoch, of training consumes multiplicative levels. We tried to minimize the number of multiplications needed to traverse forward and backward through the network to maximize the number of training rounds available. We'll now describe the network structure and the data encoding strategy.

The network architecture was inspired by the existing solution in production. This amounted to an ensemble model of linear learners. We trained several single layer networks separately, and at prediction time, we had each learner vote on each entry. We chose this approach because it reduced the amount of work required to train each model - less training time meant fewer multiplications.

Each layer in a neural network is a weight matrix of parameters multiplied by data vectors during the forward pass. We can adapt this to HE by encrypting each input vector into a single ciphertext and encrypting each row of the weight matrix into another ciphertext. The forward pass then becomes several vector multiplications, followed by logarithmically many rotations and multiplications to compute the sum of the outputs (recall that matrix multiplication is a series of dot products, which are a component-wise multiplication followed by computing the sum resulting values).

Preprocessing is an important part of any text classification task. Our data were short sentences which often contained acronyms or abbreviations. We chose to use a character n-gram encoding with n equal to three, four, five, and six - "ice cream" was broken into the 3-grams {"ice", "cre", "rea", "eam"}. These n-grams were collected and enumerated over the entire dataset and were used to one-hot encode each entry. A hashing vectorizerFootnote 6 was used to reduce the dimension of the encoded entries.

Similarly, to how we packed multiple lists together in the statistical analysis, we found we could pack together multiple models and train them at once. Using a value N=215 meant we could pack 16,384 values into each ciphertext, so if we hashed our data to 4,096 dimensions, we could fit four models into each ciphertext. This had the added benefit of reducing the number of ciphertexts required to encrypt our dataset by a factor of four. Meaning we could train four models simultaneously.

Our choice of encryption parameters meant we had between 12 and 16 multiplications before we ran out of levels. With a single layer network, the forward pass and backward pass took two multiplications each, leaving us room for three to four epochs before our model ciphertexts were spent. Our ensembles meant we could train several ciphertexts worth of models if desired, meaning we could have as many learners as desired at the cost of additional training time. Carefully modulating which models learned on what data helped us maximize the overall performance of the ensemble.

Our dataset consisted of 40,000 training examples and 10,000 test examples each evenly distributed over our five classes. To train four submodels for six epochs took five hours and resulted in a model that obtained 74% accuracy on the test set. Using the ciphertext refreshing tactic previously described, we can hypothetically train for as many epochs as we'd like, though every refresh adds more communication cost to the processFootnote 7. After training, the cloud sends the encrypted model back to StatCan, and we can run it in the cleartext on data in production. Or we can keep the encrypted model on the cloud and run encrypted model inference when we have new data to classify.

Conclusion

This concludes the Statistics Canada series of applications of HE to scanner data explored to date. HE has a number of other applications which might prove interesting to a national statistics organization such as Private Set Intersection, in which two or more parties jointly compute the intersection of private datasets without sharing them, as well as Privacy Preserving Record Linkage, where parties additionally link, share, and compute on microdata attached to their private datasets.

There's a lot left to explore in the field of PETs and StatCan is working to leverage this new field to protect the privacy of Canadians while still delivering quality information that matters.

Meet the Data Scientist

Register for the Data Science Network's Meet the Data Scientist Presentation

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.

Thursday, December 15
2:00 to 3:00 p.m. ET
MS Teams – link will be provided to the registrants by email

Register for the Data Science Network's Meet the Data Scientist Presentation. We hope to see you there!

References

North American Product Classification System (NAPCS) Canada 2017 Version 1.0

Cheon, J. H., Kim, A., Kim, M., & Song, Y. (2016). Homomorphic Encryption for Arithmetic of Approximate Numbers.Cryptology ePrint Archive.

C. Gentry. (2009). A fully homomorphic encryption scheme. PhD thesis, Stanford University: Craig Gentry's PhD Thesis

Zanussi, Z., Santos B., & Molladavoudi S. (2021). Supervised Text Classification with Leveled Homomorphic Encryption. In Proceedings 63rd ISI World Statistics Congress (Vol. 11, p. 16). International Statistical Institute - Statistical Science for a Better World

Date modified:

Quarterly Survey of Financial Statements: Weighted Asset Response Rate - third quarter 2022

Weighted Asset Response Rate
Table summary
This table displays the results of Weighted Asset Response Rate. The information is grouped by Release date (appearing as row headers), 2020, Q2, Q3, and Q4, and 2021, Q1 and Q2 calculated using percentage units of measure (appearing as column headers).
Release date 2021 2022
Q3 Q4 Q1 Q2 Q3
quarterly (percentage)
November 23, 2022 79.0 80.9 76.2 76.1 56.2
August 25, 2022 79.0 80.9 75.0 55.7 ..
May 25, 2022 79.0 77.3 56.7 .. ..
February 23, 2022 75.6 54.2 .. .. ..
November 23, 2021 56.7 .. .. .. ..
.. not available for a specific reference period
Source: Quarterly Survey of Financial Statements (2501)

Amendment to the Employee Wellness Surveys Privacy Impact Assessment (PIA) & Supplement to Statistics Canada's Generic PIA

Statistics Act Employment and Social Development Canada (ESDC) Employee Wellness Survey (EWS)
Privacy Impact Assessment (PIA) Summary

Introduction

This amendment applies to the Employee Wellness Surveys and Pulse Check Surveys PIA (signed by the Chief Statistician on November 5, 2021), and shall also be considered a supplement to Statistics Canada's Generic Privacy Impact Assessment for statistical survey activities as this ESDC EWS will operate under the authority of the Statistics Act on a cost-recovery basis for the client, ESDC, to be administered on employees of ESDC by Statistics Canada.

Objective

An Amendment to the Employee Wellness Surveys and Pulse Check Surveys PIA & Supplement to Statistics Canada's Generic Privacy Impact Assessment – Statistics Act Employment and Social Development Canada (ESDC) Employee Wellness Survey (EWS) was conducted to determine if there were any privacy, confidentiality or security issues with this activity and, if so, to make recommendations for their resolution or mitigation.

Description

The original EWS survey was collected under the authority of the Financial Administration Act (FAA) from Statistics Canada and Statistical Survey Operations employees and was examined in the Employee Wellness Surveys - PIA, whereas this new collection will be conducted under the authority of the Statistics Act on a cost recovery basis for ESDC on their employees. As such, while Statistics Canada's Generic Privacy Impact Assessment (PIA) addresses most of the privacy and security risks related to statistical activities conducted by Statistics Canada, this amendment and supplement is required to describe how the internal HR personal information activity framework that operates under the authority of the FAA (the original EWS) is being modified to collect personal information externally under the authority of the Statistics Act.

  • This ESDC EWS will be administered one time, with the potential for future cycles.
  • One key change is that, unlike in the original EWS analysis, linking activities involving the following PIBs will not be performed for the ESDC EWS:
  • Another change is that for this survey, the sample file will be provided by ESDC, and it will be matched, following collection, to the survey frame that will be built by Statistics Canada from the Incumbent file. The sample file will contain basic personal information for each of their employees (first and last name, email address, first official language and Personal Record Identifier [PRI]). The Incumbent file comes from Treasury Board Secretariat (TBS), and is an extract from the Public Services and Procurement Canada (PSPC) pay system. The Incumbent file is the most comprehensive administrative file available to federal Government of Canada institutions, by nature of its relation to their pay and staffing. Although it contains a great deal of information on employees, their positions, status and pay, only a small number of variables are required and retained from this file for inclusion on the survey frame – which will only be used internally at Statistics Canada for statistical processing purposes (see Section 4 for more detail on the variables taken from the Incumbent file for employees of ESDC).
  • New content has been added to the questionnaire:
    • Questions about organizational unit at a level of granularity which describes where within the ESDC portfolio an employee works down to branch or region (level 4) in order to ensure that the diverse yet distinct work environments found across portfolios and regions is represented and identifiable in the data.
    • Questions under the TBS Personal Information Bank for Employment Equity and Diversity (PSE 918) which include Indigenous Identity, Gender, and Sociodemographic Characteristics.
      • These questions will provide important context allowing to understand unique challenges experienced by unique populations which support the Call to Action on Equity, Diversity, and Inclusion "Nothing about us, without us".
    • A question which asks "Would you say you are: Heterosexual, Lesbian or gay, Bisexual, Or please specify" which provides important information about the unique experiences which may be had by different based on how a respondent identifies.
    • A question which asks "On a scale from 1 to 10, where 1 is "not at all important" and 10 is "critically important", how important is addressing psychological health and safety within ESDC? " in order to determine how much weight employees give particular services or programs.
    • A question which asks "How far along do you think ESDC is in terms of creating and sustaining a psychologically healthy and safe work environment? Use a scale from 1 to 10, where 1 is "Just getting started" and 10 is "Sustaining well established policies/programs/supports" in order go gauge employee perception of how mature ESDC is with their Mental Health strategy implementation.
    • A question which asks "Below is a list of workplace-based services and supports available to help employees cope with challenging situations and issues related to mental health. Please indicate all the services/supports of which you are aware" in order to understand which programs employees are aware of.

Risk area identification and categorization

The risk area identification has changed from the original Employee Wellness Surveys and Pulse Check Surveys (EWSPCS) PIA in the following sub-sections; privacy risk has decreased.

Risk area identification
b) Type of personal information involved and context
Only personal information, with no contextual sensitivities, collected directly from the individual or provided with the consent of the individual for disclosure under an authorized program. (this was "2" for EWSPCS, is "1" for this ESDC Statistics Act collection) 1
g) Technology and privacy
No (specific technology category was "yes" for EWSPCS and is "no" for this ESDC Statistics Act collection)

Conclusion

This assessment of the Amendment to the Employee Wellness Surveys and Pulse Check Surveys PIA & Supplement to Statistics Canada's Generic Privacy Impact Assessment – Statistics Act Employment and Social Development Canada (ESDC) Employee Wellness Survey (EWS) did not identify any privacy risks that cannot be managed using existing safeguards.