Supplement to Statistics Canada's Generic Privacy Impact Assessment related to the Longitudinal Immigration Statistical Environment (LISE)

Date: October 2022

Program manager: Director of Diversity and Sociocultural Statistics
Director General, Health, Justice, Diversity and Populations

Reference to Personal Information Bank (PIB)

In accordance with the Privacy Act, Statistics Canada has registered Personal Information Banks (PIBs) for its holdings of personal data contained by all databases linked through the LISE. These are: the Longitudinal Immigration Database (IMDB) (PPU 135), the Longitudinal Administrative Databank (LAD) (PPU 112), the Postsecondary Student Information System (PSIS) (PPU 090), the Registered Apprenticeship Information System (RAIS) (PPU 083), Health Research (PPU 076), and Census of Population and National Household Survey (PPU 005). The use of personal information from the programs for linkage purposes is described in these PIBs as part of the consistent uses.

When supplementary data sources are integrated into the LISE, the relevant PIBs are added or updated as required.

Please refer to Information about Programs and Information Holdings for descriptions of these Personal Information Banks.

Description of statistical activity

The Longitudinal Immigration Statistical Environment (LISE) is a secure linkage environment containing anonymous linkage keys to connect one of a subset of secondary databases with one of two primary databases; either the Longitudinal Immigration Database (IMDB) or the Longitudinal Administrative Databank (LAD). Pairwise linkages from one of the primary databases can be conducted with one of the six secondary databases: five administrative health or education databases – the Discharge Abstract Database (DAD), National Ambulatory Care Reporting System (NACRS), Ontario Mental Health Reporting System (OMHRS), Postsecondary Student Information System (PSIS), Registered Apprenticeship Information System (RAIS) – or the 2016 Census of Population. The program is conducted under the authority of the Statistics ActFootnote 1. The databases are categorized and described as follows:

Pairwise linkages between databases
Database Content Name Description
Primary Immigration Longitudinal Immigration Database (IMDB)

The IMDB combines administrative files on immigrant admissions and non-permanent resident permits from Immigration, Refugees and Citizenship Canada (IRCC) with tax files from the Canadian Revenue Agency (CRA).

  • Information is available for immigrants admitted since 1952.
  • Information is available for individuals who obtained non-permanent resident permits since 1980.
  • Tax records for 1982 and subsequent years are available for immigrant and non-permanent tax filers (T1 Family File (T1FF)).
Tax Longitudinal Administrative Databank (LAD)

The LAD is a random, 20% sample of the T1 Family File (T1FF) tax database.

  • Selection for LAD is based on an individual's SIN.
  • There is no age restriction.
Secondary Health Discharge Abstract Database (DAD)

The DAD captures administrative, clinical and demographic information on hospital discharges.

  • It includes in-hospital deaths, sign-outs and transfers, from all provinces and territories except Quebec.
  • Day surgery procedures, long-term care, rehabilitation and other types of care are included later on.
  • The most recent version of the linkage keys connect the primary databases to the DAD from the fiscal year of 1994/1995 onward.
National Ambulatory Care Reporting System (NACRS)

The NACRS contains data for hospital-based and community-based ambulatory care, including day surgery, outpatient and community-based clinics, and emergency departments.

  • NACRS consists of five categories of data elements: demographic, clinical, administrative, financial and service-specific.
  • Information on discharges, deaths and transfers is provided within a fiscal year, from April 1 (start of fiscal year) to March 31 (end of fiscal year).
  • Day surgery procedures, diagnostic imaging visits and numerous clinic visits, including renal dialysis, cardiac catheterization, oncology and mental health are included later on.
  • The most recent version of the linkage keys connect the primary databases to the NACRS from the fiscal year of 2002/2003 onward.
Ontario Mental Health Reporting System (OMHRS)

The OMHRS contains data for individuals who receive services in designated adult inpatient mental health beds in the province of Ontario, as well as those who receive services in designated adult mental health facilities outside Ontario that voluntarily submit data to the reporting system.

  • OMHRS contains the following information: mental and physical health, social supports and service use, care planning, outcome measurement, quality improvement, and case-mix funding applications.
  • The most recent version of the linkage keys connects the primary databases to the OMHRS from the fiscal year of 2005/2006 onward.
Education Postsecondary Student Information System (PSIS)

The PSIS provides detailed information on enrolments and graduates of Canadian public postsecondary institutions. The survey is a census with a cross-sectional design.

  • The PSIS contains information pertaining to programs and courses offered at an institution, as well as information regarding the students themselves and the programs and courses in which they enrolled, or from which they have graduated. The PSIS is also designed to collect continuing education data.
  • The most recent version of the linkage keys connects the primary databases to the PSIS from 2009 onward.
Registered Apprenticeship Information System (RAIS)

The RAIS is conducted on an annual basis as a census of all registered apprentices and trade qualifiers in Canada.

  • It compiles data on the number of registered apprentices taking in-class and/or on-the-job training in trades, which are either Red Seal or non-Red Seal, and where apprenticeship training is either compulsory or voluntary.
  • Information on the number of provincial and interprovincial certificates granted to apprentices or trade qualifiers (challengers) is also included.
  • The most recent version of the linkage keys connect the primary databases to the RAIS from 2008 onward.
Demographics Census of Population (2016)

The Census of Population (2016) is conducted every 5 years as a census with a cross-sectional design.

  • The short form (2A), which collects basic demographic information, surveys every household in Canada.
  • The long form (2A-L), which collects more detailed socio-economic information, surveys approximately 1 in 4 households.
  • Using this linkage key, variables of interest will be investigated within the context of the Census, such as education at landing, knowledge of official languages at landing, and many more.
  • The most recent version of this linkage key connects the IMDB to the long-form 2016 Census.

Development

The development of the LISE was originally initiated by the Ministry of Children, Community and Social Services (MCCSS) of the Government of Ontario, for the purpose of making evidence-based decisions about programs and services delivered to the Syrian refugee cohort and future refugee cohorts.

This new linkage environment supports research in a wide range, regarding relationships between employment, education, health status and social inclusion over time. It enables comparisons between the refugee group, other types of immigrants and native-born Canadians, so as to inform provincial and national policies related to the resettlement and integration of the immigrant population, and refugees in particular.

Statistics Canada will make the data information of the LISE and its analytical capacity known to Canadians through a variety of products, such as aggregated data tables and analytical reports (e.g., The Daily articles). Examples of how results will benefit Canadian society include: by providing information to researchers and policy makers on income earnings by immigrant admission category; by comparing immigrant healthcare resource utilization across admission cohorts; and by providing insights into the pre-admission characteristics of international students that play a significant role in postsecondary education participation and completion. All those results can be used to inform future immigration policies to benefit all Canadians' socioeconomic and overall well-being in a long run.

In order to address additional related research questions, further datasets can be integrated to the LISE using the Social Data Linkage Environment (SDLE)Footnote 2 described in more detail below. Personal Information Banks for these new datasets will be added or updated as required. Additional years of data information from existing included sources can be added, to maintain the timeliness of the LISE.

Privacy and confidentiality

The personal identifiers obtained for immigrants and non-immigrants are used in the SDLE to assign the anonymous statistical identifiers that allow Statistics Canada to link to other sources of information for statistical analysis and research, once approval has been obtained in accordance with Statistics Canada's Directive on Microdata Linkage. The personal identifiers obtained are removed from the rest of the information and securely stored with restricted access to a restricted number of Statistics Canada employees with an approved operational requirement to access them, and whose access is removed when no longer required. The retention period for their storage and their destruction is prescribed by Statistics Canada's Directive on the Management of Statistical Microdata Files.

The integrated datasets in the LISE are subject to the confidentiality requirements of the Statistics Act. As with all data collected under the Statistics Act, the integrated analytical datasets available for research do not contain any personal identifiers. Access is granted to researchers who have been deemed as Statistics Canada employees after they have obtained a security clearance and have sworn an oath of confidentiality under the Statistics Act. Data access is approved for a specific purpose and for a specified period of time, and must occur in a secure setting, such as Statistics Canada offices, the Research Data Centres or approved access to the virtual data lab (VDL) environment. Statistics Canada vets all output for confidentiality before removal from the secure setting or release to the public.

Only aggregated and non-confidential statistical information will be made publicly available and as such, individuals will not be identifiable in any product disseminated to the public. No personal information would ever be disclosed without consent of the original data collector and the authorization from the Chief Statistician, as required by the Statistics Act.

Reason for supplement

While the Generic Privacy Impact Assessment (PIA) presents and addresses most of the privacy principles and security risks related to statistical activities conducted by Statistics Canada, this supplement addresses any privacy risks associated with this new data environment. As is the case with all PIAs, Statistics Canada's privacy framework ensures that elements of privacy protection and privacy controls are documented and applied.

Necessity and Proportionality

The use of personal information for the LISE can be justified against Statistics Canada's Necessity and Proportionality Framework:

1. Necessity

The LISE was requested and funded by the Ontario Ministry of Children, Community and Social Services. MCCSS has commissioned Statistics Canada to create a linkage environment that would allow MCCSS to study the outcomes of immigrants across multiple indicators, such as their economic resilience, educational status, social integration, and their physical and mental health.

The LISE enables the integration of different datasets, both longitudinal and cross-sectional, to help address a wide range of priority policy questions pertaining to immigrant mobility, educational status, social integration, physical and mental health, as well as labour market outcomes over time, which were not possible to address with the underlying datasets alone. The LISE facilitates the production and publication of analysis, indicators and data tables on these topics.

2. Effectiveness

The LISE is essentially a centralized data environment that enables access to indicators for data integration and research activities. This new linkage environment supports research in a wide range, regarding relationships between employment, education, physical health, mental health and social inclusion over time.

The linkages to health and education administrative data as well as the 2016 Census expands research indicators for the LAD and IMDB. It enables comparisons between the refugee group, other types of immigrants and native-born Canadians, so as to inform provincial, as well as countrywide, policies related to the resettlement and integration of the immigrant population, and refugees in particular.

The linkage environment reduces or removes redundancies of specific data integration projects, e.g. LAD-IMDB, Census-IMDB and reduces redundancies of those projects, so as to relieve the financial and personnel burdens for Canadian taxpayers as well as the Government of Canada.

The LISE is also in-line with the fulfillment of Statistics Canada's mandate for disaggregated data by offering new indicators focusing on all Canadians, and especially vulnerable population groups, e.g., Syrian refugees.

As a linkage environment of multiple administrative databases (and the Census), it offers more accurate information, such as immigration admission categories, than what could be obtained from surveys, owing to the size and geographical distribution of the population of interest and the technical nature of some of the key variables.

Integrating administrative education and health databases with the IMDB and LAD will provide contextual and outcome information for immigrants, non-permanent residents and Canadians, resolving existing data gaps while circumventing any additional response burden. For example, large-scale investigations are scarce regarding the "healthy immigrant effect," which refers to a commonly observed trend. That is, immigrants usually arrive in their host countries with, on average, better health conditions than those of their native-born counterparts. However, their advantageous health conditions gradually disappears over time. In this context, information about health care use from health administrative databases can be used as a proxy of health conditions, so that linkages between the IMDB (or the LAD) and health data serve as an effective tool to examine immigrants' physical and mental well-being and their over-time changes.

The access to analytical datasets as well as the LISE anonymous matching keys expands access and research opportunities for using the rich information and enables evidence-based research and policy analyses conducted by researchers, governmental and non-governmental stakeholders. The utilization of the LISE helps decision makers develop well-informed policies, which aim to:

  • facilitate a smooth and effective process of resettlement and integration for refugees, newcomers and the immigrant population in general;
  • contribute to building a healthier Canadian society by enhancing Canadian residents' access to healthcare services, in particular vulnerable groups;
  • boost opportunities for educational/skill attainment among disadvantaged children and youth, such as those from refugee families and/or those residing in remote rural areas; and
  • improve all Canadians' health, socioeconomic and overall well-being by, for example, stimulating economic development with increasingly diverse labour forces, channelling immigrant workforces to jobs that are in high demand, and creating new jobs and services that benefit all Canadians.

3. Proportionality

The methods and practices behind the LISE (and the SDLE) have been designed to ensure the protection of privacy and personal information, while retaining the ability to integrate analytical variables from different existing sources to fill data gaps.

More specifically, by building the linkages between the IMDB (or the LAD) and health/education administrative databases as well as the Census, the LISE makes it possible to examine complex phenomena that are often difficult, if not impossible to understand when analyzing the existing data sources in isolation. For example, earnings for international students who completed different types of education or training programs can be compared using administrative data, rather than survey data – thus greatly reducing the burden of respondents. Pathways through postsecondary education can be examined for Canadian and international students of various ages and sociodemographic profiles, while economic outcomes can support analysis on skill utilization. Rates of program completion for immigrant compared to non-immigrant students or apprentices can be determined using several years of existing administrative data rather than waiting for the completion of one or more cycles of a new survey.

Beyond the immigrant population, the research capacity of the LISE benefits all Canadians. For instance, the linkages between the LAD and health administrative databases provide first-hand information on Canadians' access to healthcare services by demographic characteristics, by geographic areas, as well as over time. Such investigations will effectively inform policy making that provides timely, accessible and sufficient healthcare services to all immigrants as well as vulnerable population groups and Canadians in general, in order to maintain a healthy workforce for a sustainable and resilient Canadian society and economy.

In addition, the LISE enables in-depth data disaggregation to focus on particularly vulnerable population groups. Within the refugee population, little is known about differences among government-assisted refugees, privately-sponsored refugees and Protected Persons in Canada, in terms of their mobility patterns after arriving in Canada, access to healthcare services, integration patterns in Canadian labour markets, as well as their children's access to and interactions with the Canadian educational system. Such research will immensely benefit the adjustment of programs such as the Resettlement Assistance Program, facilitating a smooth integration process for all refugees.

In short, the development of the LISE allows for additional research opportunities using the core datasets to inform policy and practice. By integrating Statistics Canada's current administrative datasets, new and expanded statistical analysis can be undertaken. It also enables future linkage work with other Statistics Canada administrative and survey-based databases – again, enriching and expanding analytical opportunities to better inform public policy and research.

The associated invasion of privacy is proportional to the research, policy and various other benefits the LISE offers to vulnerable population groups, immigrants and Canadians in general. In most cases, the most privacy-intrusive data are basic personal information (e.g. date of claim, country of birth, postal code) for refugee claimants, as there are risks associated with being identified (especially if they live in a sparsely populated postal code). In terms of data linkages involving the Census, information from the Census might also make it possible to identify racialized groups amongst immigrants. However, such information is also the most important information for the IMDB, as well as the immigrant sub-population in the LAD, as it enables robust longitudinal data integration. Due to the small number of a certain racialized groups as well as refugees, they may also face the risk of being identified from health and education administrative data records, particularly when they are further grouped by immigration categories, refugee programs, countries of origin and landing destinations, etc.

After the application of Statistics Canada's dissemination rules, this privacy intrusion is deemed minimal and risk of external reidentification is low. This minimal intrusion will lead to the provision of better services to immigrants (including refugees) and the creation of policies that improve the settlement process, ultimately benefitting not only the immigrant population but also the Canadian society and economy as a whole. Policy makers can access the analytical and statistical products from the LISE to shape new policies and programs. Likewise, immigrant service providers can access the data to determine the services required to help immigrants settle in Canada successfully.

Based on the IMDB and the LAD, the LISE also shares concerns with those two data bases. For example, tax microdata provides detailed industry of employment information and income sources of everyone, as well as immigration data that identify source countries and year of immigration. In some cases, postal code from tax microdata (the lowest level of geography on the database) can be a low enough level of geography to identify individuals with unique characteristics, for example, minority population groups residing in remote areas. However, sub-provincial data are often required to respond to the needs of individuals at their place of residence (to more effectively guide local policy and support services), and any dissemination is subject to Statistics Canada suppression rules which minimize the chance of such reidentification. For immigrants, in particular, their socioeconomic outcomes vary depending on the place of residence, time since admission, and pre-admission characteristics, supporting the use of this personal information as proportional to the public good resulting from its use.

4. Alternatives

The LISE provides the keys to match annual records over time. Without this option for comprehensive longitudinal data, the analysis of Canadians, permanent and non-permanent residents' pathways through education or healthcare institutions is impossible. Longitudinal indicators using these keys provide the changing trajectories over time and enhance analytical capabilities of the IMDB and the LAD. Matching administrative data to additional health and education information allows analysis at a refined level of geography and enables analysis for vulnerable sub-populations. In an approved secure environment, employees and deemed employees can analyze the relationships of population characteristics over time, with future labour market, healthcare or education outcomes on an annual basis. No other sources allow such a detailed analysis. Survey sources are restricted by sample size, response rates, less frequent collection, and lack of granularity in the data. Moreover, Statistics Canada has longstanding evidence that response rates to longitudinal surveys decline considerably over time, introducing biases and substantially reducing quality and accuracy. For these reasons, most longitudinal surveys have been discontinued.

Creating a new longitudinal survey capable of capturing the same variables over the amount of time required for longitudinal analysis would cost a great amount of time and resources, add significantly greater response burden to respondents, and be subject to aforementioned survey restrictions that could diminish the accuracy or quality of the survey data when compared to the administrative data in question.

Moreover, the sources of the health and education administrative data contain objective and accurate data collected by trained professionals using pre-established classification systems. Such information is much more reliable than could be obtained through surveys that collect self-reported information often biased by respondents' memories, feelings and judgements. For example, the DAD receives information directly from acute care facilities (or from their respective health/regional authority or ministry/department of health), with focuses on standardized patient-centred clinical outcomes. Since 2004-2005, all DAD records have been reported in ICD-10-CA (International Statistical Classification of Diseases and Related Health Problems) and CCI (Canadian Classification of Health Interventions).Footnote 3 The NACRS pick-lists are the Presenting Complaint List (data element 136) and ED Discharge Diagnosis (data element 137). The Canadian Emergency Department Diagnoses Shortlist (CED-DxS) includes more than 800 diagnoses in common terms, which are mapped to ICD-10-CA codes.Footnote 4 OMHRS collects data using the Resident Assessment Instrument – Mental Health (RAI-MH) version 2.0, a standardized clinical instrument that is used to regularly assess those receiving inpatient mental health care.Footnote 5 All those health administrative databases support the collection and analysis of clinical outcomes, health systems use and performance reporting. Likewise, the two education administrative databases collect subjective information about students' enrollment into and graduation from programs in postsecondary institutions (vocational/training schools included) throughout the country. With linkages to those administrative databases, the analytical capabilities of the IMDB and LAD are expanded tremendously, by using the most reliable and objective information sources, while minimizing respondents' burden and the costs of data collection.

Mitigation factors

The integrated datasets in the LISE do not contain any personal identifiers and are subject to the privacy and confidentiality requirements of the Statistics Act. Anonymous linkage keys are created using the Social Data Linkage Environment (SDLE) for which a separate privacy impact assessment has been approved. Furthermore, the privacy principle of limited collection is applied to data linkages, which are limited to integrating only one primary database to one secondary database at a time, serving to further mitigate aforementioned risks:

Privacy principles of limited collection
Primary Secondary
One of either: IMDB integrated with one of: DAD
NACRS
OMHRS
LAD PSIS
RAIS
Census

The personal identifiers obtained for immigrants, non-permanent residents, as well as Canadians are used in the SDLE to assign the anonymous statistical identifiers that allow Statistics Canada to link to other sources of information for statistical analysis and research, once approval is obtained in accordance with Statistics Canada's Directive on Microdata Linkage. The personal identifiers obtained are removed from the rest of the information and securely stored with restricted access to no more than twenty-five Statistics Canada employees with an approved operational requirement to access them, and whose access is removed when no longer required. The retention period for their storage and their destruction is prescribed by Statistics Canada's Directive on the Management of Statistical Microdata Files. As mentioned, refugee claimants, particularly residing in sparsely populated areas and/or originating from small racialized groups, may still be exposed to a higher risk of reidentification. To protect respondents' confidentiality, any group with a small number of members will be collapsed into larger groups, for example, by using a higher level of geographic classification or by collapsing a certain ethnic group or country of origin with its adjacent ethnic groups/countries.

Therefore, the overall risk of harm to the individuals whose information resides in the LISE has been deemed manageable with existing Statistics Canada safeguards that are described in Statistics Canada's Generic Privacy Impact Assessment. These include, but are not limited to:

Collection

The information is transmitted electronically to Statistics Canada using a secure electronic file transfer protocol.

Storage & processing

Identity and roles-based access management controls are in place in support of least-privilege and need-to-know principles. Access is restricted to employees who demonstrated a valid requirement to access the data. Data is fully encrypted at rest and in transit as required by GC policy, and encryption keys are managed by the Government of Canada to ensure that only authorized users can decrypt the data. Statistics Canada's cloud implementation aligns with Treasury Board of Canada Secretariat (TBS) cloud services direction including the Government of Canada Cloud Security Risk Management Approach and Procedures.

After initial processing, a statistical identifier is generated by Statistics Canada to facilitate data integration. As per standard practice, following linkages with other sources of information, data is stripped of direct identifiers such as name and address, to help protect privacy and confidentiality.

Access

Access to any confidential data held by Statistics Canada is closely monitored. For information with personal identifiers, only a limited number of employees with a work-related need-to-know are allowed access.

The purpose of making the LISE available to approved researchers (as 'deemed employees')Footnote 6 through the Statistics Canada Research Data Centers (RDCs) is to ease data access for users while enforcing the security requirements of the microdata. Access to the LISE at RDCS is only granted after a successful security screening and on a need-to-know basis.

Dissemination

The Statistics ActFootnote 7 provides the legal basis for maintaining the confidentiality of personal information that Statistics Canada collects. Statistics Canada will not disclose confidential information to any third party, other than with the permission of the original data provider and the authorization from the Chief Statistician, as required by the Statistics Act.

Statistics Canada will publish only aggregated statistical information or anonymized public use microdata files as part of its general dissemination strategy.

Openness

Information concerning datasets available within the LISE through Statistics Canada's Research Data Centres or approved LISE-based research projects is available on Statistics Canada's website. This supplement to Statistics Canada's Generic PIA will also be publicly available Statistics Canada's website.

Conclusion

This assessment concludes that with the existing Statistics Canada safeguards, any remaining risks are such that Statistics Canada is prepared to accept and manage.

Formal approval

This Supplementary Privacy Impact Assessment has been reviewed and recommended for approval by Statistics Canada's Chief Privacy Officer, Director General for Modern Statistical Methods and Data Science, and Assistant Chief Statistician for Social, Health and Labour Statistics.

The Chief Statistician of Canada has the authority for section 10 of the Privacy Act for Statistics Canada, and is responsible for the Agency's operations, including the program area mentioned in this Supplementary Privacy Impact Assessment.

This Privacy Impact Assessment has been approved by the Chief Statistician of Canada.

Date modified: