2015 submissions

Linkage of records from the 2011 survey for the Programme for the International Assessment of Adult Competencies (PIAAC), the 2011 Census of Population and the 2011 National Household Survey (NHS) (002-2015)

Purpose: Given that the linguistic practices of official language minorities in the labour market and their communities affect their cultural and economic vitality, linking PIAAC data with 2011 NHS and 2011 Census data will provide information on each member of a PIAAC respondent's household, which cannot be done at this time. This will help us gain a better understanding of the relationship between the characteristics of household members and the level of skills measured in the PIAAC. As well, the census and NHS include questions that were not asked in the PIAAC survey (e.g., questions to derive the first official language spoken and a question on languages used regularly at work), hence the importance of linking data from different sources.

Description: The PIAAC survey is part of a series of international surveys that have been conducted since the mid-1980s to measure the various dimensions of adult literacy, numeracy and problem-solving skills. The 2011 Census and 2011 NHS contain information on respondents' first official language spoken and on the members of the respondents' household.

Records from the PIAAC, 2011 Census and 2011 NHS are linked using a (deterministic hierarchical) record linkage program.

Only PIAAC survey respondents and members of their households will be retained for this record linkage.

Output: Only estimates that present aggregate data in accordance with the confidentiality provisions of the Statistics Act will be published outside Statistics Canada. Products derived from the linkage between PIAAC, the census and the NHS will be in the form of cross-tabulations, charts, geographical maps and results of multivariate logistic and linear regressions.

The results of the linkage, including the variables used to link the records, such as personal identifiers and information for measuring linkage quality, will be destroyed by March 31, 2016 at the latest, or as soon as they are no longer required. All files will be stored on a server in a secure location. Access to the linkage results is restricted to Statistics Canada employees and deemed employees of Statistics Canada whose work assignment requires this access.

Development of predictive models for admission to long-term care and residential care facilities in Canada – Canadian Community Health Survey to Hospital, Mortality, Census (2011), Tax and Continuing Care Reporting System Linkage (003-2015)

Purpose: The purpose of this linkage project is to identify the factors associated with admission to long-term care and residential care facilities among household dwelling Canadians and develop a predictive model that can be used to estimate future demand for these services. Currently in Canada, as in other developed countries, there is ongoing concern and debate regarding the future demand for long-term and residential care. Currently, estimates of the future need for long-term care typically rely on age and sex population projections alone without accounting for other factors known to be associated with admission including physical and mental disability, acute health events (e.g. stroke and hip fractures) as well as social support and household composition (i.e. living alone or with others) and income. While few Canadian studies on population-based predictors for long-term care have been conducted, none have considered a full range of health states and acute health events or changes in household dwelling. Furthermore, there is currently little to no information regarding the factors associated with the admission to residential care, an increasingly important service for seniors with less serious health needs. This data linkage will create a unique retrospective longitudinal cohort by linking health survey, health administrative and census data to follow survey respondents overtime to identify those factors associated with transitioning from the community to either long-term or residential care.

Description: Building on existing linkages, this project will extend the linkage of the Canadian Community Health Survey (2000/01-2011, 4.2) to the Discharge Abstract Database (DAD) (1996/1997 to 2013/2014), Canadian Mortality Database (CMDB), 2000 to 2012 and Historical Tax Summary File (HTSF), 1990 to 2012 (Record Linkage #030-2012) to include the 2011 Census, the (2000-2013) T1Family File and the Continuing Care Reporting System (CCRS) to provide information on those institutionalized. Linkages would occur only for those CCHS respondents who have given consent to link information to their survey data.

The CCHS provides comprehensive information regarding health, socio-economic and household dwelling status of community dwelling seniors. The DAD and NACRS provide comprehensive information regarding the use hospital services including diagnosis, treatment, and use of resources which can be used to identify a major adverse health event. The CMBD will provide information regarding mortality outcomes, primary cause of death, and allow calculation of loss to follow-up and competing events. The HTSF will be used to assist in record linkage. The T1FF file will provide information on individual and family income and household composition. The 2011 Census of Population and the CCRS will be used to identify those individuals that have transitioned into institutional and/or residential care following response to the survey. The final analysis file will not contain direct personal identifiers such as names, health information numbers or death registration numbers or tax information.

Output: The linked files will at all times remain on Statistics Canada Head Office premises. Only non-confidential aggregate data that conform to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. Major findings will be used to prepare research papers for publication in peer-reviewed journals (including Statistics Canada's Health Reports) and presentation at workshops and conferences.

The linked analysis files, stripped of direct personal identifiers, will be retained until no longer required by Statistics Canada, up to December 31, 2020, at which time they will be destroyed. The corresponding linkage key files housed in the Statistics Canada Head Office will also be retained until no longer required by Statistics Canada, up to December 31 2020, at which time they will be destroyed.

2015 General Social Survey on Time Use: linking tax data from the T1 Personal master File, T1 Family File and T4 Summary and Supplementary file (007-2015)

Purpose: The General Social Survey (GSS) program, established in 1985, conducts telephone surveys from a sample selected across the 10 provinces (excluding the Territories). The GSS is recognized for its regular collection of cross-sectional data that allows for trend analysis, and its capacity to test and develop new concepts that address emerging issues. Each year the GSS focuses on a different topic, such as family, victimization, social support and aging, and time use. A specific topic is usually repeated approximately every 5 years. The 2015 GSS will focus on Time Use.

The 2015 GSS on Time Use is the fifth iteration of a series of surveys which began in 1986. By linking the 2015 GSS on Time Use responses to personal tax files of respondents, and the tax files of all household members, more accurate income (personal and household), will be obtained for respondents. At the same time, response burden will be minimized, and collection and data processing costs will be reduced.

Description: The 2015 GSS on Time Use is a sample based survey with a cross-sectional design. Telephone and /or Internet surveys are conducted through computer assisted telephone or Electronic Questionnaire (EQ) interviews from a sample selected across the 10 Canadian provinces.

By linking data, we are aiming to obtain better quality data for income (personal and household).

Questions relating to income show rather high non-response rates, the incomes reported by respondents are usually rough estimates and donor imputation is used for partial and item non-response.

The information collected during the 2015 GSS on Time Use will be linked to the personal tax records (T1, T1FF or T4) of respondents, and tax records of all household members.

Respondents will be notified of the planned linkage before and during the survey. Any respondents who object to the linkage of their data will have their objections recorded, and no linkage to their tax data will take place.

Output: The availability of the 2015 GSS on Time Use analytical data file will be announced in The Daily. The analysis file will be made available to Statistics Canada researchers, and to deemed employees at the Statistics Canada Research Data Centres. All data will remain confidential and protected under the Statistics Act.

Along with the availability announcement of the analytical data file (in The Daily), only non-confidential aggregate statistics will be released.

Business Development Bank of Canada: The Importance of Business Development Bank of Canada Client Services on Firm Performance and Survival (008-2015)

Purpose: This project examines BDC client performance relative to firms that do not receive guidance and support from BDC, firm characteristics that are affected the most by BDC support, and which services provide the greatest effect on the growth and survival of businesses. An increased understanding of the effect of its services will enable BDC to adjust their services to better support their clients in the future.

Description: A list of firms in the BDC portfolio in the 2008 to 2012 period will be linked to data from National Accounts Longitudinal Microdata File (data from the Business Register, Corporate tax data-T2 tax database, PD7 and T4). The BDC firms records will be linked probabilistically using name and address. This is a one-time linkage.

Output: Only non-confidential aggregate statistical outputs and analyses that conform to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. The information will be presented in the form of tables of regression results and summary statistics related to the project's goal of ascertaining the impact of receiving BDC support.

The linked file will be retained until March 31, 2020. All direct business identifiers will be removed from the analysis file once linkage is complete, and placed in a separate linkage key file. The linked file and the linkage key file will be retained until no longer required up to March 31, 2020, at which time it will be destroyed.

Linkage of the 2014 Teacher's Questionnaire to the 2014 Ontario Child Health Study (OCHS) (010-2015)

Purpose: The main objective of this linkage is to combine the data provided by a teacher of a child who participated in the 2014 Ontario Child Health Study (OCHS) with data collected in main survey (OCHS). The linkage will allow for a more complete portrait of children's mental health in Ontario.

Description: Responses to the Teacher's Questionnaire and 2014 Ontario Child Health Study (OCHS) will be matched for each respondent using the variable sample_id (which identifies a child uniquely in the sample). This linkage be used to create an analytical file.

Output: The data obtained on the Teacher's Questionnaire will include information about the child's school achievement and behaviour at school and combined with parent (guardian) reported data on the 2014 Ontario Child Study. The data are processed and prepared for dissemination using a regular suite of Statistical products including analytical files (with personal identifiers removed) made available in the Research Data Centres.

Prostate Cancer Surveillance and Occupational Exposures: a Subsequent Use of Linkage 049-2012 (011-2015)

Purpose: Utilizing a large and accessible population dataset like the 1991 Canadian Census cohort will provide evidence on multiple occupational exposures and prostate cancer to further research in identifying risk factors for prostate cancer. There is a need to explore occupational exposures and prostate cancer at a national level with available occupational information. This large dataset will provide more knowledge and understanding while contributing to the insufficient evidence in the literature. This is an important area of research as prostate cancer incidence is increasing and as there continues to be little known about the etiology of this cancer. Is there a relationship between occupational exposures and prostate cancer in Canadian workers is the research question. The objectives are to test relevant hypotheses related to occupational exposures in the etiology of prostate cancer and to evaluate if specific industry and occupational exposures are related to prostate cancer in Canadian workers.

Description: The 1991 Canadian Census of Population, Canadian Mortality & Cancer Follow-up study is a probabilistic linked database. Approximately 2.7 million individuals aged 25 or older, who were enumerated by the 1991 long-form census, were followed for mortality, cancer and annual place of residence.

Output: Only aggregate data and analyses conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada Research Data Centres, in the form of peer reviewed journal articles, presentations at conferences and a part of a graduate level thesis/dissertation.

2012 Canadian Survey on Disability and 2011 National Household Survey Linkage (013-2015)

Purpose: The Canadian Survey on Disability (CSD) is a post-censual survey which provides information on Canadians whose everyday activities may be limited because of a condition or health-related problem. Information from this survey is essential for the effective development and operation of national programs such as employment equity and is required by the Government of Canada to fulfill various international commitments, including the United Nations Convention on the Rights of Persons with Disabilities.

An application (024-2013) had been made in April 2013 to conduct a linkage between the 2012 Canadian Survey on Disability (CSD) and the 2011 National Household Survey (NHS). Approval was received to link the data files and a final analytical file was created in the fall of 2013. The retention period for the CSD-NHS composite file expired in August 2014, however, we are receiving a number of requests to include additional NHS variables on the file. This request is therefore to further enrich the analytical potential of the 2012 CSD microdata file by including additional variables that had not been included on the original request.

Researchers who use the CSD data have expressed the desire to have the new NHS variables added to the existing file in order to further enhance their analytical objectives. NHS data complement the findings of the CSD, providing information on topics that were either beyond the scope of the CSD or which were explored in the survey in only a very limited way in order to reduce response burden. The new variables being proposed are consistent with the goal of enriching the analytical potential of the 2012 CSD microdata file.

Description: Responses to the 2012 CSD and 2011 NHS will be matched for each respondent using the variables frame_id (which identifies a household uniquely in Canada) and persnr (which identifies a person uniquely within the household). This linkage will result in the CSD-NHS linked analytical microdata file.

Output: Linked data, including the newly requested variables, from the 2012 Canadian Survey on Disability and 2011 National Household Survey will be disseminated on the analytical microdata file produced for the 2012 CSD. A microdata file was released to the National Research Data Centre in January 2014 and will be re-released in Spring 2015, once the new variables have been added. Any CSD products containing linked data will be disseminated in accordance with Statistics Canada's policies, guidelines and standards. Only aggregate statistical estimates that conform to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada.

Economic Development Agency of Canada for the Regions of Quebec (EDAC): Economic Impact – 2001 to 2012 (018-2015)

Purpose: To support the evaluation of the Economic Development Agency of Canada for the Regions of Quebec's (EDAC) financing services program, by producing objective measures of its economic impact on the performance of small and medium-sized enterprises (SMEs). Key performance indicators, and value-added measures such as sales, profits, firm survival rate, and employment, will be calculated for EDAC client businesses and for comparable non-client businesses.

Description: A list of firms that were EDAC clients in the period 2001 to 2012 will be linked to the Business Register to obtain the Business Number and Statistical Enterprise Number, to facilitate linkage to payroll and tax data. In order to measure the effectiveness and the impact of EDAC financing services, a comparison group of non-EDAC client firms with similar characteristics will be selected.

Records of EDAC clients and the businesses in the comparison group will be linked to the Payroll Deduction Account (PD7), T2 Corporate Tax data, the General Index of Financial Information (GIFI), Research and Development in Canadian Industry (RDCI), and the Chart of Accounts database for the period 2001 to 2015. The records will be linked using the Business Number and Statistical Enterprise Number. The resulting linked analysis file will enable longitudinal analysis of each cohort. The characteristics of the matched and un-matched businesses will also be compared. This is a one-time linkage.

Output: Only non-confidential aggregate statistical outputs and analysis that conform to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. These will be in the form of separate summary tables of regression analysis results relating to the study hypotheses of the economic impact of EDAC's financing services, in addition to profiling tables.A methodology report will be prepared, explaining the file matching processes and constraints and key issues related to the quality of the data. An analytic report will be produced by Statistics Canada.

Extending the Relevance of Longitudinal Files (023-2015)

Purpose: The goal of this linkage is to add repeated measures for important outcome domains for each respondent to the five terminated longitudinal surveys to extend their analytical relevance with minimal investment when compared to the cost of new data collection. This linkage will allow researchers both inside Statistics Canada and through the Research Data Centres to analyze longer term outcomes for the cohorts in the five longitudinal surveys.

Description: The longitudinal surveys involved are:

  • Youth in Transition Survey (YITS),
  • National Population Health Survey (NPHS), Household component,
  • Survey of Labour and Income Dynamics (SLID),
  • National Longitudinal Survey of Children and Youth (NLSCY), and
  • Longitudinal Survey of Immigrants to Canada (LSIC).

To extend the analytic value of these surveys, a file containing variables that measure key outcomes (e.g. income, health, employment and mobility) will be created using data from the following Statistics Canada surveys/administrative data bases:

  • Census 2006 and 2011
  • National Household Survey 2011
  • Vital Statistics - Deaths (1993 to 2011)
  • Canadian Cancer Registry (containing cancer diagnoses from 1992 to 2011)
  • T1 Family File (T1FF) (1993 to 2011)

Output: Only aggregate data and analyses conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada Research Data Centres, in the form of peer reviewed journal articles, presentations at conferences and a part of a graduate level thesis/dissertation.

Examining the association between melanoma cancer incidence and environmental UVR exposure using the 1991 Canadian Census Cohort. A subsequent use of linkage 049-2012. (030-2015)

Purpose: This national epidemiological study will examine the relationship between average ambient residential UV radiation during a 16 year period and the relative risk of developing melanoma in Canada. Respondents in the 1991 Canadian Census Cohort will be spatially linked to a modelled monthly mean environmental ultraviolet radiation (UVR) and UV Index estimates by postal code. Surface models for UV values in each month will be constructed using spatial interpolation in Geographic Information Systems (GIS). Using these surface models, unique postal code localities will be assigned UVR and UV Index values for each of the summer months and will be joined with the 1991 Canadian Census Cohort respondents by postal code for each of the 16 years of Cohort follow-up. Cox proportional hazard models will be used to estimate the hazard of melanoma diagnosis associated with summer UV exposures among genders, age groups, visible minority groups, and socioeconomic groups.

Description: Environmental UV models will be provided to Statistics Canada researchers through collaboration with Environment Canada and the World Ozone and Ultraviolet Radiation Data Centre (WOUDC). The UVR datasets include monthly mean daily UVR and mean noon UV Index values, for a period of 1980-1990.

The 1991 Canadian Census Cohort: mortality & cancer follow-up is a probabilistic linked database. Approximately 2.7 million individuals aged 25 or older, who were enumerated by the 1991 long-form census, were followed for mortality, cancer and annual place of residence.

Output: The linked data file will remain on Statistics Canada premises. Only aggregate data and analyses conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada, in the form of peer reviewed journal articles or correspondence with collaborators.

Examining the mortality effects and socioeconomic inequalities of industrial emissions using the 1991 Canadian Census Cohort. A subsequent use of linkage 049-2012. (032-2015)

Purpose: This research will examine socioeconomic differences and potential mortality effects of residing near industrial facilities. Using the 1991 Canadian Census Cohort, subjects will be spatially joined to industrial facilities within a specified geographic range using the residential postal code (1984 through 2006) and facility location. The analysis will first examine the socio-economic differences in the potential for exposure to industrial emissions using individual-level variables from the 1991 Census of Population and ecological variables from the 1991 Census. Second, survival modelling will be undertaken with emissions assigned to subjects based on their residential postal codes. The analysis will include the individual-level variables from the 1991 Census and ecological variables at the Census Tract and Census Division calculated previously by Statistics Canada.

Description: The 1991 Canadian Census Cohort: mortality & cancer follow-up is a probabilistic linked database. Approximately 2.7 million individuals aged 25 or older, who were enumerated by the 1991 long-form census, were followed for mortality, cancer and annual place of residence.

Output: Only aggregate data and analyses conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada Research Data Centres, in the form of peer reviewed journal articles.

National Apprenticeship Survey (NAS) - 2015: linking tax data from the T1 Family File (033-2015)

Purpose: The National Apprenticeship Survey (NAS), established in 1989, conducts telephone surveys from a sample selected across the 10 provinces and 3 territories. The NAS is an occasional survey, the last one being conducted in 2007. The NAS 2015 is a survey of apprentices that targets individuals who have completed or discontinued their apprenticeship in the year 2011, 2012 or 2013. This survey aims to understand the factors that influence whether apprentices complete or discontinue their apprenticeship, the challenges of obtaining certification and the effectiveness of the most recent financial support programs. It also serves to examine the transition to the labour market of apprentices who completed or discontinued their apprenticeship.

By linking the NAS 2015 responses to the personal tax files of respondents, more accurate income (personal) will be obtained for respondents. At the same time, response burden will be minimized, data quality will be improved, and collection and data processing costs will be reduced.

Description: The NAS 2015 is a sample based survey with a cross-sectional design. Telephone surveys are conducted through computer assisted telephone interviews (CATI) from a sample selected across the 13 Canadian provinces and territories. By linking data, we are aiming to obtain better quality data for income (personal).

The information collected during the NAS 2015 will be linked to the personal tax records (T1FF) of respondents.

Respondents will be notified of the planned linkage during the survey. Any respondents who object to the linkage of their data will have their objections recorded, and no linkage to their tax data will take place.

Output: The availability of the NAS 2015 analytical data file will be announced in The Daily. The analysis file will be made available to Statistics Canada researchers, and to deemed employees at the Statistics Canada Research Data Centres. All data will remain confidential and protected under the Statistics Act.

Along with the availability announcement of the analytical data file (in The Daily), only non-confidential aggregate statistics will be released.

Amendment to include the T1FF and extending by one year the retention period of the linked files; 2014 General Social Survey on Victimization: linking tax data from the T1 Personal File and T4 Summary and Supplementary file (040-2015, 075-2013)

Purpose: This amendment will add the T1FF file to the previously approved record linkage 075-2013. There is no change to the proposal other than the addition of this file.

The General Social Survey (GSS) program, established in 1985, conducts telephone surveys from a sample selected across the 10 provinces. Population in Yukon, Northwest Territories and Nunavut are not usually part of the targeted GSS population with the exception of cycles on victimization. The GSS is recognized for its regular collection of cross-sectional data that allows for trend analysis, and its capacity to test and develop new concepts that address emerging issues. Each year the GSS focuses on a different topic, such as family, victimization, social support and aging, and time use. A specific topic is usually repeated approximately every 5 years. The 2014 GSS which will focus on Victimization is the sixth iteration.

This survey is an important source of information to better understand how safe people feel, what they think of the justice system and their experiences of crime.

By linking the 2014 GSS on victimization responses to personal tax files of respondents, and the tax files of all household members, more accurate income (personal and household) information will be obtained for respondents. At the same time, response burden will be minimized, and collection, data processing, and testing costs will be reduced.

Description: The 2014 GSS on Victimization is a sample based survey with a cross-sectional design. Telephone surveys are conducted through computer assisted telephone interviews from a sample selected across the 10 Canadian provinces and interviews are conducted through a mix of computer assisted telephone interviews and computer assisted personal interviews in the territories. By linking data, we are aiming to obtain better quality data for income (personal and household).

Questions relating to income show rather high non-response rates, the incomes reported by respondents are usually rough estimates. Linking will allow getting such information without having to ask questions.

The information collected during the 2014 GSS on Victimization will be linked to the personal tax records (T1, T1FF or T4) of respondents, and tax records of all household members. Household information (address, postal code, and telephone number), respondent's information (social insurance number, surname, name, date of birth/age, sex) and information on other members of the household (surname, name, age, sex and relationship to respondent) will be key variables for the linkage.

Respondents will be notified of the planned linkage before and during the survey. Any respondents who object to the linkage of their data will have their objections recorded, and no linkage to their tax data will take place.

Output: The availability of the 2014 GSS on Victimization analytical aggregated data file will be announced in The Daily. The analysis file containing only aggregated data created using confidentiality procedures as required by Statistics Canada's directives will be made available to Statistics Canada researchers, and to deemed employees at the Statistics Canada Research Data Centres. All data will remain confidential and protected under the Statistics Act.

Along with the availability announcement of the analytical data file (in The Daily), only non-confidential aggregate statistics will be released.

Long-Term Family Economic Consequences of a Childhood Cancer Diagnosis (041-2015)

Purpose: The objectives of this study are (1) to evaluate the proportion of families whose income is impacted and to quantify the extent to which the income is impacted after a diagnosis of childhood cancer by linking incident childhood cancer patients to their parent's income tax file data and exploring the short- and long-term economic impacts in comparison to a matched set of controls, (2) to evaluate patient, disease and family factors which may be associated with greater economic disparity, or which may ameliorate or temper any disparity and (3) to evaluate the proportion of individual cancer survivors whose income is impacted and to quantify the extent to which the income is impacted by exploring the short- and long-term economic impacts in comparison to a matched set of controls.

Descriptions: Information on children diagnosed with cancer in Ontario, held in the Pediatric Oncology Group of Ontario Networked Information System (POGONIS) will be linked to

Statistics Canada's T1 Family File (T1FF), and two variables from the Immigrant Landing File (ILF). Specifically, cancer diagnoses to children between the ages of 0 and 14 years, in Ontario between 1992 and 2006, will be linked to the T1FF, from 1989 to the most recent year of T1FF available at the time of linking, and immigrant identifier and landing year from the ILF. T1FF/ILF information for families and individual cancer survivors will be examined and compared to T1FF/ILF information from a set of matched families who did not experience a childhood cancer diagnosis.

The linkage will be produced by Statistics Canada staff on the agency's premises.

Output: Only aggregate data that conform to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. Findings such as research papers will be offered for publication consideration in regular Statistics Canada reports such as Health Reports and will be prepared for submission to peer-reviewed international science journals. All reports will be made available to the Pediatric Oncology Group of Ontario. To support on-going analysis, the linked analysis file will be retained at Statistics Canada until 31 December, 2022 at which time it will be destroyed. The retention period may end sooner if the data file is no longer needed. Access to this linked analysis file will be by Statistics Canada employees or Statistics Canada deemed employees whose assigned work activities require such access.

AMENDMENT: Data linkage to examine pathways of students through post-secondary education (PSE) and into the labour market, 2005-2013 (042-2015)

Purpose: The main objective of this project is to link the PSE institution student administrative data from 2005 to 2013, to the tax data (using the T1 Family File) of the corresponding years. This amendment is to expand the observation period to 2013; the observation period was originally up to 2012. Now that data from T1FF-2013 has been released it is now possible to add this additional year to the scope of the project.

The linkage will enable the tracking of post-schooling earning trajectories of PSE institutions students by various fields of study and by cohort of graduates.

Description: Records of students from 14 PSE institutions would be linked to the T1 Family File (T1FF) over an 8 year period (from 2005 to 2013). The data linkage will be done in two stages:

In the first stage, PSE institutions will send a file containing the student identification variables as well as a pseudo-identification variable for each student. The linkage will be done with the T1 Family File containing an identification number and a selection of variables to conduct the research. Once the linkage is finalized, the student identification variables will be destroyed except the pseudo-identification variable from the PSE institutions.

In Stage 2, the PSE institutions will provide a file with the pseudo-identification variable and the student information. This second file will be linked to the reduced T1FF file from stage 1. The record linkage will be done by Statistics Canada personnel.

Output: The outputs will consist of two types: 1. a report submitted to ESDC, containing data tables and regression models on all participating PSE institutions and 2. Individual reports on each participating institution consisting of aggregate statistics on their own students.

Two types of micro data files will be created to produce the two types of outputs. One linked file on all participating institutions for the first report and one linked file per institution for the production of individual reports.

Further analytical research may be produced from the resulting linked files

Women's Enterprise Initiative Project: Linkage of Client List to the Canadian Employer-Employee Dynamics Database, 2007 to 2012 (044-2015)

Purpose: To provide statistical information to support the assessment of the effectiveness of the Women's Enterprise Initiative (WEI) program in assisting women owned enterprises, by comparing the performance of enterprises that received financial assistance under the program to the performance of other unassisted enterprises in the same region. This information will be used by Western Economic Diversification Canada (WD) which manages the WEI programs to determine more effective means of providing assistance to their clients. Employment dynamics, enterprises entering and exiting, selected financial statistics, as well as measures of employment will be analyzed.

Description: A list of enterprises assisted by the WEI Program will be linked to the Canadian Employer-Employee Dynamics Database (CEEDD): 2007 to 2012 to identify the assisted and non-assisted groups and to produce custom tabulations on the two groups.

Output: The outputs released outside of Statistics Canada will be non-confidential aggregate statistics and analyses that conform to the confidentiality provisions of the Statistics Act. The information will be presented in the form of statistical tables, broken down by industry sector and enterprise size.

The linked analysis file, containing the linkage keys and identifiers, will be retained until March 31, 2018, or until no longer required, at which time it will be destroyed.

Air Pollution Study: Linkage of 2001 Census of Population, T1 Universe Files, Mortality and Cancer Databases (045-2015)

Purpose: To assess the impact of long-term exposure to air pollution on human health, with the objective to inform the development of Canada-wide standards for key criteria pollutants. Linkage of separate sources of information is an important way in which Statistics Canada can meet identified data gaps on environmental data related to human exposure to air pollution.

The specific objectives of this study are: to determine whether non-accidental deaths and cancers are associated with long-term low exposure to ambient air pollutants;

Description: A sample of approximately 3.7 million Canadians was selected from respondents to the 2001 Census of Population long-form questionnaires and their Census information was linked to the T1 Universe Files (1981 to 2021), the Amalgamated Mortality Database (2001 to 2021) and the Canadian Cancer Registry (1992 to 2021). Air pollution data (e.g. fine particulate matter (PM2.5), nitrogen dioxide (NO2), ozone (O3)) will be spatially integrated to these files.

The linked files will contain only those data items required to conduct the study. Personal identifiers, such as name and social insurance number, will be used only for linkage purposes, then removed from the linked analytical microdata file. Only a sample of individuals who completed the 2001 Census of Population long-form questionnaires are included on the file.

Output: All access to the linked microdata file will be restricted to Statistics Canada personnel (including Statistics Canada deemed employees) whose work activities require access. Only aggregate data that conform to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. Major findings will be used to create research papers for publication in peer-reviewed journals and presentation at workshops and conferences.

The linked file, stripped of personal identifiers, will be retained until no longer required, at which time the file will be destroyed.

Linkage of Annual Return of Broadcasting Distribution (Statistics Canada survey title) Annual Return of 'Broadcasting Distribution' Licensee (CRTC Title) to business tax data T2 tax data for Imputation of non-surveyed small cable entities. (053-2015)

Purpose: Linkage to business tax data using General to Detail Allocation (GDA) and Chart of Accounts (COA) in order to impute for selected financial variables for non surveyed small cable entities. Use of tax data for data replacement and derivation of tax ratios minimises response burden while ensuring better coverage of the industry

Description: Selected financial variables such as operating revenue, operating expenses, salaries and wages and amortization for non surveyed small cable entities are imputed through direct data replacement using General to Detail Allocation (GDA) and its Chart of Accounts (COA cells). Revenues and expenses ratios are then produced and used to derive detailed revenue and expenses items. BN numbers and names of entities are used in the linkage process.

Output: Only aggregated data at national or provincial grouping level are disseminated, after confidentiality procedures are implemented. No tax records are provided to data sharing partners. Temporary files containing individual tax information are destructed after each survey cycle, once the imputed data have been validated and exported into the final production database.

Education Longitudinal Linkage Platform (ELLP): Creation of a record linkage platform to allow development of key education indicators and analysis related to postsecondary education and apprenticeship programs (059-2015)

Purpose: Longitudinal data are needed for the development of key, Pan-Canadian, longitudinal indicators and analysis related to postsecondary education and apprenticeship programs. These outputs will lead to a better understanding of student pathways through postsecondary education and training including completion rates and outcomes. They will be useful for education and labour market policy and planning and fill gaps in current knowledge.

Administrative data files from the Postsecondary Student Information System (PSIS), the Registered Apprenticeship Information System (RAIS) and the T1 Family Files (T1FF) will be used to create a linkage platform for relating longitudinal education information and other data sources listed below.

The linkage platform will permit use of the longitudinal administrative data while protecting the privacy of individuals.

Description: The target population for the linkage platform and education indicator development comprises individuals who were enrolled in postsecondary institutions (PSIS) or registered in apprenticeship programs or as trade qualifiers (RAIS), at some time since 2008. Data for selected jurisdictions will go back as far as 2004.

Anonymized linking keys will be associated to the records of analytical variables from the data source files and all personal identifiers will be removed. A registry of these linking keys will be created. To protect the sensitivity of the information, the registry of keys and the personal identifiers required for updating the linkage platform will be stored in separate files in a separate location accessible only to the few Statistics Canada employees whose job duties require access. The registry of keys will be used to create customized, linked files that merge variables from the different data sources for creating longitudinal education indicators and for analytical purposes. These customized, linked files will not include the data source linking keys or personal identifiers.

Data sources used to construct the linkage platform or that will be linked for analytical purposes include:

  • Postsecondary Student Information System (PSIS) annual pan-Canadian records beginning with 2008-09 and ongoing, and records for selected jurisdictions for 2004-05, 2005-06, 2006-07, 2007-08;
  • Registered Apprenticeship Information System (RAIS) annual pan-Canadian records beginning with 2008 and ongoing, and records for selected jurisdictions for 2004, 2005, 2006, 2007;
  • Selected, tax-related, administrative and concordance files needed to establish and validate record matches between the annual PSIS and RAIS data files.
  • The T1 Family Files (T1FF), beginning with 1997 and ongoing
  • National Apprenticeship survey, 2015 and ongoing;
  • National Graduate Survey, beginning with 2013 (graduates of 2009/2010) and ongoing;
  • the Alberta Graduate Outcomes Survey, beginning with 2004 or the first year available after that and ongoing;
  • Additional files from the Alberta data systems that are used to report PSIS and RAIS data to Statistics Canada and files from selected institutions, beginning with 2004 and ongoing;
  • the Citizenship and Immigration Canada Landing File; the Census; the National Household Survey (NHS); the Canadian Employer-Employee Dynamics Database (CEEDD); and the Longitudinal Apprentices and Trade Qualifiers Database; all beginning with 2004 or the first year available after that and ongoing;
  • Data on student and apprenticeship loans and grants from provinces, territories and/or Employment and Social Development Canada (ESDC), beginning with 2004 and ongoing.

New years of data will be added when they become available.

The eventual production of the ELLP within the Social Data Linkage Environment will be explored.

Output: Analytical data linked using this linkage platform will be used to prepare indicators, tables, analytical reports and research papers for publication, for presentation at conferences, workshops and meetings and to fill cost-recovery requests for clients. They will also be used to provide insights for improving education data collection and data quality.

Only non-confidential aggregate statistics and analysis conforming to the confidentiality provisions of the Statistics Act or as permitted by the Statistics Act will be released outside of Statistics Canada. A discretionary disclosure approval has been granted to allow the PSIS program to release aggregated enrolment and graduation information at the postsecondary institution level for institutions that have signed a waiver covering the specific PSIS release period.

Longitudinal Immigration Database (IMDB): Extension and Updates (060-2015)

Purpose: The Longitudinal Immigration Database (IMDB) is used to analyze immigrants' economic integration as well as internal mobility. It is a unique source of data at Statistics Canada that provides a direct link between immigration policy and the economic performance of immigrants.

Specifically, the IMDB provides federal and provincial departments involved in immigration issues and programs, the research community, and immigrant settlement agencies in Canada with crucial data to conduct research regarding the selection process of immigrants, their settlement patterns and their economic integration.

Description: The Longitudinal Immigration Database (IMDB) is a database that is created by linking Immigration, Refugees and Citizenship Canada's (IRCC) administrative immigrant files with personal tax files obtained by Statistics Canada from Canada Revenue Agency.

The IMDB currently includes tax data from 1982 to 2013 and covers immigrants who landed in Canada from 1980 to 2013. The IMDB is being re-designed as follows:

  • to extend the universe of the database to include immigrants who landed before 1980 (1952 to 1979) to ensure a better coverage of the immigrant population in Canada;
  • to extend the universe of the database to include temporary residents who arrived in Canada from 1980 to 2019 to account for pre-landing experience in Canada and to study pathways from arriving as temporary residents to landing;
  • to include a date of citizenship to study pathways to citizenship;
  • to include a date of death from the Amalgamated Mortality Database to better account for the population in scope;
  • to take advantage of newly developed files such as the Dependent Registry to improve record linkage; and
  • to use this new methodology for seven reference years of updates, i.e. data reference years 2013 to 2019 (for immigration records and tax files).

Immigrant identifiers will continue to be added to the Longitudinal Administrative Databank (LAD).

Output: Only aggregate statistics and analysis conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. These will be in the form of tables on income distribution, interprovincial mobility, industry of employment, and provincial indicators produced for IRCC, as well as other federal and provincial organizations. On request, multivariate analyses and statistical tables will be produced from the IMDB analysis file for researchers. All access to the analysis file will be on Statistics Canada premises and will be restricted to only those employees and deemed employees of Statistics Canada whose assigned work duties require such access.

The IMDB linked analysis database will be retained until at least July 2021, at which time the Executive Management Board will be asked to review continuance of the program.

Canadian Health Measures Survey (CHMS) – 2016, Cycle 5 Linkage to Tax Data (061-2015)

Purpose: The Canadian Health Measures Survey (CHMS), launched in 2007, is collecting key information relevant to the health of Canadians by means of direct physical measurements such as blood pressure, height, weight and physical fitness. In addition, the survey is collecting blood and urine samples to test for chronic and infectious diseases, nutrition and environment markers.

Through household interviews, the CHMS is gathering information related to nutrition, smoking habits, alcohol use, medical history, current health status, sexual behaviour, lifestyle and physical activity, the environment and housing characteristics, as well as demographic and socioeconomic variables.

All of this valuable information will create national baseline data on the extent of such major health concerns as obesity, hypertension, cardiovascular disease, exposure to infectious diseases, and exposure to environmental contaminants. In addition, the survey will provide clues about illness and the extent to which many diseases may be undiagnosed among Canadians. The CHMS will enable us to determine relationships between disease risk factors and health status, and to explore emerging public health issues.

CHMS data are representative of the population whether they are healthy or not and provide a better picture of the actual health of Canadians.

By linking the CHMS cycle 5 to the personal tax files of respondents, more accurate income (personal and household), will be obtained for respondents. At the same time, response burden will be minimized, and collection and data processing costs will be reduced.

Description: The CHMS cycle 5 is a sample based survey with a cross-sectional design.

The sample is allocated over 11 age-gender groups, with between 500 to 600 units per group (5,700 total) required to produce national estimates.

Collection includes a combination of a personal interview using a computer-assisted interviewing method and, for the physical measures, a visit to a mobile examination centre (MEC) specifically designed for the survey.

For the cycle 5 collection period (January 2016 – December 2017), approximately 5,700 participants between the ages of 3 and 79 will complete both a household health questionnaire and a physical measures test. The mobile clinic (MEC) will be set up in 16 different sites across Canada, and will remain at each location for approximately five weeks.

The information collected during the 2016 CHMS will be linked to the personal tax records (T1, T1FF or T4) of respondents, and tax records of all household members.

Respondents will be notified of the planned linkage at the end of the household interview. Any respondents who object to the linkage of their data or the data of other members of their household will have their objections recorded, and no linkage to the tax data will take place.

Output: The availability of Canadian Health Measures Survey, cycle 5 data file(s) will be announced in The Daily. The data file containing the income data is expected to be made available in Fall 2016 to approved researchers in partner departments (Health Canada and the Public Health Agency of Canada) through a share agreement, Statistics Canada researchers, and to deemed employees at the Statistics Canada Research Data Centres. All data will remain confidential and protected under the Statistics Act. There will be no personal indentifies on this data file.

Along with the availability announcement of the analytical data file (in The Daily), only non-confidential aggregate statistics will be released.

Canadian Income Survey: Linkage to Income Data Files (063-2015)

Purpose: The purpose of this linkage is to obtain income data and reduce respondent burden, interviewer time and collection costs for the Canadian Income Survey. The linkage allows obtaining information on income variables without burdening respondents with detailed questions about their income.

Description: The Canadian Income Survey database and the T1, T1IDENT and T5007 Files will be linked using the address, city, date of birth, first name, surname, sex, province, social insurance number, codes for surname, postal code, marital status, telephone number and first initial. This information will be removed from the linked file as soon as the linkage is completed, and stored separately. Access to these files will be restricted to Statistics Canada employees whose assigned work activities require access.

Output: No information containing personal identifiers would be released outside of Statistics Canada from this linkage activity. Only non-confidential aggregate statistics and analysis conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada.

Occupational Cancer Surveillance using the 1991-2009 Canadian Census Mortality and Cancer Cohort: Secondary use of 049-2012 (064-2015)

Purpose: The objectives of this study are to: 1) Test topical and relevant hypotheses related to occupational exposures in the etiology of cancer; 2) Target cancer sites and suspected carcinogens for informed hypothesis generating analyses; and to 3) Conduct a global analysis of relationships between occupation and cancer in Canada. Broad objectives of this study include the creation of a platform for surveillance of occupational cancer in Canada, identification of industries/occupations and target exposures for prevention and risk mitigation efforts and the generation of hypotheses for future etiologic research.

Description: The 1991 Canadian Census Cohort: mortality & cancer follow-up is a probabilistic linked database. Approximately 2.7 million individuals aged 25 or older, who were enumerated by the 1991 long-form census, were followed for mortality, cancer and annual place of residence. A custom-tabulation from the Survey of Labour and Income Dynamics (1996) was used to develop a Shift Work Job-Exposure Matrix. This matrix will be used to estimate exposure to shift work in the 1991 Canadian Census Cohort based on sex, occupation and industry.

Output: Only aggregate data and analyses conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada Research Data Centres, in the form of peer reviewed journal articles and presentations.

Strengthening the Longitudinal Worker File (065-2015)

Purpose: The objective of this initiative is to strengthen the analytical value of the Longitudinal Worker File by increasing its sample size and by incorporating additional input data files. The Longitudinal Worker File is a multi-purpose file used to support research on a range of labour market issues such as worker mobility, layoffs and retirement.

Description: Information at the level of the business-enterprise will be drawn from the Longitudinal Employment Analysis Program (LEAP) file, while individual- and job-level data will be drawn from T1 files, the T4 Supplementary File, the T4 Statement of Employment Insurance Benefits Paid file, and the Record of Employment (ROE) file. All linkages will be done on a deterministic basis using Business Numbers (BNs) and/or Social Insurance Numbers (SINs).

Business Numbers and SINs will be transformed into unique personal identifiers that will remain on the individual-level linked file in a scrambled form. The use of scrambled identifiers will allow users to follow individuals longitudinally over time. Postal code information will be used to create aggregated geography variables and then removed from the files.

All BNs, SINS and postal codes will be removed from the linked file and stored in a separate location accessible only to Statistics Canada employees whose job duties require them to access this information.

Output: Methodological and analytical findings resulting from these linked data will be used to prepare research papers for publication in analytical reports, peer-reviewed scientific journals (including Statistics Canada's Health Reports), CANSIM, for presentation at conferences, workshops and meetings.

Only aggregate statistics and analysis conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. The linked file will be retained by Statistics Canada until December 31, 2025, at which time the continued retention of the file will be reviewed. All linkage keys and identifiers will be removed from the linked file and retained separately, with access limited to Statistics Canada employees whose assigned work requires access to the file.

Adding new cohorts to the Intergenerational Income Mobility Database (066-2015)

Purpose: The objective of this initiative is to extend the coverage of the Intergenerational Income Mobility Database by incorporating additional cohorts of Canadian youth and their parents into the file and by updating several input data files. The database is used to examine the extent to which the financial outcomes achieved by teenagers later in life are correlated with the incomes of their parents.

Description: Using information from the T1 Family file, the Intergenerational Income Mobility Database links together teenagers and co-resident parents. These teenager-parent matches are subsequently linked to the T1 Personal Master files, making it possible to track the income trajectories of youth into their thirties and forties. Information is also available on the incomes that the parents of these teenagers reported when they too were in their thirties and forties. This allows researchers to compare the incomes of youth and parents when they were at the same stage of the life course.

To select the 1991 cohort, a sample of all individuals aged 16 to 19 in 1991 will be identified in the 1991 T1FF. The child-parents link is taken from the Family Identification Number (FIN) available in the T1FF. If no child-parent match is found in 1991, a linkage will be attempted in each of 1992 to 1995 in order to improve the coverage of the sample and reduce potential sample selection bias. Once the family linkage file has been constructed, all individuals in that file will be linked to the administrative information from the T1 files and the T4-ROE-LEAP linked files to obtain longitudinal information on their income and employment dynamics over time. These are deterministic linkages based on SINs. Once the linked data files have been constructed, Social Insurance Number will be removed from the linked file and replaced by a 15-digit unique personal identifier. This will allow observations to be identified across years without knowing their SIN. A confidential program used to convert SINs into a personal identifier will be kept separately, with access limited to Statistics Canada employees whose assigned work requires access to the file. In addition, Payroll Deduction Account Number (PD) and Business Number (BN) will be removed and replaced by a unique Longitudinal Business Register Identifier (LBRID). The same process will be used to identify cohorts of youth who were aged 16 to 19 in 1996 or 16 to 19 in 2001.

Output: Methodological and analytical findings resulting from these linked data will be used to prepare research papers for publication in analytical reports, peer-reviewed scientific journals, CANSIM, for presentation at conferences, workshops and meetings. The linked file will also be used to develop tabular data and indicators for release on Statistics Canada's website. Only non-confidential aggregate statistics that will not result in the identification of an individual person, business or organization will be released outside of Statistics Canada.

Strengthening the Refugee Claimant Database (067-2015)

Purpose: The objective of this initiative is to strengthen the Refugee Claimant Database—a data file used for examining income characteristics of refugee claimants in Canada. The inclusion of information on the outcomes of refugee claims and linkages to landing records will increase record linkage rates, improve data quality, and increase the analytical value of the data base. This will yield better information on refugee claimants in Canada and their financial and labour market characteristics over time.

Description: The linked data file will provide better information on the income characteristics of refugee claimants by incorporating the claim decision and decision date from the Immigrant Refugee Board. These two pieces of information are required to identify and remove refugee claimants who have left the country—a group that would otherwise remain in the data with uncertain income characteristics. In addition, landing information from the Immigration Landing File at Statistics Canada will be included in the data.

Output: Analytical findings resulting from the linked data file will be used to prepare tabulations and research papers for publication.

Only aggregate statistics and analysis conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. The linked file will be retained by Statistics Canada until no longer required, up to, December 31, 2017, at which time it will be destroyed. All linkage keys and identifiers will be removed from the linked file and retained separately, with access limited to Statistics Canada employees whose assigned work requires access to the file.

2016 Census of Population Program linkage to income information from personal income tax and benefits records (071-2015)

Purpose: The purpose is to obtain information on the income of respondents to the 2016 Census of Population Program. The Census Program requires detailed information on several different sources of income, as well as income taxes paid and various deductions and contributions, and accurate reporting would require that respondents consult their own personal records. Linking the Census records to the administrative records reduces response burden and improves the data quality. The income data are used, among other uses, to measure total income, after-tax income, contributions to various programs, disposable income and the Market-Basket Measure of low-income.

Description: Respondents' information on income, income taxes and various expenditures is extracted from their personal income tax and benefits records (including the T1 income tax return, various information slips held by CRA and CCTB and GST credit programs) and added to their responses to the Census of Population Program (short and long forms).

Output: Only aggregate statistical estimates and analyses conforming to the confidentiality provisions of the Statistics Act are released outside of Statistics Canada. The linked Census to the personal income tax and benefits records information are used to produce income estimates for dissemination as part of the Census product line. Outputs for the Census include a wide range of analysis and standard data tables, as well as custom tabulations.

The linked Census edit and imputation files will be retained indefinitely. The linking key file, containing personal identifiers, will be kept until June 2020, or until no longer required, at which time it will be destroyed. All files are password-protected and kept on a server in a secure area. Access to the linking keys and linked Census edit and imputation files is restricted to Statistics Canada employees whose assigned work activities require such access.

2016 Canadian Community Health Survey Annual Component (CCHS) Linkage to Tax Data (072-2015)

Purpose: The purpose of this linkage is to reduce respondent burden while improving data quality. This will also reduce the overall survey time which will reduce collection costs.

Description:

HSD is planning to link the 2016 CCHS survey data to existing tax files to collect income information.

The first step is to determine if tax data are available for the CCHS 2016 sampled households. When this information is available, respondents will be given a linkage statement which includes a specific reference to linking to tax data. They will have the opportunity to refuse the linkage. For those respondents that refuse to link, a set of income questions will then be asked. For households where there is no tax data available, the income questions will be asked followed by the linkage statement.

After collection, the second step will be to link the 2016 CCHS data to the most recent available tax files (generally a two year lag from the collection year) to collect the income information for those respondents who did not refuse to link.

Given the CCHS sample is drawn from two frames (Canadian Child Tax Benefit file for respondents aged 12-17 and the Labour Force Survey (LFS) area frame for those 18+), there will be slightly different approaches to the two step linking strategy. For those aged 18 years or older, the sample records will all contain an ARUID. Prior to collection of the CCHS Annual 2016, the ARUIDS for the selected sample will be linked to the 2014 IDENT_ARUID file using ARUID and then linked to the most recent tax data available at the time of collection to identify cases that do not have 2014 tax data. So for the 2016 CCHS master data file, this will be 2014 T1 Personal Master File(T1). Cases that do not have 2014 tax data will be asked income questions as a back-up measure to provide income data. All respondents will also be given the tax linkage statements. For all those agreeing to the tax linkage statement (regardless of whether they were also asked income questions) we will attempt data linkage in the following manner:

  1. Link the ARUID to the 2015 IDENT_ARUID then use this to link to the 2015 T1, T1FF or T4 to obtain tax data.
  2. If a link is not found for 2015 then link to the 2014 IDENT_ARUID and use that link to find the 2014 T1, T1FF, T4 tax data.

Personal information such as name, date of birth and gender, or contact information such as telephone number or postal code may be used to verify the links (through ARUID), or improve linkage rates.

For the 12-17 year old selected respondents, records can be linked through the SIN number of the parents to identify those without 2014 T1, T1FF or T4 data. Those without the 2014 T1 data will be asked the income questions as a back-up measure. All respondents will also be asked the tax data linkage statement. For those agreeing to it (regardless of whether they are asked the income questions) we will attempt linkage as follows:

  1. If the child still lives with the recipient (parent/guardian) then link the SIN of the parent to the 2015 T1,T1FF or T4 to obtain the most recent tax data.
  2. If a link is not found for the 2015 T1 or T1FF and the child still lives with the recipient (parent/guardian) then use the SIN to link to the 2014 T1,T1FF or T4 to obtain tax data.

If the child no longer lives with the recipient (parent/guardian) then linking through contact information such as name, address or phone number may be attempted.

Output: The release of data from the 2016 Canadian Community Health Survey will be announced in The Daily. Data will be made available to deemed employees at the Statistics Canada Research Data Centres. All data will remain confidential and protected under the Statistics Act. There will be no personal identifiers on the data file.

Along with the availability announcement of the analytical data file (in The Daily), only non-confidential aggregate statistics will be released.

Adding immigrant admission category variables to the 2016 Census of Population long-form (075-2015)

Purpose: This project would add immigrant admission category (e.g., economic class, family class, refugees, etc.) and principal applicant status to the 2016 Census of Population database by linking to the Immigration, Refugees, and Citizenship Canada (IRCC) Immigrant Landing File.

Description: The Census of Population provides detailed information on the demographic, social and economic characteristics of people in Canada, as well as providing information about the housing units in which they live. The IRCC Immigrant Landing File provides information on immigrants to Canada since 1980 such as admission category.

This project will build on the previous record linkage project funded by IRCC (037-2013) which linked the Immigrant Landing File with the 2011 National Household Survey by integrating the admission category variables into the 2016 Census of Population Program databases, processing the variables to address inconsistencies and missing values, developing reference material, and disseminating the resulting variables with the 2016 Census of Population Program variables for broad access.

Output: Only aggregate statistical estimates that conform to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. The admission category variables will follow the same dissemination and output considerations of any other 2016 Census of Population Program variable and could be included in custom tabulations, standard tables or articles.

A linkage key will be retained indefinitely as part of this record linkage. The linkage results including variables used to perform the record linkage such as personal identifiers and information used to measure the linkage quality will be destroyed by March 31, 2019. All files will be kept on a server in a secure area. Access to these files will be restricted to Statistics Canada employees whose assigned work activities require such access.

Black-White disparities in mortality in Canada: A subsequent use of linkage 049-2012 – (076-2015)

Purpose: To estimate absolute and relative Black-White mortality gaps for all-cause and cardiovascular mortality in Canada and compare with those estimated for the US.

Description: The 1991 Canadian Census Cohort: mortality & cancer follow-up is a probabilistic linked database. Approximately 2.7 million individuals aged 25 or older, who were enumerated by the 1991 long-form census, were followed for mortality, cancer and annual place of residence.

We will estimate age- and sex-specific standardized mortality rates for cohort members identifying as "Black" and non-visible minority. Person time for each cohort member will be calculated from the beginning of the study (June 4, 1991) until the date of death or the end of the study (December 31, 2009). The Canadian cohort population structure will be used as the standard population for estimating mortality rates (also for the US to facilitate comparability between the countries). Age standardized mortality rate differences (RD) and rate ratios (RR) will be estimated to compare between Blacks and non-visible minorities. We will also fit standardized survival curves to assess absolute survival probabilities for the two groups, adjusted for age and socio-demographic variables. This method overcomes some of the limitations of the standard Cox proportional hazards model and permits estimation of absolute effect measures.

Output: Only aggregate data and analyses conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada Research Data Centres, in the form of peer reviewed journal articles and presentations.

A Microsimulation Analysis of Hidden Heterogeneity in Population Mortality: A subsequent use of linkage 049-2012 – (077-2015)

Purpose: The purpose of this study is to investigate hidden heterogeneity in the Canadian population using, as a starting point, differences in survival patterns among broad ethnic groups in Canadian Census Cohort: mortality & cancer follow-up.

Description: The 1991 Canadian Census Cohort: mortality & cancer follow-up is a probabilistic linked database. Approximately 2.7 million individuals aged 25 or older, who were enumerated by the 1991 long-form census, were followed for mortality, cancer and annual place of residence.

The principal research questions of this study are:

  1. Do difference in survival exists among different ethnic groups in Canada?
  2. Can these differences in survival, once account is taken of various covariates, be explained by posited or hypothetical differences in proportions of low frailty and high frailty subpopulations among the different ethnic groups?
  3. Are the proportions of low frailty and high frailty subpopulations among different ethnic groups comparable to the frequency of low frailty and high frailty subpopulations in the published literature?

Output: Only aggregate data and analyses conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada Research Data Centres, in the form of peer reviewed journal articles and presentations.

Social and Spatial Determinants of Mortality in the Maritimes using the 1991 Canadian Census Cohort: A subsequent use of linkage 049-2012 – (078-2015)

Purpose: The purpose of this project is to examine the social and spatial determinants of health in the Maritimes.

The social gradient in health is well established, with studies showing higher rates of morbidity and mortality between different social strata (Mackenbach et al. 2008). Difference in health can be explained by individual characteristics, social dimensions, and environmental attributes (Marmot 2005; O'Neill et al. 2003). Within Canada, there has been extensive research on the social determinants of health, with recent advances making use of the 1991 Canadian Census Cohort (Hwang et al. 2009; Omariba, Ng, and Vissandjée 2014; Peters et al. 2013; Simonet et al. 2010; Tjepkema and Wilkins 2011; Tjepkema et al. 2011; Wilkins et al. 2008). This project seeks to focus on geographic variations in the social determinants of health in the Maritimes.

Description: The 1991 Canadian Census Cohort: mortality & cancer follow-up is a probabilistic linked database. Approximately 2.7 million individuals aged 25 or older, who were enumerated by the 1991 long-form census, were followed for mortality, cancer and annual place of residence.

The study addresses several research questions:

  1. What are there socio-economic inequalities of health for individuals who resided in the Maritimes at baseline and how do these differ compared to those in other Canadian regions?
  2. How do migration patterns for residents of the Maritimes relate to socio-economic inequalities in health? What are the differences in health outcomes for those who migrated versus those who stayed?

Output: Only aggregate data and analyses conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada Research Data Centres, in the form of peer reviewed journal articles and presentations.

The socioeconomic determinants of changes in the distribution of deaths by age, sex and cause in Canada: subsequent use of linkage no. 049-2012 – (079-2015)

Purpose: This project will examine the role of certain socioeconomic determinants in relation to new patterns in old age mortality, namely compression of mortality or movement of mortality to older ages, and longevity differences by sex. The level of education—an important determinant for old age survival—will be examined in particular.

Description: The 1991 Canadian Census Cohort: mortality & cancer follow-up is a database probabilistic linked database. Approximately 2.7 million individuals aged 25 or older, who completed the 1991 long-form census questionnaire, were followed for mortality, cancer and annual place of residence.

The study will primarily attempt to answer the following research questions:

  • Can certain socioeconomic characteristics, in particular, the level of education, explain the differences in modal age at death (the most frequent age) and the dispersion of old age deaths for general mortality and for certain causes of death?
  • How does the situation in Canada compare to that of the United States in terms of survival inequalities by level of education?

Output: Only aggregate data and analyses that comply with the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada Research Data Centres, in the form of peer reviewed journal articles.

Changes in work and earnings following health shocks (082-2015)

Purpose: The objective of this initiative is to create a linked database that will support research on the labour market and financial outcomes experienced by individuals and families following hospitalizations resulting from accidents and acute illness. The proposed linkage will combine data on acute inpatient hospitalizations with data from various taxation- and employment-based administrative files. The resulting analytical files will support research on the economic consequences of 'health shocks' for individuals and their families and the implications for income, labour and health policies. This information does not currently exist and this linkage project will fill an important data gap.

Description: Health related information will be drawn from the Discharge Abstract Database (DAD) which contains demographic, administrative and clinical data on all hospital admissions in Canada (excluding Quebec), from April 1, 1999 to the present. The data include information on hospital admission and discharge dates, admission to intensive care, and hospital diagnoses. These data allow 'health shocks' to be identified in terms of type and severity.

Information on labour market and financial outcomes of individuals as well as job-level data will be drawn from the following administrative data files: T1 Family File, T1 Personal Master File, T1 Historical Personal Master File, T4 Summary File, T4E Statement of EI Benefits Received, EI Status Vector File, Record of Employment, and Longitudinal Employment Analysis Program. These data allow the economic characteristics of individuals (and their spouses) to be identified both before and after a health shock, so that the impacts of the health shock can be estimated on outcomes such as cessation of employment, earnings losses, job instability, and receipt of income support.

The linkage process will involve linking identifiers available in both data files - date of birth, postal code and sex - to create a link between Health Insurance Number (HIN) in the DAD and Social Insurance Number (SIN) in tax files. The linkage keys will be kept separately, with access limited to Statistics Canada employees whose assigned work requires access to the file.

Output: Methodological and analytical findings resulting from these linked data will be used to prepare research papers for publication. The linked file will also be used to develop tabular data and indicators for release on Statistics Canada's website. Only non-confidential aggregate statistics that will not result in the identification of an individual person, business or organization will be released outside of Statistics Canada.

Farm Financial Survey Linkage to Taxation Data (083-2015)

Purpose: Linkage of the Farm Financial Survey with taxation data will allow Statistics Canada to continue to produce estimates on the financial and physical aspects of farm operations in Canada while at the same time reducing burden on survey respondents. Respondents will have the option to replace 18 specific questions on revenues and expenses, previously included on the Farm Financial Survey, with taxation data.

The linkage will further allow critical cross-tabulations that are used by the survey sponsor to inform policy decisions and as performance measures for government funded agriculture programs that benefit survey respondents. These data are also used within the Agriculture Division (in processes that feed the System of Macroeconomic Accounts) and by the Investment, Science and Technology Division

Description: Commencing with the 2015 reference year, data from the Farm Financial Survey will be linked with taxation data from the T1 business and personal master files, as well as T2, T3 and T4 tax files.

Output: Linkage results will be used to produce non-confidential aggregate estimates that will be published outside of Statistics Canada; published estimates will conform to the confidentiality provisions of the Statistics Act.

The linked file used to produce the aggregate estimates will be saved indeterminately to continue to respond to client requests for custom tabulations of the data.

Atlantic Canada Opportunities Agency (ACOA) – Update of Business Performance Evaluation Report (2016) (087-2015)

Purpose: To assess the effectiveness of ACOA's programs and activities, to assess the usefulness of the Agency's efforts to assist small and medium-sized enterprises, and to determine more effective means of providing assistance to this business community. ACOA assists businesses by providing loans, as well as a broad range of programs and services, for purposes of establishing, expanding, or modernizing businesses, and for the development of human resources. Information resulting from the linkage will be used by ACOA to measure the performance of businesses which received financial assistance under the Agency's programs, and compare it to the performance of other firms in the Atlantic region. Employment dynamics, businesses entering and exiting, selected financial statistics, as well as measures of labour productivity and business owners' characteristics will be analysed. Findings from this assessment may be used by ACOA to improve assistance to businesses.

Description: A list of ACOA-assisted businesses will be linked to the following files: the 2003 to 2013 Business Register, the 2013 vintage Longitudinal Employment Analysis Program (LEAP) file, the 2003 to 2013 Corporate Tax-General Index of Financial Information (GIFI) and Scientific Research and Experimental Development Expenditures (SRED), the 2013 Trade by Enterprise Characteristics (TEC), and the 2013 Canadian Employer-Employee Dynamics Database (CEEDD) . The files will be linked using the Business Number (BN), Statistical Enterprise Number (SNUM) and the legal/operating name.

Output: Only non-confidential aggregate statistical outputs and analysis that conform to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. These will be in the form of statistical tables at the business sector and business size level for Atlantic Canada. ACOA will publish these results in their annual performance report to Parliament, which will be available on the ACOA website, and in research studies on topics such as entrepreneurial start-ups, employment patterns and growth in Atlantic Canada.

Canadian Income Survey: Linkage to Income Data Files (091-2015)

Purpose: The purpose of this linkage is to obtain income data and reduce respondent burden, interviewer time and collection costs for the Canadian Income Survey. The linkage allows obtaining information on income variables without burdening respondents with detailed questions about their income

Description: The Canadian Income Survey database and the T1, T1IDENT, T5007 and CCTB files will be linked using the address, city, date of birth, first name, surname, sex, province, social insurance number, codes for surname, postal code, marital status, telephone number and first initial. This information will be removed from the linked file as soon as the linkage is completed, and stored separately. Access to these files will be restricted to Statistics Canada employees whose assigned work activities require access.

Output: No information containing personal identifiers would be released outside of Statistics Canada from this linkage activity. Only non-confidential aggregate statistics and analysis conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada.

Longitudinal perspectives on employment, income and health: Linkage of the Longitudinal Worker File, 1991 Census, Canadian Mortality Database and Canadian Cancer Database (092-2015)

Purpose: The objective of this project is to create a new database that will support longitudinal analysis and outcome measures pertaining to employment, income and health. The database will be used to examine various issues pertaining to returns to education and training, the labour market outcomes of immigrants, retirement transitions, and changes in individual- and family-level earnings in the wake of layoffs or a cancer diagnosis. In addition, the file will be used to strengthen inputs into Statistics Canada's Population Health Model (POHEM) cancer module and the Dynamic Socio-Economic Modelling (DYSEM).

Description: This project builds on previous initiatives undertaken by Statistics Canada. First, in 2003, Statistics Canada's Policy Committee approved an initiative that drew a 15% sample of Canadians aged 25 or older from the 1991 Census 2B and 2D Long Forms and linked them to their 1991 and/or 1992 T1 tax returns, and subsequently to the Canadian Mortality Database (record linkage #012-2001). In 2009, this database was extended to cover a longer reference period and expanded to include information from the Canadian Cancer Database as well as postal code information on an annual basis (record linkage #052-2009).

Second, in 1999, Statistics Canada's Policy Committee approved the creation and annual update of the Longitudinal Worker File (LWF) (record linkage #006-99) for data year 1983 onward. An amendment was approved in 2007 (record linkage # 007-07) to add additional variables from the T1 personal tax file. In 2015, an improvement to the LWF was approved which expanded the file's sample size from a 10% random sample of Canadian workers to 100% of Canadian workers (record linkage #065-2015). The LWF contains information drawn from the T1, T4, Record of Employment, and the Longitudinal Employment Analysis Program (LEAP) files. The LWF provides longitudinal information on employment and earning outcomes from 1987 onward.

These two initiatives have yielded large and complementary databases – the first containing rich socio-demographic information (but little information on economic outcomes) and the second containing rich information on economic outcomes (but little socio-demographic information). Because of the large size of both databases, the overlap between them yields a subsample comprised of approximately 15% of Canadians who were aged 25 or older in 1991.

Output: Four separate analytical files will be created and linkable with a randomly-generated Statistics Canada respondent number.

Longitudinal Worker File output file: This file contains the demographic and economic variables from the 1987 onward LWF, individual- and family-level variables appended from the T1 Family File, and a randomly-generated Statistics Canada respondent number.

Census of Population output file: This file contains the socio-demographic variables from the 1991 Census of Population 2B and 2D (long forms) available in the original 1991 Census mortality cohort, as well as a randomly-generated Statistics Canada respondent number.

Mortality Output file: This file will contain the randomly-generated Statistics Canada number for each individual in the cohort, and the following mortality information: age, province/country of birth, underlying cause of death, nature of injury, province/country of death, sex, postal code and standard geographic codes of residence (e.g. Census Sub-Division), year, month, and day of death, derived person-years at risk, and mortality linkage weight.

Cancer Output file: This file will contain the randomly-assigned Statistics Canada number for each individual in the cohort, and the following information from the cancer database (CCDB): sex, province and year, month and day of diagnosis, year of birth, age, province or country of birth, diagnostic information (diagnostic codes, morphology and topography, morphology code indicator, source of registration, method of diagnosis, laterality, primary site number), patient vital status, province of residence, postal code of residence at diagnosis, year and province of death (if applicable), postal code of death (if applicable), cause of death (if applicable), and the cancer incidence linkage weight.

Methodological and analytical findings resulting from these linked data will be used to prepare research papers for publication in analytical reports, peer-reviewed scientific journals (including Statistics Canada's Health Reports), CANSIM, for presentation at conferences, workshops and meetings.

Only aggregate statistics and analysis conforming to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. The output files will be retained by Statistics Canada until December 31, 2022, at which time the continued retention of the files will be reviewed. All linkage keys and identifiers will be removed from the output files and retained separately, with access limited to Statistics Canada employees whose assigned work requires access to the file.

Linkage of the Census of Population 2006 and the Indian Register (IR) to mortality records for the purpose of estimating mortality rates for First Nations including Registered Indians, Inuit and Métis and examining the effect of social determinants of health on relative risk of death among Aboriginal populations. (093-2015)

Purpose: The record linkage and analysis is a Social and Aboriginal Statistics Division (SASD)-initiated project to generate new estimates for mortality rates for First Nations including Registered Indians, Inuit and Métis. The objective of the initiative is to address important information gaps in mortality rates, life expectancies, and role of social determinants of health in disparities in mortality rates, and also to explore improvements in methodology for estimating mortality rates. To this end, the mortality datasets (AMDB, CMBD) will be linked to the 2006 Census of Population and the Indian Register. As well, there will be an assessment of the validity of the linked file for analytical use.

An estimate of the mortality rates from different causes is vital for developing policies which may allocate scarce resources for prevention programs; developing prevention programs; informing and guiding future research for government agencies (including Statistics Canada), academic researchers, Aboriginal organizations, and other organizations; and informing the public and policy makers of potential disparities in mortality rates between First Nations, Métis and Inuit and non-Aboriginal populations. This information is expected to be useful to governmental agencies in all levels of government or organization that works on First Nations, Métis and Inuit issues or develops policies or programs.

Finally, better understanding of the estimation of risk of death after taking into account socio-economic characteristics in the different Aboriginal groups in comparison to the non-Aboriginal group will enable policy makers to see how much of the disparities in mortality rates can be attributed to these characteristics.

Description: The Census of Population 2006 and the Indian Register will be linked to the mortality datasets (AMDB, CMDB), 2006 to 2011.

The linked Census/IR/CMDB file will contain only those data items required to conduct the studies. All direct personal identifiers and addresses will be removed from the analysis file. Personal identifiers used for linkage purposes, such as name, death registration number and health insurance number, will be stored in separate files.

Output: The linked Census/IR/CMDB file will remain within Statistics Canada. All access to the linked microdata file will be restricted to Statistics Canada employees whose work activities require access. Only aggregate data that conform to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada. Research papers based on analyses of the linked data will be published by SASD or submitted for publication in the Statistics Canada peer-reviewed quarterly, Health Reports.

The linked analysis file will be retained until December 31, 2025, or until no longer required by Statistics Canada, at which point the continued retention of the file will be reviewed.

Creation of a Derived Record Depository and Key Registry for the Purposes of the Social Data Linkage Environment (094-2015)

Purpose: The Social Data Linkage Environment (SDLE) builds on past record linkage experience to make possible a program of pan-Canadian socio-economic record linkage research. A well-structured and regulated program of record linkage will increase the relevance of existing Statistics Canada surveys; substantially increase the use of administrative data; facilitate the integration of data from various social domains, such as health, education, justice and income thereby increasing the ability to analyse the impact of social determinants from any of these domains to the outcomes in other domains; reduce the burden on survey respondents by re-using already collected data; and maintain the highest data privacy and security standards.

A Derived Record Depository and separate Key Registry will be created to reduce privacy risks and to improve the efficiency and quality of the linkages.

Statistics Canada has responsibility for securely storing and processing data files and for the production of analysis files needed to carry out approved research studies. SDLE research projects will involve the use of linked records, and in accordance with Statistics Canada's Directive on Record Linkage, approval by the Chief Statistician is required for each new linkage project.

Description: The Derived Record Depository (DRD) is created by linking various Statistics Canada data files for the purpose of producing a list of unique individuals. Each individual in the DRD is assigned an anonymous SDLE identifier. The identifier is randomly assigned and has no value outside of the SDLE. Some of the data files used for the DRD include the T1 Personal Master Files (Tax), Canadian Child Tax Benefits (CCTB) files, Canadian Child Tax Benefits – Ident (CCTB-Ident) files, the SIN_ARUID, the DIN_ARUID, the DIN_ARUID, the SINSIN, the DINDIN, the DINSIN, the Dependant Registry, the Social Insurance Registry, the Canadian Birth Database (CBDB), the Canadian Mortality Database (CMDB), the Landed Immigrant Database and the Indian Registry. Future updates to these files will be used for further updates to the DRD.

The DRD would initially be comprised of the following personal identifiers: Surnames; Given names; Date of birth; Sex; Marital status; Date of landing/immigration; Date of emigration; Date of death; Social Insurance Numbers (SIN), Temporary Taxation Numbers (TTN), Dependant Identifier Numbers (DIN); Spouse's SIN/TTN; Dependant/Disabled individual SIN/TTN/DIN; Parent SIN/TTN; Health Information Numbers; Addresses; Address Registry Unique Identifier (ARUID); Standard Geography Classification (SGC) codes; Telephone numbers; Spouses' surname; Mother's surname; Father's surname; Alternate surname and a Statistics Canada-generated sequential identification number for each individual identified through the annual Derived Record Depository linkage process. Access to the Derived Record Depository will be restricted to the Statistics Canada employees responsible for its development and maintenance.

Linkage of the Derived Record Depository to administrative and survey databases held by Statistics Canada will be performed in a dedicated social data linkage data environment (the "SDLE"). To ensure a high level of data security and privacy, the association of Statistics Canada-generated identification numbers from the Derived Record Depository and the administrative and survey database Record Identifiers will be stored in a separate Key Registry, thus avoiding the need to store survey data with personal identifiers. For analytical studies, the associated SDLE Identifiers and the Record Identifiers will be used to link an individual's records within and among the databases in the SDLE environment. All such analytical studies will require prior linkage approval from Statistics Canada's Executive Management Board. Access to the Key Registry will be restricted to the Statistics Canada employees responsible for its development and maintenance and those responsible for the creation of linked analysis data files.

The Key Registry will contain linkage keys to permit linkage for approved studies to data files held at Statistics Canada. Some of these files include but are not limited to:

  • T1 Personal Master File;
  • Canadian Child Tax Benefits;
  • Longitudinal Immigration Database;
  • Birth and death databases;
  • Census of Population (1991 onward);
  • National Household Survey;
  • National Longitudinal Survey of Children and Youth;
  • Longitudinal Survey of Immigrants to Canada;
  • Survey of Labour and Income Dynamics;
  • Youth in Transition Survey;
  • National Population Health Survey;
  • T1 Family File;
  • Clinical administrative databases (inpatient and outpatient hospital records, 1992 onward);
  • Canadian Cancer Registry;
  • Canadian Community Health Survey (all cycles);
  • Canadian Health Measures Survey (all cycles);

Output: No information from the Derived Record Depository will be released outside of Statistics Canada. The Derived Record Depository and Key Registry will be used exclusively to support the development of research files within the SDLE. Statistics Canada will retain the Derived Record Depository and Key Registry files until it is determined that there is no further need for them.

Research projects will be approved on a study-by-study basis. These may be carried out as part of a research agenda initiated by Statistics Canada or in response to client requests. A summary of each approved study will be posted on the Statistics Canada web site.

Date modified: