Canadian Population Health Survey data (CCHS Annual and Focus Content) integrated with mortality, hospitalization, historical postal codes, cancer, tax data and Census

The purpose of this project was to create an integrated dataset that will allow researchers to analyze the behavioural, socioeconomic, and environmental factors along with the use of hospital services and health outcomes at the population level using the Canadian Community Health Survey (CCHS) data from the Annual Component and Focus Content Surveys. 

The Data

Canadian Community Health Survey (Annual Component and Focus Content Surveys)

The Annual Component of the Canadian Community Health Survey (CCHS) collects cross-sectional information about the health, health behaviours and health care use of the non-institutionalized household population aged 12 or older. The survey excludes full-time members of the Canadian Forces and residents of reserves and some remote areas, together representing about 4% of the target population. CCHS annual cycles 1.1 (2000-2001), 2.1 (2003), 3.1 (2005), 2007 to 2017 were included in this linkage project.

The CCHS focus content surveys are designed to provide cross sectional, provincial level results on specific focused health topics. Five focus content files were included in this linkage project: the CCHS - Mental Health and Well-being (2002) and CCHS - Mental Health (2012) CCHS - Nutrition (2004 and 2015) and CCHS - Healthy Aging (2008/2009). The CCHS questionnaires are available on the Statistics Canada website or through your RDC Analyst. Non-confidential information about the CCHS is available through the data liberation initiative, through your local RDC Analyst, or can be requested through Centre for Population Health Data client services (statcan.hd-ds.statcan@canada.ca).Once approval to access the linked files has been granted, user guides, questionnaires and other confidential documentation will be accessible to approved researchers in the RDC.

Discharge Abstract Database

(1999/00 to 2017/18)

The Discharge Abstract Database (DAD) includes administrative, clinical and demographic information on hospital discharges (including in-hospital deaths, sign-outs and transfers) from all provinces and territories, except Quebec. Over time, the DAD has also been used to capture data on day surgery, long-term care, rehabilitation and other types of care. For this record linkage, the DAD files covering fiscal years from 1999/00 to 2017/18 were linked to the CCHS Annual and Focus surveys.

In the DAD, jurisdiction-specific instructions for collection of data elements evolve over time. Collection of each data element may be mandatory, mandatory if applicable, optional or not applicable. Collection requirements can vary by jurisdiction and by data year.

Researchers will find the listings of DAD data elements on the Canadian Institute for Health Information (CIHI) website under the heading "Data Elements" at the DAD Metadata website1. Please note that not all DAD data elements are included in the dataset used for this linkage project. The documents on the website include information on mandatory versus optional collection status for each data element by jurisdiction, which is key to understanding coverage of data elements in the DAD.

National Ambulatory Care Reporting System

(2002/03 to 2017/18)

The National Ambulatory Care Reporting System (NACRS) contains data for hospital-based and community-based ambulatory care including day surgery, outpatient and community-based clinics, and emergency departments. Client visit data is collected at time of service in participating facilities from several jurisdictions. NACRS data for fiscal years 2002/2003 to 2017/2018 were linked to the CCHS Annual and Focus surveys.

Researchers will find the listings of NACRS data elements on the Canadian Institute for Health Information (CIHI) website under the heading "Data Elements" at the NACRS Metadata website2. The documents on the website include "a comparative list of NACRS data elements for all data submission options, along with a brief description of each data element." As with the DAD, the status of a data element may vary due to service type and/or jurisdiction. Please note that not all NACRS data elements are available in the analytical file for this record linkage project.

Ontario Mental Health Reporting System

(2006/07 to 2017/18)

The Ontario Mental Health Reporting System (OMHRS) contains data for all individuals receiving adult mental health services in Ontario, in addition to some individuals receiving services in youth inpatient beds and selected facilities in other provinces starting in fiscal year 2006/07. Information regarding mental and physical health, social supports and service use, care planning, outcome measurement, quality improvement, and case-mix funding applications are all part of the OMHRS. For this record linkage, the OMHRS files covering the fiscal years from 2006/07 to 2017/18 were linked to the CCHS Annual and Focus surveys.

Researchers will find the listings of OMHRS data elements on the Canadian Institute for Health Information (CIHI) website under the heading "Data elements" at the OMHRS Metadata website3.

Canadian Vital Statistics Death Database

(2000 to 2017)

The Canadian Vital Statistics – Death Database (CVSD) is an administrative dataset that includes demographic and cause of death information collected annually from all provincial and territorial vital statistics registries on all deaths in Canada. Death data are received from the province or territory of occurrence of death event. Records eligible for record linkage were death events that occurred from January 1, 2000 to December 31, 2017 and these deaths were linked to the CCHS Annual and Focus surveys.

Cause-of-death information is coded using the version of the International Classification of Diseases (ICD) in effect at the time of death. Details about the variables contained on the file can be obtained in the CVSD Data Dictionary, available from your RDC analyst.

Historical Postal Codes (2000-2016)

In Canada, income tax returns are submitted annually to the Canada Revenue Agency (CRA. The T1 Personal Master File (T1PMF), also known as the T1 General and Schedules, is a collection of the income tax returns shared by the CRA with Statistics Canada, and it provides income and demographic (e.g., date of death) information on tax filers in Canada. Every resident of Canada who earns taxable income is required to complete an income tax return, known as a T1 form, at the end of the year in which the income was received. Therefore, the T1PMF includes almost all individuals who filed an individual T1 tax return for the year of reference (i.e., some late filers may not be included) or those who received Canada Child Tax Benefits (CCTB) and their non-filing spouses.

The T1PMF is the principal data source for the Historical Postal Code file. Mailing address postal codes reported on these tax file were extracted to estimate a person's place of residence for that reference year. Note that for some tax filers, the mailing addresses used for filing T1 tax records may not be associated with their place of residence (e.g. P.O. Box, accountants' or lawyers' offices, parents' addresses for young adults, children's addresses for elderly parents). See Bérard-Chagnon4 for more information.

The postal code history of each person who responded to the CCHS Annual and Focus surveys was included in this linkage from 2000 to 2016 for each year that a postal code was available on the CRA files. Due to the nature of the data source, postal codes are not available for all CCHS respondents (e.g., non-tax filers do not have a postal code history) and due to the tax filing habits of Canadian, postal codes may not be available for all years for all respondents.

Canadian Cancer Registry

The Canadian Cancer Registry (CCR) is a national, dynamic, population-based registry that includes data collected and reported to Statistics Canada by the 13 Canadian Provincial and Territorial Cancer Registries (PTCRs). The person-oriented information provided by the PTCRs is compiled by Statistics Canada into a national dataset known as the CCR incidence tabulation master file (TMF), which includes information about each new primary cancer diagnosed among Canadian residents since 1992. However, cancer incidence data from Quebec are not part of the CCR after diagnosis year 2010 because the data have not been submitted to the CCR.

After the CCR incidence TMF has been validated, Statistics Canada applies the International Agency for Research on Cancer (IARC) rules6 for determining multiple primary tumours to facilitate comparability between provinces and territories. The resulting dataset is referred to as the IARC incidence TMF and it contains lower total case counts than the CCR incidence TMF. More information on the TMFs for cancer incidence is found in the CCR user guide—Diagnosis year 2016.

For this linkage project, the CCHS annual component and focus content files were linked to the CCR incidence TMF (1992 to 2016).

Census of Population

The Census of Population provides a detailed statistical portrait of the country every five years. All households in Canada are required to fill out the Census short form questionnaire which includes basic information about the people who usually live at that address such as their name, date of birth, sex, marital status and languages spoken, as well as the relationship between all the individuals living in the home. For more information on the variables included in the short form Census, analysts should consult the Census user documentation available from the RDC analyst. The short-form information from the 2011 and 2016 Censuses were included in this linkage to identify those who moved from living in the community to living in a nursing home or senior's residence.

T1 Family File

The Annual Income Estimates for Census Families and Individuals (T1 Family File or T1FF) is a file that was created to allow the development and dissemination of annual small area socio-economic data for Canadians and their families. It is derived primarily from income tax returns which are provided to Statistics Canada by the CRA. The T1FF contains information on sources of income (from T1 tax filers or those receiving the CCB) and some demographic indicators which are derived from both the tax filers and non-tax filers (e.g., non-tax filing spouse, children, etc.). Personal identifiers from the T1PMF are used to link to the T1FF to create the analytical files specific to each project. For this project T1FF information from 1993 to 2017 were linked to the CCHS annual component and focus content files.

File structure, layout

Cohort

All variables from the CCHS Annual and Focus Content surveys share files are available for analysis. Please see appropriate documents and data dictionaries.

Analytical files

The DAD, NACRS and OMHRS are event based files meaning that there will be more than one record for a person who made contact with the health care system more than once during the period of interest. During the linkage process, each CCHS and focus content record was assigned a unique STC_ID that allows the researcher to identify individuals on the DAD, NACRS and OMHRS with multiple interactions in the same dataset, across datasets and within a fiscal year and across fiscal years. DAD contains 19 files, NACRS 16 files and OMHRS one file. In order to use the file as a person based file, the researcher must transform the data to include all hospital information for one person as one record (one row on the data file).

For the CVSD, one analytical file that includes all deaths from January 1, 2000 to December 31, 2017 for respondents of the all CCHS and focus content cycles is available.

One file includes all historical postal codes files for CCHS and focus content data year included in the linkage project.

For the CCR incidence TMF, two analytical files corresponding to the respondents from the annual component files and focus content files are produced. The analytical files include linked primary tumour records, subject to a mixture of the IARC and CCR rules, from January 1, 1992 to December 31, 2016.

For the IARC incidence TMF, two analytical files that correspond to the respondents from the annual component files and focus content files are produced. These files include linked primary tumour records, as determined by the IARC rules, diagnosed between January 1, 1992 and December 31, 2016.

For the 2011 Census of Population (short form), two analytical files that include all census records linked to the CCHS respondents from the annual component files and focus content files are produced. Similarly, two analytical files are produced for the CCHS respondents linked to the 2016 Census of Population (short form).

Researchers should refer to the 2011 Census code book and 2016 Census Dictionary for definitions for all the concepts, variables, and geographic terms of the 2011 Census and 2016 Census, respectively. Please be aware that there could be some thematic and conceptual changes from 2011 to 2016 or from short-form to long-form (e.g., the 2011 Census vs. the 2011 National Household Survey). Any queries about the Census variables should be submitted to the RDC analyst.

For the T1FF, analytical files were created for each dataset by taxation year. There are 25 T1FF analytical files organized by taxation year from 1993 to 2017. Then, each of the analytical files is divided into two portions: T1FF records that belong to the respondents from (i) the annual component files, and (ii) the focus content files. This yielded 50 T1FF analytical files.

Weights

New linkage weights available in July 2019.

Footnotes

  1. DAD Metadata website.
  2. NACRS Metadata website.
  3. OMHRS Metadata website.
  4. Bérard-Chagnon J (2017). Comparison of Place of Residence between the T1 Family File and the Census: Evaluation using record linkage. Demographic Documents. September 26, 2017 Catalogue no. 91F0015M – No. 13, Statistics Canada.