Analytical Studies: Methods and References
Imputing Postal Codes to Analyze Ecological Variables in Longitudinal Cohorts: Exposure to Particulate Matter in the Canadian Census Health and Environment Cohort Database

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

by Philippe Finès, Lauren Pinault and Michael Tjepkema
Health Analysis Division

Release date: March 13, 2017

Skip to text

Text begins

Abstract

This paper describes a method of imputing missing postal codes in a longitudinal database. The 1991 Canadian Census Health and Environment Cohort (CanCHEC), which contains information on individuals from the 1991 Census long-form questionnaire linked with T1 tax return files for the 1984-to-2011 period, is used to illustrate and validate the method. The cohort contains up to 28 consecutive fields for postal code of residence, but because of frequent gaps in postal code history, missing postal codes must be imputed. To validate the imputation method, two experiments were devised where 5% and 10% of all postal codes from a subset with full history were randomly removed and imputed. The proportion of discrepancies in displacements and mean exposure were consistently higher in the experiment with 10% removed postal codes.

Keywords:  CanCHEC, census follow-up cohort; data linkage; environmental exposure; PM2.5; geographic information systems; imputation; longitudinal studies; pollution; postal codes; residential mobility

1. Introduction

The 1991 Canadian Census Health and Environment Cohort (CanCHEC), which contains information on more than 2.5 million respondents to the 1991 Census long-form questionnaire, was linked with T1 tax return data for the 1984-to-2011 period. Thus, the linked file contains up to 28 consecutive fields for postal code of residence. However, for many respondents, the postal code history is incomplete: individuals may not have filed a tax form, or they may have left the country. In fact, missing residential information is common in longitudinal databases.

Full postal code histories are important for environmental health research. Missing postal codes must be imputed in order to assign historical exposures to environmental hazards or to the ecological variables under study. Imputation must be done so that values of ecological variables assigned in years with missing postal codes likely represent real exposures. The method must be plausible, simple and parsimonious, and should yield reliable results.

This paper describes a method of imputing postal codes and assesses its validity. The method is demonstrated using a specific database, CanCHEC, and for a specific ecological variable, exposure to particulate matter 2.5 micrometres in diameter (PM2.5). The concept of gap is used throughout the paper. A gap is defined as a string of consecutive years with missing postal codes.

2. Data

The Historical Tax Summary File

The Historical Tax Summary File (HTSF) is an annual compilation of tax return data representing individuals for whom a tax declaration was filed in a given year. For the 1984-to-2011 period, the HTSF provides a history of individuals’ residential locations, with 28 consecutive fields for postal code (Wilkins et al. 2008; Peters et al. 2013). The HTSF was linked to the 1991 Canadian Census Cohort and the Canadian Mortality Database (CMDB) using social insurance number, thereby creating a new database—CanCHEC—that provides cohort members’ postal code histories. These postal codes are used in environmental health research to assign exposure data to cohort members over time.

Postal codes

A postal code is a six-character alpha-numeric identifier devised and maintained by the Canada Post Corporation for sorting and delivering mail. The characters are arranged in the form “ ANA NAN MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeeaY=JqFHe9Lq pepeea0xd9q8qiYRWxGi6xij=hbba9q8aq0=yq=He9q8qiLsFr0=vr 0=vr0db8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaruqqYLwySb acfaGaa8xqaiaa=5eacaWFbbGaa8hiaiaa=5eacaWFbbGaa8Ntaaaa @3E79@ ,” where “A” represents a letter, and “N,” a single-digit number (for example, K1A 0T6). The first three characters refer to a set of stable, well-defined areas, known as forward sortation areas (FSAs); the last three characters identify routes known as local delivery units (Statistics Canada 2014). 

The first character represents a large region or province/territory. Postal codes with zero (0) as the second character are rural; by default, those that do not have a 0 in the second position may be suburban or urban (hereafter urban).

Postal codes have a hierarchical structure. Areas with the same first two characters are in the major region designated by the first character. This hierarchy continues with each additional character (Statistics Canada 2014). Postal codes do not include the letters D, F, I, O, Q or U, and the first position does not use the letters W or Z (Table 1).

Table 1
Regions determined by the first character of a postal code
Table summary
This table displays the results of Regions determined by the first character of a postal code. The information is grouped by First character (appearing as row headers), Region (appearing as column headers).
First character Region
A Newfoundland and Labrador
B Nova Scotia
C Prince Edward Island
E New Brunswick
G Eastern Quebec
H Metropolitan Montréal
J Western Quebec
K Eastern Ontario
L Central Ontario
M Metropolitan Toronto
N Southwestern Ontario
P Northern Ontario
R Manitoba
S Saskatchewan
T Alberta
V British Columbia
X Northwest Territories and Nunavut
Y Yukon Territory

3. Methods

Preliminary steps on postal codes

The CanCHEC is used to demonstrate the method of imputing postal codes, but the method can be applied to other longitudinal cohorts.

The CanCHEC database contains 2,734,835 observations, 2,644,370 of which (those with a valid history) were used for analyses. With the Postal Code Conversion File Plus (PCCF+) program, probable non-residential (business) postal codes (Statistics Canada 2014) were flagged and removed from the database because they were unlikely to relate to a private residence. Among the remaining observations, 1,238,825Note 1 (47%) had a full postal code history, that is, a postal code in each year of follow-up. The others were missing at least one postal code between the year of entry (1984) and the year of exit (2011 or year of death).

Corrections and adjustments were then applied to the database.

  1. Censoring: A cohort member not identified as dead was considered to be alive up to the final year of follow-up (2011). In such cases, if the last year with a postal code was before the final year of follow-up:
    • for (up to) the first two years after the last year with a postal code, the last postal code was assigned;
    • for the years thereafter until the final year of follow-up, a postal code was assigned based on imputation Case 2b (see below).
  2. Postal code at death: For cohort members identified as dead (from the CMDB or HTSF), the following rules were applied:
    • if the CMDB contained a postal code, that postal code was assigned in that year and used as a surrounding postal code (“following”) for a gap, and imputation was based on Case 1 (see below);
    • if the CMDB did not contain a postal code, the postal code at death was missing. Imputation Case 2b (see below) was used for the last gap.

Imputing Historical Tax Summary File postal codes

The objective is to use imputation to fill the gaps in postal code histories in the HTSF. All postal codes in a gap are imputed during the same step. Consequently, the imputed values of the postal codes in the same gap are not necessarily related, and newly imputed postal codes are not used for imputation.

Because a large majority of cohort members do not move in a given year, missing postal codes can be largely based on the postal codes reported in the surrounding years. For example, if a postal code for a given year is missing, but the same postal code is recorded in the years before and after, it can be inferred that the address during the gap was the same as that of the surrounding years. However, there is always a non-null probability that the postal code for the gap should not be imputed based on the surrounding postal codes. For example, in a given year, a person may have temporarily not lived in their usual place of residence.

As well, the probability that missing postal codes can be imputed based on surrounding postal codes diminishes as the gap lengthens. The imputation method takes account of a probability threshold (p) that varies by gap length such that p is not 100%, and that p does not increase if gap length increases. Suggested values for this threshold are shown in Table 2-1.

Table 2-1
Suggested values of p threshold according to gap length
Table summary
This table displays the results of Suggested values of p threshold according to gap length. The information is grouped by Gap length in years (appearing as row headers), p threshold, calculated using value units of measure (appearing as column headers).
Gap length in years p threshold
value
1 or 2 0.95
3 or 4 0.80
5 or more 0.60

Taking advantage of the hierarchical coding structure of postal codes, similarity (k) and dissimilarity (d) values of the postal codes surrounding a gap are defined as in Table 2-2. The similarity value (k) is the number of consecutive identical characters (beginning from the left) in the surrounding postal codes. The dissimilarity value (d) is equal to 6 minus k.

Table 2-2
Similarity and dissimilarity of postal codes surrounding a gap, using examples
Table summary
This table displays the results of Similarity and dissimilarity of postal codes surrounding a gap. The information is grouped by Postal code before gap (appearing as row headers), Postal code after gap, Common characters before and after the gap, Similarity (k) and Dissimilarity (d), calculated using number units of measure (appearing as column headers).
Postal code before gap Postal code after gap Common characters before and after the gap Similarity (k)Table 2-2 Note 1 Dissimilarity (d)Table 2-2 Note 2
number
K1A 1A1 K1A 1A1 K1A 1A1 6 0
K1A 1A1 K1A 1A2 K1A 1A 5 1
K1A 1A1 K1A 1B1 K1A 1 4 2
K1A 1A1 K1A 2A1 K1A 3 3
K1A 1A1 K1B 1A1 K1 2 4
K1A 1A1 K2A 1A1 K 1 5
K1A 1A1 L1A 1A1 (none) 0 6

Definition of rules and cases

Two rules are defined using threshold (p), similarity (k) and dissimilarity (d):

Case 1: For gaps with surrounding postal codes:

Case 2: For gaps missing at least one surrounding postal code, Rule B is always applied.

The rules and cases are illustrated with examples in the Appendix. The imputation method thus replaces a missing postal code with: (1) a complete postal code; (2) a value that contains characters from a postal code followed by a certain number of “*”; (3) or a value that starts by “ DUMMY MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeeaY=tqpGe9Lq pepeea0xd9q8qiYRWxGi6xij=hbba9q8aq0=yq=He9q8qiLsFr0=vr 0=vr0db8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaruqqYLwySb acfaGaa8hraiaa=vfacaWFnbGaa8xtaiaa=LfacaWF3aaaaa@3D9D@ ” followed by a number from 0 through 9.

Validation

The purpose of validating the imputation method is to determine if exposure data calculated using imputed postal codes are similar to those calculated using the original, complete postal code data. As an example, estimates of PM2.5 are calculated.

Cohort member residences were spatially linked to estimates of a surface layer of PM2.5 concentration for all mainland North America, derived from a model that provides average PM2.5 concentrations at an approximately 1 km2 resolution from 2004 to 2011 (van Donkelaar et al. 2015; Pinault et al. 2016). Estimates were backcast for 1998 to 2003 using the inter-annual variation in Boys et al. (2014). Outliers with PM2.5 values greater than 20 micrograms per cubic metre (µg/m3) were excluded from analysis (fewer than 1% of cohort members in any year) (Pinault et al. 2016). Air pollution data for exposure were not available before 1998 (fifteenth year of follow-up). When multiple observations of the same postal code were available in the exposure database, one was randomly selected.

Since the exposure file used in this analysis contained exposure data for all postal codes with at least the first 3 characters present (for example, postal codes such as A0A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeeaY=tqpGe9Lq pepeea0xd9q8qiYRWxGi6xij=hbba9q8aq0=yq=He9q8qiLsFr0=vr 0=vr0db8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaruqqYLwySb acfaGaa8xqaiaa=bdacaWFbbaaaa@3B09@ , A0A1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeeaY=tqpGe9Lq pepeea0xd9q8qiYRWxGi6xij=hbba9q8aq0=yq=He9q8qiLsFr0=vr 0=vr0db8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaruqqYLwySb acfaGaa8xqaiaa=bdacaWFbbGaa8xmaaaa@3BBB@ , A0A1A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeeaY=tqpGe9Lq pepeea0xd9q8qiYRWxGi6xij=hbba9q8aq0=yq=He9q8qiLsFr0=vr 0=vr0db8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaruqqYLwySb acfaGaa8xqaiaa=bdacaWFbbGaa8xmaiaa=feaaaa@3C7D@ , A0A1A0 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeeaY=tqpGe9Lq pepeea0xd9q8qiYRWxGi6xij=hbba9q8aq0=yq=He9q8qiLsFr0=vr 0=vr0db8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaruqqYLwySb acfaGaa8xqaiaa=bdacaWFbbGaa8xmaiaa=feacaWFWaaaaa@3D2E@  ), it follows that (1) all postal codes built with Rule A and containing at least 4 “*” and (2) all postal codes built with Rule B are defined as uninformative postal codes, that is, imputed postal codes for which no exposure data are available. For all uninformative postal codes, exposure data are assigned to a missing value. By contrast, imputed postal codes ending with d<4 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeeaY=tqpGe9Lq pepeea0xd9q8qiYRWxGi6xij=hbba9q8aq0=yq=He9q8qiLsFr0=vr 0=vr0db8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaruqqYLwySb acfaGaa8hzaiabgYda8iaaisdaaaa@3B7B@  “*” are partially informative. The environmental exposure for these postal codes is assigned as the average for postal codes starting with the same k similarity characters.

Consider a missing postal code that belongs to a gap for which the surrounding postal codes are present: Case 1 is applied. If the surrounding postal codes have no common characters, either Rule A is applied: the resulting postal code will be “******”; or Rule B is applied: the resulting postal code will be “ DUMMY6 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeeaY=tqpGe9Lq pepeea0xd9q8qiYRWxGi6xij=hbba9q8aq0=yq=He9q8qiLsFr0=vr 0=vr0db8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaruqqYLwySb acfaGaa8hraiaa=vfacaWFnbGaa8xtaiaa=LfacaWF2aaaaa@3D9C@ .” Therefore, even though the two resulting postal codes differ (because they are built with different rules), they each represent an uninformative postal code.  In other words, all postal codes in gaps surrounded by postal codes with a similarity equal to 0 are always uninformative.

Validation was performed on the 1,238,825 observations with complete postal code histories. A percentage of these postal codes was randomly erased, and the imputation method was applied to the missing postal codes. The percentages erased were 5% in the first experiment (Experiment A) and 10% in the second (Experiment B). These percentages approximate the actual percentage of missing postal codes in the original database (8.1%). Imputation used the threshold values in Table 2-1. The two new files with imputed postal code histories were compared with the original dataset with complete postal code histories. The percentages of postal codes imputed according to each rule and the percentages of imputed postal codes matching the originals were calculated.

Results for individuals were also examined. The discrepancy per individual between the original and new datasets was used to validate the imputation method. The measures were:

The relevant statistics were the percentages of observations for which the absolute value of the difference between each of the two new files and the original file reached or exceeded a threshold for large values. For mean number of moves, mean displacement and mean exposure, the threshold was set at 0.1; for number of displacements, it was set at 2. These thresholds roughly corresponded to the upper 5% tail of the distributions of the variables.

Analyses of the postal codes were performed globally, then according to the first character of the postal code (which defines large regions), then according to the rural/urban designation of the second character of the postal code. Analyses of individuals were performed globally, then according to the first character of the first postal code in the history, then according to the rural/urban designation of the second character of that code.

4. Results

Global results

In the whole CanCHEC database of 2,644,370 observations,

Table 3
Distribution of length of gaps in CanCHEC after removing non-residential postal codes
Table summary
This table displays the results of Distribution of length of gaps in CanCHEC after removing non-residential postal codes. The information is grouped by Length of gap in years (appearing as row headers), Distribution, calculated using number and percent units of measure (appearing as column headers).
Length of gap in years Distribution
number percent
1 972,655 40.0
2 360,670 14.8
3 204,310 8.4
4 145,625 6.0
5 125,000 5.1
6 111,065 4.6
7 74,300 3.1
8 44,515 1.8
9 42,860 1.8
10 39,975 1.6
11 34,040 1.4
12 35,225 1.4
13 33,810 1.4
14 32,710 1.3
15 32,600 1.3
16 31,335 1.3
17 29,220 1.2
18 27,670 1.1
19 26,265 1.1
20 19,780 0.8
21 7,815 0.3
22 220 0.0
23 115 0.0
24 75 0.0
25 80 0.0
26 50 0.0
27 25 0.0
Total 2,431,995 100.0

Analyses of postal codes

A total of 1,735,620 gaps were created in Experiment A, and 3,468,405 in Experiment B (Table 4-1. In Experiment A, 91% of missing postal codes belonged to gaps which were one year long and 9% to gaps which were two years long; in Experiment B, the figures were respectively 82% and 16%. Rules A and B were applied in proportions corresponding to the parameters in Table 2-1. The overall percentage of perfectly matched postal codes (i.e., situations where imputed and original postal code are the same) was 76%; percentages were higher for shorter (one or two years) than for longer (five or more years) gaps. Results by region and geography showed the same patterns (Tables 4-2 and 4-3).

Table 4-1
Performance of imputation for Experiments A and B
Table summary
This table displays the results of Performance of imputation for Experiments A and B. The information is grouped by Experiment, and length of gap in years (appearing as row headers), Number of postal codes erased and imputed, Percentage of postal codes erased and imputed, Rule A
applied, Rule B
applied and Perfect matches, calculated using number and percent units of measure (appearing as column headers).
Experiment, and length of gap in years Number of postal codes erased and imputed Percentage of postal codes erased and imputed Rule A
applied
Rule B
applied
Perfect matches
number percent
Experiment A  
1 1,572,760 90.6 91.6 8.4 77.0
2 151,305 8.7 91.5 8.5 72.1
3 10,815 0.6 75.9 24.1 56.5
4 720 0.0 76.7 23.3 51.7
5 20 0.0 60.0 40.0 35.0
Total – Experiment A 1,735,620  100.0 91.5 8.5 76.4
Experiment B  
1 2,830,755 81.6 91.4 8.6 76.8
2 547,870 15.8 91.4 8.6 72.0
3 78,510 2.3 76.1 23.9 56.8
4 10,050 0.3 76.2 23.8 53.5
5 1,090 0.0 58.2 41.8 38.0
6 115 0.0 59.6 40.3 29.8
7 15 0.0 64.3 35.7 35.7
Total – Experiment B 3,468,405 100.0 91.0 9.0 75.5
Table 4-2
Percentage of postal codes imputed with Rule A and percentage of matching postal codes — Experiment A
Table summary
This table displays the results of Percentage of postal codes imputed with Rule A and percentage of matching postal codes — Experiment A Percentage of postal codes imputed with Rule A, Perfect match, 1-year gap, 2-year gap, 3-year gap, 4-year gap, 5-year gap, 6-year gap and 7-year gap, calculated using percent units of measure (appearing as column headers).
  Percentage of postal codes imputed with Rule A Perfect match
1-year gap 2-year gap 3-year gap 4-year gap 5-year gap 6-year gap 7-year gap 1-year gap 2-year gap 3-year gap 4-year gap 5-year gap 6-year gap 7-year gap
percent
RegionTable 4-2 Note 1  
Newfoundland and Labrador 91.8 91.1 73.7 85.0 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  81.3 78.7 65.7 70.0 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Nova Scotia 91.6 91.4 75.0 79.2 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  78.7 75.7 60.7 62.5 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Prince Edward Island 91.8 92.1 72.7 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  80.7 78.1 59.1 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
New Brunswick 91.7 92.5 73.4 75.0 60.0 Table 4-2 Note  Table 4-2 Note  76.0 71.1 50.4 33.3 60.0 Table 4-2 Note  Table 4-2 Note 
Eastern Quebec 91.5 91.5 77.0 70.3 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  78.2 73.3 58.5 54.7 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Metropolitan Montréal 91.0 90.8 75.8 72.4 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  75.3 68.7 54.2 42.1 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Western Quebec 91.9 92.0 76.8 79.1 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  76.8 72.0 53.7 47.3 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Eastern Ontario 91.7 91.6 76.0 77.3 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  77.6 73.1 58.4 49.3 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Central Ontario 92.1 92.2 76.1 80.8 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  77.5 72.6 57.0 51.3 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Metropolitan Toronto 90.4 90.0 74.0 57.1 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  76.7 72.4 55.7 35.7 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Southwestern Ontario 91.7 91.6 77.0 73.5 80.0 Table 4-2 Note  Table 4-2 Note  78.4 74.7 60.2 41.2 40.0 Table 4-2 Note  Table 4-2 Note 
Northern Ontario 91.6 91.7 76.0 75.0 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  78.8 74.5 59.1 75.0 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Manitoba 91.5 91.8 78.2 80.0 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  78.3 74.7 57.6 80.0 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Saskatchewan 91.4 91.4 72.3 81.8 80.0 Table 4-2 Note  Table 4-2 Note  79.3 75.8 57.0 54.5 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Alberta 91.6 91.3 74.9 80.7 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  74.9 68.5 55.0 47.4 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
British Columbia 91.9 91.8 74.2 77.4 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  73.7 68.2 51.0 53.6 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Northwest Territories and Nunavut 91.0 91.4 69.4 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  73.3 65.8 41.7 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Yukon Territory 92.3 95.0 93.3 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  72.7 76.1 53.3 Table 4-2 Note  Table 4-2 Note  Table 4-2 Note  Table 4-2 Note 
Total 91.6 91.6 75.7 76.3 75.0 Table 4-2 Note  Table 4-2 Note  76.9 72.2 56.2 50.1 35.0 Table 4-2 Note  Table 4-2 Note 
GeographyTable 4-2 Note 2  
Rural postal code 91.2 91.2 77.0 80.9 69.2 Table 4-2 Note  Table 4-2 Note  80.9 77.9 63.8 68.5 53.8 Table 4-2 Note  Table 4-2 Note 
Urban postal code 91.8 91.7 75.2 74.9 85.7 Table 4-2 Note  Table 4-2 Note  75.6 70.2 53.6 44.8 0.0 Table 4-2 Note  Table 4-2 Note 
Total 91.6 91.6 75.7 76.3 75.0 Table 4-2 Note  Table 4-2 Note  76.9 72.2 56.2 50.1 35.0 Table 4-2 Note  Table 4-2 Note 
Table 4-3
Percentage of postal codes imputed with Rule A and percentage of matching postal codes — Experiment B
Table summary
This table displays the results of Percentage of postal codes imputed with Rule A and percentage of matching postal codes — Experiment B Percentage of postal codes imputed with Rule A, Perfect match, 1-year gap, 2-year gap, 3-year gap, 4-year gap, 5-year gap, 6-year gap and 7-year gap, calculated using percent units of measure (appearing as column headers).
  Percentage of postal codes imputed with Rule A Perfect match
1-year gap 2-year gap 3-year gap 4-year gap 5-year gap 6-year gap 7-year gap 1-year gap 2-year gap 3-year gap 4-year gap 5-year gap 6-year gap 7-year gap
percent
RegionTable 4-3 Note 1  
Newfoundland and Labrador 91.4 90.9 75.7 72.2 58.8 Table 4-3 Note  Table 4-3 Note  80.9 77.5 62.4 57.6 58.8 Table 4-3 Note  Table 4-3 Note 
Nova Scotia 91.5 91.3 77.8 76.3 63.6 Table 4-3 Note  Table 4-3 Note  78.6 74.5 60.2 57.2 54.5 Table 4-3 Note  Table 4-3 Note 
Prince Edward Island 91.9 91.1 75.7 67.8 40.0 Table 4-3 Note  Table 4-3 Note  80.8 75.7 63.2 39.0 0.0 Table 4-3 Note  Table 4-3 Note 
New Brunswick 91.5 91.4 76.8 88.8 70.0 Table 4-3 Note  Table 4-3 Note  76.0 69.8 54.5 52.3 30.0 Table 4-3 Note  Table 4-3 Note 
Eastern Quebec 91.4 91.1 75.9 78.5 63.2 50.0 Table 4-3 Note  78.1 73.6 57.8 58.2 50.6 50.0 Table 4-3 Note 
Metropolitan Montréal 90.8 90.9 76.2 78.5 65.0 50.0 Table 4-3 Note  75.1 69.9 53.7 53.7 36.7 50.0 Table 4-3 Note 
Western Quebec 91.8 91.6 75.6 77.5 55.3 53.3 Table 4-3 Note  76.6 71.3 55.9 53.6 40.7 33.3 Table 4-3 Note 
Eastern Ontario 91.7 91.5 75.6 74.1 51.9 55.6 Table 4-3 Note  77.7 72.7 57.1 50.3 29.6 44.4 Table 4-3 Note 
Central Ontario 91.9 91.9 77.2 75.7 58.2 70.0 Table 4-3 Note  77.4 72.4 57.7 52.1 39.7 0.0 Table 4-3 Note 
Metropolitan Toronto 90.1 90.1 74.9 70.8 58.9 Table 4-3 Note  Table 4-3 Note  76.7 72.3 58.8 51.2 32.9 Table 4-3 Note  Table 4-3 Note 
Southwestern Ontario 91.5 91.7 76.8 75.1 53.8 Table 4-3 Note  71.4 78.3 74.4 60.1 56.1 41.3 Table 4-3 Note  71.4
Northern Ontario 91.4 91.3 76.0 76.8 72.0 62.5 57.1 78.6 74.6 58.6 55.3 32.0 50.0 Table 4-3 Note 
Manitoba 91.3 91.5 75.3 77.9 60.0 Table 4-3 Note  Table 4-3 Note  78.2 74.8 58.9 58.1 41.4 Table 4-3 Note  Table 4-3 Note 
Saskatchewan 91.2 91.1 74.4 79.6 50.0 Table 4-3 Note  Table 4-3 Note  79.1 75.0 58.6 61.2 26.2 Table 4-3 Note  Table 4-3 Note 
Alberta 91.5 91.5 76.8 76.2 55.0 69.2 Table 4-3 Note  74.8 69.2 55.2 48.7 32.9 0.0 Table 4-3 Note 
British Columbia 91.7 91.8 76.1 75.6 56.8 66.7 Table 4-3 Note  73.6 68.0 51.8 50.2 35.8 11.1 Table 4-3 Note 
Northwest Territories and Nunavut 91.0 91.6 78.2 72.7 60.0 Table 4-3 Note  Table 4-3 Note  72.7 68.2 55.8 27.3 0.0 Table 4-3 Note  Table 4-3 Note 
Yukon Territory 92.6 92.1 77.8 70.6 Table 4-3 Note  Table 4-3 Note  Table 4-3 Note  73.8 66.9 43.3 47.1 Table 4-3 Note  Table 4-3 Note  Table 4-3 Note 
Total 91.4 91.4 76.1 76.2 58.2 59.6 64.3 76.8 72.0 56.8 53.5 38.0 29.8 35.7
GeographyTable 4-3 Note 2  
Rural postal code 91.0 90.8 76.0 76.6 53.3 52.6 57.1 80.8 77.3 62.3 60.7 34.4 39.5 0.0
Urban postal code 91.6 91.6 76.2 76.1 59.6 63.2 71.4 75.5 70.2 54.9 51.0 39.0 25.0 71.4
Total 91.4 91.4 76.1 76.2 58.2 59.6 64.3 76.8 72.0 56.8 53.5 38.0 29.8 35.7

Analyses of individuals

In Experiments A and B, the mean number of moves differed by at least 0.1 in 1.2% and 4.7% of observations, respectively; the mean number of latitude-longitude coordinates was different by at least 2 in 3.5% and 11.5% of observations; mean displacement differed by at least 0.1 degree of latitude-longitude in 2.4% and 4.5% of observations; and mean exposure differed by at least 0.1 µg/m3 in 4.1% and 8.1% of observations (results not shown).

Mean displacement

Differences in mean displacement between the experimental datasets and the original dataset were examined by region (defined by first character of first postal code in history), and urban versus rural location (defined by second character of first postal code in history). Generally, the percentage of observations where the absolute difference in mean distance was at least 0.1 degree did not vary systematically between regions, except for the Northwest Territories and Nunavut, where it was higher (Table 5). The percentages of observations where the absolute difference in mean distance was at least 0.1 degree were slightly higher for rural than urban postal codes.

Table 5
Mean displacement, by region and geography, Experiment A and Experiment B
Table summary
This table displays the results of Mean displacement Experiment A, Experiment B, Observations , Observations with absolute difference in mean distance
≥ 0.1 degree and Observations  , calculated using number and percent units of measure (appearing as column headers).
  Experiment A Experiment B
Observations Observations with
absolute difference
in mean distance
≥ 0.1 degree
Observations   Observations with
absolute difference
in mean distance
≥ 0.1 degree
number percent number percent
RegionTable 5 Note 1  
Newfoundland and Labrador 26,835 4.83 26,830 8.99
Nova Scotia 40,180 3.12 40,180 5.46
Prince Edward Island 6,395 2.31 6,400 3.94
New Brunswick 35,720 2.33 35,725 4.33
Eastern Quebec 116,715 1.75 116,720 3.28
Metropolitan Montréal 100,825 1.10 100,825 2.03
Western Quebec 137,210 1.04 137,205 1.93
Eastern Ontario 72,110 2.57 72,105 4.78
Central Ontario 116,930 1.40 116,935 2.64
Metropolitan Toronto 93,710 1.33 93,710 2.46
Southwestern Ontario 98,330 1.24 98,330 2.44
Northern Ontario 41,830 3.64 41,830 7.00
Manitoba 56,335 3.77 56,330 6.91
Saskatchewan 54,305 4.58 54,300 8.48
Alberta 112,945 4.29 112,950 7.96
British Columbia 122,230 3.70 122,230 6.71
Northwest Territories and Nunavut 4,610 6.75 4,585 13.24
Yukon Territory 1,420 3.39 1,410 6.17
Total 1,238,635 2.42 1,238,600 4.48
GeographyTable 5 Note 2  
Rural postal code 354,180 2.83 354,150 5.22
Urban postal code 884,455 2.25 884,450 4.18
Total 1,238,635 2.42 1,238,600 4.48

Mean exposure

The percentage of observations where the absolute difference in mean PM2.5 exposure was at least 0.1 µg/m3 did not vary systematically between regions (Table 6); it was slightly higher for observations with the first postal code indicating an urban region.

Table 6
Mean PM2.5 exposure, by region and geography, Experiment A and Experiment B
Table summary
This table displays the results of Mean PM2.5 exposure Observations with absolute difference in mean PM2.5 exposure ≥ 0.1 µg/m, Experiment A and Experiment B, calculated using percent units of measure (appearing as column headers).
  Observations with absolute difference in mean PM2.5 exposure ≥ 0.1 µg/m3
Experiment A Experiment B
percent
RegionTable 6 Note 1  
Newfoundland and Labrador 2.68 5.14
Nova Scotia 2.77 5.56
Prince Edward Island 1.59 3.08
New Brunswick 2.88 6.00
Eastern Quebec 3.66 7.21
Metropolitan Montréal 5.45 10.49
Western Quebec 5.34 10.44
Eastern Ontario 4.10 7.96
Central Ontario 4.79 9.31
Metropolitan Toronto 4.74 9.20
Southwestern Ontario 4.63 9.12
Northern Ontario 3.54 6.88
Manitoba 2.49 4.93
Saskatchewan 3.24 6.32
Alberta 4.17 8.10
British Columbia 3.42 6.75
Northwest Territories and Nunavut 3.59 7.41
Yukon Territory 3.53 7.61
Total 4.15 8.11
GeographyTable 6 Note 2  
Rural postal codes 3.39 6.70
Urban postal codes 4.45 8.67
Total 4.15 8.11

5. Discussion

Validation was conducted on a subset of the database in which all postal codes were present, but from which small percentages (5% in Experiment A; 10% in Experiment B) were erased and then imputed. The percentage of postal codes erased and imputed fit with the percentage observed in the subset.

Results for postal codes showed that Rules A and B were applied according to the a priori threshold p, and that the percentage of perfectly matching postal codes was usually greater than two-thirds for gaps that did not exceed two years.

Postal code results for individuals (number of moves, number of latitude-longitude coordinates, displacement based on latitude-longitude coordinates) revealed discrepancies in 1.2% to 3.5% of observations for Experiment A (4.5% to 11.5% for Experiment B). Results for PM2.5 exposure revealed discrepancies in 4.1% of observations (Experiment A) and 8.7% of observations (Experiment B). In the context of assigning environmental health exposures, these percentages were considered satisfactory. Also, they did not vary drastically across geographic region or rural/urban location.

Based on these results, the imputation method was considered to be valid. However, in the subsets used for validation, most of the randomly created gaps were short (one or two years), whereas in the original database, only 55% of gaps were one or two years long (Table 3). Thus, the validation method yielded a lower percentage of long gaps than in the original file. This is because the random rule to render postal codes missing in the histories does not take account of some correlation that could exist between successive missing postal codes. Consequently, performance of the imputation could be slightly overestimated. This is not a limitation of the method, but rather, a limitation of the validation due to the database used for the analyses.

Nevertheless, for any longitudinal database in which gaps generally do not exceed two years, the imputed postal codes would be similar to those of the original database. It is the nature of CanCHEC that long gaps will occur. The only control is the choice of the a priori threshold (p). Depending on the situation, analysts using databases with long gaps could apply much lower values of p (for instance, 0.2) when the gap becomes too long, thereby increasing the percentage of occurrences of Rule B. This suggests that the presence of long gaps (at least four years) may make imputation of postal codes arduous.

Other methods could be used to impute the long postal code gaps that frequently occur in databases. Postal codes at the ends of the gaps might be imputed first, and then, working inward, those in the interior of the gap could be imputed in subsequent steps. However, this would generate postal codes that are highly dependent on the first ones imputed, and for which the level of confidence would vary with the distance to the ends of the gap. Another possibility is to impute postal codes based not only on the two surrounding the gap, but also on those that are one or two years further away. This could involve many hypotheses and a series of complex rules.

6. Conclusion

This paper describes a method to impute postal codes in a longitudinal cohort. Imputation was largely based on the postal codes immediately surrounding the gaps. Validation was conducted by randomly erasing a percentage of postal codes from a subset of full histories, imputing the postal codes that were erased, and evaluating the results. This method of imputing postal codes is fully functional for the Canadian Census Health and Environment Cohort database and is considered valid. It can be adapted for any longitudinal file and for any pollutant or ecological variable.

The SAS programs used to implement the methods described in this paper are available from the authors on request.  A user guide is in preparation.

7. Appendix

Illustration of imputation rules

To illustrate the imputation rules, five years of follow-up and seven examples are presented (first six columns in Appendix Table 1, showing examples of postal codes before imputation). For each gap identified in each example, the table provides a description of the gaps, identification of the case and how the surrounding postal codes compare (the two central columns in Appendix Table 1). According to the rules, for gaps belonging to Case 1, a random assignment would apply to Rule A or B. For examples belonging to Cases 2a, 2b or 2c, Rule B would be used. Results (postal codes after imputation) are shown in the last columns of Appendix Table 1. Example 4 illustrates what has already been explained: the three missing postal codes in the gap are imputed simultaneously and independently. Also, had the random generator produced different random numbers, imputed postal codes for examples 1, 2 (both imputed postal codes), 3 (year 3 only) and 4 (the 3 of them) could have been different: Rule A could have been applied instead of Rule B, and vice versa.

Appendix Table 1
Illustration of cases and rules on hypothetical examples with 5 years of follow-up
Table summary
This table displays the results of Illustration of cases and rules on hypothetical examples with 5 years of follow-up. The information is grouped by Example (appearing as row headers), Postal codes before imputation, Gap 1: Description ® Identification of case; Comparison of surrounding postal codes ® Rule(s) used, Gap 2: Description ® Identification of case; Comparison of surrounding postal codes ® Rule(s) used and Postal codes after imputation (appearing as column headers).
Example Postal codes before imputation Gap 1: Description MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Identification of case; Comparison of surrounding postal codes MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Rule(s) used Gap 2: Description MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Identification of case; Comparison of surrounding postal codes MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Rule(s) used Postal codes after imputation
Year 1 Year 2 Year 3 Year 4 Year 5 Year 1 Year 2 Year 3 Year 4 Year 5
1 K1A1A1 (empty) K1A1A1 K1A1A1 K1A1A1 Missing postal code in year 2 = 1-year length; both surrounding postal codes present MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Case 1 with p = 0.95; k = 6; d = 0 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Rule A n/a K1A1A1 K1A1A1
imputed value
K1A1A1 K1A1A1 K1A1A1
2 K1A1A1 (empty) K1A2B2 (empty) K1A2B2 Missing postal code in year 2 = 1-year length; both surrounding postal codes present MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Case 1 with p = 0.95; k = 3; d = 3 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Rule A Missing postal code in year 4 makes 1-year gap; both surrounding postal codes present MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Case 1 with p = 0.95; k = 6; d = 0 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Rule B K1A1A1 K1A***
imputed value
K1A2B2 DUMMY0
imputed value
K1A2B2
3 (empty) K1A1A1 (empty) K1A1A1 K1A1A1 Missing postal code in year 1 = 1-year length; no postal code before gap MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Case 2c; n/a MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Rule B Missing postal codes in year 3 = 1-year gap; both surrounding postal codes present MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Case 1 with p = 0.95; k = 6; d = 0 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Rule A DUMMY9
imputed value
K1A1A1 K1A1A1
imputed value
K1A1A1 K1A1A1
4 K1A1A1 (empty) (empty) (empty) K1A1A2 Missing postal code in years 2, 3, 4 = 3-year length; both surrounding postal codes present MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Case 1 with p = 0.80; k = 5; d = 1 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Rule A for 1st missing postal code; Rule B for 2nd missing postal code; Rule A for 3rd missing postal code n/a K1A1A1 K1A1A*
imputed value
DUMMY1
imputed value
K1A1A*
imputed value
K1A1A2
5 K1A1A1 K1A1A1 K1A1A1 (empty) (empty) Missing postal code in years 4, 5 = 2-year length; no postal code after gap MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Case 2b; n/a MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Rule B used for 2 missing postal codes n/a K1A1A1 K1A1A1 K1A1A1 DUMMY8
imputed value
DUMMY8
imputed value
6 (empty) (empty) (empty) (empty) (empty) One gap of 5 years MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Case 2a; n/a MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ Rule B used for 5 missing postal codes n/a DUMMY7
imputed value
DUMMY7
imputed value
DUMMY7
imputed value
DUMMY7
imputed value
DUMMY7
imputed value
7 K1A1A1 K1A1A1 K1A1A2 K1A1A1 K1A1A1 No gaps MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ n/a; n/a MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaWaa8Haaeaaai aawEniaaaa@3D08@ No imputation n/a K1A1A1 K1A1A1 K1A1A2 K1A1A1 K1A1A1

References

Boys, B.L., R.V. Martin, A. van Donkelaar, R.J. MacDonell, N.C. Hsu, M.J. Cooper, R.M. Yantosca, Z. Lee, D.G. Streets, Q. Zhang, and S.W. Wang. 2014. “Fifteen-year global time series of satellite-derived fine particulate matter.” Environmental Science and Technology 48 (19): 11109–11118.

Peters, P.A., M. Tjepkema, R. Wilkins, P. Fines, D. L. Crouse, P.C.W. Chan, and R. T Burnett, 2013. “Data Resource Profile: 1991 Canadian Census Cohort.” International Journal of Epidemiology 42 (5): 1319–1326.

Pinault, L., M. Tjepkema, D.L. Crouse, S. Weichenthal, A. van Donkelaar, R.V. Martin, M. Brauer, H. Chen, and R.T. Burnett. 2016. “Risk estimates of mortality attributed to low concentrations of ambient fine particulate matter in the Canadian Community Health Survey cohort.” Environmental Health 15 (1): 18.

Statistics Canada. 2014. Postal CodeOM Conversion File Plus (PCCF+) Version 6C, Reference Guide  November 2014 Postal CodesOM. Statistics Canada Catalogue no. 82-F0086-XDB. Ottawa: Statistics Canada.

van Donkelaar, A., R.V. Martin, J.D. Spurr, and R.T. Burnett. 2015. “High-resolution satellite-derived PM2.5 from optimal estimation and geographically weighted regression over North America.” Environmental Science and Technology 49 (17): 10482–10491.

Wilkins, R., M. Tjepkema, C. Mustard, and R. Choinière. 2008. “The Canadian census mortality follow-up study, 1991 through 2001.” Health Reports 19 (3): 25–43. Statistics Canada Catalogue no. 82-003-XPE.

Notes

Date modified: