Chapter 2
Base population

Go to text

Beginning of text

A base population is the population at the beginning of a period used as a reference or starting point for the estimation process. For postcensal estimates, the base population is the population enumerated in the most recent census, adjusted for census net undercoverage (also referred to as the censal estimateNote 1). Postcensal estimates as of July 1 of a census year are calculated by the component method, using the most recent census of population adjusted for census net undercoverage (CNU)Note 2 and taking into account the demographic events that occurred between the Census Day and June 30. The intercensal estimates are based on postcensal estimates corrected for the error of closure.

Since census net undercoverage (census undercoverage minus census overcoverage) is an important aspect of estimating population counts used in the Demographic Estimates Program, this chapter focuses on the census population, adjusted for net census undercoverage. It begins with a brief description of the census collection procedures and the definition of the population universe of the 2011 Census, followed by a examination of the studies used to provide estimates of census coverage error with a section on adjustments for non-enumerated Indian reserves and settlements, and concludes with the procedures used for estimating census net undercoverage for the domains of age, sex and marital status.

2.1 Censal estimates as the base population

The census requires the participation of the entire population of Canada. Every five years, Statistics Canada conducts a census. The Census of Canada up to and including the 1966 Census has been conducted by interview. Starting from the 1971 Census, two collection methods have been used: self-enumeration and interview. In 2011, about 98% of households were enumerated using self-enumeration. A letter was sent to 60% of Canadian households. This letter replaced the traditional paper questionnaire, and provided the necessary information to enable respondents to complete the questionnaire online. Another group of households (approximately 20%) received a census package by mail. For the remaining households (approximately 20%), enumerators delivered questionnaires (approximately 18%) or completed the questionnaires during personal interviews (approximately 2%).

This interview method was normally used in remote areas of the country and on most Indian reserves. It was also used in large urban downtown areas where many residents are transient.Note 3

As was done for the 2006 Census, the 2011 Census offered all households in Canada the option of completing their questionnaire online. Each letter or paper questionnaire had a unique internet access code printed on the front along with the 2011 Census website address. Respondents needed this access code to complete their questionnaire online; the information was directly submitted into the Data Processing Centre system and was verified for completeness. Approximately 54% of households responded via the Internet. Details about 2011 Census data collection and data processing procedures are described in 2011 Census Technical Report: Coverage.Note 4

The base populations in the Demographic Estimates Program are derived from the quinquennial censuses between 1971 and 2011. The population universe of the 2011 Census includes the following groups:

  • Canadian citizens (by birth or by naturalization) and immigrants with a usual place of residence in Canada;
  • Canadian citizens (by birth or by naturalization) and immigrants who are abroad, either on a military base or attached to a diplomatic mission;
  • Canadian citizens (by birth or by naturalization) and immigrants at sea or in port aboard merchant vessels under Canadian registry or Canadian government vessels;
  • Non-permanent residents:
    • persons with a usual place of residence in Canada who are claiming refugee status and members of their families living with them;
    • persons with a usual place of residence in Canada who hold study permits and members of their families living with them;
    • persons with a usual place of residence in Canada who hold work permits and members of their families living with them.

The population universe of the 2011 Census does not include foreign residents but, since 1991, non-permanent residents are included in the population universe.

Foreign residents have not been enumerated since the 1991 Census. Foreign residents are persons who belong to the following groups:

  • government representatives of another country attached to the embassy, high commission, or other diplomatic body of that country in Canada, and members of their families living with them;
  • members of the Armed Forces of another country who are stationed in Canada, and members of their families living with them;
  • residents of another country visiting Canada temporarily (for example, a foreign visitor on vacation or on business, with or without a visitor's permit).

The definition of the population universe indicates which persons should be included in the census, but not where these persons should be enumerated. The Canadian census uses the modified de jure method of enumeration, whereby persons are to be enumerated at their usual place of residence, even if they are temporarily away at the time of the census. Persons away from their usual place of residence and residing elsewhere in Canada are to be enumerated at their usual place of residence and are considered temporarily residents at the other location. Persons without a usual place of residence are to be enumerated wherever they happen to be on Census Day.

Each base population for the Demographic Estimates Program (Pt, where t = the census year) is adjusted as follows (unless otherwise noted, adjustments to the base population apply to provincial, territorial and subprovincial levels):

  • adjustment of the population for census net undercoverage (CNU);
  • addition of independent estimates for incompletely enumerated Indian reserves in 1991, 1996, 2001, 2006 and 2011;
  • adjustment for early enumeration in 1991 and 1996 in parts of northern Quebec, Newfoundland and Labrador, Yukon and the Northwest Territories;
  • addition of estimates of non-permanent residents in 1971, 1976, 1981 and 1986. Since 1991, non-permanent residents are included in the census universe;
  • at the provincial level, the first postcensal population estimate is July 1 of the census year. This is obtained by addition or subtraction of the components of growth between Census Day and June 30. At the subprovincial level, the estimate of the July 1 population estimate is obtained by applying to the annual components of growth, a fraction of the year that corresponds to the period between Census Day and June 30. These are adjusted to the appropriate provincial and territorial components.

2.2 Adjustment for census net undercoverage (CNU)

Coverage errors are defined as errors caused by the miscounting of the population on Census Day. There are two types of coverage error. Population undercoverage refers to the error of excluding someone who should have been enumerated. Population overcoverage refers to the error of either enumerating someone more than once or including someone who should not have been enumerated. The latter error is considered negligible. Undercoverage is more common than overcoverage. The net impact of undercoverage and overcoverage on the size of a population of interest is census net undercoverage (CNU). Census net undercoverage is calculated as the number of persons excluded who should have been enumerated (undercoverage) less the number of excess enumerations of persons enumerated more than once (overcoverage). Coverage errors are one of the most important types of error since they affect not only the accuracy of the counts of the various census universes, but also the accuracy of all of the census data describing the characteristics of these universes.

Following each census, Statistics Canada undertakes coverage studies to measure coverage errors. Coverage studies provide undercoverage estimates for the 1991 to 2011 Censuses at the provincial and territorial levels, and for the 1971 to 1986 Censuses at the provincial level only. Estimates of overcoverage at the provincial and territorial levels are only available as of the 1991 Census. Overcoverage for previous censuses was estimated by assuming that the overcoverage-to-undercoverage ratio for each census between 1971 and 1986 was the same as in 1991. The CNU for Yukon and the Northwest Territories prior to 1991 was estimated by assuming that the ratio between the CNU for each territory and the 10 provinces for each census between 1971 and 1986 was the same as in 1991.

For consistency, 1991 Census undercoverage and overcoverage were revised in 1998 to take into account the methodological improvements made in the 1996 Census coverage studies. This revision altered the CNU in all censuses between 1971 and 1986. Similarly, 1996 Census undercoverage and overcoverage were revised in 2003.

The following discussions on the procedures to estimate CNU are based on the 2011 Census coverage studies.

2.2.1 Census coverage studies

Census coverage error of the 2011 Census is measured by three studies. The 2011 Dwelling Classification Survey (DCS) addressed coverage error resulting from dwelling occupancy classification error. Census data were adjusted for this type of coverage error. The 2011 Reverse Record Check (RRC) measured population undercoverage. The 2011 Census Overcoverage Study (COS) measured population overcoverage. Census data are not adjusted for the population coverage error measured by the RRC and the COS. Rather, estimates of census net undercoverage are used in the production of Statistics Canada's demographic estimates of population.

The methodology of each of the 2011 coverage studies is described below.

A. Dwelling Classification Survey (DCS)

One of the potential sources of error in a census is the misclassification of dwellings. When a questionnaire is not returned from a household, the enumerator has to determine if the dwelling is occupied or not. Two types of errors can occur. First, an occupied dwelling can be incorrectly classified as unoccupied. This classification error results in census dwelling and population undercoverage because the dwelling is excluded from the census database. Second, an unoccupied dwelling can be incorrectly classified as occupied. When this error occurs, no questionnaire will be received for this dwelling and it will be subject to Non-response follow-up (NRFU). The dwelling will be considered as a non-respondent dwelling and therefore subject to imputation. This would add persons to the census database when, in fact, no one is living at that dwelling thus resulting in population overcoverage. Estimates from the DCS are used to adjust census data for both of these coverage errors.

An additional type of dwelling classification error measured by the DCS is the error incurred when marginal dwellings or dwellings under construction are classified in error as dwellings. This misclassification of dwellings can result in dwelling overcoverage however; census data are not adjusted for these dwellings so census estimates of the dwelling stock include some degree of overcoverage.

The DCS target population was all non-response private dwellings and all unoccupied private dwellings excluding dwellings in collective collection units (CU), canvasser CUs and Indian reserves CUs. A sample of private dwellings in the sampled CUs that were classified as unoccupied on Census Day or classified as occupied but for which no census form had been returned, was to be checked again in late June or early July 2011 to determine the true occupancy status of the dwellings on Census Day. A DCS questionnaire was used for this purpose.

At this point in processing, the unoccupied dwellings and the non-response dwellings in the sample were separated and the classification of these dwellings was confirmed against final census listing. The questionnaires completed for each sampled CU were matched to the final census listing of unoccupied dwellings. If a match could not be found, the sampled dwelling was discarded and no further processing was required. Dwellings listed as unoccupied on the census list for which no DCS questionnaire was received were considered as total non-response and went onto the next step of processing. Similarly, the final census listing of all dwellings for which a census questionnaire was not received was used to establish which of the DCS dwellings for which a DCS questionnaire was not received would be considered as total non-response.

Total non-response was addressed by a weighting adjustment while item imputation was used for item non-response. The procedure was the same for the unoccupied dwellings and non-response dwellings. When there was no information for a dwelling, the design weights of the respondents were adjusted by the design weight of the non-respondents.

Once the DCS estimates were produced, census data were adjusted for non-response dwellings and for occupied dwellings classified in error as unoccupied. This process resulted in all private dwellings on the database being classified as either occupied or unoccupied. A second procedure was used to impute the household dwelling size and other variables for the selected non-response dwelling. Household size was determined by randomly selecting a dwelling from all dwellings that had completed a census questionnaire in the same CU (nearest neighbour imputation). The complete record from this donor household was then assigned to the non-response dwelling. If no donor was found, then only a household size was assigned.

B. Reverse Record Check (RRC)

The Reverse Record Check (RRC) is a postcensal study carried out to estimate 2011 Census population undercoverage. The target population, which consisted of all persons who should have been enumerated in the 2011 Census, was formed from six sources (sampling frames). The first five frames were used to estimate undercoverage in the ten provinces, whereas estimates for the three territories were calculated based on samples from the last frame only. The six sampling frames of the 2011 RRC are:

  1. 2006 Census: all persons enumerated in the 2006 Census for which names and dates of birth were completed and valid;
  2. Missed: all persons from the 2006 RRC sample who were classified as missed including all persons enumerated for which names or dates of birth were missing or invalid;
  3. Births: all children born between May 16, 2006 and May 9, 2011;
  4. Immigrants: all immigrants who arrived in Canada between May 16, 2006 and May 9, 2011;
  5. Non-permanent residents: all persons from another country, who held employment or student permits, covering May 10, 2011 and persons claiming refugee status before May 10, 2011. Family members living with them in Canada are also in this frame;
  6. Health care files: all persons listed in the health care files of Yukon, the Northwest Territories, and Nunavut who were living in these territories on May 10, 2011.

A problem that exists with using multiple frames is the possibility that persons may be listed on more than one frame. For example, a person in the immigrants frame may have been in Canada on a work permit in May 2006, and thus have been enumerable in the 2006 Census. The person would then be in both the immigrants frame and the census frame if he or she was enumerated, or in the immigrants frame and the missed frame if not enumerated. All potential cases of frame overlap must be identified to avoid double-counting.

Another difficulty is that none of the first five sampling frames covered people who had emigrated, or who were outside the country at the time of the 2006 Census without being enumerated and had returned during the intercensal period. Coverage error estimates do not include these populations.

Sampling fractions were not the same in all strata. To make the sample design more efficient, higher sampling rates were applied in subgroups for which high undercoverage or a lower tracing rate was expected.

The methodology for the territories that was changed in 2006 was once again used in 2011. As with RRCs previous to 2006, the sampling frames of the three territories were created from their respective health care files. However, the people listed in the sampling frames of each territory were then matched by name, sex and date of birth with the 2006 Census (or 2011 respectively) response database using exact matching. A manual verification was also performed. Matched people were classified as enumerated, and given a weight of 1. People not classified as enumerated were then stratified by age and sex.

After sample selection and checking the sample for quality of information for different variables of interest (i.e., geographic or demographic), the sample was ready for processing and classification. The goal of processing is to determine whether each selected person (SP) was part of the census target population and, if so, to determine whether each SP was enumerated. In addition, processing is undertaken to provide further information for the non-response adjustment.

Most of the work in processing involved searching the RRC version of the 2011 Census Response Database (RRC RDB) to determine whether the SP was enumerated at one of the addresses associated with him or her. The addresses were obtained from various sources including:

  • the sampling frame for the selection address;
  • updates with help from tax records;
  • the computed-assisted telephone interview (CATI) and paper questionnaires;
  • matches with the RRC Response Data Base (RDB) using birth date and sex of the SP and members of his or her household, or, the SP's name, postal code or telephone number as well as for all the members of his or her household.

Two outcomes could result from this process. First, when the SP was found, the classification of enumerated was usually assigned and no further processing was required. An exception was SPs who were later identified as deceased before the census from vital statistics for deaths. Second, when the SP was not found or identified as deceased before the Census, the case was sent for collection. While collection was taking place, searching the RRC RDB continued. When data from the CATI interview was available, it could be determined whether or not each SP was part of the census target population. If so, the CATI data could enable further searching.

Processing provides the information required to determine which SPs were:

  1. listed;
  2. mobile;
  3. included in the census target population;
  4. classified;
  5. enumerated;
  6. missed.

Selected persons for whom one or more of the above-mentioned characteristics could not be determined were considered as non-respondents. The persons that were classified were considered as partial non-responses as we knew they were part of the census target population without having enough information to determine if they were enumerated or not. Selected persons, who were in the census target population but were not enumerated, thus classified as missed, were the basis for the estimate of undercoverage.

The final weights of the selected persons (SP) began with their initial (or design) weights. The initial weight of an SP from the missed frame was the final weight assigned to him or her during the previous Reverse Record Check (RRC) when the SP was classified as missed or as being enumerated but with missing or invalid information on names or dates of birth. For the other sampling frames, the initial weights were generally equal to the inverse of the probability of selection. The exception was the non-permanent resident's frame where the initial weight was higher to account for the small number of non-permanent residents who were not in the sampling frame when the sample was selected. Final non-permanent resident counts were only available after the sample was selected. Initial weights were adjusted to add to these counts. The census frame may contain people enumerated more than once. For the first time in 2011, we set out to identify the persons selected from this frame who had been enumerated more than once in order to correct the problem. The weights of these persons were adjusted downward to compensate for the fact that they are in the frame more than once.

In order to reduce bias, the initial weights of the respondents had to be adjusted to account for non-response. The weight of the non-respondents was redistributed among the respondents. Where possible, this was done by ensuring that the weight of non-respondents with certain characteristics was redistributed only to respondents with the same characteristics. In the rare cases where a respondent with the same characteristics as a non-respondent could not be identified in a stratum, the stratum was grouped with another stratum deemed similar.

After adjusting for non-response, the estimated number of enumerated persons in the territories has traditionally been lower than the comparable census count. This is likely due to undercoverage of the census target population in the health care files. To address this bias, the weight of SPs selected in a territory was adjusted so that the estimated number of enumerated persons equaled the comparable census count for that territory.

The RRC RDB differs from the final census database in that it does not include imputations made during Whole household imputation (WHI), enumerations with an invalid or missing name or an incomplete or invalid birth date, or enumerations added after the start of the RRC data processing phase. People from the target population who are not in the RRC RDB are classified as missed. Census population undercoverage is estimated by the number (weighted) of missed persons less the number of persons excluded from the RRC RDB.

C. Census Overcoverage Study (COS)

Population overcoverage is the number of enumerations in excess of persons who are included in census tabulations more than once, usually twice. This is an error resulting in bias for census counts and estimates because they should only have been included once. Following the 2001 Census, the level of overcoverage due to duplication of individuals was measured by three studies, each one covering a part of the overcoverage: the Automated Match Study (AMS), the Collective Dwelling Study (CDS) and the Reverse Record Check (RRC). The introduction of names to the Census Response Database (RDB) since 2006 provides an opportunity to use name matching to measure overcoverage and therefore estimate overcoverage with a single study, the Census Overcoverage Study (COS). The 2011 COS is based on a series of automated exact and probabilistic matching operations and manual work. These matching operations also involve the use of various administrative data files. Therefore, since 2006 the RRC no longer measures overcoverage.

In principle, the RDB could have been matched to itself to detect duplicate enumerations. However, on a practical level, and for methodological considerations, the COS was conducted in two steps as outlined below.

Step 1 – Probabilistic matching with administrative data

The first step was based on probabilistic matching procedures, and involved matching the RDB with a set of administrative data files representing a large portion of the census target population. It was expected that this process would create a base sample of cases, including a good proportion of overcoverage cases. In particular, the majority of RDB records assigned to the same administrative record through 'many-to-one' matches were identified to be cases of overcoverage after manual review, since they pointed to the same individual from the administrative data files. A sample of these cases was chosen and verified manually to determine if they were cases of overcoverage or not and then weighted to produce the estimates.

The following administrative data files were used:

  • 2005 to 2009 income tax records;
  • Canadian child tax benefits;
  • Birth files for Canadian citizens born between 1974 and 2008;
  • Immigration files for immigrants and non-permanent residents up until September 2011;
  • Health care files from Yukon, the Northwest Territories and Nunavut up until July 2011.

Since some individuals can obviously be in more than one administrative file, the files are used sequentially. As a result, once a person from the census was linked to an administrative file, no attempts were made to link that person to subsequent files. This strategy was used to prevent incidents of individuals who are in more than one file, rather than the 2006 strategy that aimed to identify all cases of overlap and remove them at the source.

Overcoverage was identified by taking a sample of cases where two or more RDB records matched to the same administrative record (or group of administrative records). The sampled cases were manually verified to determine if they were overcoverage cases. We then proceeded with the weighting and the estimation. For evaluation purposes, a sample of the one to one cases was manually verified.

In Step 1, for technical reasons, RDB records for the provinces were matched to provincial administrative records, and RDB records for the territories were matched to the records in the territorial administrative Health Care Files. Hence, cases of overcoverage between the provinces and the territories were missed at Step 1, but they were included in Step 2.

Before Step 2, the RDB was split into two parts, A and B. Part A consisted of all RDB records that were matched to at least one administrative record, whether overcovered or not. Part B consisted of all RDB records that were not matched to an administrative record, as well as territorial records. The latter was done to take into account provincial and territorial matches that were missed in Step 1.

Step 2 – Probabilistic match with the Census Response Database (RDB)

Step 2 of the COS is a probabilistic record linkage between RDB records that were not matched with an administrative record (Part B) and the complete RDB (Part A + Part B). Statistics Canada's Generalized Record Linkage System (GRLS) was used for this step.

Within the framework of GRLS, variables such as first name, last name, sex, date of birth and some variables related to geography were considered during the record linkage. GRLS provided results in pairs of individuals with an associated weight that indicates the strength of the match. The higher the matching weight is, the more likely the pair is a good match, thus resulting in overcoverage.

The standard Fellegi-Sunter (1969) approach was implemented in the GRLS. A lower threshold, S1, was established below which matches were rejected without further review (i.e., no overcoverage), in order to minimize cases of overcoverage below threshold S1. In order to verify cases above the S1 threshold (i.e., pairs whose matching weight was greater than S1), a sample of these matches was selected for manual verification.

The household members of the persons in steps 1 and 2 were also considered, and pairs (from the two households in question) were created using the individuals with similar data. They were then sampled and checked manually. These additional pairs were weighted and used in estimation.

In 2011, overcoverage was measured primarily by the Census Overcoverage Study (COS). The total overcoverage estimate comprised individuals overcovered in the first two steps, and those identified as overcovered among household members.

To evaluate the COS, the Automated Match Study (AMS) was repeated in 2011. The COS estimates were compared to those of the AMS. The comparison revealed a bias in the COS estimates whereby some pairs identified in the AMS were not found in the COS frames. Since the AMS provided an estimate of overcoverage not included in the COS, the last step in estimating overcoverage was to account for this bias by using the AMS estimates to adjust the COS estimates.

2.3 Calculating census net undercoverage

Let T represent the total or true number of persons in the census target population. Then, let C be the published census count of the number of persons in the census target population. The error in using C instead of T as denoted by N, and it is the census net coverage error, defined as:

Equation 2.1:      N=TC MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaacaWGob GaaGjbVlabg2da9iaaysW7caWGubGaaGjbVlabgkHiTiaaysW7caWG dbaaaa@4241@

The censal population P is defined as:

Equation 2.2:      P=C+N MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaacaWGqb GaaGjbVlabg2da9iaaysW7caWGdbGaaGjbVlabgUcaRiaaysW7caWG obaaaa@4232@

Let U denote population undercoverage. U is the number of persons not included in C who should have been.

Let O denote population overcoverage where O is the number of persons included in C who should not have been. There are two components to O. The first is persons who were enumerated more than once. These duplicate enumerations should not have been included in C. The census coverage studies focus on duplicate enumerations. The second component of O is persons who were included in C who are not in the census target population. Foreign residents visiting Canada, for example, who are listed on a census form as usual residents of a dwelling should not be included in C. Fictitious persons are another example. The number of persons included that are not in the census target population has been seen by previous studies to be negligibly small. Therefore, the 2006 and 2011 Census coverage studies did not measure this component of coverage error.

Since U refers to persons who should be included in C and O refers to persons who should not be included in C, the difference between T and C is U less O. That is:

Equation 2.3:      N=UO MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaacaWGob GaaGjbVlabg2da9iaaysW7caWGvbGaaGjbVlabgkHiTiaaysW7caWG pbaaaa@424E@

The true number of persons in the census target population is then:

Equation 2.4:      T=C+N=C+UO MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaacaWGub GaaGjbVlabg2da9iaaysW7caWGdbGaaGjbVlabgUcaRiaaysW7caWG obGaaGjbVlabg2da9iaaysW7caWGdbGaaGjbVlabgUcaRiaaysW7ca WGvbGaaGjbVlabgkHiTiaaysW7caWGpbaaaa@50CF@

An estimate of  T MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGaamivaaaa@3708@  is given by  T ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGabmivayaaja aaaa@3718@  where:

Equation 2.5:      T ^ =C+ N ^ =C+ U ^ O ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaaceWGub GbaKaacaaMe8Uaeyypa0JaaGjbVlaadoeacaaMe8Uaey4kaSIaaGjb Vlqad6eagaqcaiaaysW7cqGH9aqpcaaMe8Uaam4qaiaaysW7cqGHRa WkcaaMe8UabmyvayaajaGaaGjbVlabgkHiTiaaysW7ceWGpbGbaKaa aaa@510F@

U ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGabmyvayaaja aaaa@3719@  is an estimate of the number of persons not included in C that should have been;
O ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGabm4tayaaja aaaa@3713@  is an estimate of the number of persons included in C who should not have been. Let us assume that overcoverage from persons included in C who are not in the census target population is zero. Therefore,  O ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGabm4tayaaja aaaa@3713@  is restricted to an estimate of the number of duplicate enumerations. It is the goal of the census coverage studies to produce  U ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGabmyvayaaja aaaa@3719@  and  O ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGabm4tayaaja aaaa@3713@ .

Census population coverage error can be usefully expressed as rates relative to the true population. The undercoverage rate  R U MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGaamOuamaaBa aaleaacaWGvbaabeaaaaa@380C@  is U expressed as a percentage of T. The overcoverage rate  R O MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGaamOuamaaBa aaleaacaWGpbaabeaaaaa@3806@  is O expressed as a percentage of T. The census net undercoverage rate  R N MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGaamOuamaaBa aaleaacaWGobaabeaaaaa@3805@  is the difference between U and O expressed as a percentage of the census target population. These three rates can be estimated by  R ^ U MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaaceWGsb GbaKaadaWgaaWcbaGaamyvaaqabaaaaa@3993@ , R ^ O MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGabmOuayaaja WaaSbaaSqaaiaad+eaaeqaaaaa@3816@  and  R ^ N MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9cspeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGabmOuayaaja WaaSbaaSqaaiaad6eaaeqaaaaa@3815@  as follows:

Equation 2.6:      R ^ U =100× U ^ T ^ =100× U ^ C+ N ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaaceWGsb GbaKaadaWgaaWcbaGaamyvaaqabaGccaaMe8Uaeyypa0JaaGjbVlaa igdacaaIWaGaaGimaiaaysW7cqGHxdaTcaaMe8+aaSaaaeaaceWGvb GbaKaaaeaaceWGubGbaKaaaaGaaGjbVlabg2da9iaaysW7caaIXaGa aGimaiaaicdacaaMe8Uaey41aqRaaGjbVpaalaaabaGabmyvayaaja aabaGaam4qaiabgUcaRiqad6eagaqcaaaaaaa@5607@

Equation 2.7:      R ^ O =100× O ^ T ^ =100× O ^ C+ N ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaaceWGsb GbaKaadaWgaaWcbaGaam4taaqabaGccaaMe8Uaeyypa0JaaGjbVlaa igdacaaIWaGaaGimaiaaysW7cqGHxdaTcaaMe8+aaSaaaeaaceWGpb GbaKaaaeaaceWGubGbaKaaaaGaaGjbVlabg2da9iaaysW7caaIXaGa aGimaiaaicdacaaMe8Uaey41aqRaaGjbVpaalaaabaGabm4tayaaja aabaGaam4qaiabgUcaRiqad6eagaqcaaaaaaa@55F5@

Equation 2.8:      R ^ N =100× N ^ T ^ =100×[ U ^ O ^ C+ N ^ ] MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaaceWGsb GbaKaadaWgaaWcbaGaamOtaaqabaGccaaMe8Uaeyypa0JaaGjbVlaa igdacaaIWaGaaGimaiaaysW7cqGHxdaTcaaMe8+aaSaaaeaaceWGob GbaKaaaeaaceWGubGbaKaaaaGaaGjbVlabg2da9iaaysW7caaIXaGa aGimaiaaicdacaaMe8Uaey41aqRaaGjbVpaadmaabaWaaSaaaeaace WGvbGbaKaacaaMe8UaeyOeI0IaaGjbVlqad+eagaqcaaqaaiaadoea caaMe8Uaey4kaSIaaGjbVlqad6eagaqcaaaaaiaawUfacaGLDbaaaa a@5FF0@

A positive census net undercoverage rate indicates that undercoverage is larger than overcoverage. That is, there are more people not included in the published census count C than the number of duplicated enumerations. This has been, and continues to be, the experience of the Canadian census. For some domains of interest, however, negative census net undercoverage has recently been observed.

2.4 Adjustments for non-enumerated Indian reserves and settlements

Enumeration is sometimes not permitted on some Indian reserves and settlements or it is interrupted before it can be completed. These areas, a total of 18 in the 2011 Census, are called incompletely enumerated Indian reserves and Indian settlements. Census data for these areas are not available and therefore have not been included in any census tabulations. Additionally, in 2011, 13 Indian reserves in Ontario were enumerated late as they had to temporarily relocate due to forest fires.

Neither the 2011 Census nor the Reverse Record Check is in a position to produce an estimate of the population living in the 18 incompletely enumerated Indian reserves and settlements. In order to produce official estimates of population, a model-based methodology was used to prepare estimates of population for these geographical areas. For the 13 Ontario reserves, the population numbers are from a Statistics Canada enumeration that took place after the Census.

A two step model was developed to estimate the population of the 18 incompletely enumerated Indian reserves. The first step uses a simple linear regression to predict the census count in 2011. The linear regression was constructed using all Indian reserves that were completely enumerated in both the 2006 and the 2011 Census. The model assumes a linear growth from 2006 to 2011 for all provinces with separate estimates, for the intercept and the regression parameters for each province. For each incompletely enumerated reserve, the input variable for the regression model was either the actual census count in 2006 or the best predicted census count from the 2006 model. The output of the model was the estimated census count in 2011.

The second step is done to produce consistency with the results of the census coverage studies. An adjustment was made to the estimated census count to account for census net undercoverage of all subjected census counts. Census net undercoverage for the incompletely enumerated reserves was estimated by calculating the census net undercoverage rate for all completely enumerated reserves in each province and then applying that rate to the estimated census count of all the incompletely enumerated Indian reserves in the province. The estimated census count and the estimated net missed persons in each reserve were then summed to create an estimated population for the incompletely enumerated Indian reserves. This procedure was also applied to the 13 Ontario reserves.

2.5 Net census undercoverage estimates by single year of age and sex

The Demographic Estimates Program requires an estimate of net census undercoverage for various domains of estimation. The sections above describe the methodology used in coverage studies to estimate undercoverage and overcoverage. These estimates are reliable for large domains (e.g., provinces and territories or broad age groups), but would likely be less accurate for smaller domains because of small sample sizes.

Net census undercoverage figures must be whole numbers (positive or negative), and must be produced for domains of estimation that correspond to the cross-tabulation of the following variables: single year of age, sex, provinces and territories, and subprovincial areas. They are also required for the domains composed of marital status (legal and historical), single year of age, sex, and provinces and territories.

In addition, the modelled net census undercoverage figures, when summed across domains of interest, must match the net census undercoverage totals for large domains published in the coverage studies. Furthermore, net census undercoverage must be a smooth function of single year of age; that is, the change in net census undercoverage from one single year of age to another must be regular and, ideally, must not fluctuate too rapidly or abruptly. The following sections present a detailed description of the net census undercoverage estimation methodology for small domains.

2.5.1 Provinces and territories

This section describes the methodology for estimating net census undercoverage by single year of age and sex in the provinces and territories. Normally, direct estimates (or estimates produced by the coverage studies) could be used for all domains of interest. However, direct estimates may be much too volatile as a result of the small sample size of some of these domains. As a result, a model better suited to small domains was used: the empirical Bayes or Fay-Herriot model. This model is used to produce more accurate estimates for small domains, despite their potential bias. As a result, estimating using this model is a trade-off between bias and variance in net census undercoverage estimates.

Moreover, producing estimates using the empirical Bayes model requires direct estimates and their variance for every domain of interest, as well as independent variablesNote 5 related to net census undercoverage. In the Demographic Estimates Program, direct estimates and their variances were not available for single years of age, but were available for the intermediate domains of broad age groups (0 to 19 years, 20 to 29 years, 30 to 44 years, and 45 years and older). The totals of the independent variables for these domains were also available.

However, all the parameters of the empirical Bayes model must be estimated, specifically model error variance and the regression slope. Regression slope is estimated using the ordinary least squares technique. The independent variables are selected from a series of variables according to their contribution to R squared in the model. The selected variables ultimately result in a more parsimonious model (one that contains the fewest terms possible, while remaining highly predictive).

Model error variance was estimated using the restricted maximum likelihood technique. The results of this method were compared with the results produced using the Fay-Herriot and Wang-Fuller adjusted density maximization methods. All these methods use iterations to estimate model variance. The first method was selected since the net census undercoverage estimates that it produced generally had the smallest mean square error.

Lastly, to produce modelled estimates for the required domains (that is, by single year of age), synthetic expansion was used. It is based on the implicit assumption that the net census undercoverage adjustment factor is constant for all single years of age in an intermediate domain. For example, the net census undercoverage adjustment factor at age 1 is the same as the one applied at age 5, since both ages are in the intermediate domain of 0 to 19 years.

Formally, it is described as follows:

Equation 2.9:      M ^ jka = C jka ×( F ^ jk 1 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaadaqiaa qaaiaad2eaaiaawkWaamaaBaaaleaacaWGQbGaam4Aaiaadggaaeqa aOGaaGjbVlabg2da9iaaysW7caWGdbWaaSbaaSqaaiaadQgacaWGRb GaamyyaaqabaGccaaMe8Uaey41aqRaaGjbVpaabmaabaWaaecaaeaa caWGgbaacaGLcmaadaWgaaWcbaGaamOAaiaadUgaaeqaaOGaaGjbVl abgkHiTiaaysW7caaIXaaacaGLOaGaayzkaaaaaa@5337@

where

M ^ jka MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaadaqiaa qaaiaad2eaaiaawkWaamaaBaaaleaacaWGQbGaam4Aaiaadggaaeqa aaaa@3C2C@
=
net number of people omitted for a given age in province or territory j and broad age group k;
C jka MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaacaWGdb WaaSbaaSqaaiaadQgacaWGRbGaamyyaaqabaaaaa@3B60@
=
number of people counted in the census for a given age a in a province or a territory j and a broad age group k;
F ^ jk MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0x c9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8fr Fve9Fve9Ff0dmeaabaqaaiaacaGaaeaabaWaaeaaeaaakeaadaqiaa qaaiaadAeaaiaawkWaamaaBaaaleaacaWGQbGaam4Aaaqabaaaaa@3B3F@
=
adjustment factor produced by the empirical Bayes model by broad age group k.

As stated above, modelled net census undercoverage estimates must be consistent with published coverage study estimates. In other words, the sum of the modelled estimates must equal the net census undercoverage produced by the coverage studies in large domains, such as provinces and territories. Still, it is preferable for net census undercoverage to change gradually from year to year. To satisfy these two constraints, the modelled estimates must therefore be adequately adjusted. To this end, the raking ratio algorithm technique (Deming 1943) is applied to the modelled net census undercoverage estimates. This technique uses provincial net census undercoverage totals as the first margin and smoothed directed estimates by single year of age as the second margin.

The totals of the first margin were taken directly from the published coverage studies and were used without modification. The totals of the second margin were produced by smoothing the direct estimates by single year of age at the national level. They were then calibrated to match the total of the provincial estimates from the coverage studies.

Specifically, smoothing can be modelled using an expression such as

Equation 2.10:      M a =g( a )+ ϵ a MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaaciGaaiaabaqaamaabaabaaGcbaGaamytamaaBa aaleaacaWGHbaabeaakiaaysW7cqGH9aqpcaaMe8Uaam4zamaabmaa baGaamyyaaGaayjkaiaawMcaaiaaysW7cqGHRaWkcaaMe8+efv3ySL gznfgDOfdaryqr1ngBPrginfgDObYtUvgaiuGaqaaaaaaaaaWdbiab =v=aYpaaBaaaleaacaWGHbaabeaaaaa@5086@

where  M a MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGaamytamaaBa aaleaacaWGHbaabeaaaaa@37D6@  is the national net census undercoverage at age a, such that a varies from 0 to 100, and g(a) is a smooth function of age.

The smoothed estimates are produced by solving the following optimization problem with constraints:

Equation 2.11:      S( g )= a=0 100 w a ( M a g( a ) ) 2 +λ g "( x )dx MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGaam4uamaabm aabaGaam4zaaGaayjkaiaawMcaaiaaysW7cqGH9aqpcaaMe8+aaabm aeaacaWG3bWaaSbaaSqaaiaadggaaeqaaaqaaiaadggacqGH9aqpca aIWaaabaGaaGymaiaaicdacaaIWaaaniabggHiLdGcdaqadaqaaiaa d2eadaWgaaWcbaGaamyyaaqabaGccaaMe8UaeyOeI0IaaGjbVlaadE gadaqadaqaaiaadggaaiaawIcacaGLPaaaaiaawIcacaGLPaaadaah aaWcbeqaaiaaikdaaaGccaaMe8Uaey4kaSIaaGjbVlabeU7aSnaape aabaGaam4zaaWcbeqab0Gaey4kIipakiaackcadaqadaqaaiaadIha aiaawIcacaGLPaaacaWGKbGaamiEaaaa@5FC2@

where wa is a given constant, often the inverse of the variance of Ma, and  λ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGaeq4UdWgaaa@37A6@  is a smoothing parameter. The smaller the value of the constant wa, the less weight is put on the direct estimate in question (which occurs when variance of the direct estimate is large). Moreover, the smoothing parameter is estimated mechanically using the generalized cross-validation criterion.

The smoothed net census undercoverage estimates are smoother than direct estimates. However, the sum of these smoothed estimates was inconsistent with the national net census undercoverage estimate published in the coverage studies. To address this inconsistency, the smoothed estimates were calibrated with the direct estimates for the broad age groups (intermediate domains) using the Denton method (1971), which is based on the principle of fluctuation preservation. In other words, the smoothed estimates are adjusted, but the year-to-year fluctuations between net census undercoverage levels are preserved. This is in line with the assumption that net census undercoverage is a smooth function of age.

2.5.2 Subprovincial areas

Base populations for census metropolitan areas and census divisions are obtained by applying the corresponding provincial and territorial census net undercoverage rates, available by age and sex. This synthetic estimate assumes that within a province or a territory and for a single year of age there is a constant census net undercoverage rate. For example, in British Colombia a 20 year old male would be assumed to be missed at the same rate across the entire province. Late enumeration and non-enumerated Indian reserves and settlements were adjusted by adding the provincial or territorial estimates to the appropriate geographic regions. All figures sum to provincial, territorial and national totals.

2.6 Estimates of census net undercoverage by marital status and legal marital status (and age and sex)

2.6.1 Provinces and territories

At this point, estimates of census net undercoverage are available by single year of age and sex for each province and territory. However, the Demographic Estimates Program also requires estimates of provincial and territorial census net undercoverage by age, sex and marital status.

The estimates of census net undercoverage (CNU) by single year of age, sex and marital statusNote 6 and legal marital statusNote 7 for each province or territory were modeled by Social Survey Methods Division of Statistics Canada. The method used to estimate the marital status and legal marital status CNU is a two step raking ratio procedure. As was the case with CNU by single year of age, sex, province and territory, two margins are necessary. One margin was the previously released estimate of net missed persons by single year of age and sex for each province or territory. Whereas the other margin used was from the coverage studies showing the estimated net missed persons by age for marital status or legal marital status for each province or territory. Some minor modifications were brought to the direct survey estimates in order to ensure coherence between the estimates for marital status and the estimates modeled by single year.

The raking ratio procedure was used in a two step process. The first step was to use the five-year age groups and the second step was to use these results to produce the single year estimates. The choice was to create a design matrix using the census distribution of marital status, for each sex, within each five-year age grouping. The resulting distribution of marital status within an age group determined the structure of the design matrix. If any single marital status estimate in any five-year age group constituted at least 1 percent of the total for that age group, then the design matrix was set to one, otherwise it was set at zero. These conditions imply that the various categories are independent of each other if they meet this threshold. Finally, for each province or territory and sex, the legal marital status used the identical design matrix derived from the marital status. After convergence, the estimated CNU rates were checked, and if any estimate had a CNU rate exceeding 50 percent then the initial value for that cell was re-set to zero and the raking procedure was repeated.

Single year of age estimates by province and territory were calculated by a second raking ratio procedure. Again two marginal totals had to be fixed. One margin was the counts of previously released single year of age by sex while the other margin was the marital status or legal marital status estimates for each sex by province or territory. The initial counts were the five-year age estimates. The final single year estimates were then checked to ensure that for every single age the explicit constraint between marital status and legal marital status was maintained. This constraint means that for the adjusted population, the difference between the marital status married or common law and legal marital status married is greater than or equal to zero. On the other hand, the difference between the marital status and legal marital status for the other category (divorced, separated, single and widowed) is less than or equal to zero. The estimates were then rounded to the nearest integer all while respecting the totals already published in the coverage studies.

2.7 Demographic adjustment to age structure

2.7.1 Adjustment at age 0

The postcensal population estimate at age 0 almost entirely stems from the previous year's birth estimates. The estimate is therefore very reliable and can be used as a benchmark in adjusting the censal estimate. For 2011, the postcensal estimate at age 0 was first considered when modelling net census undercoverage. However, a demographic adjustment was required.

The demographic adjustment involves applying the sex ratio at age 0 of the postcensal estimates on the date of the census to the censal estimates for each province and territory. The adjustment concerns the number of men or women—whichever has the greatest error of closure—but could be made to both if the adjustment to be made to one is relatively greater.

The demographic adjustment reduces the error of closure at age 0 and makes it more equal by sex. The adjustment is then redistributed to other ages to maintain the total population by sex of the censal estimate for each province and territory.

2.7.2 Adjustments for older seniors

An analysis of the age structure of census counts and postcensal population estimates shows that populations of seniors over 85 years, and especially centenarians (population 100 years of age and older), were affected by an overestimation problem. For Canadian censuses from 1951 to 1991, an evaluation of the quality of reporting of older seniors reveals that overestimation problems affect men from 95 years of age and women from 100 years of age. The overestimation of centenarian populations was even more marked in the 1950s and 1960s (Bourbeau and Lebel 2000). In addition, the proportion of centenarians in Canada was shown to be greater than that of a set of countries with comparable death rates (Bourbeau and Lebel 2000; Kannisto 1999).

The problem of overestimating the number of centenarians, which seems to affect the Canadian census, are not unique to Canada. Specifically, such problems have been observed in the United States (Siegel and Passel 1976; Spencer 1987; Krach and Velkoff 1999; Kestenbaum and Ferguson 2005; Humes and Velkoff 2007; US Census Bureau 2012) and in the United Kingdom (Dini and Goldring 2008; Thatcher 1992, 1999). In the case of the very elderly, these problems can stem from errors related to voluntary reporting (from false declarations), involuntary reporting (from reporting errors, illiteracy or cognitive problems of the respondent, or from misreporting by proxy), data entry (manual or by scanner), questionnaire design, or from bias not detected by the data validation processes.

Census coverage studies that aim to adjust differential coverage, especially by age and sex, have trouble countering this phenomenon because of the small numbers of people surveyed beyond the age of 65, difficulty reaching seniors, and significant natural attrition (mortality) of these populations. The coverage studies of the last three censuses have made very few changes to the age structure of the base population in the population estimates beyond 85 years of age. To counter this phenomenon, manual validation procedures for the 2006 Census were implemented using a stratified sample of 95- to 99-year-olds and a complete selection of centenarians. An adjustment method for older seniors was also developed at the time of the previous rebasing in 2008, using the 2006 Census, and when the postcensal estimates from 2006 were produced.

However, when compared with the 2011 Census counts, the postcensal estimates of the number of centenarians based on the 2006 Census were 29% higher for women and 88% higher for men. This indicates that the downward adjustment made to the 2006 Census counts for the population aged 95 years and over—the population that would become the population of centenarians in 2011—was insufficient, or that the manual validation process for some questionnaires did not eliminate false nonagenarians and centenarians. This prompted Statistics Canada's Demographic Estimates Program to partially review its adjustment method for the age structure of the base population.

The new demographic adjustment for older seniors was made for the 2001, 2006 and 2011 censal estimates, and it reduced the number of people by 5% overall. The largest adjustments involved centenarians, reducing their counts on average by 28% for men and 4% for women. For the 95-to-99 age group, population counts were reduced by 13% for men and 3% for women. For example, with the new adjustments, the number of centenarians was revised downward in 2006 from 830 to 595 (-39%) for men and from 3,891 to 3,784 (-3%) for women. These revisions should not be interpreted as actual decreases in the number of older seniors, including centenarians, but rather as a new series of estimates based on a more robust method that offers more accurate estimates of populations of older seniors.

Using data on deaths between 1951 and 2011 and a combination of two methods (the method of extinct generations and the survival ratio method), the Demographic Estimates Program made a demographic adjustment to the age structure of the 2011 Census population, to prevent overestimation of the population of older seniors that was observed in the 2006 cycle. Assuming that populations of older seniors are hardly affected by migration, the basic principle of the method of extinct generations (Vincent 1951) is simple. Once all the individuals in a given generation have died, the historical number of survivors for this censal cohort, namely those born between May 10 in year x and May 9 in year x + 1, can be calculated for earlier years by inversely cumulating the number of deaths. The effectiveness of the method of extinct generations has been demonstrated many times by comparing populations reconstructed using this method with observed populations for countries with high-quality data (Coale and Kisker 1990; Coale and Caselli 1990; Kannisto et al. 1994; Human Mortality Database 2013). The file required to use this method comprises data on deaths by date of birth and date of death.

One drawback to this method is that the count for a given generation cannot be estimated exhaustively until all the individuals in that generation have died. An extinct generation is a cohort of individuals born in the same censal year who have all died by a given time (May 2011 in this case). This threshold is normally set at 110 years, meaning that generations for which death information is known through to the age of 110 will be considered extinct and complete. Since 1950, very few deaths have occurred beyond the age of 110 in Canada. This number has usually been 349 deaths in 10.6 million, or an average of one to two deaths per generation.

For non-extinct cohorts, the survival ratio method (Thatcher 1992; Thatcher et al. 2002; Andreev 2004) was used to produce an estimate based on the same principle, but according to the assumption that deaths in non-extinct cohorts are distributed by age like deaths in extinct cohorts. The population at a given age is estimated using the survival ratios of non-extinct cohorts, based on the ratios of the last five extinct cohorts. This method can be expressed as follows:

Equation 2.12:      P x t = P x+1 t+1 + D x t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGaamiuamaaDa aaleaacaWG4baabaGaamiDaaaakiaaysW7cqGH9aqpcaaMe8Uaamiu amaaDaaaleaacaWG4bGaey4kaSIaaGymaaqaaiaadshacqGHRaWkca aIXaaaaOGaaGjbVlabgUcaRiaaysW7caWGebWaa0baaSqaaiaadIha aeaacaWG0baaaaaa@4A38@

where P is the population and D is the deaths at age x at the start of census year t.

For a given year, the population (P) is calculated using the following equation:

Equation 2.13:      P x T =( D x1 T1 + D x2 T2 + D x3 T3 + D x4 T4 + D x5 T5 )× S x T ×c MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeaacaGaaiaabaqaamaabaabaaGcbaGaamiuamaaDa aaleaacaWG4baabaGaamivaaaakiaaysW7cqGH9aqpcaaMe8+aaeWa aeaacaWGebWaa0baaSqaaiaadIhacqGHsislcaaIXaaabaGaamivai abgkHiTiaaigdaaaGccaaMe8Uaey4kaSIaaGjbVlaadseadaqhaaWc baGaamiEaiabgkHiTiaaikdaaeaacaWGubGaeyOeI0IaaGOmaaaaki aaysW7cqGHRaWkcaaMe8UaamiramaaDaaaleaacaWG4bGaeyOeI0Ia aG4maaqaaiaadsfacqGHsislcaaIZaaaaOGaaGjbVlabgUcaRiaays W7caWGebWaa0baaSqaaiaadIhacqGHsislcaaI0aaabaGaamivaiab gkHiTiaaisdaaaGccaaMe8Uaey4kaSIaaGjbVlaadseadaqhaaWcba GaamiEaiabgkHiTiaaiwdaaeaacaWGubGaeyOeI0IaaGynaaaaaOGa ayjkaiaawMcaaiaaysW7cqGHxdaTcaaMe8Uaam4uamaaDaaaleaaca WG4baabaGaamivaaaakiaaysW7cqGHxdaTcaaMe8Uaam4yaaaa@7B6E@

where S is the survival ratio, T is the cohort and c is an adjustment factor that makes it possible for the ratio of older seniors to change over time (the modelled mortality is slightly reduced). This adjustment factor is based on the approach used by the United Kingdom's Office for National Statistics (ONS 2013). Adjustment factor c was set at 1.009 for men and 1.004 for women, and is based on the annual average change in life expectancy at 90 years of age for the last six extinct generations (that is, for people born between May 10, 1896 and May 9, 1901), by sex.

S is calculated as follows:

Equation 2.14:      S x T = T1 T5 P x ( T2 T6 D x1 + T3 T7 D x2 + T4 T8 D x3 + T5 T9 D x4 + T6 T10 D x5 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4uamaaDa aaleaacaWG4baabaGaamivaaaakiaaysW7cqGH9aqpcaaMe8+aaSaa aeaadaaeWaqaaiaadcfadaWgaaWcbaGaamiEaaqabaaabaGaamivai abgkHiTiaaigdaaeaacaWGubGaeyOeI0IaaGynaaqdcqGHris5aaGc baWaaeWaaeaadaaeWaqaaiaadseadaWgaaWcbaGaamiEaiabgkHiTi aaigdaaeqaaOGaaGjbVlabgUcaRiaaysW7daaeWaqaaiaadseadaWg aaWcbaGaamiEaiabgkHiTiaaikdaaeqaaOGaaGjbVlabgUcaRiaays W7daaeWaqaaiaadseadaWgaaWcbaGaamiEaiabgkHiTiaaiodaaeqa aOGaaGjbVlabgUcaRiaaysW7daaeWaqaaiaadseadaWgaaWcbaGaam iEaiabgkHiTiaaisdaaeqaaOGaaGjbVlabgUcaRiaaysW7daaeWaqa aiaadseadaWgaaWcbaGaamiEaiabgkHiTiaaiwdaaeqaaOGaaGjbVd WcbaGaamivaiabgkHiTiaaiAdaaeaacaWGubGaeyOeI0IaaGymaiaa icdaa0GaeyyeIuoaaSqaaiaadsfacqGHsislcaaI1aaabaGaamivai abgkHiTiaaiMdaa0GaeyyeIuoaaSqaaiaadsfacqGHsislcaaI0aaa baGaamivaiabgkHiTiaaiIdaa0GaeyyeIuoaaSqaaiaadsfacqGHsi slcaaIZaaabaGaamivaiabgkHiTiaaiEdaa0GaeyyeIuoaaSqaaiaa dsfacqGHsislcaaIYaaabaGaamivaiabgkHiTiaaiAdaa0GaeyyeIu oaaOGaayjkaiaawMcaaaaaaaa@8F56@

The demographic adjustment was made by calculating the difference between the adjusted censal estimates and the estimate produced using the methods briefly discussed earlier for older seniors. To ensure the greatest consistency of estimates by cohort, the adjustment was made for the census populations of the last three censuses (2001, 2006 and 2011), by age and sex for each province and territory, starting at age 85 in 2001, age 90 in 2006, and age 95 in 2011. In addition, to minimize the impact on the age structure and on several indicators that base their calculations on population estimates (e.g., total fertility rate, life expectancy at birth, immigration rate by age group, etc.), surplus population counts, resulting from the difference between the censal estimates and the estimate of older seniors calculated using data on deaths, were redistributed among the population between 5 and 74 years of age based on the relative weight of the provinces and territories, by sex.

Notes

Date modified: