Data quality, concepts and methodology: Quality of demographic data

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

The estimates contain certain inaccuracies stemming from two types of errors:

  1. errors in the census data;
  2. imperfections in other data sources and the method used to estimate the components.

Census data

A. Coverage, response and imputation errors

The errors attributable to census data can be divided into two groups: response and processing errors, and coverage errors. The first group implies non-response error, misinterpretation by respondents, incorrect coding and non-response imputation. Errors in the second group primarily result from undercoverage and, to a lesser extent, overcoverage. It should be noted that both types of errors are intrinsic to any survey data.

The coverage errors occur when dwellings and/or individuals are missed, incorrectly included (except for the 2006 and 2011 censuses, where people incorrectly included were not considered in the Census Overcoverage Study) or counted more than once. Following each census, Statistics Canada undertakes coverage studies to measure these errors. The main studies are the Reverse Record Check Survey (RRC) and the Census Overcoverage Study (COS). Based on these studies, estimates of census net undercoverage (CNU) (which is equal to the difference between undercoverage and overcoverage) are produced. Demography Division adjusts the population enumerated in the census by province and territory using these estimates.

When creating base populations, the Estimates Program corrects the census populations only for coverage errors. This correction, which is based on the findings of coverage studies, is primarily subject to sampling errors, and to a lesser extent, processing errors. Statistical tests indicate that coverage adjustments improve the quality of census data. The Estimates Program uses the estimates from coverage studies for the provinces and territories. However, given the size of the samples in these studies, estimates by age and sex are modelled. Furthermore, it is assumed that the coverage rates estimated for a province or territory apply to the regions within that geographic area. Prior to 1991, the Estimates Program used census data that was unadjusted for coverage errors. Coverage studies had been done to measure undercoverage, but none measured overcoverage. Following the decision to integrate a correction for the coverage to the enumerated population in 1991, the Program had to revise the population estimates for the period from 1971 to 1986. The correction is based on the findings of the coverage studies conducted during this period and on hypotheses regarding the ratio between the overcoverage and undercoverage levels based on the findings of subsequent coverage studies.

The corrections to the census data due to CNU improved, in general, the quality of the estimates by compensating for the differential undercoverage by age, sex and by province/territory across censuses.

The adjustment also incorporates the results of a study on the estimates of the number of people living on incompletely enumerated Indian reserves to complete the corrections for coverage errors in the census. The results of the coverage studies contain mainly sampling errors.

These adjustments have a direct impact on:

  1. the error of closure and its distribution by age and sex within a province or a territory as well as by province/territory as the CNU1 and its distribution vary from one census to another;
  2. within-cohort consistency of population estimates. If for example, the male cohort in age group 0 to 4 in 1981 was tracked up to the 2001 Census (unadjusted for CNU)1 the age group 20 to 24 would be noticeably smaller in 2001 than the age group 15 to 19 in 1996. Since Canada receives many immigrants within these age groups, the opposite would be expected. However, only after adjustment for CNU,1 the cohort size increases from 1996 to 2001.

For further information regarding the main coverage studies, please see the following document on Statistics Canada’s web site: 1996, 2001, 2006 and 2011 Census Technical Report on Coverage.

Components

Errors due to estimation methodologies and data sources other than the census can also be significant.

A. Births and deaths

Since the law requires the recording of vital statistics, the final estimates for births and deaths data meet very high standards. Nevertheless, since preliminary estimates are derived, they can be slightly different from final estimates.

B. Immigration and non-permanent residents

With respect to immigrants and non-permanent residents, Citizenship and Immigration Canada (CIC) administers special data files on both of these components. Since immigration is controlled by law, data on immigrants and NPRs are compiled upon arrival in Canada. These data represent only “legal” immigration and exclude illegal immigrants. Thus, for the “legal” part of international movement into Canada, the data are considered to be of high quality. However, some biases such as the difference between the stated province of intended residence at the time of arrival and the actual province of residence, may persist. Finally, since information provided by the Visitor Data System (VDS) from (CIC) is not complete (age and sex of dependents, province of residence for certain groups of permit holders), estimates of NPRs are more prone to error than data on immigrants.

C. Emigration, returning emigration and net temporary emigration

Of all the demographic components that are used in the population estimates program, the emigration, returning emigration and net temporary emigration are the most difficult to estimate with precision. Canada does not have a complete border registration system. While immigration and non-permanent residents (NPRs) are well documented by the federal government, Statistics Canada has always used indirect techniques for the estimation of the number of persons leaving the country. For this reason, available statistics regarding these three components have historically been of a lower quality than other components.

Estimates of the number of emigrants and returning emigrants are both derived using Canada child tax benefit (CCTB) data provided by Canada Revenue Agency (CRA). Estimates must be adjusted to take into account the incomplete coverage of the program and to derive the emigration and returning emigration of adults.

These adjustments and the delay in obtaining the data are the two main sources of errors. As current information on the number of persons living temporarily abroad does not exist, estimates are based on the Reverse Record Check (RRC) and the census. Estimates for the intercensal period, distributed equally among the five years, are maintained constant for the postcensal period. Moreover, assumptions were made to allow for the distribution of annual estimates to a quarterly level. Any geographical or quarterly variation may introduce error in the estimation of these components.

D. Interprovincial migration

Since July 1993, preliminary2 interprovincial migration estimates have been based on Canada child tax benefit (CCTB) files. Under this program, only 76% of children aged 0 to 17 at the Canada level were entitled to benefits on July 1, 2001. Consequently, preliminary CCTB based estimates are subject to larger error than final estimates derived from Canada Revenue Agency (CRA) tax files.

E. Level of detail of components

As a more detailed breakdown of the data introduces a greater risk of inaccuracy into the estimates, the possibility of error in the components is augmented by the method used to distribute the estimates by age and sex. It seems that, in general, the initial errors should be minimal where the distribution of annual estimates of births, deaths and immigrants is concerned, and more significant with regard to the distribution of other components (non-permanent residents, emigrants, returning emigrants, net temporary emigrants and interprovincial migrants). Finally, the size of error due to the age and /sex distribution may vary by period and errors in some components may have a greater impact on a given age group or sex.

Quality assessment

In order to assess the quality of our estimates, two evaluation measures are used: precocity errors and errors of closure.

A. Precocity error

The quality of preliminary estimates of components is evaluated using precocity errors. Precocity error is defined as the difference between preliminary and final estimate of a particular component in terms of its relative proportion of the total population for the relevant geographical area. It can be calculated for both population and component estimates. The precocity error measures the impact of the trade-off of accuracy in favour of timeliness on the estimated population. The precocity error is calculated as:

Figure 1: Precocity error

Precocity error allows for useful comparisons between components, as well as between provinces and territories or geographical areas of different population size. Precocity error can either be positive or negative. A positive precocity error denotes that the preliminary estimate is larger than the final estimate while a negative precocity error indicates the opposite.

Precocity error by component for Canada

At the national level, immigration component yielded the smallest precocity errors, with values close to zero per thousand throughout the years under consideration. On the other hand, interprovincial in-migrants and out-migrants 1  yielded the greatest precocity errors, ranging between 0.92 per thousand and 2.55 per thousand during the period 2008/2009 and 2011/2012 (see text table 3).

Precocity errors for births were mostly small when compared to other components, with the largest precocity error of -0.29 per thousand in 2007/2008. Similar to births, precocity errors for deaths were also low, with values less than 0.32 per thousand in current years.

Precocity errors for emigration and returning emigration were mostly negative, i.e. preliminary estimates were smaller than final estimates. During the years under consideration, precocity error for emigration was lowest in 2009/2010 at 0.05 per thousand and largest in 2007/2008 at -0.48 per thousand. For returning emigration, the values ranged from -0.20 per thousand in 2008/2009 to -0.33 per thousand in 2007/2008 and 2010/2011. During the period 2007/2008 to 2010/2011, the precocity errors for net temporary emigration were positive and low, at 0.05 per thousand throughout those years.

Precocity errors for net non-permanent residents were negative and low during the period under consideration. From 2007/2008 to 2009/2010, precocity errors were stable at -0.04 per thousand and increased slightly to -0.08 per thousand in 2010/2011.

Precocity error by component for provinces and territories

In general, precocity error is typically larger for smaller provinces or territories as it is an error measurement relative to population size. At the provincial level, precocity errors larger than ±10.0 per thousand occurred only once for Prince Edward Island during the current four years under consideration, however, this occurred many times for the three territories (refer to text table 3).

At the provincial and territorial level, precocity errors for births were small and mostly negative, ranging from close to 0.0 per thousand (Quebec in 2009/2010) 2  to -1.67 per thousand (Nunavut in 2009/2010). Similar to births, precocity errors for deaths were also low but predominantly positive. Over the years, the largest precocity error for deaths was 1.03 per thousand (Northwest Territories in 2010/2011).

Compared to other demographic components, precocity errors for immigration were low among the provinces and territories, with absolute error values no more than 0.50 per thousand over the current years. Net non-permanent residents was another component that yielded small precocity errors, with absolute error values less than 0.42 per thousand across the provinces and territories.

Precocity errors for emigration ranged from the lowest at 0.01 per thousand (Saskatchewan in 2007/2008) and -0.01 per thousand (New Brunswick in 2009/2010) to the largest at -1.56 per thousand (Northwest Territories in 2007/2008). Precocity errors for returning emigration were mostly negative; the values ranged from around zero per thousand for some years in the three territories to -0.74 per thousand for Yukon in 2009/2010. Precocity errors for net temporary emigration were positive during the years under consideration, except for British Columbia (2007/2008 to 2010/2011) and the Northwest Territories (2008/2009 to 2010/2011 only).

Precocity errors for interprovincial in-migrants and out-migrants show that final estimates of these components were systematically lower than preliminary estimates (with three exceptions for in-migrants and one exception for out-migrants).

At the provincial level, the largest precocity error for net interprovincial migration was -6.63 per thousand (Prince Edward Island in 2009/2010), while the smallest was close to zero per thousand (Quebec in 2009/2010). Compared to the other provinces, precocity errors for Ontario and Quebec were relatively low during the years under consideration, with the largest error for these two provinces at -0.42 per thousand for Quebec in 2008/2009. Precocity errors for net interprovincial migration for Alberta, the gainer in net interprovincial migration in recent years, ranged from a low of 0.30 per thousand in 2009/2010 to a high of 2.73 per thousand in 2008/2009.

Contribution of components to the sum of precocity errors

When looking at aggregated estimates of precocity errors, there is the potential for a “netting-out” effect, referring to negative precocity errors in one component canceling out positive errors in another component. The analysis of the contribution of each component to the sum of precocity errors without the netting-out effect can be done by using absolute values of the precocity errors. A mean absolute percentage precocity error by component is calculated by dividing the mean absolute precocity error by component by its sum and expressed in percentages. In this case, the mean absolute precocity error by component is the mean of the absolute precocity errors for the 2006/2007 to 2010/2011 period.

At the national level, the mean absolute precocity error for the total emigration 3  component contributed the most to the sum of mean absolute precocity errors (62.37%), followed by births (15.22%) and deaths (13.89%), between 2006/2007 and 2010/2011. Immigration and net non-permanent residents each accounted for less than 9.0% to the sum of mean absolute precocity errors (refer to text table 4).

At the provincial and territorial level, the contribution of individual component to the sum of mean absolute precocity errors was not uniform across the country. Net interprovincial migration accounted for the largest share of the sum of mean absolute precocity errors in ten out of the thirteen provinces and territories, ranging from 35.30% in New Brunswick to 79.43% in Nunavut. In Quebec, Ontario and British Columbia, total emigration contributed the most to the sum, with 56.59%, 47.96% and 66.54%, respectively (refer to text table 4).

On the other hand, births accounted for the smallest share of the sum of mean absolute precocity errors in Quebec, at 1.92%. Between 2006/2007 and 2010/2011, net non-permanent residents accounted for the smallest share of the sum of mean absolute precocity errors in Newfoundland and Labrador (1.02%), Prince Edward Island (1.63%), New Brunswick (0.42%), Saskatchewan (0.89%) and British Columbia (0.79%). For the rest of the provinces and territories, the contribution of immigration to the sum was the smallest, at 2.18% or below.

Precocity errors by age and sex are not currently available.

B. Error of closure

The error of closure measures the exactness of the final postcensal estimates. It is defined as the difference between the final postcensal population estimates on Census Day and the enumerated population of the most recent census adjusted for census net undercoverage (CNU1). A positive error of closure means that the postcensal population estimates have overestimated the population.

The error of closure comes from two sources: errors primarily due to sampling when measuring census coverage and errors related to the components of population growth over the intercensal period. For each five-year intercensal period, the error of closure can only be calculated following the release of census data and estimates of CNU1. The error of closure can be calculated for the total population of each province and territory as well as by age and sex.

Text table 5 shows postcensal population estimates on May 10, 2011 and census counts adjusted for CNU1 and the errors of closure for Canada, provinces and territories for 2001, 2006 and 2011.

For Canada as a whole, the error of closure was estimated at 171,115 or 0.50% in 2011. This is an increase over the errors for 2001 (0.16%) and 2006 (0.14%).

The population estimates overestimated the population of six provinces, two territories and Canada as a whole. Four provinces and two territories posted errors of closure greater than 1% or less than -1%. Of these jurisdictions, only Newfoundland and Labrador’s estimated population differed from the adjusted census population by more than 2% (-2.09%). In 2006, two provinces and three territories posted errors of closure greater than 1% or less than -1%, while this was the case for three provinces and two territories in 2001.

By considering the variance in CNU, it is possible to identify errors of closure that are statistically significant. Text table 5 shows the results of this analysis.

The error of closure is statistically significant for Canada, five provinces and one territory. This means that the population estimates significantly overestimated or underestimated the adjusted census population in these jurisdictions. As noted above, these results are due to both the sampling for census coverage studies and errors in the components of population growth over the intercensal period. Among these components, interprovincial migration and emigration are mostly associated with large errors of closure.

The error of closure can be calculated for total population estimates and for age and sex.

Next technical note | Previous technical note

Date modified: