Publications

Population Projections for Canada (2013 to 2063), Provinces and Territories (2013 to 2038): Technical Report on Methodology and Assumptions

Chapter 1: Statistics Canada's cohort-component population projection model

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

By Patrice Dion

Skip to text

Introduction
The cohort-component model
Relationship between the Population Estimates Program and the projections
Algorithm of the model
Conclusion
References
Notes

Text begins

Introduction

This chapter describes Statistics Canada’s cohort-component projection model in its entirety. The first section outlines the general premises behind the model and presents a brief overview of its history. The second section describes the model used by Statistics Canada and its specificities, including the relationship between the population estimates and population projections programs, and how the latter can be seen as an extension of the former. The final section contains a more detailed analysis of the algorithm used to transform the parameters into projections.

The cohort-component model

Genesis of the model

The idea of creating population projections became popular in the 18th century, in a context where Europe was experiencing some serious social and epidemiological crises. The first population projections consisted of extrapolations of the total population. At that time, the main focus was on discovering a universal law of population growth, a ‘universal multiplier’ that obeyed certain laws of nature. For example, at the end of the 18th century, Malthus suggested that populations grow exponentially, resulting in an imbalance with available resources, which exhibit linear growth.

Up until the beginning of the 20th century, mortality and fertility, though recognized as having an impact on population growth, were not taken into account in population projection calculations. It was only later, and gradually, that the cohort-component method was developed, and the first of these projections were published early in the 20th century. It was an important step for demography, providing a greater understanding of population dynamics and bringing together several fields of knowledge to form classical demography as it is known today (Le Bras 2008).

The first component-based projections included mortality rates that varied by age, but the number of births was set in advance, regardless of the population. An important development took place in 1924, with the publication of Alfred Lotka’s Elements of Physical Biology. By adding women’s probabilities of giving birth to the life table, Lotka introduced the idea that women could have fertility rates that were a function of their age. He thus demonstrated the possibility of producing projections using variable mortality and fertility rates that could be applied to cohorts.^Note 1

The way in which migration was handled in the first cohort-component projection models remained problematic, however. Those models usually employed a uniregional perspective, in which each region was projected independently of the others. When considered, migration was incorporated using predetermined net migration counts or rates. Hence, the advantages of using age-specific fertility and mortality rates, applied directly to the populations at risk, were absent for migration. It was not until the mid-1960s, when a new paradigm in demography, known as multiregional demography, began to emerge, that projection models began to treat regions as a system composed of a number of interdependent populations connected by migration flows (Rogers 2006). Through the use of matrix operations, multiregional projection models make it possible to incorporate specific rates of migration from each of the system’s regions to every other region.^Note 2,^Note 3

Cohort-component projection models constitute a considerable advance over models which extrapolate the total population because they associate quantitative measures of mortality and fertility with population growth and its composition, and because they permit the development of specific assumptions for each component which take advantage of what is known about it (O’Neill et al. 2001). Rather than attempting to predict population growth, the goal is to forecast changes in fertility and mortality.

Statistics Canada’s cohort-component model

Today, most statistical agencies produce their official projections with the cohort-component model. At Statistics Canada, the model was first used to make ‘official projections’ in the 1970s, when population projections became an important activity at Statistics Canada.^Note 4 Since the first series was released in 1974, there have been seven other series which have generally followed the cycle of population censuses.^Note 5 The model has evolved over the years. For example, the purely multiregional version of the model did not appear until the 1984/2006 edition.

Statistics Canada’s cohort-component model was developed to extend the data series of Statistics Canada’s Population Estimates Program (PEP) further in time. Thus, the provincial estimation model and the provincial projection model are accounting models that have the same components:

$\begin{array}{l} {Population}_{t + 1} & = & {Population}_{t} + {Births}_{t, t + 1} - {Deaths}_{t, t + 1} + {Immigrants}_{t, t + 1} - {Emigrants}_{t, t + 1} \\ - Net temporary {emigrants}_{t, t + 1} + {Returning emigrants}_{t, t + 1} \\ + {Net non-permanent residents}_{t, t + 1} + {Net interprovincial migration}_{t, t + 1} \end{array}$

In the context of the projections, each scenario makes assumptions about the future evolution of each of these components,^Note 6 separately for each province and territory. In fact, Statistics Canada uses a 'hybrid bottom-up' approach: 'bottom-up' because the projected values for Canada are the sum of the individual projections for the provinces and territories, with no projection produced at the Canada level, and ‘hybrid’ because the assumptions are often developed initially at the national level. In other words, the assumptions for each province and territory are derived from assumptions first developed at the national level.

Relationship between the Population Estimates Program and the projections

As noted above, the PEP data are the reference universe and the primary source of population projections in the context of the cohort-component model. A brief description of the various series produced by the PEP is necessary in order to understand the nature of the relationship between Statistics Canada’s population estimates and population projections.

Data sources for the projections

To meet data timeliness and accuracy requirements, the PEP produces more than one series of population estimates for the same reference date, though it does so at different times. Postcensal estimates are produced using data from the most recent census adjusted for census net undercoverage (CNU), including an adjustment for incompletely enumerated Indian reserves. There are three series of postcensal estimates. Preliminary postcensal estimates are available shortly after the reference date, but they are derived in part through certain assumptions because there are no data for several components. Updated postcensal estimates and final postcensal estimates are produced one year and two years, respectively, after the preliminary postcensal estimates. Though not as timely, these series include data that were unavailable when the preliminary estimates were produced, and therefore they are usually more accurate. In general, however, the accuracy of postcensal data tends to diminish as they get further away from the date of the last census.

The accuracy of postcensal estimates can be estimated with data from each new census as well as the results of coverage studies conducted following the census. The difference between the postcensal population estimates on Census Day and the population enumerated in that census (after adjustment for CNU (including incompletely enumerated Indian reserves)) is referred to as the error of closure. It stems from errors in the components of population growth during the period between two censuses and from precision errors in measuring census coverage, mainly sampling errors. When a new base population is calculated following a census, an additional series of estimates is produced, the intercensal estimates, which revise the final postcensal estimates to take the error of closure into account.^Note 7 This revision consists of adding a component known as the residual deviation, which includes the error of closure, while the other components of population growth remain the same as in the final postcensal estimates.

Thus, each series of estimates involves a degree of compromise between data timeliness and accuracy. The starting point for the population projections in this edition is the population of Canada on July 1, 2013, according to the preliminary postcensal estimates. It is preferable to use data that are as timely as possible—rather than data that are more ‘exact’ but less up to date—in order to take account of the latest demographic trends.

Nevertheless, other considerations apply in the calculation of the projection parameters when the latter are based on data from population estimates. First, it makes sense in this context to favour estimates that are considered more accurate. Second, postcensal estimates are historically consistent only over a five-year period, since they start from a new base following each census. For these two reasons, intercensal estimates are the ultimate projection reference series for the development of assumptions about the components of growth. In fact, when postcensal estimate are used, it is only because the intercensal series are not yet available.

However, this does present a conundrum: in intercensal estimates, the demographic equation is balanced only if the residual component is included, but it is both undesirable and very difficult to project that component because of its nature and historical trends. Although the error of closure has only a minor impact on the projections at the national level, the difference can be more significant at the provincial/territorial level.^Note 8 Moreover, unlike the PEP data, a projection series does not have the luxury of revision. For this reason, strategies are introduced in this edition to take account of the residual component in the projections. These strategies consist of analyzing the sources of the residual deviation so that whenever possible, it can be distributed among the other demographic components. The goal of these efforts is to minimize the residual deviation and to increase the accuracy of the other components of population growth. Two different approaches are used, one for the immigration component and the other for the emigration component. They are described in detail in the relevant chapters of this report.

Projection assumptions

The connections between the projections and the PEP data affect not only the cohort-component model’s structure but also the way in which the projection assumptions are designed. The assumptions always contain, in one way or another, a function that remains constant. If we take the mortality component as an example, an assumption might be that in the short term, the number of deaths will remain constant in the future. However, an assumption that mortality risks will remain constant is more likely to contain constant mortality rates or death probabilities. Since it is the future of age-sex cohorts that is being projected, those mortality rates or death probabilities should be disaggregated by age and sex, so that both the size and the structure of the populations at risk can be taken into account. With rare exceptions, assumptions are developed in the form of rates rather than probabilities, because population estimates and vital statistics are better suited to the calculation of rates:^Note 9 the measurement of demographic events (i.e., the components) is not associated with a population at risk, which is required for the calculation of probabilities. If we go back to the mortality example, the deaths counted during a year may include persons who were in Canada at the beginning of the year, as well as immigrants who arrived during the year. Thus, the various components of population growth affect the population at risk simultaneously, which makes it impossible to determine an exact number of persons at risk. However, it is possible to find a suitable denominator by estimating the average number of person-years, which combines the number of persons (at a location) and the duration of their presence during a year.^Note 10 For example, a person who was in Canada for six months will theoretically contribute 0.5 person-years to the denominator. The number of person-years is usually estimated by taking the average of the population at the beginning of the period and the population at the end of the period (one year later).

Algorithm of the model

In addition to the reasons mentioned above, the use of rates has another advantage: the rates for the various demographic events can be added together (unlike probabilities) to take the interaction between events into account instead of applying each event in a predetermined order. The projection model sums all the rates and combines them to form out-migration rates in what is known as a transition matrix. The transition matrix contains one row and one column for each combination of age, sex and province/territory. More specifically, the (net) out-migration rates are on the diagonal:

$\begin{matrix} M_{a, s} (i, i) = D_{r, a, s} - I_{r, a, s} + E_{r, a, s} - R E_{r, a, s} + \sum_{z \neq i} M I_{z, a, s} & (1.1) \end{matrix}$

A given cell located on the diagonal of transition matrix $M$ applies to a specific region and is therefore composed of all the rates for that region: mortality rate $D$ , immigration rate $I$ , total emigration rate $E$ , return emigration rate $R E$ , and total rate of out-migration from the region to other regions $M I$ . The indexes r, a and s refer to region, age and sex respectively. Note that at this stage, non-permanent residents (NPRs) are excluded from the calculation.^Note 11 The cells that are not on the diagonal are used exclusively for internal migration. The values in these cells are negative, representing rates of interregional migration, from each region to every other region:

$\begin{matrix} M_{a, s} (i, j) = - \sum_{z \neq i} M I_{j, i, a, s} & (1.2) \end{matrix}$

The transition rate matrices are then transformed into survival probability matrices using matrix operations:

$\begin{matrix} S_{a, s} = (I - 0.5 M_{a, s}) {(I + 0.5 M_{a, s})}^{- 1} & (1.3) \end{matrix}$

where $S$ is the survival probability and $I$ is the identity matrix. The projected population for year t+1 is derived by multiplying the population of the previous year t, excluding NPRs, by the probabilities in matrix $S$ :

$\begin{matrix} P_{a + 1, s}^{t + 1} = S_{a, s} * (P_{a, s}^{t} - N P R_{a, s}^{t}) & (1.4) \end{matrix}$

where $P_{a, s}^{t}$ is the population vector at the beginning of the period, $P_{a + 1, s}^{t + 1}$ is the population vector at the end of the period, and $N P R_{a, s}^{t}$ is the population vector for non-permanent residents present at the beginning of the period.

However, the model does not rule out the use of parameters in the form of ratios or counts. In the case of ratios, for a given component, they are first transformed into counts:

$\begin{matrix} C n t_{r, a, s} = [P_{r, a, s}^{t} - N P R_{r, a, s}^{t}] * Q_{r, a, s} & (1.5) \end{matrix}$

where $Q_{r, a, s}$ is the vector of prospective ratios.

Whether they are derived from ratios or not, the counts are summed, and the net result is multiplied by the probability of survival over half the period between t and t+1and added to the population at time t, which is calculated as shown above using components whose parameters consist of rates:

$\begin{matrix} P_{a + 1, s}^{t + 1} = S_{a, s} * (P_{a, s}^{t} - N P R_{a, s}^{t}) + S_{a, s}^{'} C n t_{a, s}^{n e t} & (1.6) \end{matrix}$

where $C n t_{a, s}^{n e t}$ is the net value of the components expressed as counts, and $S_{}^{'}$ , the probability of survival over half the period, is calculated as follows:

$\begin{matrix} S_{a, s}^{'} = {(I + 0.5 M_{a, s})}^{- 1} & (1.7) \end{matrix}$

Then the number of non-permanent residents at time t+1 is added at the end:

$\begin{matrix} P_{a + 1, s}^{t + 1} = S_{a, s} * (P_{a, s}^{t} - N P R_{a, s}^{t}) + S_{a, s}^{'} C n t_{a, s}^{n e t} + N P R_{a + 1, s}^{t + 1} & (1.8) \end{matrix}$

The last step is to include the births for the permanent resident (PR) population and the non-permanent resident population.^Note 12 If they are in the form of counts, the births are simply added.

If they are fertility rates, the rates are multiplied by the estimated average population between t and t+1. In the case of permanent residents, for a given region, total births are calculated as follows:

$\begin{matrix} N^{P R} = \sum_{x = 10}^{54} F_{x}^{P R} * P R a v g_{x, f e m}^{t} & (1.9) \end{matrix}$

where $N^{P R}$ are the total births for the permanent resident population, $F_{x}^{P R}$ are the age-specific fertility rates for the PRs and $P R a v g_{x, f e m}^{t}$ is the average PR population at the beginning of the period, estimated as follows:

$\begin{matrix} P R a v g_{x, f e m}^{t} = (P_{x, f e m}^{t} - N P R_{x, f e m}^{t} + P_{x + 1, f e m}^{t + 1} - N P R_{x + 1, f e m}^{t + 1}) / 2 & (1.10) \end{matrix}$

NPR births are estimated in much the same way as PR births:

$\begin{matrix} N^{N P R} = \sum_{x = 10}^{54} F_{x}^{N P R} * N P R a v g_{x, f e m}^{t} & (1.11) \end{matrix}$

where $N P R a v g_{x, f e m}^{t}$ is calculated as follows:

$\begin{matrix} N P R a v g_{x, f e m}^{t} = (N P R_{x, f e m}^{t} + N P R_{x + 1, f e m}^{t + 1}) / 2 & (1.12) \end{matrix}$

Lastly, the number of births of each sex is calculated using the sex ratio (or masculinity ratio) specified in advance for the projection.^Note 13 For example, for the PR population in a given region, births of boys and girls will be calculated as follows:

$\begin{matrix} N (males) = N (t o t a l) * \frac{m r}{100 + m r} & (1.13) \end{matrix}$

$\begin{matrix} N (females) = N (t o t a l) * \frac{100}{100 + m r} & (1.14) \end{matrix}$

Conclusion

The cohort-component projection model has numerous advantages. Its relative simplicity and therefore transparency aid the involvement of experts in the consultation processes, the communication of assumptions to users and the reproduction of results. Despite its simplicity, the model is highly effective at producing plausible projections. In this regard, the innovations included in this edition of the projections enhance the quality, transparency and relevance of Statistics Canada’s National Projections Program.

The cohort-component projection model is also used to produce customized projections for specific regions and/or based on particular assumptions. The improvements made in the program in recent years increase Statistics Canada’s capacity to respond quickly to these requests.

References

George, M.V. 2001. “Population forecasting in Canada: Conceptual and methodological developments”, Canadian Studies in Population, volume 28, issue 1, pages 111 to 154.

Le Bras, H. 2008. The Nature of Demography, Princeton University Press, New Jersey, 362 pages.

Lotka A.J. 1924. Elements of Physical Biology, Williams & Wilkins, Baltimore, Maryland.

O’Neill, B., D. Balk, M. Brickman and M. Ezra. 2001. “A guide to global population projections”, Demographic Research, volume 4, pages 203 to 288.

Rogers, A. 2006. "Demographic modeling of the geography of migration and population: A multiregional perspective", working paper presented at the session “The Legends of Quantitative Geography” convened at the International Geographical Union (IGU) regional conference, Brisbane, Australia, July 3 to 7, 2006.

Statistics Canada. 2010. Projections of the Diversity of the Canadian Population, 2006 to 2031, Statistics Canada Catalogue no. 91-551-X.

Statistics Canada. 2012a. Population and Family Estimation Methods at Statistics Canada, Statistics Canada Catalogue no. 91-528-X.

Statistics Canada. 2012b. Population Projections by Aboriginal Identity in Canada, 2006 to 2031, Statistics Canada Catalogue no. 91-552-X.

Notes

Note 1.

For more details on the genesis of the cohort-component projection model, see Le Bras (2008). The information in this paragraph is largely based on that source.

Return to note 1 referrer

Note 2.

For more details on the genesis of the multiregional model, see Rogers (2006). The information in this paragraph is largely based on that source.

Return to note 2 referrer

Note 3.

However, unrestricted use of the multiregional model results in certain difficulties. This is discussed in the chapter on interprovincial migration of this report, and a solution is proposed.

Return to note 3 referrer

Note 4.

The Dominion Bureau of Statistics produced projections before this, however, they were not intended for official publication (George 2001).

Return to note 4 referrer

Note 5.

Note that the cohort-component projections are not the only projections produced by Statistics Canada. Since 2005, Statistics Canada has published a series of projections for particular populations using a microsimulation projection model. This additional projection tool currently enables Statistics Canada to meet the varied needs of population projection users, especially for the production of highly detailed projections. Microsimulation is better suited to coherent projections for a large number of characteristics of the population. It produces results that cannot be achieved with the component model. Microsimulation projections are usually requested and funded by other federal government departments. For analyses based on microsimulation models, see Statistics Canada (2010; 2012b).

Return to note 5 referrer

Note 6.

For a description of these components in the Population Estimates Program (PEP), see Statistics Canada (2012a).

Return to note 6 referrer

Note 7.

An adjustment is produced to make this error accurate for the reference date of the population estimates and not for the date of the census.

Return to note 7 referrer

Note 8.

At the Canada level, the error of closure as a proportion of the enumerated population adjusted for CNU was 0.16% in 2001, 0.14% in 2006 and 0.50% in 2011. It is generally larger at the provincial/territorial level, in particular because of the higher variability associated with the estimates of interprovincial migration (Statistics Canada 2012a).

Return to note 8 referrer

Note 9.

More specifically, in the demographic components of the PEP, the events associated with age x during a one-year period from t to t+1 actually relate to individuals aged x in year t, who will all be aged x+1 in year t+1. The resulting rates are so-called ‘prospective’ rates.

Return to note 9 referrer

Note 10.

In this case, we have annualized rates.

Return to note 10 referrer

Note 11.

In fact, NPRs are not subject to the risks of dying or emigrating, and their number is determined only by the annual net counts. For more details, see Chapter 7.

Return to note 11 referrer

Note 12.

The capacity to incorporate distinct fertility rates by residence status is one of the innovations in the present edition. For more details, see Chapter 3.

Return to note 12 referrer

Note 13.

The ratio is 105 males to 100 females in the present edition.

Return to note 13 referrer

Date modified:: 2015-11-30

Language selection

Search and menus

Search