## Introduction

There are a number of indicators that are useful for monitoring the evolution and the impact of a pandemic like COVID-19 in terms of fatalities. Excess mortality is considered a better indicator for monitoring the scale of the pandemic and making comparisons.^{Footnote 1}^{Footnote 2} Excess mortality refers to the "mortality above what would be expected based on the non-crisis mortality rate in the population of interest."^{Footnote 3} Excess mortality also encompasses collateral impacts of the pandemic, such as deaths occurring because of the overwhelming of the health care system, or deaths avoided due to decreased air pollution or traffic.^{Footnote 4}^{Footnote 5}^{Footnote 6}

## Estimating excess deaths

There are a number of challenges associated with measures of excess deaths. The most important challenge is to properly estimate some level of expected deaths that would occur in non-Covid19 context as a comparison basis for the current counts of deaths.^{Footnote 1} Indeed, death is a statistically rare event, and important variations may be observed from year to year in the annual counts of deaths, in particular in the less populated provinces and in the territories. Moreover, yearly counts of deaths may be affected by changes in the composition of the population, in regard to age more particularly, and changes in mortality rates (e.g. improvement of mortality).

A second challenge is the difficulty to collect timely counts of deaths. In Canada, death data are collected by the provincial and territorial vital statistical offices. The capacity to provide death data to Statistics Canada in a timely manner varies greatly.^{Footnote 7} Moreover, it is possible that the pandemic imposes a burden on health care and other institutions that disturb the data collection process, although it could instead add pressure for accelerated collection. The incomplete coverage of the numbers of deaths makes it difficult to draw any conclusions on the extent of excess deaths in Canada that could be caused by the COVID-19 pandemic.

Beginning on May 13, 2020, Statistics Canada has been releasing provisional counts of excess deaths for 2020.^{Footnote 8} Although the data were published for transparency and as information to be tracked and updated regularly, the uncertainty associated with the baseline expected death counts and the incomplete coverage of the numbers of deaths made it difficult to draw conclusions on the extent of excess deaths in Canada that could be caused by the COVID-19 pandemic. Statistical models are used to obtain estimated death counts adjusted for incompleteness and to estimate baseline non-Covid mortality. Estimates of excess deaths are obtained by comparing adjusted counts with modeled baseline mortality for all weeks in 2020 up to July 4. A description of the models is provided in the next section.

## Methodology

This section describes the distinct models used for estimation of baseline mortality and adjustment of observed death counts.

### Estimating expected mortality

The model used to estimate the expected number of deaths is based on a quasi-Poisson regression model fit to weekly death count data. Adapted from an infectious disease detection algorithm developed by Farrington et al.,^{Footnote 9} which has been largely utilized in the context of mortality surveillance in recent years^{Footnote 10}. Later modifications to the algorithm, originally implemented by Noufaily et al.^{Footnote 11} and further expanded by Salmon et al.,^{Footnote 12} that aim at addressing certain limitations of the model were also adopted in this implementation.

The model was implemented in the R programming language with the use of the surveillance package,^{Footnote 12} and was applied to weekly^{Footnote 13} death counts (all-cause) spanning a selected reference period of approximately four years (2016-2019). Historical counts are a combination of published death data from the Canadian Vital Statistics Death Database (2016-2018) and provisional death counts (2019) coming from the National Routing System (NRS). Estimates of expected deaths are produced for all weeks of 2020 up until the week ending July 4, 2020.

An overdispersed Poisson generalized linear model with a linear time trend and a seasonal factor is fit to the data. The seasonal component aims to represent the expected pattern across weeks that repeats from year-to-year, and consists of a zero-order spline term with 11 knots, representing 10 distinct periods within a given year.^{Footnote 14} The 10 periods are split between a single 7-week period corresponding to the current week being estimated and the 3 preceding and subsequent weeks, and 9 other 5-week periods corresponding to the rest of the year.

The model can be expressed using the following log-linear configuration:

$\text{log}{\mu}_{t}=\alpha +\beta t+{\gamma}_{c\left(t\right)}$

where ${\mu}_{t}$ is the expectation of the count in week *t*, $\beta $ is the coefficient corresponding to the linear time trend, and ${\gamma}_{c\left(t\right)}$ the seasonal factor for week *t*, with *c(t) *indicating the period in the year that week *t *belongs to. ^{Footnote 12}

The quasi-Poisson model relaxes the Poisson assumption that the variance must equal the mean. Instead, $E\left({y}_{t}\right)={\mu}_{t}$, and $Var\left({y}_{t}\right)={\varphi}_{t}{\mu}_{t}$, where the overdispersion parameter $\varphi $ is estimated from the model using the formula:

$\hat{\varphi}=\text{max}\left\{\frac{1}{n-p}\sum _{i=1}^{n}{w}_{i}\frac{({y}_{i}-{{\displaystyle \hat{\mu}}}_{i}{)}^{2}}{{\hat{\mu}}_{i}},1\right\}$

where *n* is the number of weeks used in the baseline, and *p* the number of parameters in the model. A value of $\varphi =1$ implies no overdispersion (regular Poisson model), and $\varphi <1$ implies underdispersion (a rare occurrence, hence the condition on $\hat{\varphi}$). A weight *w* is assigned to each of the historical observations, based on the value of its standard deviation in an unweighted model. This reduces the influence of potential outliers on the estimation of the expected counts and corresponding prediction interval.

Finally, a 95% prediction interval is computed for the expected count in week *t* by assuming that the count follows a negative binomial distribution with mean ${\mu}_{t}$ and size parameter set to ${\mu}_{t}\u2044({\varphi}_{t}-1)$.

### Adjusting death counts for incompleteness

Analysis of death by date (or week) of death is inevitably distorted by delays in reporting. This necessitates appropriate correction of the observed data to estimate the number of death that have occurred but not yet reported. The data received by Statistics Canada via the NRS contain information of day of death, date of report and some demographic information (e.g. age and sex).

Reporting delays are susceptible to change over time, and this is all the more true in a time of a pandemic. For this reason, the model estimates adjustment factors that are based on recent data, and uses different period for weeks that are during the pandemic and those preceding it. Weekly counts of deaths that occurred between December 29, 2019 and March 22, 2020 were adjusted based on the distribution of reporting delays estimated from death records received prior to March 22, 2020. Deaths counts for weeks between March 22 and July 4 were adjusted based on reporting delays observed between March 22 and August 7. In some jurisdictions, the level of data completeness of death records can be very low for the most recent weeks. Weekly adjusted counts are provided only for weeks where the estimated coverage rates satisfy a minimum threshold.^{Footnote 15}

The method used for adjusting observed death counts was originally developed by Brookmeyer and Damiano^{Footnote 16} to model daily counts. It was adapted here to work on a weekly scale. The model was implemented in R programming language using code sent by the authors.^{Footnote 17} The number of deaths occurring on week *t* and reported in week *t+d *(i.e. with a delay of *d* weeks), ${Y}_{td}$, is modeled using the following Poisson regression model:

$\text{LOG}\left(E\right({Y}_{td}\left)\right)={\alpha}_{t}+{\beta}_{d}$

where ${\alpha}_{t}$ represents the log-transformed preliminary reported count in week *t*, and ${\beta}_{d}$ is the term representing the adjustment for underreporting. Note that the right side of the equation is in a log scale so the underreporting adjustment can be seen as a multiplicative adjustment in the original scale. The adjusted number of deaths occurring in week *t *is then the observed deaths count divided by the estimated probability that the lag on the death being reported is less or equal to a maximum of *x *weeks, with x+t being the last observable week to report deaths, i.e. x is the maximum delay time in the dataset minus the week of the death t:

Adjusted number of deaths (t)=$\frac{{\sum}_{d=0}^{{d}_{m{ax}^{-t}}}{Y}_{td}}{P(\text{delay}\le {d}_{\mathrm{max}}-t|\text{time}=t,\text{delay}\le {d}_{\mathrm{max}})}$

## Estimating excess mortality

To calculate the weekly number of excess deaths, the baseline number of deaths in the absence of the pathogen (COVID-19) is subtracted from the observed (and adjusted for reporting delay) number of deaths for the period of interest. The method involves the following steps:

- Quasi Poisson models are fit to the weekly death counts at the provincial and territorial level from January 1st 2016 to January 1st 2020 to obtain a baseline measure of the expected mortality.
- Baseline deaths are projected for year 2020 until July 4.
- Adjust for reporting delays the weekly counts of deaths that occurred from December 29, 2019 to March 22, 2020 based on the distribution of reporting delays estimated from death records received prior to March 22, 2020.
- Adjust for reporting delays the weekly counts of deaths that occurred from the week ending March 22 to July 4 based on reporting delays observed between March 22 and August 7.
- Apply an additional correction to the counts and prediction intervals for the period for March 22 to July 4. This correction factor is the ratio of the adjusted count for the week of March 22 based on the distribution of reporting delays estimated from death records received prior to March 22, 2020 to the unadjusted count for the same week.
- Excess mortality is defined as the adjusted observed mortality minus the baseline for the period of interest.

The 95% prediction intervals surrounding estimates of excess deaths were computed by combining the variances from the two models. An empirical distribution of excess deaths is calculated by randomly pairing 10,000 estimates (replicates) from each model, as per the bootstrap method. The bounds of the prediction intervals represent quantiles of the empirical distribution. The method assumes independence between the two processes that are weekly mortality and collection of death records, but makes no assumption about the statistical distribution of excess deaths.

## Validation

The computation of excess mortality requires the estimation of two greatly uncertain processes: how many deaths there should be in a given week, and how many deaths occurred that were not yet recorded at the time of estimation. The use of modelling for estimation of excess deaths aims at improving estimation but also, importantly, to reflect the uncertainty of these processes.

Validation of the models tend to show that they perform well in many regards. Expected counts tend to mimic the seasonal patterns typically observed during a year and follow the increase observed in past years (mainly due to population growth, particularly at old ages). However, because they captured over periods comprising several weeks, these seasonal patterns tend to be smoothed to some extent. For example, the application of time series models to weekly counts tend to produce more defined peaks, in particular in the month of January (likely due to influenza outbreaks). Another limitation has to do with the way prediction intervals are computed. In the model for estimating expected counts, the death counts are assumed to follow a negative binomial distribution, which is well adapted for modelling discrete counts data susceptible to present overdispersion. However, the bounds of the prediction intervals are defined as the quantiles of the negative binomial distribution, and thus do not reflect the variance due to parameter estimation. A better statistical representation would also account for uncertainty in parameter estimation.

The model for adjusting death counts was designed mainly for its capacity to capture recent trends in reporting delays. Experimentation with different time periods suggest that indeed, there have been changes in the pace at which deaths are registered in the provincial and territorial vital statistics database, at least in some provinces. However, the model assumes that there are no changes within the reference period considered. This is not guaranteed, in particular in a time of pandemic. Another limitation is that with the reference period is too short for capturing adequately potential seasonal patterns. The application of times series models to the data reveals the presence of some seasonal patterns in the coverage rates for some lags (number of days between date of death and report date). It is assumed that biases due to changes in reporting patterns are more important than those due to seasonality. Likewise, potential patterns of underreporting related to some specific days of the week, such as Sundays or holidays, were not considered.

Statistics Canada will continue to refine the methodology in an effort to better inform Canadians of the effects of the COVID-19 pandemic.