Evaluating Health Rankings

1. Which aspects of health or health care are considered in the ranking?

Health rankings are often published to highlight important differences in the health status of populations, to evaluate the relative success of health care systems and to compare the availability of health-related resources across jurisdictions. While there is a great deal of interest in learning from inter– and intra–national comparisons, it is often hard to interpret the results of such rankings and difficult to draw firm conclusions regarding the relative merits of jurisdictions' health policies.

Differences in the purpose of the ranking and the aspects of health considered in rankings may explain some of the observed variation in Canada's international health–related rankings. Organizations may differ in the questions they are attempting to answer and, importantly, how they conceptualize health and health care and then identify indicators to quantify these aspects of health. There are no internationally agreed–upon standards for health-related rankings.

The examples of international rankings shown in Box 1 illustrate how organizations can arrive at quite different conclusions depending on aspects of health considered and the indicator set used. The Conference Board of Canada, in its ranking of Canada relative to other OECD countries, relied on indicators of mortality and morbidity (such as life expectancy, rates of death and disease, immunization rates, self–reported health and risk factors such as obesity), while The Frontier Centre for Health Policy ranking considered a broader array of indicators (for example, patient rights and information, waiting times for treatment, clinical outcomes, generosity of public health care systems and provision of pharmaceuticals). The World Health Organization (WHO) used a complex ranking method based on an aggregation of several factors: the burden of diseaseexpressed in terms of disability–adjusted life years; inequality of health status; responsiveness of the health care systems; and a measure of fairness in financing health services. This ranking methodology was criticized5 and the WHO has not since published similar rankings, although it continues to develop methodologies to assess health status and the performance of health care systems.6

The intent of international rankings is often to compare the achievement of specific health outcomes within health care systems with different underlying delivery or financing mechanisms. A fundamental reason interpretation of such health-related rankings is difficult is because, simply put, there is more to health than health care. While the general public tends to be preoccupied with the health care system, the determinants of a population's health are much broader than health care and systems to manage that care. These determinants include a host of other factors–genetics, socio–economic status and government policies and programs in areas as diverse as transportation, education, zoning and early child development–that lie beyond the traditional boundaries of health care institutions and providers. Thus, a country's or region's performance on a population health indicator need not be strongly correlated with the performance of its health care system.

Box 2 – Conceptual Framework for Statistics Canada/CIHI Health Indicators

Health Status

How healthy are Canadians? Health status can be measured in a variety of ways, including well-being, health conditions, disability or death.

  • Well-being
  • Health conditions
  • Human function
  • Death

Non-Medical Determinants of Health

Non-medical determinants of health are known to affect our health and, in some cases, when and how we use health care.

  • Health behaviours
  • Living and working conditions
  • Personal resources
  • Environmental factors

Health system performance

How healthy is the health system? These indicators measure various aspects of the quality of health care.

  • Acceptability
  • Accessibility
  • Appropriateness
  • Coompetence
  • Continuity
  • Effectiveness
  • Efficiency
  • Safety

Community and health system characteristics

These measures provide useful contextual information, but are not direct measures of health status or the quality of health care.

  • Community
  • Health system
  • Resources

Recognizing the complexity of the determinants of health, Statistics Canada and the Canadian Institute for Health Information (CIHI), for example, developed a conceptual framework for a set of indicators to measure 1) the proverbial "bottom line"—health status; 2) non-medical determinants of health; and 3) health system performance. Summary data on community and health system characteristics are also included in the framework, but these factors are not considered indicators because a higher (or lower) value may not unambiguously be indicative of a better state of affairs (Box 2).7

Having an agreed-upon conceptual framework for the determinants of health will facilitate analyses of population health and health system performance within Canada.

In summary, to judge the value of a ranking report (or understand differences between two ranking reports) one must first evaluate whether the conceptual framework that underlies the ranking is sound and encompasses the domains of health that are important given the purpose of the ranking. It is then necessary to assess whether the indicators of health or health care that are included in the ranking methodology are consistent with the conceptual framework.

2. How meaningful and valid are the indicators chosen to quantify these aspects of health and health care?

Following the choice of the aspects of health and health care to be compared, specific indicators are needed before jurisdictions or facilities can be ranked. For example, using the framework described in Box 2, influenza immunization, screening mammography and Pap smear are the indicators used to gauge the accessibility of health systems. Here, the general objective is to have indicators that reflect the quality aims central to the health system and that, in addition, are measurable and actionable on the part of health decision–makers.

The specific measures that so far have been included within this indicator framework represent an inevitable compromise between what measurements have been judged most useful and what data are, in fact, available. A quick scan of the available indicators in Box 3 reveals many gaps in coverage of important aspects of health and health care. For example, measuring the appropriateness of care is limited to two indicators of obstetrical care, the rate of Caesarean section and rates of vaginal birth after a Caesarean section. It is obvious that indicators relevant to many other aspects of care are needed to provide a fuller picture of the appropriateness of Canadian health care. Meaningful national and international comparisons will become easier to make as the most important determinants of health and health outcomes are identified, standardized measures of these factors are developed and health information systems improve.

Vital statistics such as mortality data are readily available and essential to understanding population health. However, as life expectancy has increased, measures that capture how people are faring while alive are needed to more comprehensively monitor population health. Data from surveys and medical encounters are being used to assess individual well-being, the presence of disease and other health conditions and the ability to conduct everyday functions. One useful indicator, health–adjusted life expectancy (HALE), estimates an individual's likelihood of years of life without disease or functional impairments.

Box 3 – Examples of Health Indicators Used to Assess Canadian Health System Performance

Acceptability

  • Patient satisfaction

Accessibility

  • Influenza immunization
  • Screening mammography
  • Pap smear

Appropriateness

  • Caesarean section
  • Vaginal birth after Caesarean section

Effectiveness

  • Number of cases of pertussis, measles, tuberculosis, HIV, chlamydia
  • Pneumonia and influenza hospitalizations
  • Deaths due to medically treatable diseases (such as bacterial infections, cervical cancer, hypertensive disease)
  • 30-day acute myocardial infarction in-hospital mortality rate
  • 30-day stroke in-hospital mortality rate
  • Readmission rates for asthma, hysterectomy, pneumonia, prostatectomy

Efficiency

  • May not require hospitalization
  • Expected compared to actual stay

Safety

  • Hip fracture hospitalization
  • In-hospital hip fracture

Note: Measures are not yet available for the indicators competency and continuity.
Source: Canadian Institute for Health Information, 2008.

As already noted, the broader determinants of health include much more than the health care system. As a result, using population health indicators such as life expectancy and HALE, or disease-specific incidence and mortality rates, to rank the performance of health care systems may confuse rather than educate decisionmakers. The reason, simply, is that the ministry of health is responsible for health care per se, but generally not for many of the other determinants of health, so that a given jurisdiction may have a fine health care system and yet still have poor population health outcomes for reasons that lie outside the health care system. Consequently, rankings that focus on the incidence of certain diseases, mortality rates or the prevalence of chronic conditions are invaluable in assessing the health of a population. However, such rankings can be misleading if used to assess a health care system's performance. For example, using the incidence rate of lung cancer to assess a health system's performance is problematic because this rate is determined largely by historical smoking prevalence rates. One jurisdiction may have a higher lung cancer incidence rate than another because its smoking policies 20 years ago did not discourage smoking (for example, low cigarette taxes or a lack of public education campaigns).

Likewise, using the rate of breast cancer incidence to rank jurisdictions can be misleading because the rate could simply reflect differences in screening practices. A jurisdiction with a high rate might have an active screening program aimed at prevention whereas a jurisdiction with a low rate might be neglecting to detect this serious disease. In short, health status indicators are not necessarily helpful in gauging the performance of a jurisdiction's health care system. More informative to health care decision-makers are indicators that measure specific aspects of the aims of the health care system, for example, its accessibility, effectiveness and appropriateness.

In summary, the second step in judging the value of a ranking report is checking to see if meaningful and valid measures have been used for each of the indicators included in the ranking scheme. The absence of data for a specific indicator or difficulties in measurement may impede the incorporation of important aspects of health into ranking schemes. The development and standardization of health measures and new data collection efforts are under way nationally and internationally to overcome these limitations of data systems that have precluded inclusion of important aspects of health or health care into rankings.

3. Are ranking indicators based on accurate, reliable and comparable information?

Every day we read in the newspaper and online about medical breakthroughs, the connections between our environment and our health and innovations in health care communications technology. Some of these reports are based on clinical research studies, others on analyses of large data sets maintained by health authorities or large surveys. Knowing when the data from a particular study, data set or survey can be generalized to the entire population of residents of a jurisdiction or to a subset of the population can be hard to judge. It is important to know when the data were collected, whether findings were based on a sample that is representative of the population of interest and if the data, when collected, are complete. Suspicions should arise if, for example, the indicator being used as part of a ranking scheme is based on 10-year-old information that does not reflect current medical practice. If the indicator being used in the ranking is based on a survey in which only 25 percent of the population of interest responded, it may not accurately represent the true values, experiences or attitudes of the population. Biases can arise, for example, if only those dissatisfied with their care are motivated to participate in a survey or study. Self-reported data from individuals, health care providers or institutions may be inaccurate if the information is not validated.

Rankings based on published indicators may also be inaccurate because of differences in how jurisdictions define and measure them. Even a health indicator as carefully monitored as infant mortality may be hard to interpret because of differences in registration practices. Infant mortality is defined as the number of deaths of children under one year of age that occur in a given year, expressed per 1,000 live births.8 Canada and the United States are among countries that register the death of a very premature baby as an infant death. In other countries, very premature babies who do not survive are registered as fetal deaths. This practice increases infant mortality rates in Canada and the U.S. relative to those countries that register these deaths as fetal instead of infant deaths.8,9The low position of Canada and the U.S. relative to other countries when ordered by infant mortality rate (Chart 1) could, in part, be attributable to these differences in registration practices.

Chart 1 Infant mortality, 2005. Opens a new browser window. Chart 1
Infant mortality, 2005

In summary, the third step in judging a ranking report is to assess whether the data used to rank jurisdictions are accurate, reliable and comparable. Ideally, data should be from sources that are free from bias, current, complete and represent the population of interest. How data elements are defined and collected should also be scrutinized to ensure that "apples to apples" comparisons are being made.

4. Do sound methods underlie the ranking process?

Even when meaningful and valid indicators are used, and accurate and comparable data are available for ranking jurisdictions on the basis of their health or health care systems, the results of any ranking may still be unsound. Several statistical and methodological issues may arise that can compromise the validity of the ranking.

Distinguishing meaningful differences in performance—in order for an indicator to be useful in distinguishing low- and high-performing jurisdictions, it needs to vary in value. If all jurisdictions score highly on a particular indicator, it is not useful in discriminating "good" from "poor" performers. Before using a measure in a ranking scheme, statisticians like to examine the statistical properties of the indicator, especially the shape of the distribution of the indicator's values. In some cases, there may be values clustered at the low-performance and high-performance ends of the scale. In other cases, most values will be clustered in the middle, with just a few cases at the low and high ends of the distribution. Chart 1 shows the distribution of infant mortality rates within countries belonging to the OECD. Here, there are two countries, Turkey and Mexico, with outlying high values. Statisticians use information about the distribution of an indicator's values to determine where to make cut points to distinguish levels of performance. When these statistical properties are not examined and arbitrary cut points are used, such as simply dividing the jurisdictions being compared into thirds, the indicator has less value in correctly identifying levels of performance. For example, if cut points are chosen to identify the top 10 and bottom 10 performers in terms of infant mortality among OECD countries in Chart 1, the bottom 10 would combine countries with very different rates.

Interpreting absolute versus relative comparisons—rankings by nature provide a summary measure of the relative difference between one jurisdiction and another. This relative measure can obscure the absolute magnitude of any difference. Two jurisdictions may be close in rank order; however, the absolute difference in the value of the underlying indicators can be either close or far apart, which may mean that the jurisdictions are in fact quite distinct in terms of their health status or performance of their health care systems. Looking again at Chart 1, the first two countries and the second– and third–last countries are the same distance apart in terms of their rank order, but the actual difference in the underlying infant mortality is over 100 times larger at 0.1 deaths per thousand in the former case and 11.6 in the latter. Readers of ranking reports should be able to review both the relative and absolute difference of the individual indicators included in the ranking scheme.

Chart 2 Hospital Ranking by Outcome Indicator (Hypothetical) . Opens a new browser window. Chart 2
Hospital Ranking by Outcome Indicator (Hypothetical)

Chart 2 shows a ranking of 24 hospitals by a hypothetical outcome indicator that ranges in value from 0 to 10. The scores for the hospitals shown in the figure range from 3.5 to 9.1. Media reports often highlight the merits of the 10 top–scoring hospitals and distinguish the bottom–ranking 10 hospitals for scrutiny. However, differentiating the hospitals ranked within the top 10 and those ranked below this arbitrary cut point is difficult. The hospitals ranked 10 and 11 both have a score of 6.5, making the relative rankings difficult to interpret. In this hypothetical example, distinguishing hospitals ranked in the top third (rank 1 to 8), the middle third (rank 9 to 16) and the bottom third (rank 17 to 24) would make more sense given the distribution of the outcome indicator.

Making adjustments for underlying differences in a population— the values of indicators used in ranking jurisdictions can vary significantly due to underlying differences in the health or demographic composition of the populations being compared. The results of rankings can be misleading if these differences are not taken into account. For example, if one jurisdiction's population is disproportionately retirees, health status indicators will likely be skewed toward the unhealthy end of the spectrum unless some form of age standardization has been made. When combining several indicators into one ranking scheme, it is also important to consistently use the same reference population (for example, the population as measured during the same census year) when making adjustments, especially when a jurisdiction's performance is being monitored over time.

Combining the values of many indicators into the ranking scheme— rankings reported in the popular media often involve combining and weighting dozens of often diverse data elements to create a single measure whose value is then used to determine rank order. The specific formula used in such an aggregation procedure, and how much weight each factor considered in the ranking receives, can be very influential in determining the relative positions of jurisdictions in the overall ranking. But the choice of specific formula and weighting scheme is usually arbitrary. An equal weighting of all factors considered in the ranking may not be appropriate if some indicators are publicly viewed as having greater bearing on health and well-being.

Weighting schemes may be value-laden; in other words, they may embody some sort of ethical or social norm. Other weighting schemes may simply be arbitrary (for example, the organization authoring a report assigns a higher priority value to some areas than to others). Ideally, for public reporting of summary indices of performance, the weights and the ranking formula should be based on some broadly acceptable principles. For example, the consumer price index aggregates hundreds of price changes, but the formula and weights are not at all arbitrary. They are based directly on regularly collected and detailed data on household expenditure patterns to provide the weights applied to each commodity's specific price change.

Identifying statistically significant differences in performance—some level of uncertainty underlies all measurement. Measures are often taken on a sample of people or institutions and the uncertainty that exists from just measuring a small subset of a population can be estimated. Statistical estimates of uncertainty can be expressed as confidence intervals, which reveal the precision of the measure of interest. The confidence interval gives a range of values likely to include the population's true value. A narrow confidence interval suggests a more precise estimate than a wide confidence interval for a given level of confidence (a 95% confidence interval is frequently used). To accurately distinguish levels of performance, rankings should be accompanied by upper and lower limits of a confidence interval for the relevant rates, percentages or other values used as part of the ranking. When a measure used as part of a ranking is very unstable, for example, due to small sample sizes, it may be advisable to use two- or three-year averages to smooth out short-term fluctuations.

Chart 3 illustrates the importance of understanding levels of uncertainty associated with estimates. The figure provides information on wait time for hip fracture surgery by province/territory for the period 2006–2007. The proportion of individuals with surgery performed on the same or the next day is shown along with the 95% confidence interval (shown as vertical bars) associated with each wait time estimate. The wait time for hip fracture surgery in the Yukon Territory appears visually to be lower than in the other areas shown. However, the confidence interval surrounding the estimate of wait time in the Yukon (59.4%) is wide (from 38.0 to 80.9), reflecting a relatively high level of uncertainty associated with this estimate. Statistically speaking, the wait time in the Yukon is not significantly different than wait times in the other areas during this period.

Chart 3 Wait Time for Hip Fracture Surgery (Proportion With Surgery Same or Next Day), by Province/Territory, 2006–2007. Opens a new browser window. Chart 3
Wait Time for Hip Fracture Surgery (Proportion With Surgery Same or Next Day), by Province/Territory, 2006–2007

Consideration of other statistical issues—the validity of any ranking scheme depends on many other statistical considerations. One issue that must be taken into account is the extent to which adjustments are made when two or more indicators used within the ranking scheme are correlated. For example, the 30-day in-hospital stroke mortality rate and the 180-day net survival rate for stroke are highly correlated measures. The overall ranking may be misleading if these two indicators are both used in a ranking scheme independently without an adjustment being made for their correlated nature, as otherwise a type of double–counting or at least an overweighting of this one finding will occur. Another factor to scrutinize is how ties in rank are handled. Rounding down the values of indicators can allow for ties and avoid the false impression of a difference between very similar ranking scores, for example, scores of 4.1 and 4.2. However, any rounding of the number of significant digits published should be based on confidence intervals or some sense of what differences are statistically supportable.

In addition to the many statistical and methodological issues to be considered, the critical reader of ranking results must question their source and potential biases. Trust in any ranking scheme can only be achieved through transparency, that is, full disclosure of the methods underlying the ranking. Confidence in the underlying methodology can also be bolstered by careful peer review. This process helps to assure the reader that there are no inherent biases associated with the methodology and limits any suggestion of a conflict of interest on the part of the organization issuing the ranking report. With full disclosure, the sophisticated reader may also be able to critically evaluate the ranking by testing its sensitivity to modifications to the ranking methodology.

In summary, the fourth step in judging a ranking report is to examine the soundness of the methods that underlie the ranking process. The ranking report should allow the careful reader to 1) distinguish meaningful differences in performance; 2) interpret absolute versus relative comparisons; 3) understand whether appropriate adjustments are made for underlying differences in the populations being compared; 4) understand how values of indicators are combined in the ranking scheme; 5) identify statistically significant differences in performance; 6) be assured that statistical issues such as those used to handle ties in rank have been appropriately handled; and 7) that the authors of the report have reduced the potential for bias through full disclosure of ranking methods and peer review.

Date modified: