Findings

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Methods
Definitions
Results
Discussion

Obesity is a public health problem in both the developed and developing world. Globally, an estimated 400 million people are obese.¹ In Canada, the prevalence is estimated to be 23% in adults² and 8% in children,³ with rates expected to rise in coming years.^4,⁵ The costs associated with obesity represent approximately 2% of Canada's total health care expenditures.⁶

Because collecting measured data is expensive, national estimates of the prevalence of obesity are usually based on self-reported survey data. In most countries, body mass index (BMI) is used to estimate the prevalence of obesity because BMI can be easily calculated from self-reported height and weight. However, in both clinical and population samples, self-reports have tended to overestimate height and underestimate weight, which results in a systematic underestimation of obesity prevalence.^7-10 This tendency has recently been confirmed in a review of 64 international studies,¹¹ as well as in Canadian research.¹²

Underestimating the prevalence of obesity is important not only because obesity itself can cause social and physical impairment, but also because it is a risk factor for disease.^13-15 When estimates of obesity are based on self-reported data, the relationship between obesity and conditions such as diabetes, hypercholesterolemia, hypertension, arthritis and heart disease is substantially exaggerated.^16-18

Given that much population health surveillance will continue to rely on self-reported data, it has been suggested¹⁹ that estimates of obesity based on self-reports could be adjusted to more closely approximate measured values. Using data from the 2005 Canadian Community Health Survey (CCHS), which collected both self-reported and measured height and weight, this study examines the feasibility of developing correction equations to adjust self-reported estimates, and assesses whether these equations improve the estimation of obesity (when based on BMI).

Methods

Data source

Data for this study come from the 2005 CCHS. The CCHS is an ongoing survey designed to provide timely cross-sectional estimates of health determinants, health status and health system use at a sub-provincial level.²⁰ Sampling is based on a multi-stage cluster sampling technique that is representative of over 98% of the Canadian population (members of the Canadian Forces, individuals living on Indian Reserves or Crown lands, and residents of institutions, Canadian Forces bases and certain remote regions are excluded). Three sampling frames were used to select households for the 2005 survey: 49% of the sample of households came from an area frame; 50% from a list frame of telephone numbers; and the remaining 1% from a Random Digit Dialing (RDD) sampling frame. More details about the CCHS are available in Béland, 2002.²¹

The 2005 CCHS collected data from 132,947 respondents, yielding a response rate of 79%. A subsample of 7,376 respondents aged 12 years or older were asked their height and weight, and later in the interview, were directly measured. These respondents were all drawn from the area frame and were selected across the ten provinces in proportion to their populations (residents of the territories were excluded). Measured height and weight were obtained for 4,735 individuals—a response rate of 64%. (The main reason for non-response was refusal.)

Because of the high non-response to measured height and weight, an adjustment was made to minimize non-response bias. A special sampling weight was created by redistributing the sampling weights of non-respondents to respondents using response propensity classes. The variables used to create these classes were region (British Columbia, Prairies, Ontario, Quebec, Atlantic provinces), age, sex, household size, marital status, rural/urban indicator, and quarter of data collection.

The present study included only adults aged 18 years or older. Children are in a stage of development where weight and height may change over short periods of time. It has also been suggested that the nature of the reporting error in children and adolescents may differ from that in adults.¹⁰ Women who were pregnant (n=47) or breastfeeding (n=58) were also excluded, as BMI is not recommended for use in these groups. Respondents for whom the difference between self-reported and measured estimates of height, weight or BMI were more than 3 standard deviations from the mean were considered outliers and were excluded from the analysis (n=43, n=44 and n=39, respectively). This left 4,080 respondents with self-reported and measured values for height and weight.

CCHS interviewers were trained to measure height and weight. Height was measured to the nearest 0.5 cm (without shoes) with a measuring tape attached to the wall. Weight was measured to the nearest 0.1 kg (without shoes) with a calibrated digital scale (ProFit UC-321 made by Lifesource). The interview lasted approximately 50 minutes and took place in the respondent's home. Self-reported height and weight were collected near the beginning of the interview; the measurements were taken near the end. Respondents were not told that they would be measured.

Self-reported height and weight were collected with the questions: "How tall are you without shoes on?" and "How much do you weigh?" Categories for height in feet and inches were listed on the questionnaire with corresponding metric values in brackets. Interviewers rounded up to the closest inch for respondents who reported half-inch measures. If questioned, interviewers told respondents to report their weight without clothing. Respondents were asked if they had reported in pounds or kilograms; 94% reported in pounds.

Analytical techniques

The first step was to use the full subsample (n=4,080) to determine which factors were associated with the bias between self-reported and measured height and weight. The bias was calculated by subtracting the measured value from the self-reported value. Negative values indicated underestimation; positive values, overestimation.

Multiple linear regression was used with the bias as the dependent variable in the model. Socio-demographic and health variables, selected based on a review of the literature, were entered as independent variables. Separate models were estimated with the bias in weight, height and BMI as dependent variables. All models were estimated separately for men and women, because the bias differs between the sexes.^8,^22-24 Variables that were significant (p<0.05) were used to develop the correction equations.

The sample was then randomly divided into two parts: split-sample A and split-sample B, each containing approximately 50% of the respondents (2,029 or 49.7% and 2,051 or 50.3%, respectively). Split-sample A was used to generate the correction equations using the variables that were significantly associated with the bias in height, weight and BMI identified in the first step. Split-sample B was used to test the equations. To generate the correction equations, the measured value was the dependent variable, and the self-reported value and any variables that were significantly associated with the bias from the first part of the study were independent variables. Only significant independent variables (or categorical variables for which at least one category was significant) were retained for the final correction equations.

Four models were tested: two Full Models and two Reduced Models. In Model 1 (the first Full Model), estimates of height and weight were first adjusted based on the predictors that were significantly related to the bias in height and weight, respectively, in step 1. BMI was then calculated using the adjusted values of height and weight. In Model 2 (the second Full Model), BMI was adjusted by regressing the predictors of the bias in BMI from step 1 directly onto measured BMI. The Reduced Models were similar, except only self-reported height, weight and BMI were used as independent predictors of the measured values. The models are shown in Table 1.

All analyses were run for men and women separately. Interactions and quadratic terms were tested as appropriate. All variables were entered into the models simultaneously, but only significant variables were retained to generate the final correction equations. Final models were tested to ensure they met the assumptions of independence, linearity, equal variance, and normality.

The correction equations generated from split-sample A were applied to the data in split-sample B. Descriptive statistics (means, prevalence of selected categories) were used to compare the self-reported, measured and corrected estimates of obesity. Sensitivity (proportion of obese, overweight or normal weight respondents, based on measured values, who were classified as obese based on self-reported and corrected estimates) and specificity (proportion of non-obese, non-overweight or non-normal weight respondents who were correctly classified based on self-reported and corrected estimates) were used to determine if the corrected estimates improved BMI classification, compared with self-reported estimates. According to the World Health Organization²⁵ and Canadian classification guidelines,²⁶ respondents were categorized as underweight (BMI less than 18.5 kg/m²), normal weight (BMI 18.5 to 24.9 kg/m²), overweight (BMI 25.0 to 29.9 kg/m²) or obese (BMI 30.0 kg/m² or more).

Logistic regression was then used to determine if the corrected estimates more accurately modeled the relationship between obesity and obesity-related health conditions than did the self-reported estimates. All models controlled for age and sex and examined the relationship between BMI (self-reported, measured and corrected) and one of six conditions: diabetes, heart disease, hypertension, arthritis, activity limitations, and fair or poor self-rated health. The analysis was restricted to respondents aged 40 years or older, because the six conditions are more prevalent in that age range.

Data were appropriately weighted, and all measures of variance were estimated with the bootstrap technique to account for the complex survey design.^27-28 SAS (version 9.1) was used for all analyses.

Definitions

The socio-demographic variables included age (divided into seven groups: 18 to 24, 25 to 34, 35 to 44, 45 to 54, 55 to 64, 65 to 74, and 75 years or older); level of education (less than secondary graduation, secondary graduation, some postsecondary, and postsecondary graduation); geographic region (Atlantic, Québec, Ontario, West and British Columbia); urban or rural area; employment status the week before the interview (full-time, part-time or not working); immigrantstatus (10 or fewer years in Canada, more than 10 years in Canada and Canadian-born); ethnicity (collapsed because of sample size into White, East and South East Asian, and Other); and household income. Household incomegroups were derived by dividing total household income from all sources in the previous 12 months by Statistics Canada's low-income cutoff (LICO) specific to the number of people in the household, the size of the community, and the survey year. These adjusted income quotients were grouped into deciles.

The health variables were self-reported health status and mental health status (dichotomized into fair/poor versus good/very good/excellent); activity limitations imposed by a long-term health problem(sometimes/often versus never); smoking status (daily/occasional versus non-smoker); self-perceived stress (most days are quite a bit/extremely stressful versus a bit/not very stressful); life satisfaction (dissatisfied/very dissatisfied versus satisfied/very satisfied); perception of weight (overweight, underweight, or about right); number of physician consultations in the past year (continuous);and chronic conditions (asthma, arthritis/rheumatism, hypertension, diabetes, heart disease, cancer, mood disorders). Sample sizes were too small to examine associations with eating disorders.

Leisure-time physical activity level was based on total energy expenditure (EE) during leisure time. EE was calculated from the reported frequency and duration of all of a respondent's leisure-time physical activities in the three months before the 2005 CCHS interview and the metabolic energy demand (MET value) of each activity, which was independently established.²⁹

EE = ∑(N_i*D_i*MET_i / 365 days), where
N_i = number of occasions of activity i in a year,
D_i = average duration in hours of activity i, and
MET_i = a constant value for the metabolic energy cost of activity i.

An EE of 3 or more kilocalories per kilogram per day (KKD) was defined as active; 1.5 to 2.9 KKD, moderately active; and less than 1.5 KKD, inactive.

The influence of end-digit preference (the tendency to round responses to numbers ending in 0 and 5) was examined for weight, because past research has associated it with a reporting bias.^8,^10,³⁰ The majority of CCHS respondents (73% of men and 67% of women) reported values for their weight that ended in 0 or 5, although it would be expected that, by chance, only about 20% of respondents would have end-digits of 0 or 5.

Results

Consistent with past research, mean values of self-reported height were overestimated, while weight and BMI were underestimated. Men overestimated their height by 1.08 cm, and underestimated their weight by 1.84 kg and hence, their calculated BMI by 0.94 kg/m². For women, height was overestimated by 0.56 cm, and weight and BMI were underestimated by 2.47 kg and 1.19 kg/m², respectively.

The regression results derived from split-sample A that were used to establish the correction equations for weight are shown in Table 2. In the Full Models for men, self-reported weight, age and the respondents' perception of being over- or underweight were significant predictors of measured weight. Those who perceived themselves as overweight tended to underestimate their weight, and those who perceived themselves as underweight tended to overestimate their weight; the model adjusted these values up or down as appropriate. The adjusted R² was .95 for both the Full and Reduced Models.

For women, factors associated with measured weight were self-reported weight, the perception of being overweight, and end-digit preference (the model added a positive adjustment to self-reported weight to compensate for this tendency). The adjusted R² for women for both the Full and Reduced Models was .97.

Results for height are found in Table 3. Among men, self-reported height, age and life dissatisfaction were significant predictors of measured height, with a negative adjustment related to age and a positive adjustment for those who reported being dissatisfied with their lives. The adjusted R² was .82 for the Full Model and .81 for the Reduced Model. For women, all age groups were significantly associated with measured height except for 45 to 54 years. Also significant were those whose ethnicity was a group other than White or East/South East Asian, and those who reported an activity limitation.

For BMI(Table 4), the Full Models adjusted self-reported estimates down for men who were dissatisfied with life and who perceived themselves as underweight, and positive adjustments were made for age. For women, significant predictors of measured BMI were self-reported BMI, education, perception of being overweight, and end-digit preference. The R² was higher for the female than the male models, but in both cases, was similar for the Full and Reduced Models.

To generate the final equations, adjustments were made for all of the variables in Tables 1 to 3. The final equations are shown in Table 5.

These equations were applied to data in split-sample B to generate corrected estimates of mean height, weight and BMI (Table 6). In all cases, self-reported estimates were statistically different from the measured values, and the corrected estimates were closer than the self-reported estimates to the measured values. In all but one case (the difference in BMI for females in Model 3), the corrected and measured means were not statistically different.

Among men, the proportion who were obese was 13.8% according to self-reported data and 23.1% according to measured data (Table 7); the corrected data generated estimates ranging from 19% to 22%. Self-reported, measured and corrected data yielded similar rates of overweight among men. However, self-reported data overestimated the percentage of men in the normal weight range; the corrected data reduced this bias by 9 to 11 percentage points, with the result that the corrected and measured estimates were similar.

Among women, the proportion who were obese was 12.5% according to self-reported data and 18.9% according to measured data; the corrected data generated estimates ranging from 18.2% to 18.7%. Similarly, for overweight, corrected values were closer than self-reported values to the measured prevalence, with a slight 1-to 2-percentage-point overestimate in the corrected values. Sample sizes in the underweight category were too small to generate reliable estimates.

Sensitivity values in the normal weight category for self-reported data were 93.9% for men and 91.8% for women (Table 8), meaning that in most cases, self-reports correctly classified people of normal weight into the normal weight category.

Sensitivities for the overweight and obese categories fell to 71.1% and 58.7%, for men, and to 62.6% and 68.5% for women. When the data were corrected, sensitivities increased: the corrected numbers accurately classified as many as 86.1% of obese women, 76% of obese men, 79.7% of overweight women, and 82.8% of overweight men. However, the corrected estimates reduced sensitivities for those in the normal weight range.

Specificities were highest for the underweight and obese categories (Table 8), indicating that it is rare for someone to be classified into these groups based on self-reports unless they actually are underweight or obese.

Table 9 displays adjusted odds ratios relating self-reported, measured and corrected BMI to six obesity-related health conditions. An earlier study¹⁶ demonstrated that self-reported BMI exaggerates the relationship between obesity and these health conditions. Unique to the present analysis is that the models have been re-generated based on the corrected estimates. Compared with the odds ratios from the self-reported models, the odds ratios for the corrected models are reduced in most cases (that is, they are closer to the measured values). Arthritis is an exception, with the corrected estimates inflating the relationships for those who are overweight or obese (class II or II - BMI 35 or more kg/m²) even more than what they would be if based on self-reports. In addition, the odds ratios for obese class I are higher than the self-reported odds ratios for diabetes in Models 1 and 2, and for high blood pressure, in Models 3 and 4.

Measured height and weight data were available for only a subsample of the 2005 CCHS. The ultimate goal of developing correction equations is to be able to apply them to the broader survey. When applied to the full sample of the 2005 CCHS (without different adjustments for telephone and in-person interviews) for respondents who were 18 years or older and who were not pregnant or breastfeeding (n=118,383), the models generated obesity estimates similar to, although slightly lower than, the measured values (Table 10). Based on data from both split-sample A and B, the self-reported prevalence of obesity was 16% for both sexes, while the measured prevalence was 25.6% for men and 22.3% for women. The models generated obesity rates of approximately 23% for men and 21% for women.

Limitations

The response rate for the measured height and weight subsample of the CCHS was only 65%. If people who agreed to participate had different height and weight profiles than did those who refused, the sample could be biased. The self-reported prevalence of obesity among everyone who was selected to have their height and weight measured was 15.9% – 19.1% of non-respondents and 14% of respondents. However, when the special sampling weight was applied to those who underwent the physical measures, the prevalence of obesity based on self-reported data fell to 15.2%, comparable to that for the entire subsample.¹²

Bias in self-reported height may be due to inconsistent rounding between self-reported and measured data. When half-inches were reported, interviewers asked respondents to round up to the nearest inch, but for the measured values, height was recorded to the nearest 0.5 cm. Moreover, because interviewers recorded self-reported height only in metres, it was impossible to determine how many people reported in feet and inches and thereby assess the extent of this rounding bias.

For measured weight, it is not known if interviewers consistently asked respondents to empty their pockets and remove their footwear. And for self-reported weight, it is not known if respondents reported their weight with or without clothing, since interviewers told them to report their weight without clothing only if they asked.

Although interviewers were trained in the correct procedures for measuring height and weight, and the weigh scales and measuring tapes were calibrated, intra- and inter- interviewer reliability was not assessed.

BMI is commonly used as a measure of obesity on population surveys, but it has limitations: it cannot distinguish between muscle mass and fat, nor does it consider fat distribution.²⁶

Finally, the models generated for this article were limited to the variables collected in the CCHS. It is possible that additional variables that were not part of the survey could be associated with the bias in weight, height or obesity.

Discussion

BMI calculated from self-reported height and weight underestimates obesity prevalence. This has implications for our understanding of the burden of obesity and the relationship between obesity and obesity-related health conditions. This study examined the feasibility of applying correction factors to self-reported estimates to determine if they could be adjusted to more closely approximate measured values.

In each of the four models tested, and in all analyses undertaken, the corrected estimates provided more accurate measures of overweight and obesity than did the self-reported values. However, this was not the case for the normal weight category. The sensitivity values for the normal weight population fell to as low as 84% in men (a 10-percentage-point decrease) and to 83% in women (a 9-percentage-point decrease). Kuskowska-Wolk et al. also found a reduction in sensitivity for normal weight individuals.¹⁹ We hypothesized that the decline in sensitivity was because heavier individuals have a greater reporting bias¹² (a greater tendency to underestimate their BMI), and therefore, different adjustments may be required depending on where the individual lies on the BMI distribution. Without these differing adjustments, sensitivity declines when a small proportion of normal weight individuals are erroneously shifted to the overweight category. We attempted to address this by incorporating polynomial regressions (quadratic terms for self-reported weight) and spline regression to determine if different slopes could be generated for different weight ranges. The quadratics and differential slopes were not significant, and we were unable to refine the estimates for those in the normal weight range. Therefore, although the adjustments improve the estimates for those who are overweight or obese, the non-adjusted numbers provide better estimates for respondents in the normal weight category because the reporting bias is smaller in this group. Further research is needed to better understand how to improve self-reported overweight and obesity estimates without decreasing sensitivity for those in the normal weight range. More research is also required to determine if differential adjustments are necessary for respondents who were interviewed by telephone.

Despite this drawback, the improvement in classification for overweight and obese individuals is significant, and thus, we recommend the use of corrected estimates in addition to self-reported values in studies examining overweight and obesity in the adult population of the 2005 CCHS. We attempted to adjust for independent variables that were related to the reporting bias, but the R² of the Full Models (Models 1 and 2) was either the same as or only slightly higher than that of the Reduced Models (Models 3 and 4, which used only weight, height or BMI). In most cases, including the extra variables offered no predictive advantage. Plankey et al.³¹ also found that more complex models (including self-reported BMI and additional covariates) only minimally improved predictive ability. Of the models we tested, all four generated similar means, prevalence rates and sensitivity values; no model stood out as being consistently superior. Model 4, however, had the further advantage of being the most parsimonious, and therefore, showing the greatest utility if it is determined that the equations are generalizable.

This method of generating corrected estimates (linear regression with measured BMI as the outcome) has been used in the past,^10,^19,^31-34 but to our knowledge, has never been attempted on data for the Canadian population. Plankey et al.³¹ concluded that a systematic error was associated with the reporting bias, which was impossible to correct with this method. However, in their work, the self-reported sensitivity values for the obese population (BMI 27.3 kg/m² or more) were 80% in men and 85% in women and increased only marginally with the corrected models. By contrast, in the current study, self-reported sensitivity of obesity was much lower—59% for men and 69% for women—and the correction equations increased these values significantly. Also, the reporting bias in our study was two to three times larger than that in the 1976-1980 NHANES II, on which the analysis of Plankey et al.was based.

The generalizability of these equations has not been determined. Some authors³³ assume transportability, while others³⁰ have shown that correction equations are applicable only to the population for which they have been established. In one Swedish study,³² researchers demonstrated that because height was under- rather than over-reported in that country, self-reported estimates of BMI did not require calibration.

More research using Canadian data is required to determine if these equations are stable across Canadian populations and over time. It is probable that the increase in obesity in recent years³⁵ has been accompanied by a corresponding increase in reporting bias, which could indicate temporal instability in the equations. At least one study that has examined the bias over time has found that it has increased.³⁶

In the interim, surveys that collect self-reported and measured height and weight would benefit from standardization of protocols to ensure that equipment is regularly calibrated and that respondents are asked to report their weight in a consistent way and are measured in light clothing, without shoes. Rounding should also be minimized, if not eliminated.

Conclusion

Although measured data for height and weight provide the most accurate estimates of the prevalence of obesity based on BMI, the costs of collecting such data are often prohibitive for large population-based surveys. Corrected estimates, though not identical to measured BMI values, are a significant improvement over estimates based on self-reported data, which substantially underestimate obesity prevalence and overestimate the relationship between obesity and disease.