What are confidence intervals and margins of error?
These indicate the precision of a survey's results. The confidence level should always be reported as part of the margin of error statement. The confidence level is often stated as 19 times in 20 (95% confidence level) or 9 times in 10 (90% confidence level). For a given sample result, the higher the confidence level is, the larger the margin of error. But remember, the confidence level and margin of error only indicate sampling errors.
Example: A survey recently published by XYZ Consultants found that 73% of Canadians regularly watch ice hockey games on television but only 2% watch field hockey.
The survey interviewed a representative sample of 1,200 Canadian adults and has a margin of error of plus or minus 3 percentage points, 19 times out of 20, i.e., 95% of the time.
This means that 73% is our best estimate of the percentage of ice hockey viewers in the whole population, and the true value is expected to lie within 3% of that number—in other words, between 70% and 76%, at a confidence level of 95%.
Strictly speaking, we can infer there are 95 chances in 100 that the sampling procedure, which generated the data, will produce a 95% confidence interval that includes the true value.
What is a coefficient of variation?
A coefficient of variation (CV) is simply the margin of error expressed as a percentage of the estimate to which it refers. In the XYZ example, with an estimate of 73% and a margin of error of 3%, the CV is 100*3/73, or about 4.1% of the estimated level of ice hockey viewing.
The CV is useful in the interpretation of relative levels of precision, especially when widely varying quantities are being compared.
Example: In a province there may be an estimated 50,000 people unemployed with a margin of error of 1,300 people. At the same time, that province's estimated unemployment rate is 8% with a margin of error of 0.2%. It is difficult to compare these numbers directly. However, the CV of the estimated number of unemployed is 2.6%, while the CV of the estimated unemployment rate is 2.5%. (They need not be equal.) This shows that the two estimates have essentially the same level of precision.
What was the achieved response rate for the survey?
Response rates are important for a number of reasons:
- Non-respondents may be different from respondents in ways that can affect the survey results. Determine what techniques were applied to maximize response rates.
- A low response rate can be more damaging to data quality than a small sample size by contributing to total survey error.
- An unexpectedly high response rate can be indicative of other problems, as might be the case in quota sampling.
Example: If the survey results are based on an apparent 100% response rate obtained by interviewing the first 1,000 people willing to respond, then the results should be interpreted with caution. Such quota sampling has no information about how many people were approached in total in order to get the 1,000 interviews. There is also no information about how the respondents may be different from those who did not respond.
Can statistics be misused?
Yes. For this reason you should request and use statistics that are produced with professional and scientific rigour, commensurate to their use. You should question what a statistic represents, how it was calculated, and its strengths and limitations. Some say that "some statistical information is better than none at all." This statement is true to the extent that the user is aware of the limitations of the statistics and the risk of using them in their particular context.
Here are a few examples where statistics are to be interpreted or used with caution:
Representing an average
When reporting on salaries in a company, Person A claims that the average salary is over $60,000, Person B claims that the average worker gets $28,000, and Person C claims that "most" employees gets only $26,000. Any or all of these statements may be true at the same time. How to make sense of this? First, each person is trying to convey a single numerical representation of the salaries. Person A actually reports the mean, which is the sum of all salaries divided by the number of paid employees, including the CEO who makes $900,000. Person B reports the median, meaning that half of employees make less than $28,000 and half make more. Finally, person C reports the mode, which is the most frequent or typical salary in the company. The mean, mode and median are clearly defined statistical concepts; the average is not.
Exaggerating the precision
In a quick poll, 57.14% preferred X and 42.86% preferred Y. In fact, this could mean that 4 of the 7 persons interviewed preferred X over Y. If only one person more had preferred Y over X, the results would had been totally reversed. The size of the sample is far too small to support the level of precision expressed by the proportions.
Finding the answer you want
"Seven out of ten dentists prefer Toothpaste X." How many different times did Toothpaste Company X ask groups of 10 dentists about their preferences before finally finding one group with 7 in favour?"
Up and down
Mr. A's income dropped by 40% from 2009 to 2010, but in 2011 it rose by 50% so he's better off than ever. Is this so? A 40% drop from $100,000 took him down to $60,000. Then an increase of 50% of that brought him back up to $90,000 so he's still down by 10%.