The mean of a numeric variable is calculated by adding the values of all observations in a data set and then dividing that sum by the number of observations in the set. This provides the average value of all the data.
There are two types of variables—discrete and continuous. Discrete variables are defined as variables that cannot be divided internally. For example, a hockey player can score 1 or 2 goals, but never 1 and a half goals. Continuous variables, however, can be divided into smaller units. A student's age can be 11 years, 7 months and 3 days, as opposed to just 11 or 12 years.
It is important that you understand the difference between these two types of variables, so that you can properly calculate the mean in any given situation. The following examples use discrete variables to calculate the mean
Example 1 – Soccer tournament at Mount Rival I
Example 2 – Traffic fatalities
Example 3 – Soccer tournament at Mount Rival II
Example 4 – Height of 50 Grade 10 girls
Résumé
Mount Rival hosts a soccer tournament each year. This season, in 10 games, the lead scorer for the home team scored 7, 5, 0, 7, 8, 5, 5, 4, 1 and 5 goals. What was the mean score?
= (7 + 5 + 0 + 7 + 8 + 5 + 5 + 4 + 5 + 1) ÷ 10
= 47 ÷ 10
= 4,7
Therefore, in the 10-game tournament, the player scored an average of 4.7 goals per game. The average of 4.7 is not a whole number so it only has meaning in a statistical sense. In reality, it is impossible to score 4.7 goals, even if you are a top scorer.
The mathematical notation to calculate the mean for a discrete variable is as follows:
where x stands for an observed value,
n stands for the number of observations in the data set,
The following table lists the number of people killed in traffic accidents over a 10 year period. During this time period, what was the average number of people killed per year? How many people died each day on average in traffic accidents during this time period?
| Year | Fatalities |
|---|---|
| 1 | 959 |
| 2 | 1,037 |
| 3 | 960 |
| 4 | 797 |
| 5 | 663 |
| 6 | 652 |
| 7 | 560 |
| 8 | 619 |
| 9 | 623 |
| 100 | 583 |
Using the formula to calculate the mean for discrete variables, you can see that:
= (959 + 1 037 + 960 + 797 + 663 + 652 + 560 + 619 + 623 + 583) ÷ 10
= 7,453 ÷ 10
= 745.3
The average number of people killed per year is 745.3.
To calculate the daily death rate from traffic accidents, the average yearly death rate is divided by the number of days in a year (leap years are ignored).
= 745.3 ÷ 365
= 2.0
Therefore, on average, 2 people died each day in traffic accidents.
A frequency table lists the number of observations that lie in any given data set. It can be used with grouped or ungrouped variables.
For example, to provide a frequency table of the age of people in a data set, you can produce a table using the exact age (ungrouped), or you can group the ages (grouped).
An ungrouped variable can be regarded as being a special type of grouped variable (i.e., a group). You can calculate the mean of a discrete variable using a frequency table. This method provides an approximation of the true mean for an ungrouped variable. How accurate the approximation is depends on how evenly the observed values are spread within each group.
[an error occurred while processing this directive]Grouping observations in tables is useful when dealing with a large amounts of data. The goal-scoring figures from the soccer tournament example can be displayed in a frequency table.
| Number of goals (x) | Frequency (f) | Total number of goals (xf) |
|---|---|---|
| 0 | 1 |
0 |
| 1 | 1 |
1 |
| 4 | 1 |
4 |
| 5 | 4 |
20 |
| 7 | 2 |
14 |
| 8 | 1 |
8 |
| Total ( |
10 |
47 |
Because the observations are grouped, the mathematical notation changes slightly.
For a discrete variable in a frequency table, the mean is calculated as follows:
The calculation for the mean of the player's goals is:
= (0 + 1 + 4 + 20 + 14 + 8) ÷ (1 + 1 + 1 + 4 + 2 + 1)
= 47 ÷ 10
= 4,7
Since the variable is ungrouped, this is the exact mean. The next example shows what happens when working with grouped variables.
[an error occurred while processing this directive]The following table shows the heights of 50 randomly selected Grade 10 girls. What is the mean height of the girls?
Determine the midpoint of each class interval for a variable before calculating the mean from a frequency table.
| Height (cm) | Midpoint (x) | Frequency (f) | Total amount of midpoint (xf) |
|---|---|---|---|
| 150 –< 155 | 152.5 |
4 |
610.0 |
| 155 –< 160 | 157.5 |
7 |
1,102,5 |
| 160 –< 165 | 162.5 |
18 |
2,925.0 |
| 165 –< 170 | 167.5 |
11 |
1,842.5 |
| 170 –< 175 | 172.5 |
6 |
1,035.0 |
| 175 –< 180 | 177.5 |
4 |
710.0 |
| - | - |
50 |
8,225.0 |
The calculation is the same as that used in the soccer tournament example above, except that the xf is now the product of the midpoint of the interval multiplied by the frequency of the same interval. This approximation is required because we do not know the exact height of each girl.
As a result, we must treat all of the heights as if they were midpoints for their interval. For example, because there are four girls in the interval of 150 –< 155 cm, we will treat each of the four girls as measuring 152.5 cm. As was mentioned in the soccer tournament example, the accuracy of the approximation of the mean will depend on how close each of the girls is to the midpoint of her interval.
Thus,
= (610.0 + 1,102.5 + 2,925.0 + 1,842.5 + 1,035.0 + 710.0) ÷ (4 + 7 + 18 + 11 + 6 + 4)
= 8,225.0 ÷ 50
= 164.5 cm
Therefore, the mean height of the 50 girls in Grade 10 is 164.5 cm.
[an error occurred while processing this directive]The mean is used in computing other statistics (such as the variance) and does not exist for open-ended grouped frequency distributions. It is often not the most appropriate measure for skewed (unbalanced) distributions such as salary information. (See Measures of spread for more information on variance.)