Statistics Canada
Symbol of the Government of Canada

Scatterplots

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

In science, the scatterplot is widely used to present measurements of two or more related variables. It is particularly useful when the variables of the y-axis are thought to be dependent upon the values of the variable of the x-axis (usually an independent variable).

In a scatterplot, the data points are plotted but not joined; the resulting pattern indicates the type and strength of the relationship between two or more variables (see Figure 1 below).

Figure 1. Car ownership in Anytowne, by household income.

Car ownership increases as the household income increases, showing that there is a positive relationship between these two variables.

The pattern of the data points on the scatterplot reveals the relationship between the variables. Scatterplots can illustrate various patterns and relationships, such as:

Data correlation

When the data points form a straight line on the graph, the linear relationship between the variables is stronger and the correlation is higher (Figure 2).

Scatterplot showing a strong linear relationship of variables.

Positive or direct relationships

If the points cluster around a line that runs from the lower left to upper right of the graph area, then the relationship between the two variables is positive or direct (Figure 3). An increase in the value of x is more likely associated with an increase in the value of y. The closer the points are to the line, the stronger the relationship.

Scatterplot showing a positive or direct relationships between variables.

Negative or inverse relationships

If the points tend to cluster around a line that runs from the upper left to lower right of the graph, then the relationship between the two variables is negative or inverse (Figure 4).

Scatterplot showing a negative or inverse relationships.

Scattered data points

If the data points are randomly scattered, then there is no relationship between the two variables; this means there is a low or zero correlation between the variables (Figure 5).

Scatterplot showing a low or zero correlation between two variables.

Non-linear patterns

Very low or zero correlation may result from a non-linear relationship between two variables. If the relationship is, in fact, non-linear (i.e., points clustering around a curve, not a straight line), the linear correlation coefficient will not be a good measure of the strength of the relationship (Figure 6).

Scatterplot showing a non-linear relationship between two variables.

Spread of data

A scatterplot will also illustrate if the data are widely spread or if they are concentrated within a smaller area (Figure 7 and 8).

Scatterplot showing data widely spread.

Scatterplot showing concentrated data.

Outliers

Besides portraying a non-linear relationship between the two variables, a scatterplot can also show whether or not there exist any outliers in the data (Figure 9).

Scatterplot showing an outlier in the data.