Aside from the sampling error associated with the process of selecting a sample, a survey is subject to a wide variety of errors. These errors are commonly referred to as "non-sampling errors".
Non-sampling errors can be defined as errors arising during the course of all survey activities other than sampling. Unlike sampling errors, they can be present in both sample surveys and censuses.
Non-sampling errors can be classified into two groups: random errors and systematic errors.
Non-sampling errors are extremely difficult, if not impossible, to measure. Since random errors have the tendency to be cancelled out, systematic errors are the principal cause for concern. Unlike sampling variance, bias caused by systematic errors cannot be reduced by increasing the sample size.
Non-sampling errors can occur because of problems in coverage, response, non-response, data processing, estimation and analysis. Each of these types of errors is explained below.
An error in coverage occurs when units are omitted, duplicated or wrongly included in the population or sample. Omissions are referred to as "undercoverage", while duplication and wrongful inclusions are called "overcoverage". Coverage errors are caused by defects in the survey frame, such as inaccuracy, incompleteness, duplications, inadequacy or obsolescence. Coverage errors may also occur in field procedures (e.g., while a survey is conducted, the interviewer misses several households or persons).
Response errors result when data is incorrectly requested, provided, received or recorded. These errors may occur because of inefficiencies with the questionnaire, the interviewer, the respondent or the survey process.
Non-response errors are the result of not having obtained sufficient answers to survey questions. There are two types of non-response errors: complete and partial.
More information on editing and imputation can be found in the chapter entitled Data processing.
Processing errors sometimes emerge during the preparation of the final data files. For example, errors can occur while data are being coded, captured, edited or imputed. Coder bias is usually a result of poor training or incomplete instructions, variance in coder performance (i.e., tiredness, illness), data entry errors, or machine malfunction (some processing errors are caused by errors in the computer programs). The same thing can be said about captured errors. Sometimes, errors are incorrectly identified during the editing phase. Even when errors are discovered, they can be corrected improperly because of poor imputation procedures.
Statistics Canada and other data-collecting agencies devote much effort to designing and monitoring surveys in order to make them as error-free as possible. If an inappropriate estimation method is used, then bias can still be introduced, regardless of how errorless the survey had been before estimation.
Here is an example of a potentially inappropriate estimation. We know that global warming is an issue where there is a lot of debate. To accurately measure this phenomenon, one should know how to come up with an acceptable "average global temperature". Figure 1 features a common portrayal of climate change data. It shows an average global temperature increase between 0.3° and 0.6°C over nearly 140 years.
The measurements that comprise the data set have been taken at various weather stations around the world. In this case, the population is the set of weather measurements, from which a sample can be taken.
Some scientists question the accuracy of a graph like Figure 1 because they feel that the estimates from the sample survey are biased.
Scientists argue that measurements of temperature should reflect the ratio of the earth's land mass to the water mass. For example, if the land mass is half of the mass of water (seas and oceans), then twice as many measurements should come from locations over water than over land. In fact, in Figure 1, few measurements were taken from locations over the surface of water, whereas the great majority of measurements were taken from weather stations on land.
Why might this bias the estimates from the sample survey?
Temperatures on land tend to be naturally higher than on water surfaces owing to the phenomenon known as 'urban heat island effect.' If the sample is too heavily weighted in favour of land-based temperatures, and the estimates do not take this into account (as some scientists claim), then the results may not truly reflect a global average.
For more information on estimation, refer to the Sampling methods chapter.
Analysis errors are those that occur when using the wrong analytical tools or when the preliminary results are provided instead of the final ones. Errors that occur during the publication of data results are also considered analysis errors.