Statistics Canada
Symbol of the Government of Canada

Data editing

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

Data should be edited before being presented as information. This action ensures that the information provided is accurate, complete and consistent. No matter what type of data you are working with, certain edits are performed on all surveys. Data editing can be performed manually, with the assistance of computer programming, or a combination of both techniques. It depends on the medium (electronic, paper) by which the data are submitted.

There are two levels of data editing—micro- and macro-editing.

Micro-editing corrects the data at the record level. This process detects errors in data through checks of the individual data records. The intent at this point is to determine the consistency of the data and correct the individual data records.

Macro-editing also detects errors in data, but does this through the analysis of aggregate data (totals). The data are compared with data from other surveys, administrative files, or earlier versions of the same data. This process determines the compatibility of data.

We might ask the question "Why are there errors in our files?" There are several situations where errors can be introduced into the data, and the following list gives some of them:

  • A respondent could have misunderstood a question.
  • A respondent or an interviewer could have checked the wrong response.
  • An interviewer could have miscoded or misunderstood a written response.
  • An interviewer could have forgotten to ask a question or record the answer.
  • A respondent could have provided inaccurate responses.

Always keep in mind the objectives of data editing:

  • to ensure the accuracy of data;
  • to establish the consistency of data;
  • to determine whether or not the data are complete;
  • to ensure the coherence of aggregated data; and
  • to obtain the best possible data available.

Applying editing rules

So, how do we edit? The first step is to apply 'rules' (or factors to be taken into consideration) to the data. These rules are determined by the expert knowledge of a subject-matter specialist, the structure of the questionnaire, the history of the data, and any other related surveys or data.

Expert knowledge can come from a variety of sources. The specialist could be an analyst who has extensive experience with the type of data being edited. An expert could also be one of the survey sponsors who is familiar with the relationships between the data.

The layout and structure of the questionnaire will also impact the rules for editing data. For example, sometimes respondents are instructed to skip certain questions if the questions do not apply to them or their situation. This specification must be respected and incorporated into the editing rules.

Lastly, other surveys relating to the same sort of variables or characteristics are used in order to establish some of the rules for editing data.

Data editing types

There are several types of data edits available: They include

  • Validity edits look at one question field or cell at a time. They check to ensure the record identifiers, invalid characters, and values have been accounted for; essential fields have been completed (e.g., no quantity field is left blank where a number is required); specified units of measure have been properly used; and the reporting time is within the specified limits.
  • Range edits are similar to validity edits in that they look at one field at a time. The purpose of this type of edit is to ensure that the values, ratios and calculations fall within the pre-established limits.
  • Duplication edits examine one full record at a time. These types of edits check for duplicated records, making certain that a respondent or a survey item has only been recorded once. A duplication edit also checks to ensure that the respondent does not appear in the survey universe more than once, especially if there has been a name change. Finally, it ensures that the data have been entered into the system only once.
  • Consistency edits compare different answers from the same record to ensure that they are coherent with one another. For example, if a person is declared to be in the 0 to 14 age group, but also claims that he or she is retired, there is a consistency problem between the two answers. Inter-field edits are another form of a consistency edit. These edits verify that if a figure is reported in one section, a corresponding figure is reported in another.
  • Historical edits are used to compare survey answers in current and previous surveys. For example, any dramatic changes since the last survey will be flagged. The ratios and calculations are also compared, and any percentage variance that falls outside the established limits will be noted and questioned.
  • Statistical edits look at the entire set of data. This type of edit is performed only after all other edits have been applied and the data have been corrected. The data are compiled and all extreme values, suspicious data and outliers are rejected.
  • Miscellaneous edits fall in the range of special-reporting arrangements; dynamic edits particular to the survey; correct classification checks; changes to physical addresses, locations and/or contacts; and legibility edits (i.e., making sure the figures or symbols are recognizable and easy to read).

Data editing is influenced by the complexity of the questionnaire. Complexity refers to the length, as well as the number of questions asked. It also includes the detail of questions and the range of subject matter that the questionnaire may cover. In some cases, the terminology of a question can be very technical. For these types of surveys, special reporting arrangements and industry-specific edits may occur.

Data errors

Data editing should detect and minimize errors, such as:

  • unasked questions;
  • unrecorded answers;
  • inappropriate responses.

An inaccurate response can occur as a result of carelessness or a deliberate effort to give misleading answers. It can also occur if some of the answers require mathematical calculations. For example, converting days into hours or annual income into weekly income increases the possibility of making mistakes.

Example 1 – Inaccurate responses

This example of data editing shows how an inaccurate response can occur. Carefully read the following questions and answers based on the questions asked in Statistics Canada's Labour Force Survey form. Can you detect the error in the respondent's answers?

Question 151 - Excluding overtime, how many paid hours does Person 1 work per week?
Answer
- 40

Question 153 - Last week, how many hours was Person 1 away from this job because of vacation, illness, or any other reason?
Answer
- 0

Question 155 - Last week, how many hours of paid overtime did Person 1 work at this job?
Answer
- 4

Question 156 - Last week, how many extra hours without pay did Person 1 work at this job?
Answer
- 0

Question 157 - Last week, how many hours did Person 1 actually work at the main job?
Answer
- 40

Question 151 shows that Person 1 normally works 40 hours per week. Question 153 shows the respondent had no time off the previous week, and Question 155 shows that, in fact, some overtime was worked. However, Question 157 gives us the answer that all of this amounted to 40 hours worked for the week! The actual response to Question 157 should be 44 hours.

The answers to individual questions look acceptable. It is only by comparing them with each other that we find one or more of the answers to be wrong.

Cross-referencing, a form of a consistency edit, is only one type of data editing that compares the answers of various questions. Cross-referencing can be performed manually or with the use of editing software.

This edit indicates that further action should be taken to ensure an accurate response in the above example; the interviewer will need to get in touch with the household and verify the number of hours worked by Person 1.

In computer-assisted personal or telephone interviews, the interviewer would receive an electronic warning when trying to enter 40 as a response to Question 157. The interviewer could then immediately double-check the answer with the respondent. This system is much faster, and eliminates the burden of trying to contact the respondent later on.

Editing as a management tool

The editing process can also be a valuable tool in assessing the quality of the data by indicating the required modifications. By indicating potential causes of problems, editing can also be an effective way of avoiding the need to repeat the survey.