Data should be edited before being presented as information. This action ensures that the information provided is accurate, complete and consistent. No matter what type of data you are working with, certain edits are performed on all surveys. Data editing can be performed manually, with the assistance of computer programming, or a combination of both techniques. It depends on the medium (electronic, paper) by which the data are submitted.
There are two levels of data editing—micro- and macro-editing.
Micro-editing corrects the data at the record level. This process detects errors in data through checks of the individual data records. The intent at this point is to determine the consistency of the data and correct the individual data records.
Macro-editing also detects errors in data, but does this through the analysis of aggregate data (totals). The data are compared with data from other surveys, administrative files, or earlier versions of the same data. This process determines the compatibility of data.
We might ask the question "Why are there errors in our files?" There are several situations where errors can be introduced into the data, and the following list gives some of them:
Always keep in mind the objectives of data editing:
So, how do we edit? The first step is to apply 'rules' (or factors to be taken into consideration) to the data. These rules are determined by the expert knowledge of a subject-matter specialist, the structure of the questionnaire, the history of the data, and any other related surveys or data.
Expert knowledge can come from a variety of sources. The specialist could be an analyst who has extensive experience with the type of data being edited. An expert could also be one of the survey sponsors who is familiar with the relationships between the data.
The layout and structure of the questionnaire will also impact the rules for editing data. For example, sometimes respondents are instructed to skip certain questions if the questions do not apply to them or their situation. This specification must be respected and incorporated into the editing rules.
Lastly, other surveys relating to the same sort of variables or characteristics are used in order to establish some of the rules for editing data.
There are several types of data edits available: They include
Data editing is influenced by the complexity of the questionnaire. Complexity refers to the length, as well as the number of questions asked. It also includes the detail of questions and the range of subject matter that the questionnaire may cover. In some cases, the terminology of a question can be very technical. For these types of surveys, special reporting arrangements and industry-specific edits may occur.
Data editing should detect and minimize errors, such as:
An inaccurate response can occur as a result of carelessness or a deliberate effort to give misleading answers. It can also occur if some of the answers require mathematical calculations. For example, converting days into hours or annual income into weekly income increases the possibility of making mistakes.
This example of data editing shows how an inaccurate response can occur. Carefully read the following questions and answers based on the questions asked in Statistics Canada's Labour Force Survey form. Can you detect the error in the respondent's answers?
Question 151 - Excluding overtime, how many paid hours does Person 1 work per week?
Answer - 40
Question 153 - Last week, how many hours was Person 1 away from this job because of vacation, illness, or any other reason?
Answer - 0
Question 155 - Last week, how many hours of paid overtime did Person 1 work at this job?
Answer - 4
Question 156 - Last week, how many extra hours without pay did Person 1 work at this job?
Answer - 0
Question 157 - Last week, how many hours did Person 1 actually work at the main job?
Answer - 40
Question 151 shows that Person 1 normally works 40 hours per week. Question 153 shows the respondent had no time off the previous week, and Question 155 shows that, in fact, some overtime was worked. However, Question 157 gives us the answer that all of this amounted to 40 hours worked for the week! The actual response to Question 157 should be 44 hours.
The answers to individual questions look acceptable. It is only by comparing them with each other that we find one or more of the answers to be wrong.
Cross-referencing, a form of a consistency edit, is only one type of data editing that compares the answers of various questions. Cross-referencing can be performed manually or with the use of editing software.
This edit indicates that further action should be taken to ensure an accurate response in the above example; the interviewer will need to get in touch with the household and verify the number of hours worked by Person 1.
In computer-assisted personal or telephone interviews, the interviewer would receive an electronic warning when trying to enter 40 as a response to Question 157. The interviewer could then immediately double-check the answer with the respondent. This system is much faster, and eliminates the burden of trying to contact the respondent later on.
The editing process can also be a valuable tool in assessing the quality of the data by indicating the required modifications. By indicating potential causes of problems, editing can also be an effective way of avoiding the need to repeat the survey.