Data quality, concepts and methodology: Editing

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Data editing is the application of checks to detect missing, invalid or inconsistent entries or to point to data records that are potentially in error. In the survey process for the MRTS, data editing is done at two different time periods.

First of all, editing is done during data collection. Once data are collected via the telephone, or via the receipt of completed mail-in questionnaires, the data are captured using customized data capture applications. All data are subjected to data editing. Edits during data collection are referred to as field edits and generally consist of validity and some simple consistency edits. They are used to detect mistakes made during the interview by the respondent or the interviewer and to identify missing information during collection in order to reduce the need for follow-up later on. Another purpose of the field edits is to clean up responses. In the MRTS, the current month’s responses are edited against the respondent’s previous month’s responses and/or the previous year’s responses for the current month. Field edits are also used to identify problems with data collection procedures and the design of the questionnaire, as well as the need for more interviewer training.

Follow-up with respondents occurs to validate potential erroneous data following any failed preliminary edit check of the data. Once validated, the collected data is regularly transmitted to the head office in Ottawa.

Secondly, editing known as statistical editing is also done after data collection and this is more empirical in nature. Statistical editing is run prior to imputation in order to identify the data that will be used as a basis to impute non-respondents. Large outliers that could disrupt a monthly trend are excluded from trend calculations by the statistical edits. It should be noted that adjustments are not made at this stage to correct the reported outliers.

The first step in the statistical editing is to identify which responses will be subjected to the statistical edit rules. Reported data for the current reference month will go through various edit checks.

The first set of edit checks is based on the Hidiriglou-Berthelot method whereby a ratio of the respondent’s current month data over historical (last month, same month last year) or auxiliary data is analyzed. When the respondent’s ratio differs significantly from ratios of respondents who are similar in terms of industry and/or geography group, the response is deemed an outlier.

The second set of edits consists of an edit known as the share of market edit. With this method, one is able to edit all respondents, even those where historical and auxiliary data is unavailable. The method relies on current month data only. Therefore, within a group of respondents, that are similar in terms of industry and/or geography, if the weighted contribution of a respondent to the group’s total is too large, it will be flagged as an outlier.

For edit checks based on the Hidiriglou-Berthelot method, data that are flagged as an outlier will not be included in the imputation models (those based on ratios). Also, data that are flagged as outliers in the share of market edit will not be included in the imputation models where means and medians are calculated to impute for responses that have no historical responses.

In conjunction with the statistical editing after data collection of reported data, there is also error detection done on the extracted GST data. Modeled data based on the GST are also subject to an extensive series of processing steps which thoroughly verify each record that is the basis for the model as well as the record being modeled. Edits are performed at a more aggregate level (industry by geography level) to detect records which deviate from the expected range, either by exhibiting large month-to-month change, or differing significantly from the remaining units. All data which fail these edits are subject to manual inspection and possible corrective action.

Next technical note | Previous technical note

Date modified: