|
|
Survey steps >
Scope and purpose Data quality evaluation
is a process used to determine whether final products meet the original
objectives of the statistical activity, in particular in terms of that
data's accuracy, timeliness and reliability. It allows users to better
interpret survey results and the Agency to improve the quality of its
surveys.
There are two broad methods of evaluating data quality:
Certification or validation is the process whereby
data are analysed before official release with a view to avoiding gross
errors and eliminating poor quality data. This process frequently coincides
with an interpretative analysis of the data and usually involves time
constraints and deadlines, and therefore only methods that yield rapid
results can be used.
Sources of error studies generally provide quantitative information
on the specific sources of errors in the data. While timeliness is important,
the results of these studies often are only available after the official
release of the data.
Principles
Users must be able to determine to what extent data errors affect their
use; however, users are rarely able to independently evaluate the accuracy
of data produced by a statistical agency. It is therefore up to each agency
to evaluate data quality and quickly provide users with the results in
a usable form.
Data quality evaluations are also useful to the Agency. To the extent
that errors can be associated with certain stages of the survey process,
evaluations can be used to improve the quality of the next iteration of
the survey, as well as other similar surveys. Evaluations include, for
instance, reviewing survey plans, the significance of nonresponse, as
well as dubious imputation practices.
The timeliness of data quality evaluations is just as important as the
timeliness of the data. Ideally, evaluation results are valid and timely
enough to improve released data — for example: an evaluation of
coverage can be used to compensate for differences between the frame and
the target population. When this is not possible, evaluation results should
at least be timely enough to help users to analyse the data and staff
to design the next iteration of the survey.
Guidelines
- Determine the extent of data quality evaluation required for a program or
a product. The factors to be considered are: data uses and users; risk of
errors and impact of errors on data use; quality variation over time; cost
of the evaluation in relation to the total cost of the program; improving
quality; increasing efficiency and productivity; usefulness of measures to
users and ease of interpretation; and whether the survey will or will not
be repeated.
- Data quality evaluations at Statistics Canada must be designed to meet the
mandatory and minimum requirements of the Policy
on Informing Users of Data Quality and Methodology (Statistics Canada,
2000d). In the case of census and survey data, minimum requirements include
measuring or evaluating coverage errors, response or imputation rate and (if
dealing with a sample survey) measurement of sampling errors for key characteristics.
- Provide a quality evaluation based on expert opinion or subjective analysis
whenever data quality evaluations will not yield quantitative measurements
because of the nature of the product, the user, time constraints, cost or
technical feasibility.
- Make planning of data quality evaluations part of the overall survey design,
as the information needed for such evaluations must often be collected during
the survey process. Data quality reports should be included in the dissemination
schedule for the survey.
- In the case of repeated surveys and statistical activities, it may not be
necessary, or even possible, to consistently produce detailed quality evaluations.
However, periodically review activity to ensure it meets its objectives –
not just when problems arise.
- Involve users of evaluation results, whether they are associated with a
statistical agency or not, in establishing the data quality evaluation program
objectives. When circumstances permit, also involve them in the evaluation
process.
- Among certification methods, consider:
- checking coherence in relation to external data sources – for example:
other surveys, other iterations of the same survey, administrative data
- checking internal coherence – for example: by calculating ratios
that are known to be within certain limits (male-female ratios, average
values of properties, etc)
- analysing largest units individually as regards their contributions to
overall estimates (generally applied to business surveys)
- calculating data quality indicators – for example: nonresponse rates,
imputation rates and coefficients of variation
- holding feedback sessions with staff involved in data collecting and processing
- "reasonableness" checks by well-informed experts, including
pre-release external review in the form of “work in progress.”
- Evaluate the following sources of error:
- Coverage errors which consist of omissions, erroneous inclusions,
and duplications in the frame used to conduct the survey. Since they affect
all survey estimates, they constitute one of the most important types
of error. Coverage errors may translate into a negative or positive bias
in the data, and the impact may vary depending on the survey universe
subgroup. One should also be concerned about classification errors, notably
industrial and geographical, among others. For example, badly defined
limits or erroneous coding may lead to an omission of part of the territory.
- Nonresponse errors occur when there is no response to one or
all of the survey questions. Nonresponse leads to an increase in variance
as a result of a reduction in the actual size of the sample and the recourse
to imputation, and produces a bias if the nonrespondents have characteristics
of interest that are different from those of the respondents. Furthermore,
there is a risk of significantly underestimating the sampling error, if
imputed data are treated as though they were observed data.
- Measurement errors occur when the response provided differs
from the real value; such errors may be attributable to the respondent,
the interviewer, the questionnaire, the collection method or the respondent's
record-keeping system. Such errors may be random or they may result in
a systematic bias if they are not random.
- Processing errors occur at subsequent stages of the process,
when checking, coding, entering, imputing, and tabulating data. Like measurement
errors, processing errors may lead to variance and bias. It is also necessary
to look into the potential impact of snags in the survey process: uneven
staff training, unusually high staff turnover, procedural changes in mid-operation,
etc.
- Sampling errors occur when survey results were obtained from
a sample rather than the population as a whole. In practice, these errors
may also include estimation errors that may be attributable to the use
of estimators which, deliberately or otherwise, create a bias (e.g., some
small area estimators).
A good discussion of the subject can be found in Lessler and Kalsbeek
(1992) and Lyberg et al (1997).
References
Biemer, P., Groves, R.M., Mathiowetz, N.A., Lyberg, L. and Sudman, S. (eds.)
(1991). Measurement Errors in Surveys. Wiley, New York.
Fuller, W. (1987). Measurement Error Models. Wiley, New York.
Gosselin, J.-F., Chinnappa, B.N., Ghangurde, P.D. and Tourigny, J. (1978).
Coverage. Chapter 2 in A Compendium of Methods of Error Evaluation in
Censuses and Surveys, Statistics Canada Catalogue no 13-564E, 7-9.
Lessler, J.T. and Kalsbeek, W.D. (1992). Nonsampling Errors in Surveys.
Wiley, New York.
Lyberg, L., Biemer, P., Collins, M., de Leeuw, E., Dippo, C., Schwarz, N. and
Trewin, D. (ed.) (1997). Survey Measurement and Process Quality.
Wiley, New York.
Statistics Canada (1996b). Proceedings of Symposium 1996: Nonsampling
Errors.
Statistics Canada (2000d). Policy
on Informing Users of Data Quality and Methodology. Policy Manual,
2.3.
Statistics Canada (2001e). Proceedings of Symposium 2001: Achieving
Quality in a Statistical Agency: A Methodological Perspective.
Statistics Canada (2002c). Statistics
Canada’s Quality Assurance Framework - 2002. Statistics Canada Catalogue
no. 12-586-XIE.
Statistics Canada (2003b). Guidelines on the Use of Data Accuracy Criteria
in the Dissemination of Statistical Information. Issued by the Methods and Standards
Committee, July 2003.
|